Results 1 - 10
of
65
Efficient query processing in geographic web search engines
- In SIGMOD
, 2006
"... Geographic web search engines allow users to constrain and order search results in an intuitive manner by focusing a query on a particular geographic region. Geographic search technology, also called local search, has recently received significant interest from major search engine companies. Academi ..."
Abstract
-
Cited by 91 (3 self)
- Add to MetaCart
(Show Context)
Geographic web search engines allow users to constrain and order search results in an intuitive manner by focusing a query on a particular geographic region. Geographic search technology, also called local search, has recently received significant interest from major search engine companies. Academic research in this area has focused primarily on techniques for extracting geographic knowledge from the web. In this paper, we study the problem of efficient query processing in scalable geographic search engines. Query processing is a major bottleneck in standard web search engines, and the main reason for the thousands of machines used by the major engines. Geographic search engine query processing is different in that it requires a combination of text and spatial data processing techniques. We propose several algorithms for efficient query processing in geographic search engines, integrate them into an existing web search query processor, and evaluate them on large sets of real data and query traces. 1.
Efficient Retrieval of the Top-k Most Relevant Spatial Web Objects
"... The conventional Internet is acquiring a geo-spatial dimension. Web documents are being geo-tagged, and geo-referenced objects such as points of interest are being associated with descriptive text documents. The resulting fusion of geo-location and documents enables a new kind of top-k query that ta ..."
Abstract
-
Cited by 88 (16 self)
- Add to MetaCart
(Show Context)
The conventional Internet is acquiring a geo-spatial dimension. Web documents are being geo-tagged, and geo-referenced objects such as points of interest are being associated with descriptive text documents. The resulting fusion of geo-location and documents enables a new kind of top-k query that takes into account both location proximity and text relevancy. To our knowledge, only naive techniques exist that are capable of computing a general web information retrieval query while also taking location into account. This paper proposes a new indexing framework for locationaware top-k text retrieval. The framework leverages the inverted file for text retrieval and the R-tree for spatial proximity querying. Several indexing approaches are explored within the framework. The framework encompasses algorithms that utilize the proposed indexes for computing the top-k query, thus taking into account both text relevancy and location proximity to prune the search space. Results of empirical studies with an implementation of the framework demonstrate that the paper’s proposal offers scalability and is capable of excellent performance. 1.
NewsStand: A New View on News
, 2008
"... News articles contain a wealth of implicit geographic content that if exposed to readers improves understanding of today’s news. However, most articles are not explicitly geotagged with their geographic content, and few news aggregation systems expose this content to users. A new system named NewsSt ..."
Abstract
-
Cited by 55 (26 self)
- Add to MetaCart
News articles contain a wealth of implicit geographic content that if exposed to readers improves understanding of today’s news. However, most articles are not explicitly geotagged with their geographic content, and few news aggregation systems expose this content to users. A new system named NewsStand is presented that collects, analyzes, and displays news stories in a map interface, thus leveraging on their implicit geographic content. NewsStand monitors RSS feeds from thousands of online news sources and retrieves articles within minutes of publication. It then extracts geographic content from articles using a custom-built geotagger, and groups articles into story clusters using a fast online clustering algorithm. By panning and zooming in NewsStand’s map interface, users can retrieve stories based on both topical significance and geographic region, and see substantially different stories depending on position and zoom level.
Architecture of a Spatio-Textual Search Engine
- In: Proceedings of the 15th ACM Int. Symp. on Advances in Geographic Information Systems (ACMGIS07), ACM Press (2007) 186 – 193
"... STEWARD (\Spatio-Textual Extraction on the Web Aiding Retrieval of Documents"), a system for extracting, querying, and visualizing textual references to geographic locations in unstructured text documents, is presented. Methods for re-trieving and processing web documents, extracting and dis-am ..."
Abstract
-
Cited by 42 (19 self)
- Add to MetaCart
(Show Context)
STEWARD (\Spatio-Textual Extraction on the Web Aiding Retrieval of Documents"), a system for extracting, querying, and visualizing textual references to geographic locations in unstructured text documents, is presented. Methods for re-trieving and processing web documents, extracting and dis-ambiguating georeferences, and identifying geographic focus are described. A brief overview of STEWARD's querying capabilities, as well as the design of an intuitive user inter-face, are provided. Finally, several application scenarios and future extensions to STEWARD are discussed.
Retrieving Top-k Prestige-Based Relevant Spatial Web Objects
"... The location-aware keyword query returns ranked objects that are near a query location and that have textual descriptions that match query keywords. This query occurs inherently in many types of mobile and traditional web services and applications, e.g., Yellow Pages and Maps services. Previous work ..."
Abstract
-
Cited by 36 (7 self)
- Add to MetaCart
The location-aware keyword query returns ranked objects that are near a query location and that have textual descriptions that match query keywords. This query occurs inherently in many types of mobile and traditional web services and applications, e.g., Yellow Pages and Maps services. Previous work considers the potential results of such a query as being independent when ranking them. However, a relevant result object with nearby objects that are also relevant to the query is likely to be preferable over a relevant object without relevant nearby objects. The paper proposes the concept of prestige-based relevance to capture both the textual relevance of an object to a query and the effects of nearby objects. Based on this, a new type of query, the Location-aware top-k Prestige-based Text retrieval (LkPT) query, is proposed that retrieves the top-k spatial web objects ranked according to both prestige-based relevance and location proximity. We propose two algorithms that compute LkPT queries. Empirical studies with real-world spatial data demonstrate that LkPT queries are more effective in retrieving web objects than a previous approach that does not consider the effects of nearby objects; and they show that the proposed algorithms are scalable and outperform a baseline approach significantly. 1.
Analysis of Geographic Queries in a Search Engine Log
, 2008
"... Geography is becoming increasingly important in web search. Search engines can often return better results to users by analyzing features such as user location or geographic terms in web pages and user queries. This is also of great commercial value as it enables location specific advertising and im ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
Geography is becoming increasingly important in web search. Search engines can often return better results to users by analyzing features such as user location or geographic terms in web pages and user queries. This is also of great commercial value as it enables location specific advertising and improved search for local businesses. As a result, major search companies have invested significant resources into geographic search technologies, also often called local search. This paper studies geographic search queries, i.e., text queries such as “hotel new york ” that employ geographical terms in an attempt to restrict results to a particular region or location. Our main motivation is to identify opportunities for improving geographical search and related technologies, and we perform an analysis of 36 million queries of the recently released AOL query trace. First, we identify typical properties of geographic search (geo) queries based on a manual examination of several thousand queries. Based on these observations, we build a classifier that separates the trace into geo and non-geo queries. We then investigate the properties of geo queries in more detail, and relate them to web sites and users associated with such queries. We also propose a new taxonomy for geographic search queries.
Personalized Web Search with Location Preferences
"... Abstract — As the amount of Web information grows rapidly, search engines must be able to retrieve information according to the user’s preference. In this paper, we propose a new web search personalization approach that captures the user’s interests and preferences in the form of concepts by mining ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
(Show Context)
Abstract — As the amount of Web information grows rapidly, search engines must be able to retrieve information according to the user’s preference. In this paper, we propose a new web search personalization approach that captures the user’s interests and preferences in the form of concepts by mining search results and their clickthroughs. Due to the important role location information plays in mobile search, we separate concepts into content concepts and location concepts, and organize them into ontologies to create an ontology-based, multi-facet (OMF) profile to precisely capture the user’s content and location interests and hence improve the search accuracy. Moreover, recognizing the fact that different users and queries may have different emphases on content and location information, we introduce the notion of content and location entropies to measure the amount of content and location information associated with a query, and click content and location entropies to measure how much the user is interested in the content and location information in the results. Accordingly, we propose to define personalization effectiveness based on the entropies and use it to balance the weights between the content and location facets. Finally, based on the derived ontologies and personalization effectiveness, we train an SVM to adapt a personalized ranking function for re-ranking of future search. We conduct extensive experiments to compare the precision produced by our OMF profiles and that of a baseline method. Experimental results show that OMF improves the precision significantly compared to the baseline. I.
Efficient continuously moving top-k spatial keyword query processing
- In ICDE
, 2011
"... Abstract—Web users and content are increasingly being geo-positioned. This development gives prominence to spatial key-word queries, which involve both the locations and textual descriptions of content. We study the efficient processing of continuously moving top- ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
(Show Context)
Abstract—Web users and content are increasingly being geo-positioned. This development gives prominence to spatial key-word queries, which involve both the locations and textual descriptions of content. We study the efficient processing of continuously moving top-
Spatial Keyword Query Processing: An Experimental Evaluation
"... Geo-textual indices play an important role in spatial keyword querying. The existing geo-textual indices have not been compared systematically under the same experimental framework. This makes it difficult to determine which indexing technique best supports specific functionality. We provide an all- ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
(Show Context)
Geo-textual indices play an important role in spatial keyword querying. The existing geo-textual indices have not been compared systematically under the same experimental framework. This makes it difficult to determine which indexing technique best supports specific functionality. We provide an all-around survey of 12 stateof-the-art geo-textual indices. We propose a benchmark that enables the comparison of the spatial keyword query performance. We also report on the findings obtained when applying the benchmark to the indices, thus uncovering new insights that may guide index selection as well as further research. 1.
Efficient processing of top-k spatial keyword queries
- In SSTD,2011
"... Abstract. Givenaspatiallocationandasetofkeywords,atop-k spatial keyword query returns the k best spatio-textual objects ranked according to their proximity to the query location and relevance to the query keywords. There are many applications handling huge amounts of geotagged data, such as Twitter ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
(Show Context)
Abstract. Givenaspatiallocationandasetofkeywords,atop-k spatial keyword query returns the k best spatio-textual objects ranked according to their proximity to the query location and relevance to the query keywords. There are many applications handling huge amounts of geotagged data, such as Twitter and Flickr, that can benefit from this query. Unfortunately,thestate-of-the-artapproachesrequirenon-negligibleprocessing cost that incurs in long response time. In this paper, we propose a novel index to improve the performance of top-k spatial keyword queries named Spatial Inverted Index (S2I). Our index maps each distinct term to a set of objects containing the term. The objects are stored differently according to the document frequency of the term and can be retrieved efficiently in decreasing order of keyword relevance and spatial proximity. Moreover, we present algorithms that exploit S2I to process top-k spatial keyword queries efficiently. Finally, we show through extensive experiments that our approach outperforms the state-of-the-art approaches in terms of update and query cost.