Results 1 - 10
of
38
Improving Recommendation for Long-tail Queries via Templates
"... The ability to aggregate huge volumes of queries over a large population of users allows search engines to build precise models for a variety of query-assistance features such as query recommendation, correction, etc. Yet, no matter how much data is aggregated, the long-tail distribution implies tha ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
(Show Context)
The ability to aggregate huge volumes of queries over a large population of users allows search engines to build precise models for a variety of query-assistance features such as query recommendation, correction, etc. Yet, no matter how much data is aggregated, the long-tail distribution implies that a large fraction of queries are rare. As a result, most query assistance services perform poorly or are not even triggered on long-tail queries. We propose a method to extend the reach of query assistance techniques (and in particular query recommendation) to long-tail queries by reasoning about rules between query templates rather than individual query transitions, as currently done in query-flow graph models. As a simple example, if we recognize that ‘Montezuma’ is a city in the rare query “Montezuma surf”andif therule‘<city> surf → <city> beach ’ has been observed, we are able to offer “Montezuma beach ” as a recommendation, even if the two queries were never observed in a same session. We conducted experiments to validate our hypothesis, first via traditional small-scale editorial assessments but more interestingly via a novel automated large scale evaluation methodology. Our experiments show that general coverage can be relatively increased by 24 % using templates without penalizing quality. Furthermore, for 36 % of the 95M queries in our query flow graph, which have no out edges and thus could not be served recommendations, we can now offer at least one recommendation in 98 % of the cases.
Context-sensitive query auto-completion
- In WWW ’11
, 2011
"... ABSTRACT Query auto completion is known to provide poor predictions of the user's query when her input prefix is very short (e.g., one or two characters). In this paper we show that context, such as the user's recent queries, can be used to improve the prediction quality considerably even ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
(Show Context)
ABSTRACT Query auto completion is known to provide poor predictions of the user's query when her input prefix is very short (e.g., one or two characters). In this paper we show that context, such as the user's recent queries, can be used to improve the prediction quality considerably even for such short prefixes. We propose a context-sensitive query auto completion algorithm, NearestCompletion, which outputs the completions of the user's input that are most similar to the context queries. To measure similarity, we represent queries and contexts as high-dimensional term-weighted vectors and resort to cosine similarity. The mapping from queries to vectors is done through a new query expansion technique that we introduce, which expands a query by traversing the query recommendation tree rooted at the query. In order to evaluate our approach, we performed extensive experimentation over the public AOL query log. We demonstrate that when the recent user's queries are relevant to the current query she is typing, then after typing a single character, NearestCompletion's MRR is 48% higher relative to the MRR of the standard MostPopularCompletion algorithm on average. When the context is irrelevant, however, NearestCompletion's MRR is essentially zero. To mitigate this problem, we propose HybridCompletion, which is a hybrid of NearestCompletion with MostPopularCompletion. HybridCompletion is shown to dominate both NearestCompletion and MostPopularCompletion, achieving a total improvement of 31.5% in MRR relative to MostPopularCompletion on average.
Query Similarity by Projecting the Query-Flow Graph
, 2010
"... Defining a measure of similarity between queries is an interesting and difficult problem. A reliable query-similarity measure can be used in a variety of applications such as query recommendation, query expansion, and advertising. In this paper, we exploit the information present in query logs in or ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
(Show Context)
Defining a measure of similarity between queries is an interesting and difficult problem. A reliable query-similarity measure can be used in a variety of applications such as query recommendation, query expansion, and advertising. In this paper, we exploit the information present in query logs in order to develop a measure of semantic similarity between queries. Our approach relies on the concept of the query-flow graph, a graph-based representation of a query log. The query-flow graph aggregates query reformulations from many users: nodes in the graph represent queries, and two queries are connected if they are likely to appear as part of the same search goal. Our query-similarity measure is obtained by projecting the graph (or appropriate subgraphs extracted from it) on a low-dimensional Euclidean space. Our experiments show that the measure we obtain captures a notion of semantic similarity between queries and it is useful for diversifying query recommendations.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery
, 2012
"... One popular form of semantic search observed in several modern search engines is to recognize query patterns that trigger instant answers or domain-specific search, producing semantically enriched search results. This often requires understanding the query intent in addition to the meaning of the qu ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
One popular form of semantic search observed in several modern search engines is to recognize query patterns that trigger instant answers or domain-specific search, producing semantically enriched search results. This often requires understanding the query intent in addition to the meaning of the query terms in order to access structured data sources. A major challenge in intent understanding is to construct a domain-dependent schema and to annotate search queries based on such a schema, a process that to date has required much manual annotation effort. We present an unsupervised method for clustering queries with similar intent and for producing a pattern consisting of a sequence of semantic concepts and/or lexical items for each intent. Furthermore, we leverage the discovered intent patterns to automatically annotate a large number of queries beyond those used in clustering. We evaluated our method on 10 selected domains, discovering over 1400 intent patterns and automatically annotating 125K (and potentially many more) queries. We found that over 90 % of patterns and 80 % of instance annotations tested are judged to be correct by a majority of annotators.
Mining Concept Sequences from Large-Scale Search Logs for Context-Aware Query Suggestion
"... Query suggestion plays an important role in improving usability of search engines. Although some recently proposed methods provide query suggestions by mining query patterns from search logs, none of them models the immediately preceding queries as context systematically, and uses context informatio ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Query suggestion plays an important role in improving usability of search engines. Although some recently proposed methods provide query suggestions by mining query patterns from search logs, none of them models the immediately preceding queries as context systematically, and uses context information effectively in query suggestions. Context-aware query suggestion is challenging in both modeling context and scaling up query suggestion using context. In this article, we propose a novel context-aware query suggestion approach. To tackle the challenges, our approach consists of two stages. In the first, offline model-learning stage, to address data sparseness, queries are summarized into concepts by clustering a click-through bipartite. A concept sequence suffix tree is then constructed from session data as a context-aware query suggestion model. In the second, online query suggestion stage, a user’s search context is captured by mapping the query sequence submitted by the user to a sequence of concepts. By looking up the context in the concept sequence suffix tree, we suggest to the user context-aware queries. We test our approach on large-scale search logs of a commercial search engine containing 4.0 billion Web queries, 5.9 billion clicks, and 1.87 billion search sessions. The experimental results clearly show that our approach outperforms three baseline methods in both coverage and quality of suggestions.
Efficient Error-tolerant Query Autocompletion
"... Query autocompletion is an important feature saving users many keystrokes from typing the entire query. In this paper we studytheproblemofqueryautocompletion thattolerates errors in users ’ input using edit distance constraints. Previous approaches index data strings in a trie, and continuously main ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
(Show Context)
Query autocompletion is an important feature saving users many keystrokes from typing the entire query. In this paper we studytheproblemofqueryautocompletion thattolerates errors in users ’ input using edit distance constraints. Previous approaches index data strings in a trie, and continuously maintain all the prefixes of data strings whose edit distance from the query are within the threshold. The major inherent problem is that the number of such prefixes is huge for the first few characters of the query and is exponential in the alphabet size. This results in slow query response even if the entire query approximately matches only few prefixes. Inthispaper, weproposeanovelneighborhoodgenerationbased algorithm, IncNGTrie, which can achieve up to two orders of magnitude speedup over existing methods for the error-tolerant queryautocompletion problem. Ourproposed algorithm only maintains a small set of active nodes, thus saving both space and time to process the query. We also study efficient duplicate removal which is a core problem in fetching query answers. In addition, we propose optimization techniques to reduce our index size, as well as discussions on several extensions to our method. The efficiency of our method is demonstrated against existing methods through extensive experiments on real datasets. 1.
Using query log and social tagging to refine queries based on latent topics
- In Proceedings of the 20th
"... An important way to improve users ’ satisfaction in Web search is to assist them to issue more effective queries. One such approach is query refinement (reformulation), which generates new queries according to the current query issued by users. A common procedure for conducting refinement is to gene ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
(Show Context)
An important way to improve users ’ satisfaction in Web search is to assist them to issue more effective queries. One such approach is query refinement (reformulation), which generates new queries according to the current query issued by users. A common procedure for conducting refinement is to generate some candidate queries first, and then a scoring method is designed to assess the quality of these candidates. Currently, most of the existing methods are context based. They rely heavily on the context relation of terms in the his-torical queries, and cannot detect and maintain the semantic consistency of queries. In this paper, we propose a graphical model to score queries. The proposed model exploits a latent topic space, which is automatically derived from the query log, to assess the semantic dependency of terms in a query. In the graphical model, both term context dependency and topic context dependency are considered. This also makes it feasible to score some queries which do not have much available historical term context information. We also uti-lize social tagging data in the candidate query generation process. Based on the observation that different users may tag the same resource with different tags of similar mean-ing, we propose a method to mine these term pairs for new candidate query construction.
The Wisdom of Advertisers: Mining Subgoals via Query Clustering
"... This paper tackles the problem of mining subgoals of a given search goal from data. For example, when a searcher wants to travel to London, she may need to accomplish several subtasks such as “book flights, ” “book a hotel, ” “find good restaurants ” and “decide which sightseeing spots to visit. ” A ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
This paper tackles the problem of mining subgoals of a given search goal from data. For example, when a searcher wants to travel to London, she may need to accomplish several subtasks such as “book flights, ” “book a hotel, ” “find good restaurants ” and “decide which sightseeing spots to visit. ” As another example, if a searcher wants to lose weight, there may exist several alternative solutions such as “do physical exercise, ” “take diet pills, ” and “control calorie intake. ” In this paper, we refer to such subtasks or solutions as subgoals, and propose to utilize sponsored search data for finding subgoals of a given query by means of query clustering. Advertisements (ads) reflect advertisers ’ tremendous efforts in trying to match a given query with implicit user needs. Moreover, ads are usually associated with a particular action or transaction. We therefore hypothesized that they are useful for subgoal mining. To our knowledge, our work is the first to use sponsored search data for this purpose. Our experimental results show that sponsored search data is a good resource for obtaining related queries and for identifying subgoals via query clustering. In particular, our method that combines ad impressions from sponsored search data and query cooccurrences from session data outperforms a state-of-the-art query clustering method that relies on document clicks rather than ad impressions in terms of purity, NMI, Rand Index, F1-measure and subgoal recall.
HITSCIR System in NTCIR-9 Subtopic Mining Task
"... Web queries tend to have multiple user intents. Automatically identifying query intents will benefit search result navigation, search result diversity and personalized search. This paper presents the HITSCIR system in NTCIR-9 subtopic mining task. Firstly, the system collects query intent candidates ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Web queries tend to have multiple user intents. Automatically identifying query intents will benefit search result navigation, search result diversity and personalized search. This paper presents the HITSCIR system in NTCIR-9 subtopic mining task. Firstly, the system collects query intent candidates from multiple resources. Secondly, Affinity Propagation algorithm is applied for clustering these query intent candidates. It could decide the number of clusters automatically. Each cluster has a representative intent candidate called exemplar. Prior preference and heuristic pair-wise preferences could be incorporated in the clustering framework. Finally, the exemplars are ranked by considering each own quality and the popularity of the clusters they represent. The NTCIR-9 evaluation results show that our system could effectively mine query intents with good relevance, diversity and readability.
Structured Query Suggestion for Specialization and Parallel Movement: Effect on Search Behaviors
"... Query suggestion, which enables the user to revise a query with a single click, has become one of the most fundamental features of Web search engines. However, it is often difficult for the user to choose from a list of query suggestions, and to understand the relation between an input query and sug ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
Query suggestion, which enables the user to revise a query with a single click, has become one of the most fundamental features of Web search engines. However, it is often difficult for the user to choose from a list of query suggestions, and to understand the relation between an input query and suggested ones. In this paper, we propose a new method to present query suggestions to the user, which has been designed to help two popular query reformulation actions, namely, specialization (e.g. from “nikon ” to “nikon camera”) and parallel movement (e.g. from “nikon camera ” to “canon camera”). Using a query log collected from a popular commercial Web search engine, our prototype called SParQS classifies query suggestions into automatically generated categories and generates a