Results 1 - 10
of
14
Clustering query refinements by user intent.
- In WWW,
, 2010
"... ABSTRACT We address the problem of clustering the refinements of a user search query. The clusters computed by our proposed algorithm can be used to improve the selection and placement of the query suggestions proposed by a search engine, and can also serve to summarize the different aspects of inf ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
ABSTRACT We address the problem of clustering the refinements of a user search query. The clusters computed by our proposed algorithm can be used to improve the selection and placement of the query suggestions proposed by a search engine, and can also serve to summarize the different aspects of information relevant to the original user query. Our algorithm clusters refinements based on their likely underlying user intents by combining document click and session cooccurrence information. At its core, our algorithm operates by performing multiple random walks on a Markov graph that approximates user search behavior. A user study performed on top search engine queries shows that our clusters are rated better than corresponding clusters computed using approaches that use only document click or only sessions co-occurrence information.
Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction
"... Web search result clustering aims to facilitate information search on the Web. Rather than presenting the results of a query as a flat list, these are grouped on the basis of their similarity and subsequently shown to the user as a list of possibly labeled clusters. Each cluster is supposed to repre ..."
Abstract
-
Cited by 21 (8 self)
- Add to MetaCart
Web search result clustering aims to facilitate information search on the Web. Rather than presenting the results of a query as a flat list, these are grouped on the basis of their similarity and subsequently shown to the user as a list of possibly labeled clusters. Each cluster is supposed to represent a different meaning of the input query, thus taking into account the language ambiguity, i.e. polysemy, issue. However, Web clustering methods typically rely on some shallow notion of textual similarity of search result snippets. As a result, text snippets with no word in common tend to be clustered separately, even if they share the same meaning, whereas snippets with words in common may be grouped together even if they refer to different meanings of the input query. In this paper, we present a novel approach to Web search result clustering based on the automatic discovery of word senses from raw text, a task referred to as Word Sense Induction (WSI). Key to our approach is to first acquire the senses (i.e., meanings) of an ambiguous query and then cluster the search results based on their semantic similarity to the word senses induced. Our experiments, conducted on datasets of ambiguous queries, show that our approach outperforms both Web clustering and search engines. 1.
Extracting Query Facets from Search Results
"... Web search queries are often ambiguous or multi-faceted, which makes a simple ranked list of results inadequate. To assist information finding for such faceted queries, we explore a technique that explicitly represents interesting facets of a query using groups of semantically related terms extracte ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
Web search queries are often ambiguous or multi-faceted, which makes a simple ranked list of results inadequate. To assist information finding for such faceted queries, we explore a technique that explicitly represents interesting facets of a query using groups of semantically related terms extracted from search results. As an example, for the query “baggage allowance”, these groups might be different airlines, different flight types (domestic, international), or different travel classes (first, business, economy). We name these groups query facets and the terms in these groups facet terms. We develop a supervised approach based on a graphical model to recognize query facets from the noisy candidates found. The graphical model learns how likely a candidate term is to be a facet term as well as how likely two terms are to be grouped together in a query facet, and captures the dependencies between the two factors. We propose two algorithms for approximate inference on the graphical model since exact inference is intractable. Our evaluation combines recall and precision of the facet terms with the grouping quality. Experimental results on a sample of web queries show that the supervised method significantly outperforms existing approaches, which are mostly unsupervised, suggesting that query facet extraction can be effectively learned.
Identifying Aspects for Web-Search Queries
"... Many web-search queries serve as the beginning of an exploration of an unknown space of information, rather than looking for a specific web page. To answer such queries effectively, the search engine should attempt to organize the space of relevant information in a way that facilitates exploration. ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Many web-search queries serve as the beginning of an exploration of an unknown space of information, rather than looking for a specific web page. To answer such queries effectively, the search engine should attempt to organize the space of relevant information in a way that facilitates exploration. We describe the Aspector system that computes aspects for a given query. Each aspect is a set of search queries that together represent a distinct information need relevant to the original search query. To serve as an effective means to explore the space, Aspector computes aspects that are orthogonal to each other and to have high combined coverage. Aspector combines two sources of information to compute aspects. We discover candidate aspects by analyzing query logs, and cluster them to eliminate redundancies. We then use a mass-collaboration knowledge base (e.g., Wikipedia) to compute candidate aspects for queries that occur less frequently and to group together aspects that are likely to be “semantically ” related. We present a user study that indicates that the aspects we compute are rated favorably against three competing alternatives – related searches proposed by Google, cluster labels assigned by the Clusty search engine, and navigational searches proposed by Bing. 1.
The Wisdom of Advertisers: Mining Subgoals via Query Clustering
"... This paper tackles the problem of mining subgoals of a given search goal from data. For example, when a searcher wants to travel to London, she may need to accomplish several subtasks such as “book flights, ” “book a hotel, ” “find good restaurants ” and “decide which sightseeing spots to visit. ” A ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper tackles the problem of mining subgoals of a given search goal from data. For example, when a searcher wants to travel to London, she may need to accomplish several subtasks such as “book flights, ” “book a hotel, ” “find good restaurants ” and “decide which sightseeing spots to visit. ” As another example, if a searcher wants to lose weight, there may exist several alternative solutions such as “do physical exercise, ” “take diet pills, ” and “control calorie intake. ” In this paper, we refer to such subtasks or solutions as subgoals, and propose to utilize sponsored search data for finding subgoals of a given query by means of query clustering. Advertisements (ads) reflect advertisers ’ tremendous efforts in trying to match a given query with implicit user needs. Moreover, ads are usually associated with a particular action or transaction. We therefore hypothesized that they are useful for subgoal mining. To our knowledge, our work is the first to use sponsored search data for this purpose. Our experimental results show that sponsored search data is a good resource for obtaining related queries and for identifying subgoals via query clustering. In particular, our method that combines ad impressions from sponsored search data and query cooccurrences from session data outperforms a state-of-the-art query clustering method that relies on document clicks rather than ad impressions in terms of purity, NMI, Rand Index, F1-measure and subgoal recall.
HITSCIR System in NTCIR-9 Subtopic Mining Task
"... Web queries tend to have multiple user intents. Automatically identifying query intents will benefit search result navigation, search result diversity and personalized search. This paper presents the HITSCIR system in NTCIR-9 subtopic mining task. Firstly, the system collects query intent candidates ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Web queries tend to have multiple user intents. Automatically identifying query intents will benefit search result navigation, search result diversity and personalized search. This paper presents the HITSCIR system in NTCIR-9 subtopic mining task. Firstly, the system collects query intent candidates from multiple resources. Secondly, Affinity Propagation algorithm is applied for clustering these query intent candidates. It could decide the number of clusters automatically. Each cluster has a representative intent candidate called exemplar. Prior preference and heuristic pair-wise preferences could be incorporated in the clustering framework. Finally, the exemplars are ranked by considering each own quality and the popularity of the clusters they represent. The NTCIR-9 evaluation results show that our system could effectively mine query intents with good relevance, diversity and readability.
K2q: Generating natural language questions from keywords with user refinements
- In IJCNLP
, 2011
"... Garbage in and garbage out. A Q&A system must receive a well formulated question that matches the user’s intent or she has no chance to receive satisfactory answers. In this paper, we propose a keywords to questions (K2Q) system to assist a user to articulate and refine questions. K2Q generates ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Garbage in and garbage out. A Q&A system must receive a well formulated question that matches the user’s intent or she has no chance to receive satisfactory answers. In this paper, we propose a keywords to questions (K2Q) system to assist a user to articulate and refine questions. K2Q generates candidate questions and refinement words from a set of input keywords. After specifying some initial keywords, a user receives a list of candidate questions as well as a list of refinement words. The user can then select a satisfactory question, or select a refinement word to generate a new list of candidate questions and refinement words. We propose a User Inquiry Intent (UII) model to describe the joint generation process of keywords and questions for ranking questions, suggesting refinement words, and generating questions that may not have previously appeared. Empirical study shows UII to be useful and effective for the K2Q task. 1
Unsupervised Extraction of Template Structure in Web Search Queries
"... Web search queries are an encoding of the user’s search intent and extracting structured information from them can facilitate central search engine operations like improving the ranking of search results and advertisements. Not surprisingly, this area has attracted a lot of attention in the research ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Web search queries are an encoding of the user’s search intent and extracting structured information from them can facilitate central search engine operations like improving the ranking of search results and advertisements. Not surprisingly, this area has attracted a lot of attention in the research community in the last few years. The problem is, however, made challenging by the fact that search queries tend to be extremely succinct; a condensation of user search needs to the bare-minimum set of keywords. In this paper we consider the problem of extracting, with no manual intervention, the hidden structure behind the observed search queries in a domain: the origins of the constituent keywords as well as the manner the individual keywords are assembled together. We formalize important properties of the problem and then give a principled solution based on generative models that satisfies these properties. Using manually labeled data we show that the query templates extracted by our solution are superior to those discovered by strong baseline methods. The query templates extracted by our approach have potential uses in many search engine tasks; query answering, advertisement matching and targeting, to name a few. In this paper we study one such task, estimating Query-Advertisability, and empirically demonstrate that using extracted template information can improve performance over and above the current state-of-the-art. Categories and Subject Descriptors:
Extending Faceted Search to the General Web
"... Faceted search helps users by offering drill-down options as a complement to the keyword input box, and it has been used successfully for many vertical applications, including e-commerce and digital libraries. However, this idea is not well explored for general web search, even though it holds great ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Faceted search helps users by offering drill-down options as a complement to the keyword input box, and it has been used successfully for many vertical applications, including e-commerce and digital libraries. However, this idea is not well explored for general web search, even though it holds great potential for assisting multi-faceted queries and exploratory search. In this paper, we explore this potential by extend-ing faceted search into the open-domain web setting, which we call Faceted Web Search. To tackle the heterogeneous nature of the web, we propose to use query-dependent au-tomatic facet generation, which generates facets for a query instead of the entire corpus. To incorporate user feedback on these query facets into document ranking, we investigate both Boolean filtering and soft ranking models. We evalu-ate Faceted Web Search systems by their utility in assist-ing users to clarify search intent and find subtopic informa-tion. We describe how to build reusable test collections for such tasks, and propose an evaluation method that considers both gain and cost for users. Our experiments testify to the potential of Faceted Web Search, and show Boolean filter-ing feedback models, which are widely used in conventional faceted search, are less effective than soft ranking models.
Heterogeneous Graph-Based Intent Learning with Queries, Web Pages and Wikipedia Concepts
"... The problem of learning user search intents has attracted intensive attention from both industry and academia. How-ever, state-of-the-art intent learning algorithms suffer from different drawbacks when only using a single type of da-ta source. For example, query text has difficulty in distin-guishin ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
The problem of learning user search intents has attracted intensive attention from both industry and academia. How-ever, state-of-the-art intent learning algorithms suffer from different drawbacks when only using a single type of da-ta source. For example, query text has difficulty in distin-guishing ambiguous queries; search log is bias to the order of search results and users ’ noisy click behaviors. In this work, we for the first time leverage three types of objects, namely queries, web pages and Wikipedia concepts collaboratively for learning generic search intents and construct a hetero-geneous graph to represent multiple types of relationships between them. A novel unsupervised method called hetero-geneous graph-based soft-clustering is developed to derive an intent indicator for each object based on the constructed het-erogeneous graph. With the proposed co-clustering method, one can enhance the quality of intent understanding by tak-ing advantage of different types of data, which complement each other, and make the implicit intents easier to interpret with explicit knowledge from Wikipedia concepts. Experi-ments on two real-world datasets demonstrate the power of the proposed method where it achieves a 9.25 % improve-ment in terms of NDCG on search ranking task and a 4.67% enhancement in terms of Rand index on object co-clustering task compared to the best state-of-the-art method.