Results 1 - 10
of
87
Learning query intent from regularized click graphs
- In SIGIR 2008
, 2008
"... This work presents the use of click graphs in improving query intent classifiers, which are critical if vertical search and general-purpose search services are to be offered in a unified user interface. Previous works on query classification have primarily focused on improving feature representation ..."
Abstract
-
Cited by 114 (12 self)
- Add to MetaCart
(Show Context)
This work presents the use of click graphs in improving query intent classifiers, which are critical if vertical search and general-purpose search services are to be offered in a unified user interface. Previous works on query classification have primarily focused on improving feature representation of queries, e.g., by augmenting queries with search engine results. In this work, we investigate a completely orthogonal approach — instead of enriching feature representation, we aim at drastically increasing the amounts of training data by semi-supervised learning with click graphs. Specifically, we infer class memberships of unlabeled queries from those of labeled ones according to their proximities in a click graph. Moreover, we regularize the learning with click graphs by content-based classification to avoid propagating erroneous labels. We demonstrate the effectiveness of our algorithms in two different applications, product intent and job intent classification. In both cases, we expand the training data with automatically labeled queries by over two orders of magnitude, leading to significant improvements in classification performance. An additional finding is that with a large amount of training data obtained in this fashion, classifiers using only query words/phrases as features can work remarkably well.
Robust Classification of Rare Queries Using Web Knowledge Search
- 179 ― Proceedings of NTCIR-9 Workshop Meeting
, 2007
"... We propose a methodology for building a practical robust query classification system that can identify thousands of query classes with reasonable accuracy, while dealing in real-time with the query volume of a commercial web search en-gine. We use a blind feedback technique: given a query, we determ ..."
Abstract
-
Cited by 68 (11 self)
- Add to MetaCart
(Show Context)
We propose a methodology for building a practical robust query classification system that can identify thousands of query classes with reasonable accuracy, while dealing in real-time with the query volume of a commercial web search en-gine. We use a blind feedback technique: given a query, we determine its topic by classifying the web search results retrieved by the query. Motivated by the needs of search ad-vertising, we primarily focus on rare queries, which are the hardest from the point of view of machine learning, yet in ag-gregation account for a considerable fraction of search engine traffic. Empirical evaluation confirms that our methodology yields a considerably higher classification accuracy than pre-viously reported. We believe that the proposed methodology will lead to better matching of online ads to rare queries and overall to a better user experience.
Sources of Evidence for Vertical Selection
"... Web search providers often include search services for domainspecific subcollections, called verticals, such as news, images, videos, job postings, company summaries, and artist profiles. We address the problem of vertical selection, predicting relevant verticals (if any) for queries issued to the s ..."
Abstract
-
Cited by 59 (14 self)
- Add to MetaCart
(Show Context)
Web search providers often include search services for domainspecific subcollections, called verticals, such as news, images, videos, job postings, company summaries, and artist profiles. We address the problem of vertical selection, predicting relevant verticals (if any) for queries issued to the search engine’s main web search page. In contrast to prior query classification and resource selection tasks, vertical selection is associated with unique resources that can inform the classification decision. We focus on three sources of evidence: (1) the query string, from which features are derived independent of external resources, (2) logs of queries previously issued directly to the vertical, and (3) corpora representative of vertical content. We focus on 18 different verticals, which differ in terms of semantics, media type, size, and level of query traffic. We compare our method to prior work in federated search and retrieval effectiveness prediction. An in-depth error analysis reveals unique challenges across different verticals and provides insight into vertical selection for future work.
Understanding User’s Query Intent with Wikipedia
- WWW 2009 MADRID! TRACK: SEARCH / SESSION: QUERY CATEGORIZATION
, 2009
"... Understanding the intent behind a user’s query can help search engine to automatically route the query to some corresponding vertical search engines to obtain particularly relevant contents, thus, greatly improving user satisfaction. There are three major challenges to the query intent classificatio ..."
Abstract
-
Cited by 57 (3 self)
- Add to MetaCart
Understanding the intent behind a user’s query can help search engine to automatically route the query to some corresponding vertical search engines to obtain particularly relevant contents, thus, greatly improving user satisfaction. There are three major challenges to the query intent classification problem: (1) Intent representation; (2) Domain coverage and (3) Semantic interpretation. Current approaches to predict the user’s intent mainly utilize machine learning techniques. However, it is difficult and often requires many human efforts to meet all these challenges by the statistical machine learning approaches. In this paper, we propose a general methodology to the problem of query intent classification. With very little human effort, our method can discover large quantities of intent concepts by leveraging Wikipedia, one of the best human knowledge base. The Wikipedia
Search advertising using web relevance feedback
- In Proc 17th. Intl. Conf. on Information and Knowledge Management
, 2008
"... The business of Web search, a $10 billion industry, relies heavily on sponsored search, whereas a few carefully-selected paid advertisements are displayed alongside algorithmic search results. A key technical challenge in sponsored search is to select ads that are relevant for the user’s query. Iden ..."
Abstract
-
Cited by 43 (15 self)
- Add to MetaCart
(Show Context)
The business of Web search, a $10 billion industry, relies heavily on sponsored search, whereas a few carefully-selected paid advertisements are displayed alongside algorithmic search results. A key technical challenge in sponsored search is to select ads that are relevant for the user’s query. Identifying relevant ads is challenging because queries are usually very short, and because users, consciously or not, choose terms intended to lead to optimal Web search results and not to optimal ads. Furthermore, the ads themselves are short and usually formulated to capture the reader’s attention rather than to facilitate query matching. Traditionally, matching of ads to queries employed standard information retrieval techniques using the bag of words approach. Here we propose to go beyond the bag of words, and augment both queries and ads with additional knowledgerich features. We use Web search results initially returned for the query to create a pool of relevant documents. Classifying these documents with respect to an external taxonomy and identifying salient named entities give rise to two new feature types. Empirical evaluation based on over 9,000 query-ad pairwise judgments confirms that using augmented queries produces highly relevant ads. Our methodology also relaxes the requirement for each ad to explicitly specify the exhaustive list of queries (“bid phrases”) that can trigger it.
Classification-enhanced ranking
, 2010
"... Many have speculated that classifying web pages can improve a search engine’s ranking of results. Intuitively results should be more relevant when they match the class of a query. We present a simple framework for classification-enhanced ranking that uses clicks in combination with the classificatio ..."
Abstract
-
Cited by 36 (13 self)
- Add to MetaCart
Many have speculated that classifying web pages can improve a search engine’s ranking of results. Intuitively results should be more relevant when they match the class of a query. We present a simple framework for classification-enhanced ranking that uses clicks in combination with the classification of web pages to derive a class distribution for the query. We then go on to define a variety of features that capture the match between the class distributions of a web page and a query, the ambiguity of a query, and the coverage of a retrieved result relative to a query’s set of classes. Experimental results demonstrate that a ranker learned with these features significantly improves ranking over a competitive baseline. Furthermore, our methodology is agnostic with respect to the classification space and can be used to derive query classes for a variety of different taxonomies.
Query enrichment for web-query classification
- ACMTOIS
, 2006
"... Web search queries are typically short and ambiguous. To classify these queries into certain target categories is a difficult but important problem. In this paper, we present a new technique called query enrichment, which takes a short query and maps it to the intermediate objects. Based on the coll ..."
Abstract
-
Cited by 34 (2 self)
- Add to MetaCart
(Show Context)
Web search queries are typically short and ambiguous. To classify these queries into certain target categories is a difficult but important problem. In this paper, we present a new technique called query enrichment, which takes a short query and maps it to the intermediate objects. Based on the collected intermediate objects, the query is then mapped to the target categories. To build the necessary mapping functions, we use an ensemble of search engines to produce an enrichment of the queries. Our technique was applied to ACM Knowledge-discovery and data mining competition (ACM KDDCUP) in 2005, where we won the championship on all three evaluation metrics (precision, F1 measure, which combines precision and recall together, and creativity, which is judged by the organizers) among a total of 33 teams worldwide. In this paper, we show that, despite the difficulty in an abundance of ambiguous queries and a lack of training data, our query-enrichment technique can solve the problem satisfactorily through a two-phase classification framework. We present a detailed description of our algorithm and experimental evaluation. Our best result of F1 and precision are 42.4 % and 44.4%, respectively, which are 9.6%
A study of mobile search queries in Japan
- In Proc. of the WWW2007
, 2007
"... In this paper we study the characteristics of search queries on mobile phones in Japan, comparing them with previous results of generic search queries in Japan and mobile search queries in the USA. We confirm some results while find some interesting differences on the query distribution, use of the ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
(Show Context)
In this paper we study the characteristics of search queries on mobile phones in Japan, comparing them with previous results of generic search queries in Japan and mobile search queries in the USA. We confirm some results while find some interesting differences on the query distribution, use of the different script languages and query topics. 1.
Varying approaches to topical web query classification
- Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
, 2007
"... Topical classification of web queries has drawn recent interest because of the promise it offers in improving retrieval effectiveness and efficiency. However, much of this promise depends on whether classification is performed before or after the query is used to retrieve documents. We examine two p ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
(Show Context)
Topical classification of web queries has drawn recent interest because of the promise it offers in improving retrieval effectiveness and efficiency. However, much of this promise depends on whether classification is performed before or after the query is used to retrieve documents. We examine two previously unaddressed issues in query classification: pre vs. post-retrieval classification effectiveness and the effect of training explicitly from classified queries vs. bridging a classifier trained using a document taxonomy. Bridging classifiers map the categories of a document taxonomy onto those of a query classification problem to provide sufficient training data. We find that training classifiers explicitly from manually classified queries outperforms the bridged classifier by 48 % in F1 score. Also, a pre-retrieval classifier using only the query terms performs merely 11 % worse than the bridged classifier which requires snippets from retrieved documents.
Adaptation of offline vertical selection predictions in the presence of user feedback
- In SIGIR 2009
, 2009
"... Web search results often integrate content from specialized corpora known as verticals. Given a query, one important aspect of aggregated search is the selection of relevant verticals from a set of candidate verticals. One drawback to previous approaches to vertical selection is that methods have no ..."
Abstract
-
Cited by 20 (11 self)
- Add to MetaCart
(Show Context)
Web search results often integrate content from specialized corpora known as verticals. Given a query, one important aspect of aggregated search is the selection of relevant verticals from a set of candidate verticals. One drawback to previous approaches to vertical selection is that methods have not explicitly modeled user feedback. However, production search systems often record a variety of feedback information. In this paper, we present algorithms for vertical selection which adapt to user feedback. We evaluate algorithms using a novel simulator which models performance of a vertical selector situated in realistic query traffic.