Results 11 - 20
of
176
Ranking with Ordered Weighted Pairwise Classification
"... In ranking with the pairwise classification approach, the loss associated to a predicted ranked list is the mean of the pairwise classification losses. This loss is inadequate for tasks like information retrieval where we prefer ranked lists with high precision on the top of the list. We propose to ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
(Show Context)
In ranking with the pairwise classification approach, the loss associated to a predicted ranked list is the mean of the pairwise classification losses. This loss is inadequate for tasks like information retrieval where we prefer ranked lists with high precision on the top of the list. We propose to optimize a larger class of loss functions for ranking, based on an ordered weighted average (OWA) (Yager, 1988) of the classification losses. Convex OWA aggregation operators range from the max to the mean depending on their weights, and can be used to focus on the top ranked elements as they give more weight to the largest losses. When aggregating hinge losses, the optimization problem is similar to the SVM for interdependent output spaces. Moreover, we show that OWA aggregates of marginbased classification losses have good generalization properties. Experiments on the Letor 3.0 benchmark dataset for information retrieval validate our approach. 1.
Classification-enhanced ranking
, 2010
"... Many have speculated that classifying web pages can improve a search engine’s ranking of results. Intuitively results should be more relevant when they match the class of a query. We present a simple framework for classification-enhanced ranking that uses clicks in combination with the classificatio ..."
Abstract
-
Cited by 36 (13 self)
- Add to MetaCart
Many have speculated that classifying web pages can improve a search engine’s ranking of results. Intuitively results should be more relevant when they match the class of a query. We present a simple framework for classification-enhanced ranking that uses clicks in combination with the classification of web pages to derive a class distribution for the query. We then go on to define a variety of features that capture the match between the class distributions of a web page and a query, the ambiguity of a query, and the coverage of a retrieved result relative to a query’s set of classes. Experimental results demonstrate that a ranker learned with these features significantly improves ranking over a competitive baseline. Furthermore, our methodology is agnostic with respect to the classification space and can be used to derive query classes for a variety of different taxonomies.
Relationship identification for social network discovery
- In AAAI 07: Proceedings of the 22nd National Conference on Artificial Intelligence
, 2007
"... In recent years, informal, online communication has transformed the ways in which we connect and collaborate with friends and colleagues. With millions of individuals communicating online each day, we have a unique opportunity to observe the formation and evolution of roles and relationships in netw ..."
Abstract
-
Cited by 35 (2 self)
- Add to MetaCart
In recent years, informal, online communication has transformed the ways in which we connect and collaborate with friends and colleagues. With millions of individuals communicating online each day, we have a unique opportunity to observe the formation and evolution of roles and relationships in networked groups and organizations. Yet a number of challenges arise when attempting to infer the underlying social network from data that is often ambiguous, incomplete and context-dependent. In this paper, we consider the problem of collaborative network discovery from domains such as intelligence analysis and litigation support where the analyst is attempting to construct a validated representation of the social network. We specifically address the challenge of relationship identification where the objective is to identify relevant communications that substantiate a given social relationship type. We propose a supervised ranking approach to the problem and assess its performance on a manager-subordinate relationship identification task using the Enron email corpus. By exploiting message content, the ranker routinely cues the analyst to relevant communications relationships and message traffic that are indicative of the social relationship.
BoltzRank: Learning to Maximize Expected Ranking Gain
"... Ranking a set of retrieved documents according to their relevance to a query is a popular problem in information retrieval. Methods that learn ranking functions are difficult to optimize, as ranking performance is typically judged by metrics that are not smooth. In this paper we propose a new listwi ..."
Abstract
-
Cited by 35 (5 self)
- Add to MetaCart
(Show Context)
Ranking a set of retrieved documents according to their relevance to a query is a popular problem in information retrieval. Methods that learn ranking functions are difficult to optimize, as ranking performance is typically judged by metrics that are not smooth. In this paper we propose a new listwise approach to learning to rank. Our method creates a conditional probability distribution over rankings assigned to documents for a given query, which permits gradient ascent optimization of the expected value of some performance measure. The rank probabilities take the form of a Boltzmann distribution, based on an energy function that depends on a scoring function composed of individual and pairwise potentials. Including pairwise potentials is a novel contribution, allowing the model to encode regularities in the relative scores of documents; existing models assign scores at test time based only on individual documents, with no pairwise constraints between documents. Experimental results on the LETOR3.0 data set show that our method out-performs existing learning approaches to ranking. 1.
Structured learning for non-smooth ranking losses
- In SIGKDD Conference
, 2008
"... Learning to rank from relevance judgment is an active research area. Itemwise score regression, pairwise preference satisfaction, and listwise structured learning are the major techniques in use. Listwise structured learning has been applied recently to optimize important non-decomposable ranking cr ..."
Abstract
-
Cited by 35 (2 self)
- Add to MetaCart
(Show Context)
Learning to rank from relevance judgment is an active research area. Itemwise score regression, pairwise preference satisfaction, and listwise structured learning are the major techniques in use. Listwise structured learning has been applied recently to optimize important non-decomposable ranking criteria like AUC (area under ROC curve) and MAP (mean average precision). We propose new, almost-lineartime algorithms to optimize for two other criteria widely used to evaluate search systems: MRR (mean reciprocal rank) and NDCG (normalized discounted cumulative gain) in the max-margin structured learning framework. We also demonstrate that, for different ranking criteria, one may need to use different feature maps. Search applications should not be optimized in favor of a single criterion, because they need to cater to a variety of queries. E.g., MRR is best for navigational queries, while NDCG is best for informational queries. A key contribution of this paper is to fold multiple ranking loss functions into a multi-criteria max-margin optimization. The result is a single, robust ranking model that is close to the best accuracy of learners trained on individual criteria. In fact, experiments over the popular LETOR and TREC data sets show that, contrary to conventional wisdom, a test criterion is often not best served by training with the same individual criterion.
Supervised Semantic Indexing
"... Abstract. We present a class of models that are discriminatively trained to directly map from the word content in a query-document or documentdocument pair to a ranking score. Like Latent Semantic Indexing (LSI), our models take account of correlations between words (synonymy, polysemy). However, un ..."
Abstract
-
Cited by 34 (8 self)
- Add to MetaCart
(Show Context)
Abstract. We present a class of models that are discriminatively trained to directly map from the word content in a query-document or documentdocument pair to a ranking score. Like Latent Semantic Indexing (LSI), our models take account of correlations between words (synonymy, polysemy). However, unlike LSI our models are trained with a supervised signal directly on the task of interest, which we argue is the reason for our superior results. We provide an empirical study on Wikipedia documents, using the links to define document-document or query-document pairs, where we obtain state-of-the-art performance using our method. Key words: supervised, semantic indexing, document ranking 1
Inferring and using location metadata to personalize Web search
- In SIGIR’11
, 2011
"... Personalization of search results offers the potential for significant improvements in Web search. Among the many observable user attributes, approximate user location is particularly simple for search engines to obtain and allows personalization even for a first-time Web search user. However, actin ..."
Abstract
-
Cited by 34 (13 self)
- Add to MetaCart
(Show Context)
Personalization of search results offers the potential for significant improvements in Web search. Among the many observable user attributes, approximate user location is particularly simple for search engines to obtain and allows personalization even for a first-time Web search user. However, acting on user location information is difficult, since few Web documents include an address that can be interpreted as constraining the locations where the document is relevant. Furthermore, many Web documents – such as local news stories, lottery results, and sports team fan pages – may not correspond to physical addresses, but the location of the user still plays an important role in document relevance. In this paper, we show how to infer a more general location relevance which uses not only physical location but a more general notion of locations of interest for Web pages. We compute this information using implicit user behavioral data, characterize the most location-centric pages, and show how location information can be incorporated into Web search ranking. Our results show that a substantial fraction of Web search queries can be significantly improved by incorporating location-based features.
Directly optimizing evaluation measures in learning to rank
- In SIGIR ’08: Proceedings of the 31th annual international ACM SIGIR conference on Research and development in information retrieval
, 2008
"... One of the central issues in learning to rank for information retrieval is to develop algorithms that construct ranking models by directly optimizing evaluation measures used in information retrieval such as Mean Average Precision (MAP) and Normalized Discounted Cumulative Gain (NDCG). Several such ..."
Abstract
-
Cited by 33 (8 self)
- Add to MetaCart
(Show Context)
One of the central issues in learning to rank for information retrieval is to develop algorithms that construct ranking models by directly optimizing evaluation measures used in information retrieval such as Mean Average Precision (MAP) and Normalized Discounted Cumulative Gain (NDCG). Several such algorithms including SVM map and AdaRank have been proposed and their effectiveness has been verified. However, the relationships between the algorithms are not clear, and furthermore no comparisons have been conducted between them. In this paper, we conduct a study on the approach of directly optimizing evaluation measures in learning to rank for Information Retrieval (IR). We focus on the methods that minimize loss functions upper bounding the basic loss function defined on the IR measures. We first provide a general framework for the study and analyze the existing algorithms of SVM map and AdaRank within the framework. The framework is based on upper bound analysis and two types of upper bounds are discussed. Moreover, we show that we can derive new algorithms on the basis of this analysis and create one example algorithm called PermuRank. We have also conducted comparisons between SVM map, AdaRank, PermuRank, and conventional methods of Ranking SVM and Rank-Boost, using benchmark datasets. Experimental results show that the methods based on direct optimization of evaluation measures can always outperform conventional methods of Ranking SVM and RankBoost. However, no significant difference exists among the performances of the direct optimization methods themselves.
Modeling the Impact of Short- and Long-Term Behavior on Search Personalization
"... User behavior provides many cues to improve the relevance of search results through personalization. One aspect of user behavior that provides especially strong signals for delivering better relevance is an individual’s history of queries and clicked documents. Previous studies have explored how sho ..."
Abstract
-
Cited by 30 (9 self)
- Add to MetaCart
(Show Context)
User behavior provides many cues to improve the relevance of search results through personalization. One aspect of user behavior that provides especially strong signals for delivering better relevance is an individual’s history of queries and clicked documents. Previous studies have explored how short-term behavior or long-term behavior can be predictive of relevance. Ours is the first study to assess how short-term (session) behavior and long-term (historic) behavior interact, and how each may be used in isolation or in combination to optimally contribute to gains in relevance through search personalization. Our key findings include: historic behavior provides substantial benefits at the start of a search session; short-term session behavior contributes the majority of gains in an extended search session; and the combination of session and historic behavior out-performs using either alone. We also characterize how the relative contribution of each model changes throughout the duration of a session. Our findings have implications for the design of search systems that leverage user behavior to personalize the search experience.
Enhancing Single-document Summarization by Combining RankNet and Third-party Sources
"... We present a new approach to automatic summarization based on neural nets, called NetSum. We extract a set of features from each sentence that helps identify its importance in the document. We apply novel features based on news search query logs and Wikipedia entities. Using the RankNet learning alg ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
(Show Context)
We present a new approach to automatic summarization based on neural nets, called NetSum. We extract a set of features from each sentence that helps identify its importance in the document. We apply novel features based on news search query logs and Wikipedia entities. Using the RankNet learning algorithm, we train a pair-based sentence ranker to score every sentence in the document and identify the most important sentences. We apply our system to documents gathered from CNN.com, where each document includes highlights and an article. Our system significantly outperforms the standard baseline in the ROUGE-1 measure on over 70 % of our document set. 1