Results 1 -
6 of
6
The K-armed Dueling Bandits Problem
"... We study a partial-information online-learning problem where actions are restricted to noisy comparisons between pairs of strategies (also known as bandits). In contrast to conventional approaches that require the absolute reward of the chosen strategy to be quantifiable and observable, our setting ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
We study a partial-information online-learning problem where actions are restricted to noisy comparisons between pairs of strategies (also known as bandits). In contrast to conventional approaches that require the absolute reward of the chosen strategy to be quantifiable and observable, our setting assumes only that (noisy) binary feedback about the relative reward of two chosen strategies is available. This type of relative feedback is particularly appropriate in applications where absolute rewards have no natural scale or are difficult to measure (e.g., user-perceived quality of a set of retrieval results, taste of food, product attractiveness), but where pairwise comparisons are easy to make. We propose a novel regret formulation in this setting, as well as present an algorithm that achieves (almost) information-theoretically optimal regret bounds (up to a constant factor). 1
Online learning for recency search ranking using real-time user feedback
- In CIKM
, 2010
"... Traditional machine-learned ranking algorithms for web search are trained in batch mode, which assume static relevance of documents for a given query. Although such a batch-learning framework has been tremendously successful in commercial search engines, in scenarios where relevance of documents to ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Traditional machine-learned ranking algorithms for web search are trained in batch mode, which assume static relevance of documents for a given query. Although such a batch-learning framework has been tremendously successful in commercial search engines, in scenarios where relevance of documents to a query changes over time, such as ranking recent documents for a breaking news query, the batch-learned ranking functions do have limitations. Users ’ real-time click feedback becomes a better and timely proxy for the varying relevance of documents rather than the editorial judgments provided by human editors. In this paper, we propose an online learning algorithm that can quickly learn the best reranking of the top portion of the original ranked list based on real-time users ’ click feedback. In order to devise our algorithm and evaluate it accurately, we collected exploration bucket data that removes positional biases on clicks on the documents for recency-classified queries. Our initial experimental result shows that our scheme is more capable of quickly adjusting the ranking to track the varying relevance of documents reflected in the click feedback, compared to batch-trained ranking functions.
NEW LEARNING FRAMEWORKS FOR INFORMATION RETRIEVAL
, 2011
"... Recent advances in machine learning have enabled the training of increasingly complex information retrieval models. This dissertation proposes principled approaches to formalize the learning problems for information retrieval, with an eye towards developing a unified learning framework. This will co ..."
Abstract
- Add to MetaCart
Recent advances in machine learning have enabled the training of increasingly complex information retrieval models. This dissertation proposes principled approaches to formalize the learning problems for information retrieval, with an eye towards developing a unified learning framework. This will conceptually simplify the overall development process, making it easier to reason about higher level goals and properties of the retrieval system. This dissertation advocates two complementary approaches, structured prediction and interactive learning, to learn feature-rich retrieval models that can perform well in practice.
Research Statement of Yisong Yue Objective
"... My core research interests lie in statistical machine learning, with a primary application focus in the field of information retrieval and access. In particular, I am interested in developing principled learning methods with theoretical foundations that will not only lead to practical systems of imm ..."
Abstract
- Add to MetaCart
My core research interests lie in statistical machine learning, with a primary application focus in the field of information retrieval and access. In particular, I am interested in developing principled learning methods with theoretical foundations that will not only lead to practical systems of immediate benefit, but also push our ability to reason about the increasingly sophisticated information systems of the future. More broadly, I am interested in developing general learning approaches that can be applied to automate prediction tasks in a wide range of application domains. Managing digital information is a growing problem in every application domain, ranging from integrating biological data, browsing digital libraries, organizing personal content, searching in specialty domains, or filtering news feeds or Twitter updates. Current design methodologies are labor intensive and require extensive hands-on expertise. This inherently limits the scope and reasoning power of the systems that we can efficiently deploy today. Moving forward, I am particularly interested in developing and applying new machine learning approaches. Existing learning approaches have proven to be invaluable with their ability to combine coarse human feedback (e.g., “this document is relevant”) with statistical regularities of the prediction domain in order to derive effective models. This is evidenced by their widespread commercial adoption, and I am convinced that progress in
Online Learning with Preference Feedback
"... We propose a new online learning model for learning with preference feedback. The model is especially suited for applications like web search and recommender systems, where preference data is readily available from implicit user feedback (e.g. clicks). In particular, at each time step a potentially ..."
Abstract
- Add to MetaCart
We propose a new online learning model for learning with preference feedback. The model is especially suited for applications like web search and recommender systems, where preference data is readily available from implicit user feedback (e.g. clicks). In particular, at each time step a potentially structured object (e.g. a ranking) is presented to the user in response to a context (e.g. query), providing him or her with some unobserved amount of utility. As feedback the algorithm receives an improved object that would have provided higher utility. We propose a learning algorithm with provable regret bounds for this online learning setting and demonstrate its effectiveness on a web-search application. The new learning model also applies to many other interactive learning problems and admits several interesting extensions. 1
Large-Scale Validation and Analysis of Interleaved Search Evaluation
"... Interleaving is an increasingly popular technique for evaluating information retrieval systems based on implicit user feedback. While a number of isolated studies have analyzed how this technique agrees with conventional offline evaluation approaches and other online techniques, a complete picture o ..."
Abstract
- Add to MetaCart
Interleaving is an increasingly popular technique for evaluating information retrieval systems based on implicit user feedback. While a number of isolated studies have analyzed how this technique agrees with conventional offline evaluation approaches and other online techniques, a complete picture of its efficiency and effectiveness is still lacking. In this paper we extend and combine the body of empirical evidence regarding interleaving, and provide a comprehensive analysis of interleaving using data from two major commercial search engines and a retrieval system for scientific literature. In particular, we analyze the agreement of interleaving with manual relevance judgments and observational implicit feedback measures, estimate the statistical efficiency of interleaving, and explore the relative performance of different interleaving variants. We also show how to learn improved credit-assignment functions for clicks that further increase the sensitivity of interleaving.

