Results 1  10
of
42
Bayesian inference for PlackettLuce ranking models
"... This paper gives an efficient Bayesian method for inferring the parameters of a PlackettLuce ranking model. Such models are parameterised distributions over rankings of a finite set of objects, and have typically been studied and applied within the psychometric, sociometric and econometric literatu ..."
Abstract

Cited by 32 (0 self)
 Add to MetaCart
(Show Context)
This paper gives an efficient Bayesian method for inferring the parameters of a PlackettLuce ranking model. Such models are parameterised distributions over rankings of a finite set of objects, and have typically been studied and applied within the psychometric, sociometric and econometric literature. The inference scheme is an application of Power EP (expectation propagation). The scheme is robust and can be readily applied to large scale data sets. The inference algorithm extends to variations of the basic PlackettLuce model, including partial rankings. We show a number of advantages of the EP approach over the traditional maximum likelihood method. We apply the method to aggregate rankings of NASCAR racing drivers over the 2002 season, and also to rankings of movie genres. 1.
Pairwise Ranking Aggregation in a Crowdsourced Setting
"... Inferring rankings over elements of a set of objects, such as documents or images, is a key learning problem for such important applications as Web search and recommender systems. Crowdsourcing services provide an inexpensive and efficient means to acquire preferences over objects via labeling by se ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
(Show Context)
Inferring rankings over elements of a set of objects, such as documents or images, is a key learning problem for such important applications as Web search and recommender systems. Crowdsourcing services provide an inexpensive and efficient means to acquire preferences over objects via labeling by sets of annotators. We propose a new model to predict a goldstandard ranking that hinges on combining pairwise comparisons via crowdsourcing. In contrast to traditional ranking aggregation methods, the approach learns about and folds into consideration the quality of contributions of each annotator. In addition, we minimize the cost of assessment by introducing a generalization of the traditional active learning scenario to jointly select the annotator and pair to assess while taking into account the annotator quality, the uncertainty over ordering of the pair, and the current model uncertainty. We formalize this as an active learning strategy that incorporates an explorationexploitation tradeoff and implement it using an efficient online Bayesian updating scheme. Using simulated and realworld data, we demonstrate that the active learning strategy achieves significant reductions in labeling cost while maintaining accuracy.
TrueSkill Through Time: Revisiting the History of Chess
"... We extend the Bayesian skill rating system TrueSkill to infer entire time series of skills of players by smoothing through time instead of filtering. The skill of each participating player, say, every year is represented by a latent skill variable which is affected by the relevant game outcomes that ..."
Abstract

Cited by 27 (3 self)
 Add to MetaCart
(Show Context)
We extend the Bayesian skill rating system TrueSkill to infer entire time series of skills of players by smoothing through time instead of filtering. The skill of each participating player, say, every year is represented by a latent skill variable which is affected by the relevant game outcomes that year, and coupled with the skill variables of the previous and subsequent year. Inference in the resulting factor graph is carried out by approximate message passing (EP) along the time series of skills. As before the system tracks the uncertainty about player skills, explicitly models draws, can deal with any number of competing entities and can infer individual skills from team results. We extend the system to estimate playerspecific draw margins. Based on these models we present an analysis of the skill curves of important players in the history of chess over the past 150 years. Results include plots of players ’ lifetime skill development as well as the ability to compare the skills of different players across time. Our results indicate that a) the overall playing strength has increased over the past 150 years, and b) that modelling a player’s ability to force a draw provides significantly better predictive power. 1
Error Correcting Tournaments
, 2008
"... Abstract. We present a family of adaptive pairwise tournaments that are provably robust against large error fractions when used to determine the largest element in a set. The tournaments use nk pairwise comparisons but have only O(k + log n) depth, where n is the number of players and k is the robus ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
(Show Context)
Abstract. We present a family of adaptive pairwise tournaments that are provably robust against large error fractions when used to determine the largest element in a set. The tournaments use nk pairwise comparisons but have only O(k + log n) depth, where n is the number of players and k is the robustness parameter (for reasonable values of n and k). These tournaments also give a reduction from multiclass to binary classification in machine learning, yielding the best known analysis for the problem. 1
Measure Transformer Semantics for Bayesian Machine Learning
"... Abstract. The Bayesian approach to machine learning amounts to inferring posterior distributions of random variables from a probabilistic model of how the variables are related (that is, a prior distribution) and a set of observations of variables. There is a trend in machine learning towards expres ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
(Show Context)
Abstract. The Bayesian approach to machine learning amounts to inferring posterior distributions of random variables from a probabilistic model of how the variables are related (that is, a prior distribution) and a set of observations of variables. There is a trend in machine learning towards expressing Bayesian models as probabilistic programs. As a foundation for this kind of programming, we propose a core functional calculus with primitives for sampling prior distributions and observing variables. We define combinators for measure transformers, based on theorems in measure theory, and use these to give a rigorous semantics to our core calculus. The original features of our semantics include its support for discrete, continuous, and hybrid measures, and, in particular, for observations of zeroprobability events. We compile our core language to a small imperative language that has a straightforward semantics via factor graphs, data structures that enable many efficient inference algorithms. We use an existing inference engine for efficient approximate inference of posterior marginal distributions, treating thousands of observations per second for large instances of realistic models. 1
Target Assistance for Subtly Balancing Competitive Play
, 2011
"... In games where skills such as targeting are critical to winning, it is difficult for players with different skill levels to have a competitive and engaging experience. Although several mechanisms for accommodating different skill levels have been proposed, traditional approaches can be too obvious a ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
In games where skills such as targeting are critical to winning, it is difficult for players with different skill levels to have a competitive and engaging experience. Although several mechanisms for accommodating different skill levels have been proposed, traditional approaches can be too obvious and can change the nature of the game. For games involving aiming, we propose the use of target assistance techniques (such as area cursors, target gravity, and sticky targets) to accommodate skill imbalances. We compared three techniques in a study, and found that area cursors and target gravity significantly reduced score differential in a shootinggallery game. Further, less skilled players reported having more fun when the techniques helped them be more competitive, and even after they learned assistance was given, felt that this form of balancing was good for group gameplay. Our results show that target assistance techniques can make targetbased games more competitive for shared play.
Learning Consensus Opinion: Mining Data from a Labeling Game
, 2009
"... We consider the problem of identifying the consensus ranking for the results of a query, given preferences among those results from a set of individual users. Once consensus rankings are identified for a set of queries, these rankings can serve for both evaluation and training of retrieval and learn ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
We consider the problem of identifying the consensus ranking for the results of a query, given preferences among those results from a set of individual users. Once consensus rankings are identified for a set of queries, these rankings can serve for both evaluation and training of retrieval and learning systems. We present a novel approach to collecting the individual user preferences over imagesearch results: we use a collaborative game in which players are rewarded for agreeing on which image result is best for a query. Our approach is distinct from other labeling games because we are able to elicit directly the preferences of interest with respect to image queries extracted from query logs. As a source of relevance judgments, this data provides a useful complement to click data. Furthermore, the data is free of positional biases and is collected by the game without the risk of frustrating users with nonrelevant results; this risk is prevalent in standard mechanisms for debiasing clicks. We describe data collected over 34 days from a deployed version of this game that amounts to about 18 million expressed preferences between pairs. Finally, we present several approaches to modeling this data in order to extract the consensus rankings from the preferences and better sort the search results for targeted queries.
Ranking Mechanisms in Twitterlike Forums
"... We study the problem of designing a mechanism to rank items in forums by making use of the user reviews such as thumb and star ratings. We compare mechanisms where forum users rate individual posts and also mechanisms where the user is asked to perform a pairwise comparison and state which one is be ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
We study the problem of designing a mechanism to rank items in forums by making use of the user reviews such as thumb and star ratings. We compare mechanisms where forum users rate individual posts and also mechanisms where the user is asked to perform a pairwise comparison and state which one is better. The main metric used to evaluate a mechanism is the ranking accuracy vs the cost of reviews, where the cost is measured as the average number of reviews used per post. We show that for many reasonable probability models, there is no thumb (or star) based ranking mechanism that can produce approximately accurate rankings with bounded number of reviews per item. On the other hand we provide a review mechanism based on pairwise comparisons which achieves approximate rankings with bounded cost. We have implemented a system, shoutvelocity [5], which is a twitterlike forum but items (i.e., tweets in Twitter) are rated by using comparisons. For each new item the user who posts the item is required to compare two previous entries. This ensures that over a sequence of n posts, we get at least n comparisons requiring one review per item on average. Our mechanism uses this sequence of comparisons to obtain a ranking estimate. It ensures that every item is reviewed at least once and winning entries are reviewed more often to obtain better estimates of top items.
Realtime Multiattribute Bayesian Preference Elicitation with Pairwise Comparison Queries
"... Preference elicitation (PE) is an important component of interactive decision support systems that aim to make optimal recommendations to users by actively querying their preferences. In this paper, we outline five principles important for PE in realworld problems: (1) realtime, (2) multiattribute, ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
(Show Context)
Preference elicitation (PE) is an important component of interactive decision support systems that aim to make optimal recommendations to users by actively querying their preferences. In this paper, we outline five principles important for PE in realworld problems: (1) realtime, (2) multiattribute, (3) low cognitive load, (4) robust to noise, and (5) scalable. In light of these requirements, we introduce an approximate PE framework based on TrueSkill for performing efficient closedform Bayesian updates and query selection for a multiattribute utility belief state — a novel PE approach that naturally facilitates the efficient evaluation of value of information (VOI) heuristics for use in query selection strategies. Our best VOI query strategy satisfies all five principles (in contrast to related work) and performs on par with the most accurate (and often computationally intensive) algorithms on experiments with synthetic and realworld datasets. 1
Hipster wars: Discovering elements of fashion styles,” ECCV
, 2014
"... Abstract. The clothing we wear and our identities are closely tied, revealing to the world clues about our wealth, occupation, and socioidentity. In this paper we examine questions related to what our clothing reveals about our personal style. We first design an online competitive Style Rating Game ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
Abstract. The clothing we wear and our identities are closely tied, revealing to the world clues about our wealth, occupation, and socioidentity. In this paper we examine questions related to what our clothing reveals about our personal style. We first design an online competitive Style Rating Game called Hipster Wars to crowd source reliable human judgments of style. We use this game to collect a new dataset of clothing outfits with associated style ratings for 5 style categories: hipster, bohemian, pinup, preppy, and goth. Next, we train models for betweenclass and withinclass classification of styles. Finally, we explore methods to identify clothing elements that are generally discriminative for a style, and methods for identifying items in a particular outfit that may indicate a style. 1