Results 1  10
of
37
MultiLabel Prediction via Compressed Sensing
, 902
"... We consider multilabel prediction problems with large output spaces under the assumption of output sparsity – that the target vectors have small support. We develop a general theory for a variant of the popular ECOC (error correcting output code) scheme, based on ideas from compressed sensing for e ..."
Abstract

Cited by 96 (3 self)
 Add to MetaCart
(Show Context)
We consider multilabel prediction problems with large output spaces under the assumption of output sparsity – that the target vectors have small support. We develop a general theory for a variant of the popular ECOC (error correcting output code) scheme, based on ideas from compressed sensing for exploiting this sparsity. The method can be regarded as a simple reduction from multilabel regression problems to binary regression problems. It is shown that the number of subproblems need only be logarithmic in the total number of label values, making this approach radically more efficient than others. We also state and prove performance guarantees for this method, and test it empirically. 1.
Label Embedding Trees for Large MultiClass Tasks
"... Multiclass classification becomes challenging at test time when the number of classes is very large and testing against every possible class can become computationally infeasible. This problem can be alleviated by imposing (or learning) a structure over the set of classes. We propose an algorithm f ..."
Abstract

Cited by 85 (2 self)
 Add to MetaCart
(Show Context)
Multiclass classification becomes challenging at test time when the number of classes is very large and testing against every possible class can become computationally infeasible. This problem can be alleviated by imposing (or learning) a structure over the set of classes. We propose an algorithm for learning a treestructure of classifiers which, by optimizing the overall tree loss, provides superior accuracy to existing tree labeling methods. We also propose a method that learns to embed labels in a low dimensional space that is faster than nonembedding approaches and has superior accuracy to existing embedding approaches. Finally we combine the two ideas resulting in the label embedding tree that outperforms alternative methods including OnevsRest while being orders of magnitude faster. 1
Relating reinforcement learning performance to classification performance
 In 22nd International Conference on Machine Learning (ICML
, 2005
"... We prove a quantitative connection between the expected sum of rewards of a policy and binary classification performance on created subproblems. This connection holds without any unobservable assumptions (no assumption of independence, small mixing time, fully observable states, or even hidden state ..."
Abstract

Cited by 38 (4 self)
 Add to MetaCart
(Show Context)
We prove a quantitative connection between the expected sum of rewards of a policy and binary classification performance on created subproblems. This connection holds without any unobservable assumptions (no assumption of independence, small mixing time, fully observable states, or even hidden states) and the resulting statement is independent of the number of states or actions. The statement is critically dependent on the size of the rewards and prediction performance of the created classifiers. We also provide some general guidelines for obtaining good classification performance on the created subproblems. In particular, we discuss possible methods for generating training examples for a classifier learning algorithm. 1.
Information, Divergence and Risk for Binary Experiments
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2009
"... We unify fdivergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROCcurves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all ..."
Abstract

Cited by 37 (8 self)
 Add to MetaCart
We unify fdivergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROCcurves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all are related to costsensitive binary classification. As well as developing relationships between generative and discriminative views of learning, the new machinery leads to tight and more general surrogate regret bounds and generalised Pinsker inequalities relating fdivergences to variational divergence. The new viewpoint also illuminates existing algorithms: it provides a new derivation of Support Vector Machines in terms of divergences and relates Maximum Mean Discrepancy to Fisher Linear Discriminants.
Error Correcting Tournaments
, 2008
"... Abstract. We present a family of adaptive pairwise tournaments that are provably robust against large error fractions when used to determine the largest element in a set. The tournaments use nk pairwise comparisons but have only O(k + log n) depth, where n is the number of players and k is the robus ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
(Show Context)
Abstract. We present a family of adaptive pairwise tournaments that are provably robust against large error fractions when used to determine the largest element in a set. The tournaments use nk pairwise comparisons but have only O(k + log n) depth, where n is the number of players and k is the robustness parameter (for reasonable values of n and k). These tournaments also give a reduction from multiclass to binary classification in machine learning, yielding the best known analysis for the problem. 1
Conditional Probability Tree Estimation Analysis and Algorithms
"... 1.1 Main Results We consider the problem of estimating the conditional probability of a label in time O(log n), where n is the number of possible labels. We analyze a natural reduction of this problem to a set of binary regression problems organized in a tree structure, proving a regret bound that s ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
(Show Context)
1.1 Main Results We consider the problem of estimating the conditional probability of a label in time O(log n), where n is the number of possible labels. We analyze a natural reduction of this problem to a set of binary regression problems organized in a tree structure, proving a regret bound that scales with the depth of the tree. Motivated by this analysis, we propose the first online algorithm which provably constructs a logarithmic depth tree on the set of labels to solve this problem. We test the algorithm empirically, showing that it works succesfully on a dataset with roughly 10 6 labels. 1
The Offset Tree for Learning with Partial Labels
"... We present an algorithm, called the offset tree, for learning in situations where a loss associated with different decisions is not known, but was randomly probed. The algorithm is an optimal reduction from this problem to binary classification. In particular, it has regret at most (k −1) times the ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
We present an algorithm, called the offset tree, for learning in situations where a loss associated with different decisions is not known, but was randomly probed. The algorithm is an optimal reduction from this problem to binary classification. In particular, it has regret at most (k −1) times the regret of the binary classifier it uses, where k is the number of decisions, and no reduction to binary classification can do better. We test the offset tree empirically and discover that it generally results in superior (or equal) performance, compared to several plausible alternative approaches. 1
Contextual Sequence Prediction with Application to Control Library Optimization
"... Abstract—Sequence optimization, where the items in a list are ordered to maximize some reward has many applications such as web advertisement placement, search, and control libraries in robotics. Previous work in sequence optimization produces a static ordering that does not take any features of the ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Sequence optimization, where the items in a list are ordered to maximize some reward has many applications such as web advertisement placement, search, and control libraries in robotics. Previous work in sequence optimization produces a static ordering that does not take any features of the item or context of the problem into account. In this work, we propose a general approach to order the items within the sequence based on the context (e.g., perceptual information, environment description, and goals). We take a simple, efficient, reductionbased approach where the choice and order of the items is established by repeatedly learning simple classifiers or regressors for each “slot ” in the sequence. Our approach leverages recent work on submodular function maximization to provide a formal regret reduction from submodular sequence optimization to simple costsensitive prediction. We apply our contextual sequence prediction algorithm to optimize control libraries and demonstrate results on two robotics problems: manipulator trajectory prediction and mobile robot path planning. I.
Machine Learning Techniques—Reductions Between Prediction Quality Metrics
"... Abstract Machine learning involves optimizing a loss function on unlabeled data points given examples of labeled data points, where the loss function measures the performance of a learning algorithm. We give an overview of techniques, called reductions, for converting a problem of minimizing one los ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
Abstract Machine learning involves optimizing a loss function on unlabeled data points given examples of labeled data points, where the loss function measures the performance of a learning algorithm. We give an overview of techniques, called reductions, for converting a problem of minimizing one loss function into a problem of minimizing another, simpler loss function. This tutorial discusses how to create robust reductions that perform well in practice. The reductions discussed here can be used to solve any supervised learning problem with a standard binary classification or regression algorithm available in any machine learning toolkit. We also discuss common design flaws in folklore reductions. 1
Correlation Clustering Revisited: The “True ” Cost of Error Minimization Problems
"... Correlation Clustering was defined by Bansal, Blum, and Chawla as the problem of clustering a set of elements based on a possibly inconsistent binary similarity function between element pairs. Their setting is agnostic in the sense that a ground truth clustering is not assumed to exist, and the only ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Correlation Clustering was defined by Bansal, Blum, and Chawla as the problem of clustering a set of elements based on a possibly inconsistent binary similarity function between element pairs. Their setting is agnostic in the sense that a ground truth clustering is not assumed to exist, and the only reasonable way to measure the cost of a solution is by comparing it with the input similarity function. This problem has been studied in theory and application and has been subsequently proven to be APXHard. In this work we assume that there does exist an unknown correct clustering of the data. This is the case in applications such as record linkage in databases. In this setting, we argue that it is more reasonable to measure accuracy of the output clustering against the unknown underlying true clustering. This corresponds to the intuition that in real life an action is penalized or rewarded based on reality and not on our noisy perception thereof. The traditional combinatorial optimization version of the problem only offers an indirect solution to our revisited version via a triangle inequality argument applied to the distances between the output clustering, the input similarity function and the underlying ground truth. In the revisited version, we show that it is possible to shortcut the traditional optimization detour and obtain a factor 2 approximation. This factor could not have possibly been obtained by using a solution