Results 1  10
of
24
Label Embedding Trees for Large MultiClass Tasks
"... Multiclass classification becomes challenging at test time when the number of classes is very large and testing against every possible class can become computationally infeasible. This problem can be alleviated by imposing (or learning) a structure over the set of classes. We propose an algorithm f ..."
Abstract

Cited by 85 (2 self)
 Add to MetaCart
Multiclass classification becomes challenging at test time when the number of classes is very large and testing against every possible class can become computationally infeasible. This problem can be alleviated by imposing (or learning) a structure over the set of classes. We propose an algorithm for learning a treestructure of classifiers which, by optimizing the overall tree loss, provides superior accuracy to existing tree labeling methods. We also propose a method that learns to embed labels in a low dimensional space that is faster than nonembedding approaches and has superior accuracy to existing embedding approaches. Finally we combine the two ideas resulting in the label embedding tree that outperforms alternative methods including OnevsRest while being orders of magnitude faster. 1
Analysis of a Classificationbased Policy Iteration Algorithm
"... Wepresentaclassificationbasedpolicyiteration algorithm, called Direct PolicyIteration, and provide its finitesample analysis. Our results state a performance bound in terms of the number of policy improvement steps, the number of rollouts used in each iteration, the capacity of the considered poli ..."
Abstract

Cited by 31 (9 self)
 Add to MetaCart
Wepresentaclassificationbasedpolicyiteration algorithm, called Direct PolicyIteration, and provide its finitesample analysis. Our results state a performance bound in terms of the number of policy improvement steps, the number of rollouts used in each iteration, the capacity of the considered policy space, and a new capacity measure which indicates how well the policy space can approximate policiesthataregreedyw.r.t. anyofitsmembers. The analysis reveals a tradeoff between the estimation and approximation errors in this classificationbased policy iteration setting. We also study the consistency of the method when there exists a sequence of policy spaces with increasing capacity. 1.
Efficient Optimal Learning for Contextual Bandits
"... We address the problem of learning in an online setting where the learner repeatedly observes features x, selects among K actions, and receives reward r for the action taken. We provide the first efficient algorithm with an optimal regret. Our algorithm uses an oracle which returns an optimal policy ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
(Show Context)
We address the problem of learning in an online setting where the learner repeatedly observes features x, selects among K actions, and receives reward r for the action taken. We provide the first efficient algorithm with an optimal regret. Our algorithm uses an oracle which returns an optimal policy given rewards for all actions for each x. The algorithm has running time polylog(N), where N is the number of policies that we compete with. This is exponentially faster than all previous algorithms that achieve optimal regret in this setting. Our formulation also enables us to create an algorithm with regret that is additive rather than multiplicative in feedback delay as in all previous work. 1.
Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity
"... Given a set V of n elements we wish to linearly order them using pairwise preference labels which may be nontransitive (due to irrationality or arbitrary noise). The goal is to linearly order the elements while disagreeing with as few pairwise preference labels as possible. Our performance is measu ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
Given a set V of n elements we wish to linearly order them using pairwise preference labels which may be nontransitive (due to irrationality or arbitrary noise). The goal is to linearly order the elements while disagreeing with as few pairwise preference labels as possible. Our performance is measured by two parameters: The number of disagreements (loss) and the query complexity (number of pairwise preference labels). Our algorithm adaptively queries at most O(n poly(log n, ε−1)) preference labels for a regret of ε times the optimal loss. This is strictly better, and often significantly better than what nonadaptive sampling could achieve. Our main result helps settle an open problem posed by learningtorank (from pairwise information) theoreticians and practitioners: What is a provably correct way to sample preference labels? 1
Machine Learning Techniques—Reductions Between Prediction Quality Metrics
"... Abstract Machine learning involves optimizing a loss function on unlabeled data points given examples of labeled data points, where the loss function measures the performance of a learning algorithm. We give an overview of techniques, called reductions, for converting a problem of minimizing one los ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
Abstract Machine learning involves optimizing a loss function on unlabeled data points given examples of labeled data points, where the loss function measures the performance of a learning algorithm. We give an overview of techniques, called reductions, for converting a problem of minimizing one loss function into a problem of minimizing another, simpler loss function. This tutorial discusses how to create robust reductions that perform well in practice. The reductions discussed here can be used to solve any supervised learning problem with a standard binary classification or regression algorithm available in any machine learning toolkit. We also discuss common design flaws in folklore reductions. 1
Multiclass learnability and the erm principle
 In COLT
, 2011
"... Multiclass learning is an area of growing practical relevance, for which the currently available theory is still far from providing satisfactory understanding. We study the learnability of multiclass prediction, and derive upper and lower bounds on the sample complexity of multiclass hypothesis clas ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
Multiclass learning is an area of growing practical relevance, for which the currently available theory is still far from providing satisfactory understanding. We study the learnability of multiclass prediction, and derive upper and lower bounds on the sample complexity of multiclass hypothesis classes in different learning models: batch/online, realizable/unrealizable, full information/bandit feedback. Our analysis reveals a surprising phenomenon: In the multiclass setting, in sharp contrast to binary classification, not all Empirical Risk Minimization (ERM) algorithms are equally successful. We show that there exist hypotheses classes for which some ERM learners have lower sample complexity than others. Furthermore, there are classes that are learnable by some ERM learners, while other ERM learner will fail to learn them. We propose a principle for designing good ERM learners, and use this principle to prove tight bounds on the sample complexity of learning symmetric multiclass hypothesis classes (that is, classes that are invariant under any permutation of label names). We demonstrate the relevance of the theory by analyzing the sample complexity of two widely used hypothesis classes: generalized linear multiclass models and reduction trees. We also obtain some practically relevant conclusions. 1
Label Partitioning For Sublinear Ranking
"... We consider the case of ranking a very large set of labels, items, or documents, which is common to information retrieval, recommendation, and largescale annotation tasks. We present a general approach for converting an algorithm which has linear time in the size of the set to a sublinear one via l ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
We consider the case of ranking a very large set of labels, items, or documents, which is common to information retrieval, recommendation, and largescale annotation tasks. We present a general approach for converting an algorithm which has linear time in the size of the set to a sublinear one via label partitioning. Our method consists of learning an input partition and a label assignment to each partition of the space such that precision at k is optimized, which is the loss function of interest in this setting. Experiments on largescale ranking and recommendation tasks show that our method not only makes the original linear time algorithm computationally tractable, but can also improve its performance. 1.
A hierarchical classifier applied to multiway sentiment detection
 In COLING 2010: Proceedings of the 23rd International Conference on Computational Linguistics
, 2010
"... This paper considers the problem of documentlevel multiway sentiment detection, proposing a hierarchical classifier algorithm that accounts for the interclass similarity of tagged sentimentbearing texts. This type of classifier also provides a natural mechanism for reducing the feature space of ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
This paper considers the problem of documentlevel multiway sentiment detection, proposing a hierarchical classifier algorithm that accounts for the interclass similarity of tagged sentimentbearing texts. This type of classifier also provides a natural mechanism for reducing the feature space of the problem. Our results show that this approach improves on stateoftheart predictive performance for movie reviews with threestar and fourstar ratings, while simultaneously reducing training times and memory requirements. 1
Discriminative probabilistic prototype learning
"... In this paper we propose a simple yet powerful method for learning representations in supervised learning scenarios where an input datapoint is described by a set of feature vectors and its associated output may be given by soft labels indicating, for example, class probabilities. We represent an ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
In this paper we propose a simple yet powerful method for learning representations in supervised learning scenarios where an input datapoint is described by a set of feature vectors and its associated output may be given by soft labels indicating, for example, class probabilities. We represent an input datapoint as a Kdimensional vector, where each component is a mixture of probabilities over its corresponding set of feature vectors. Each probability indicates how likely a feature vector is to belong to oneoutofK unknown prototype patterns. We propose a probabilistic model that parameterizes these prototype patterns in terms of hidden variables and therefore it can be trained with conventional approaches based on likelihood maximization. More importantly, both the model parameters and the prototype patterns can be learned from data in a discriminative way. We show that our model can be seen as a probabilistic generalization of learning vector quantization (LVQ). We apply our method to the problems of shape classification, hyperspectral imaging classification and people’s work class categorization, showing the superior performance of our method compared to the standard prototypebased classification approach and other competitive benchmarks. 1.
Condensed Filter Tree for CostSensitive MultiLabel Classification
"... Different realworld applications of multilabel classification often demand different evaluation criteria. We formalize this demand with a general setup, costsensitive multilabel classification (CSMLC), which takes the evaluation criteria into account during learning. Nevertheless, most existi ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Different realworld applications of multilabel classification often demand different evaluation criteria. We formalize this demand with a general setup, costsensitive multilabel classification (CSMLC), which takes the evaluation criteria into account during learning. Nevertheless, most existing algorithms can only focus on optimizing a few specific evaluation criteria, and cannot systematically deal with different ones. In this paper, we propose a novel algorithm, called condensed filter tree (CFT), for optimizing any criteria in CSMLC. CFT is derived from reducing CSMLC to the famous filter tree algorithm for costsensitive multiclass classification via constructing the label powerset. We successfully cope with the difficulty of having exponentially many extendedclasses within the powerset for representation, training and prediction by carefully designing the tree structure and focusing on the key nodes. Experimental results across many realworld datasets validate that CFT is competitive with special purpose algorithms on special criteria and reaches better performance on general criteria. 1.