Results 1  10
of
71
On the consistency of multiclass classification methods
 In Proceedings of the 18th Conference on Computational Learning Theory (COLT
, 2005
"... Binary classification is a well studied special case of the classification problem. Statistical properties of binary classifiers, such as consistency, have been investigated in a variety of settings. Binary classification methods can be generalized in many ways to handle multiple classes. It turns o ..."
Abstract

Cited by 67 (2 self)
 Add to MetaCart
(Show Context)
Binary classification is a well studied special case of the classification problem. Statistical properties of binary classifiers, such as consistency, have been investigated in a variety of settings. Binary classification methods can be generalized in many ways to handle multiple classes. It turns out that one can lose consistency in generalizing a binary classification method to deal with multiple classes. We study a rich family of multiclass methods and provide a necessary and sufficient condition for their consistency. We illustrate our approach by applying it to some multiclass methods proposed in the literature.
Learning from Ambiguously Labeled Images
"... In many image and video collections, we have access only to partially labeled data. For example, personal photo collections often contain several faces per image and a caption that only specifies who is in the picture, but not which name matches which face. Similarly, movie screenplays can tell us w ..."
Abstract

Cited by 50 (4 self)
 Add to MetaCart
(Show Context)
In many image and video collections, we have access only to partially labeled data. For example, personal photo collections often contain several faces per image and a caption that only specifies who is in the picture, but not which name matches which face. Similarly, movie screenplays can tell us who is in the scene, but not when and where they are on the screen. We formulate the learning problem in this setting as partiallysupervised multiclass classification where each instance is labeled ambiguously with more than one label. We show theoretically that effective learning is possible under reasonable assumptions even when all the data is weakly labeled. Motivated by the analysis, we propose a general convex learning formulation based on minimization of a surrogate loss appropriate for the ambiguous label setting. We apply our framework to identifying faces culled from web news sources and to naming characters in TV series and movies. We experiment on a very large dataset consisting of 100 hours of video, and in particular achieve 6 % error for character naming on 16 episodes of LOST. 1.
Statistical analysis of Bayes optimal subset ranking
 IEEE Transactions on Information Theory
, 2008
"... Abstract—The ranking problem has become increasingly important in modern applications of statistical methods in automated decision making systems. In particular, we consider a formulation of the statistical ranking problem which we call subset ranking, and focus on the DCG (discounted cumulated gain ..."
Abstract

Cited by 48 (0 self)
 Add to MetaCart
(Show Context)
Abstract—The ranking problem has become increasingly important in modern applications of statistical methods in automated decision making systems. In particular, we consider a formulation of the statistical ranking problem which we call subset ranking, and focus on the DCG (discounted cumulated gain) criterion that measures the quality of items near the top of the ranklist. Similar to error minimization for binary classification, direct optimization of natural ranking criteria such as DCG leads to a nonconvex optimization problems that can be NPhard. Therefore a computationally more tractable approach is needed. We present bounds that relate the approximate optimization of DCG to the approximate minimization of certain regression errors. These bounds justify the use of convex learning formulations for solving the subset ranking problem. The resulting estimation methods are not conventional, in that we focus on the estimation quality in the topportion of the ranklist. We further investigate the asymptotic statistical behavior of these formulations. Under appropriate conditions, the consistency of the estimation schemes with respect to the DCG metric can be derived. I.
Multiinstance multilabel learning
 Artificial Intelligence
"... In this paper, we propose the MIML (MultiInstance MultiLabel learning) framework where an example is described by multiple instances and associated with multiple class labels. Compared to traditional learning frameworks, the MIML framework is more convenient and natural for representing complicate ..."
Abstract

Cited by 38 (16 self)
 Add to MetaCart
(Show Context)
In this paper, we propose the MIML (MultiInstance MultiLabel learning) framework where an example is described by multiple instances and associated with multiple class labels. Compared to traditional learning frameworks, the MIML framework is more convenient and natural for representing complicated objects which have multiple semantic meanings. To learn from MIML examples, we propose the MimlBoost and MimlSvm algorithms based on a simple degeneration strategy, and experiments show that solving problems involving complicated objects with multiple semantic meanings in the MIML framework can lead to good performance. Consideringthat the degeneration process may lose information, we propose the DMimlSvm algorithm which tackles MIML problems directly in a regularization framework. Moreover, we show that even when we do not have access to the real objects and thus cannot capture more information from real objects by using the MIML representation, MIML is still useful. We propose the InsDif and SubCod algorithms. InsDif works by transforming singleinstances into the MIML representation for learning, while SubCod works by transforming singlelabel examples into the MIML representation for learning. Experiments show that in some tasks they are able to achieve better performance than learning the singleinstances or singlelabel examples directly.
Multiclass AdaBoost
 STATISTICS AND ITS INTERFACE VOLUME
, 2009
"... Boosting has been a very successful technique for solving the twoclass classification problem. In going from twoclass to multiclass classification, most algorithms have been restricted to reducing the multiclass classification problem to multiple twoclass problems. In this paper, we develop a n ..."
Abstract

Cited by 38 (2 self)
 Add to MetaCart
Boosting has been a very successful technique for solving the twoclass classification problem. In going from twoclass to multiclass classification, most algorithms have been restricted to reducing the multiclass classification problem to multiple twoclass problems. In this paper, we develop a new algorithm that directly extends the AdaBoost algorithm to the multiclass case without reducing it to multiple twoclass problems. We show that the proposed multiclass AdaBoost algorithm is equivalent to a forward stagewise additive modeling algorithm that minimizes a novel exponential loss for multiclass classification. Furthermore, we show that the exponential loss is a member of a class of Fisherconsistent loss functions for multiclass classification. As shown in the paper, the new algorithm is extremely easy to implement and is highly competitive in terms of misclassification error rate.
On the Consistency of Ranking Algorithms
"... We present a theoretical analysis of supervised ranking, providing necessary and sufficient conditions for the asymptotic consistency of algorithms based on minimizing a surrogate loss function. We show that many commonly used surrogate losses are inconsistent; surprisingly, we show inconsistency ev ..."
Abstract

Cited by 31 (1 self)
 Add to MetaCart
We present a theoretical analysis of supervised ranking, providing necessary and sufficient conditions for the asymptotic consistency of algorithms based on minimizing a surrogate loss function. We show that many commonly used surrogate losses are inconsistent; surprisingly, we show inconsistency even in lownoise settings. We present a newvalueregularizedlinear loss, establishits consistency under reasonable assumptions on noise, and show that it outperforms conventional ranking losses in a collaborative filtering experiment. The goal in ranking is to order a set of inputs in accordance with the preferences of an individual or a population. In this paper we consider a general formulation of the supervised ranking problem in which each training example consists of a query q, a set of inputs x, sometimes called results, and a weighted graph G representing preferences over the results. The learning task is to discover a function that provides a queryspecific ordering of the inputs that best respects the observed preferences. This queryindexed setting is natural for tasks like web search in which a different ranking is needed for each query. Following existing literature, we assume the existence of a scoring function f(x,q) that gives a score to each result in x; the scoresaresortedtoproducearanking(Herbrich et al., 2000; Freund et al., 2003). We assume simply that the observed preference graph G is a directed acyclic graph (DAG). Finally, we cast our work in a decisiontheoretic framework in which ranking procedures are evaluated via a loss function L(f(x,q),G).
How to compare different loss functions and their risks
, 2006
"... Many learning problems are described by a risk functional which in turn is defined by a loss function, and a straightforward and widelyknown approach to learn such problems is to minimize a (modified) empirical version of this risk functional. However, in many cases this approach suffers from subst ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
Many learning problems are described by a risk functional which in turn is defined by a loss function, and a straightforward and widelyknown approach to learn such problems is to minimize a (modified) empirical version of this risk functional. However, in many cases this approach suffers from substantial problems such as computational requirements in classification or robustness concerns in regression. In order to resolve these issues many successful learning algorithms try to minimize a (modified) empirical risk of a surrogate loss function, instead. Of course, such a surrogate loss must be “reasonably related ” to the original loss function since otherwise this approach cannot work well. For classification good surrogate loss functions have been recently identified, and the relationship between the excess classification risk and the excess risk of these surrogate loss functions has been exactly described. However, beyond the classification problem little is known on good surrogate loss functions up to now. In this work we establish a general theory that provides powerful tools for comparing excess risks of different loss functions. We then apply this theory to several learning problems including (costsensitive) classification, regression, density estimation, and density level detection.
NEW MULTICATEGORY BOOSTING ALGORITHMS BASED ON MULTICATEGORY FISHERCONSISTENT LOSSES
 SUBMITTED TO THE ANNALS OF APPLIED STATISTICS
"... Fisherconsistent loss functions play a fundamental role in the construction of successful binary marginbased classifiers. In this paper we establish the Fisherconsistency condition for multicategory classification problems. Our approach uses the margin vector concept which can be regarded as a mu ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
Fisherconsistent loss functions play a fundamental role in the construction of successful binary marginbased classifiers. In this paper we establish the Fisherconsistency condition for multicategory classification problems. Our approach uses the margin vector concept which can be regarded as a multicategory generalization of the binary margin. We characterize a wide class of smooth convex loss functions that are Fisherconsistent for multicategory classification. We then consider using the marginvectorbased loss functions to derive multicategory boosting algorithms. In particular, we derive two new multicategory boosting algorithms by using the exponential and logistic regression losses.
Composite Multiclass Losses
"... We consider loss functions for multiclass prediction problems. We show when a multiclass loss can be expressed as a “proper composite loss”, which is the composition of a proper loss and a link function. We extend existing results for binary losses to multiclass losses. We determine the stationarity ..."
Abstract

Cited by 19 (8 self)
 Add to MetaCart
(Show Context)
We consider loss functions for multiclass prediction problems. We show when a multiclass loss can be expressed as a “proper composite loss”, which is the composition of a proper loss and a link function. We extend existing results for binary losses to multiclass losses. We determine the stationarity condition, Bregman representation, ordersensitivity, existence and uniqueness of the composite representation for multiclass losses. We subsume existing results on “classification calibration ” by relating it to properness and show that the simple integral representation for binary proper losses can not be extended to multiclass losses. 1
Robust truncated hinge loss support vector machines. Journal of the American Statistical Association 102 974–983. MR2411659 Seo Young Park Department of Statistics and Operations Research CB3260
, 2007
"... The support vector machine (SVM) has been widely applied for classification problems in both machine learning and statistics. Despite its popularity, however, SVM has some drawbacks in certain situations. In particular, the SVM classifier can be very sensitive to outliers in the training sample. Mor ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
The support vector machine (SVM) has been widely applied for classification problems in both machine learning and statistics. Despite its popularity, however, SVM has some drawbacks in certain situations. In particular, the SVM classifier can be very sensitive to outliers in the training sample. Moreover, the number of support vectors (SVs) can be very large in many applications. To circumvent these drawbacks, we propose the robust truncated hinge loss SVM (RSVM), which uses a truncated hinge loss. The RSVM is shown to be more robust to outliers and to deliver more accurate classifiers using a smaller set of SVs than the standard SVM. Our theoretical results show that the RSVM is Fisherconsistent, even when there is no dominating class, a scenario that is particularly challenging for multicategory classification. Similar results are obtained for a class of marginbased classifiers.