Abstract:
We study how close the optimal Bayes error rate can be approximately reached using a classication algorithm that computes a classier by minimizing a convex upper bound of the classication error function. The measurement of closeness is characterized by the loss function used in the estimation. We show that such a classication scheme can be generally regarded as a (non maximum-likelihood) conditional in-class probability estimate, and we use this analysis to compare various convex loss functions that have appeared in the literature. Furthermore, the theoretical insight allows us to design good loss functions with desirable properties. Another aspect of our analysis is to demonstrate the consistency of certain classication methods using convex risk minimization. This study sheds light on the good performance of some recently proposed linear classication methods including boosting and support vector machines. It also shows their limitations and suggests possible improvements. 1
Citations
|
4514
|
Statistical Learning Theory
– Vapnik
- 1998
|
|
1410
|
Convex Analysis
– Rockafellar
- 1970
|
|
1133
|
A decision-theoretic generalization of on-line learning and an application to boosting
– Freund, Schapire
- 1997
|
|
727
|
Spline Models for Observational Data
– Wahba
- 1990
|
|
550
|
Functional Analysis
– Rudin
- 1973
|
|
543
|
Additive logistic regression: a statistical view of boosting
– Friedman, Hastie, et al.
|
|
108
|
Prediction games and arcing algorithms
– Breiman
- 1999
|
|
93
|
Multilayer Feedforward Networks with Nonpolynomial Activation Function can Approximate any Function
– Leshno, Ya-Lin, et al.
- 1993
|
|
89
|
Boosting the margin: a new explanation for the eectiveness of voting methods
– Schapire, Freund, et al.
- 1998
|
|
51
|
Schapire and Yoram Singer. Improved boosting algorithms using confidence-rated predictions
– Robert
- 1999
|
|
24
|
Support vector machines are universally consistent
– Steinwart
- 2002
|
|
15
|
The relaxation method of the common point of convex sets and its application to the solution of problems in convex programming
– Bregman
- 1967
|
|
11
|
Arcing classi The Annals of Statistics
– Breiman
- 1998
|
|
8
|
Some in theory for predictor ensembles
– Breiman
- 2000
|
|
7
|
On the Bayes-risk consistency of boosting methods
– Lugosi, Vayatis
- 2001
|
|
7
|
A leave-one-out cross validation bound for kernel methods with applications in learning
– Zhang
- 2001
|
|
3
|
The consistency of greedy algorithms for classi
– Mannor, Meir, et al.
- 2002
|
|
1
|
Boosting with the l2 loss: Regression and classi
– Buhlmann, Yu
- 2001
|