Download:
|
by Robert E. Schapire, Yoav Freund, Peter Bartlett, Wee Sun Lee
http://www.comp.nus.edu.sg/~leews/publications/SchapireFrBaLe98.ps.gz
Add To MetaCart
Abstract:
Abstract. One of the surprising recurring phenomena observed in experiments with boosting is that the test error of the generated classifier usually does not increase as its size becomes very large, and often is observed to decrease even after the training error reaches zero. In this paper, we show that this phenomenon is related to the distribution of margins of the training examples with respect to the generated voting classification rule, where the margin of an example is simply the difference between the number of correct votes and the maximumnumber of votes received by any incorrect label. We show that techniques used in the analysis of Vapnik's support vector classifiers and of neural networks with small weights can be applied to voting methods to relate the margin distribution to the test error. We also show theoretically and experimentally that boosting is especially effective at increasing the margins of the training examples. Finally, we compare our explanation to those based on the bias-variance decomposition. 1
Citations
|
4514
|
Statistical Learning Theory
– Vapnik
- 1998
|
|
2438
|
Classification and Regression Trees
– Breiman, Friedman, et al.
- 1984
|
|
2138
|
UCI Repository of Machine Learning Databases
– Merz, Murphy
- 1996
|
|
1453
|
Bagging Predictors
– Breiman
- 1996
|
|
1133
|
A decision-theoretic generalization of on-line learning and an application to boosting
– Freund, Schapire
- 1997
|
|
1004
|
Experiments with a new boosting algorithm
– Freund, Schapire
- 1996
|
|
982
|
Support-vector networks
– Cortes, Vapnik
- 1995
|
|
688
|
A training algorithm for optimal margin classifiers
– Boser, Guyon, et al.
- 1992
|
|
654
|
On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab
– Vapnik, Červonekis
- 1971
|
|
453
|
The strength of weak learnability
– Schapire
- 1990
|
|
389
|
Improved boosting algorithms using confidence-rated predictions
– Schapire, Singer
- 1998
|
|
298
|
Boosting a weak learning algorithm by majority
– Freund
- 1995
|
|
296
|
An experimental ! comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization, submitted to Machine Learning
– Dietterich
- 1998
|
|
293
|
What size net gives valid generalization
– Baum, Haussler
- 1989
|
|
232
|
Bagging, boosting and C4.5
– Quinlan
- 1996
|
|
176
|
On the density of families of sets
– Sauer
- 1972
|
|
129
|
The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network
– Bartlett
- 1998
|
|
125
|
Error-correcting output coding corrects bias and variance
– Kong, Dietterich
- 1995
|
|
106
|
Game theory, on-line prediction and boosting
– Freund, Schapire
- 1996
|
|
93
|
A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training
– Jones
- 1992
|
|
88
|
Boosting decision trees
– Drucker, Cortes
- 1996
|
|
84
|
An empirical evaluation of bagging and boosting
– Maclin, Opitz
- 1997
|
|
75
|
Using output codes to boost multiclass learning problems
– Schapire
- 1997
|
|
47
|
Improving regressors using boosting techniques
– Drucker
- 1997
|
|
45
|
Dietterich and Ghulum Bakiri. Solving multiclass learning problems via errorcorrecting output codes
– Thomas
- 1995
|
|
45
|
Efficient Agnostic Learning in Neural Networks with Bounded Fan-In
– Lee, Bartlett, et al.
- 1996
|
|
28
|
Prediction games and arcing classifiers
– Breiman
- 1997
|
|
15
|
A framework for structural risk minimisation
– Shawe-Taylor, Bartlett, et al.
- 1996
|
|
14
|
variance and prediction error for classification rules
– Bias
- 1996
|
|
13
|
and Dale Schuurmans. Boosting in the limit: Maximizing the margin of learned ensembles
– Grove
- 1998
|
|
11
|
Rates of convex approximation in non-Hilbert spaces. Constructive Approximation
– Donahue, Gurvits, et al.
- 1997
|
|
2
|
Devroye. Bounds for the uniform deviation of empirical measures
– Luc
- 1982
|
|
2
|
Structural risk minimizationover data-dependent hierarchies
– Shawe-Taylor, Bartlett, et al.
- 1996
|