MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  The Annals of Statistics, to appear. Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods

Download:
Download as a PDF | Download as a PS
by Robert E. Schapire, Yoav Freund, Peter Bartlett, Wee Sun Lee
http://www.comp.nus.edu.sg/~leews/publications/SchapireFrBaLe98.ps.gz
Add To MetaCart

Abstract:

Abstract. One of the surprising recurring phenomena observed in experiments with boosting is that the test error of the generated classifier usually does not increase as its size becomes very large, and often is observed to decrease even after the training error reaches zero. In this paper, we show that this phenomenon is related to the distribution of margins of the training examples with respect to the generated voting classification rule, where the margin of an example is simply the difference between the number of correct votes and the maximumnumber of votes received by any incorrect label. We show that techniques used in the analysis of Vapnik's support vector classifiers and of neural networks with small weights can be applied to voting methods to relate the margin distribution to the test error. We also show theoretically and experimentally that boosting is especially effective at increasing the margins of the training examples. Finally, we compare our explanation to those based on the bias-variance decomposition. 1

Citations

4514 Statistical Learning Theory – Vapnik - 1998
2438 Classification and Regression Trees – Breiman, Friedman, et al. - 1984
2138 UCI Repository of Machine Learning Databases – Merz, Murphy - 1996
1453 Bagging Predictors – Breiman - 1996
1133 A decision-theoretic generalization of on-line learning and an application to boosting – Freund, Schapire - 1997
1004 Experiments with a new boosting algorithm – Freund, Schapire - 1996
982 Support-vector networks – Cortes, Vapnik - 1995
688 A training algorithm for optimal margin classifiers – Boser, Guyon, et al. - 1992
654 On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab – Vapnik, Červonekis - 1971
453 The strength of weak learnability – Schapire - 1990
389 Improved boosting algorithms using confidence-rated predictions – Schapire, Singer - 1998
298 Boosting a weak learning algorithm by majority – Freund - 1995
296 An experimental ! comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization, submitted to Machine Learning – Dietterich - 1998
293 What size net gives valid generalization – Baum, Haussler - 1989
232 Bagging, boosting and C4.5 – Quinlan - 1996
176 On the density of families of sets – Sauer - 1972
129 The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network – Bartlett - 1998
125 Error-correcting output coding corrects bias and variance – Kong, Dietterich - 1995
106 Game theory, on-line prediction and boosting – Freund, Schapire - 1996
93 A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training – Jones - 1992
88 Boosting decision trees – Drucker, Cortes - 1996
84 An empirical evaluation of bagging and boosting – Maclin, Opitz - 1997
75 Using output codes to boost multiclass learning problems – Schapire - 1997
47 Improving regressors using boosting techniques – Drucker - 1997
45 Dietterich and Ghulum Bakiri. Solving multiclass learning problems via errorcorrecting output codes – Thomas - 1995
45 Efficient Agnostic Learning in Neural Networks with Bounded Fan-In – Lee, Bartlett, et al. - 1996
28 Prediction games and arcing classifiers – Breiman - 1997
15 A framework for structural risk minimisation – Shawe-Taylor, Bartlett, et al. - 1996
14 variance and prediction error for classification rules – Bias - 1996
13 and Dale Schuurmans. Boosting in the limit: Maximizing the margin of learned ensembles – Grove - 1998
11 Rates of convex approximation in non-Hilbert spaces. Constructive Approximation – Donahue, Gurvits, et al. - 1997
2 Devroye. Bounds for the uniform deviation of empirical measures – Luc - 1982
2 Structural risk minimizationover data-dependent hierarchies – Shawe-Taylor, Bartlett, et al. - 1996