MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Boosting with early stopping: convergence and consistency (2003) [10 citations — 3 self]

Download:
Download as a PDF | Download as a PS
by Tong Zhang, Bin Yu
Annals of Statistics
http://www.stat.berkeley.edu/users/binyu/ps/zhang.final.ps
Add To MetaCart

Abstract:

Boosting is one of the most significant advances in machine learning for classification and regression. In its original and computationally flexible version, boosting seeks to minimize empirically a loss function in a greedy fashion. The resulted estimator takes an additive function form and is built iteratively by applying a base estimator (or learner) to updated samples depending on the previous iterations. An unusual regularization technique, early stopping, is employed based on CV or a test set. This paper studies numerical convergence, consistency, and statistical rates of convergence of boosting with early stopping, when it is carried out over the linear span of a family of basis functions. For general loss functions, we prove the convergence of boosting’s greedy optimization to the infinimum of the loss function over the linear span. Using the numerical convergence result, we find early stopping strategies under which boosting is shown to be consistent based on iid samples, and we obtain bounds on the rates of convergence for boosting estimators. Simulation studies are also presented to illustrate the relevance of our theoretical results for providing insights to practical aspects of boosting. As a side product, these results also reveal the importance of restricting the greedy search step sizes, as known in practice through the works of Friedman and others. Moreover, our results lead to a rigorous proof that for a linearly separable problem, AdaBoost with ǫ → 0 stepsize becomes an L 1-margin maximizer when left to run to convergence. 1

Citations

4514 Statistical Learning Theory – Vapnik - 1998
1133 A decision-theoretic generalization of on-line learning and an application to boosting – Freund, Schapire - 1997
674 The Elements of Statistical Learning – Hastie, Tibshirani, et al. - 2001
635 Generalized Additive Models – Hastie, Tibshirani - 1990
543 Additive logistic regression: a statistical view of boosting – Friedman, Hastie, et al.
483 Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods – Schapire, Freund, et al. - 1997
401 Matching pursuits with timefrequency dictionaries – Mallat, Zhang - 1993
389 Improved boosting algorithms using confidence-rated predictions – Schapire, Singer - 1998
257 Universal approximation bounds for superposition of a sigmoid function – Barron - 1993
199 Arcing classifiers – Breiman - 1998
176 Greedy function approximation: a gradient boosting machine – Friedman - 2001
124 Probability in Banach Spaces – Ledoux, Talagrand - 1991
118 Logistic regression, adaboost and bregman distances – Collins, Schapire, et al. - 2002
108 Prediction games and arcing algorithms – Breiman - 1999
105 Functional gradient techniques for combining hypotheses – Mason, Baxter, et al. - 1999
93 A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training – Jones - 1992
93 Multilayer Feedforward Networks with Nonpolynomial Activation Function can Approximate any Function – Leshno, Ya-Lin, et al. - 1993
93 Rademacher and gaussian complexities: Risk bounds and structural results – Bartlett, Mendelson - 2002
89 Boosting the margin: a new explanation for the eectiveness of voting methods – Schapire, Freund, et al. - 1998
87 Boosting in the limit: Maximizing the margin of learned ensembles – Grove, Schuurmans - 1998
70 Empirical margin distributions and bounding the generalization error of combined classifiers – Koltchinskii, Panchenko
51 Schapire and Yoram Singer. Improved boosting algorithms using confidence-rated predictions – Robert - 1999
50 Statistical behavior and consistency of classification methods based on convex risk minimization – Zhang
50 Local Rademacher complexities – Bartlett, Bousquet, et al. - 2005
45 Efficient Agnostic Learning in Neural Networks with Bounded Fan-In – Lee, Bartlett, et al. - 1996
41 Text categorization based on regularized linear classification methods. Information Retrieval 4:5–31 – Zhang, Oles - 2001
38 Boosting with the L2 loss: Regression and classification – Bühlmann, Yu - 2003
30 Process consistency for AdaBoost – Jiang - 2000
27 Convexity, classification and risk bounds – Bartlett, Jordan, et al. - 2003
23 On the rate of convergence of regularized boosting classifiers – Blanchard, Lugosi, et al. - 2003
18 The consistency of greedy algorithms for classification – Mannor, Meir, et al. - 2002
16 Some infinity theory for predictor ensembles – Breiman - 2000
14 Some local measures of complexity of convex hulls and generalization bounds – Bousquet, Koltchinskii, et al. - 2002
13 Sequential greedy approximation for certain convex optimization problems – Zhang - 2002
13 A loss function analysis for classification methods in text categorization – Li, Yang - 2003
12 The elements of statistical learning. Springer series in statistics – Hastie, Tibshirani, et al. - 2001
12 Greedy algorithms for classification - consistency, convergence rates, and adaptivity – Mannor, Meir, et al. - 2003
10 Weak convergence and empirical processes. Springer Series in Statistics – Vaart, Wellner - 1996
10 Generalization error bounds for Bayesian mixture algorithms – Meir, Zhang - 2003
7 On the Bayes-risk consistency of boosting methods – Lugosi, Vayatis - 2001
5 Boosting with the L 2 loss: regression and classification – Buhlmann, Yu - 2001
4 Consistency for l2boosting and matching pursuit with trees and treetype basis functions – Bühlmann - 2002
3 On the bayes-risk consistency of bosting methods – Lugosi, Vayatis - 2001
2 Consistency for L 2 boosting and matching pursuit with trees and tree-type basis functions – Buhlmann - 2002
2 Complexities of convex combinations and bounding the generalization error in classification. The manuscript can be downloaded from http://www-math.mit.edu/#panchenk/research.html – Koltchinskii, Panchenko - 2003
2 2001b) Further Explanation of the Effectiveness of Voting Methods: The Game between Margins and Weights – Koltchinskii, Panchenko, et al. - 2001
1 Arcing classifiers. The Annals of Statistics, 26:801--849 – Breiman - 1998
1 Further explanation of the e#ectiveness of voting methods: The game between margins and weights – Koltchinskii, Panchenko, et al. - 2001