Boosting is one of the most significant advances in machine learning for classification and regression. In its original and computationally flexible version, boosting seeks to minimize empirically a loss function in a greedy fashion. The resulted estimator takes an additive function form and is built iteratively by applying a base estimator (or learner) to updated samples depending on the previous iterations. An unusual regularization technique, early stopping, is employed based on CV or a test set. This paper studies numerical convergence, consistency, and statistical rates of convergence of boosting with early stopping, when it is carried out over the linear span of a family of basis functions. For general loss functions, we prove the convergence of boosting’s greedy optimization to the infinimum of the loss function over the linear span. Using the numerical convergence result, we find early stopping strategies under which boosting is shown to be consistent based on iid samples, and we obtain bounds on the rates of convergence for boosting estimators. Simulation studies are also presented to illustrate the relevance of our theoretical results for providing insights to practical aspects of boosting. As a side product, these results also reveal the importance of restricting the greedy search step sizes, as known in practice through the works of Friedman and others. Moreover, our results lead to a rigorous proof that for a linearly separable problem, AdaBoost with ǫ → 0 stepsize becomes an L 1-margin maximizer when left to run to convergence. 1
|
4514
|
Statistical Learning Theory
– Vapnik
- 1998
|
|
1133
|
A decision-theoretic generalization of on-line learning and an application to boosting
– Freund, Schapire
- 1997
|
|
674
|
The Elements of Statistical Learning
– Hastie, Tibshirani, et al.
- 2001
|
|
635
|
Generalized Additive Models
– Hastie, Tibshirani
- 1990
|
|
543
|
Additive logistic regression: a statistical view of boosting
– Friedman, Hastie, et al.
|
|
483
|
Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods
– Schapire, Freund, et al.
- 1997
|
|
401
|
Matching pursuits with timefrequency dictionaries
– Mallat, Zhang
- 1993
|
|
389
|
Improved boosting algorithms using confidence-rated predictions
– Schapire, Singer
- 1998
|
|
257
|
Universal approximation bounds for superposition of a sigmoid function
– Barron
- 1993
|
|
199
|
Arcing classifiers
– Breiman
- 1998
|
|
176
|
Greedy function approximation: a gradient boosting machine
– Friedman
- 2001
|
|
124
|
Probability in Banach Spaces
– Ledoux, Talagrand
- 1991
|
|
118
|
Logistic regression, adaboost and bregman distances
– Collins, Schapire, et al.
- 2002
|
|
108
|
Prediction games and arcing algorithms
– Breiman
- 1999
|
|
105
|
Functional gradient techniques for combining hypotheses
– Mason, Baxter, et al.
- 1999
|
|
93
|
A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training
– Jones
- 1992
|
|
93
|
Multilayer Feedforward Networks with Nonpolynomial Activation Function can Approximate any Function
– Leshno, Ya-Lin, et al.
- 1993
|
|
93
|
Rademacher and gaussian complexities: Risk bounds and structural results
– Bartlett, Mendelson
- 2002
|
|
89
|
Boosting the margin: a new explanation for the eectiveness of voting methods
– Schapire, Freund, et al.
- 1998
|
|
87
|
Boosting in the limit: Maximizing the margin of learned ensembles
– Grove, Schuurmans
- 1998
|
|
70
|
Empirical margin distributions and bounding the generalization error of combined classifiers
– Koltchinskii, Panchenko
|
|
51
|
Schapire and Yoram Singer. Improved boosting algorithms using confidence-rated predictions
– Robert
- 1999
|
|
50
|
Statistical behavior and consistency of classification methods based on convex risk minimization
– Zhang
|
|
50
|
Local Rademacher complexities
– Bartlett, Bousquet, et al.
- 2005
|
|
45
|
Efficient Agnostic Learning in Neural Networks with Bounded Fan-In
– Lee, Bartlett, et al.
- 1996
|
|
41
|
Text categorization based on regularized linear classification methods. Information Retrieval 4:5–31
– Zhang, Oles
- 2001
|
|
38
|
Boosting with the L2 loss: Regression and classification
– Bühlmann, Yu
- 2003
|
|
30
|
Process consistency for AdaBoost
– Jiang
- 2000
|
|
27
|
Convexity, classification and risk bounds
– Bartlett, Jordan, et al.
- 2003
|
|
23
|
On the rate of convergence of regularized boosting classifiers
– Blanchard, Lugosi, et al.
- 2003
|
|
18
|
The consistency of greedy algorithms for classification
– Mannor, Meir, et al.
- 2002
|
|
16
|
Some infinity theory for predictor ensembles
– Breiman
- 2000
|
|
14
|
Some local measures of complexity of convex hulls and generalization bounds
– Bousquet, Koltchinskii, et al.
- 2002
|
|
13
|
Sequential greedy approximation for certain convex optimization problems
– Zhang
- 2002
|
|
13
|
A loss function analysis for classification methods in text categorization
– Li, Yang
- 2003
|
|
12
|
The elements of statistical learning. Springer series in statistics
– Hastie, Tibshirani, et al.
- 2001
|
|
12
|
Greedy algorithms for classification - consistency, convergence rates, and adaptivity
– Mannor, Meir, et al.
- 2003
|
|
10
|
Weak convergence and empirical processes. Springer Series in Statistics
– Vaart, Wellner
- 1996
|
|
10
|
Generalization error bounds for Bayesian mixture algorithms
– Meir, Zhang
- 2003
|
|
7
|
On the Bayes-risk consistency of boosting methods
– Lugosi, Vayatis
- 2001
|
|
5
|
Boosting with the L 2 loss: regression and classification
– Buhlmann, Yu
- 2001
|
|
4
|
Consistency for l2boosting and matching pursuit with trees and treetype basis functions
– Bühlmann
- 2002
|
|
3
|
On the bayes-risk consistency of bosting methods
– Lugosi, Vayatis
- 2001
|
|
2
|
Consistency for L 2 boosting and matching pursuit with trees and tree-type basis functions
– Buhlmann
- 2002
|
|
2
|
Complexities of convex combinations and bounding the generalization error in classification. The manuscript can be downloaded from http://www-math.mit.edu/#panchenk/research.html
– Koltchinskii, Panchenko
- 2003
|
|
2
|
2001b) Further Explanation of the Effectiveness of Voting Methods: The Game between Margins and Weights
– Koltchinskii, Panchenko, et al.
- 2001
|
|
1
|
Arcing classifiers. The Annals of Statistics, 26:801--849
– Breiman
- 1998
|
|
1
|
Further explanation of the e#ectiveness of voting methods: The game between margins and weights
– Koltchinskii, Panchenko, et al.
- 2001
|