Download:
|
by Jerome Friedman, Trevor Hastie, Robert Tibshirani
Annals of Statistics
http://stat.stanford.edu/~jhf/ftp/boost.ps.Z
Add To MetaCart
Abstract:
Boosting (Freund & Schapire 1996, Schapire & Singer 1998) is one of the most important recent developments in classification methodology. The performance of many classification algorithms can often be dramatically improved by sequentially applying them to reweighted versions of the input data, and taking a weighted majority vote of the sequence of classifiers thereby produced. We show that this seemingly mysterious phenomenon can be understood in terms of well known statistical principles, namely additive modeling and maximum likelihood. For the two-class problem, boosting can be viewed as an approximation to additive modeling on the logistic scale using maximum Bernoulli likelihood as a criterion. We develop more direct approximations and show that they exhibit nearly identical results to boosting. Direct multi-class generalizations based on multinomial likelihood are derived that exhibit performance comparable to other recently proposed multi-class generalizations of boosting in most situations, and far superior in some. We suggest a minor modification to boosting that can reduce computation, often by factors of 10 to 50. Finally, we apply these insights to produce an alternative formulation of boosting decision trees. This approach, based on best-first truncated tree induction, often leads to better performance, and can provide interpretable descriptions of the aggregate decision rule. It is also much faster computationally making it more suitable to large scale data mining applications.
Citations
|
2438
|
Classification and Regression Trees
– Breiman, Friedman, et al.
- 1984
|
|
1453
|
Bagging Predictors
– Breiman
- 1996
|
|
1133
|
A decision-theoretic generalization of on-line learning and an application to boosting
– Freund, Schapire
- 1997
|
|
1004
|
Experiments with a new boosting algorithm
– Schapire
- 1996
|
|
635
|
Generalized Additive Models
– Hastie, Tibshirani
- 1990
|
|
520
|
Generalized Linear Models
– McCullagh, Nelder
- 1989
|
|
483
|
Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods
– Schapire, Freund, et al.
- 1997
|
|
453
|
The strength of weak learnability
– Schapire
- 1990
|
|
403
|
An introduction to computational learning theory
– Kearns, Vazirani
- 1994
|
|
401
|
Matching pursuits with timefrequency dictionaries
– Mallat, Zhang
- 1993
|
|
335
|
Very simple classification rules perform well on most commonly used data sets
– Holte
- 1993
|
|
298
|
Boosting a weak learning algorithm by majority
– Freund
- 1995
|
|
296
|
An experimental ! comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization, submitted to Machine Learning
– Dietterich
- 1998
|
|
267
|
Projection Pursuit Regression
– Friedman, Stuetzle
- 1981
|
|
157
|
Multivariate adaptive regression splines (with discussion). The Annals of Statistics
– Friedman
- 1991
|
|
108
|
Prediction games and arcing algorithms
– Breiman
- 1999
|
|
86
|
variance, and arcing classifiers
– Breiman
- 1996
|
|
77
|
Another approach to polychotomous classification
– Friedman
- 1996
|
|
72
|
Flexible discriminant analysis by optimal scoring
– Hastie, Tibshirani, et al.
- 1994
|
|
37
|
Linear smoothers and additive models (with discussion
– Buja, Hastie, et al.
- 1989
|
|
11
|
Classification by pairwise coupling. The Annals of Statistics
– Hastie, Tibshirani
- 1998
|
|
5
|
Nearest neighbor pattern classification', Proc
– Cover
- 1967
|