MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Additive logistic regression: a statistical view of boosting (2000) [543 citations — 10 self]

Download:
Download as a PDF | Download as a PS
by Jerome Friedman, Trevor Hastie, Robert Tibshirani
Annals of Statistics
http://stat.stanford.edu/~jhf/ftp/boost.ps.Z
Add To MetaCart

Abstract:

Boosting (Freund & Schapire 1996, Schapire & Singer 1998) is one of the most important recent developments in classification methodology. The performance of many classification algorithms can often be dramatically improved by sequentially applying them to reweighted versions of the input data, and taking a weighted majority vote of the sequence of classifiers thereby produced. We show that this seemingly mysterious phenomenon can be understood in terms of well known statistical principles, namely additive modeling and maximum likelihood. For the two-class problem, boosting can be viewed as an approximation to additive modeling on the logistic scale using maximum Bernoulli likelihood as a criterion. We develop more direct approximations and show that they exhibit nearly identical results to boosting. Direct multi-class generalizations based on multinomial likelihood are derived that exhibit performance comparable to other recently proposed multi-class generalizations of boosting in most situations, and far superior in some. We suggest a minor modification to boosting that can reduce computation, often by factors of 10 to 50. Finally, we apply these insights to produce an alternative formulation of boosting decision trees. This approach, based on best-first truncated tree induction, often leads to better performance, and can provide interpretable descriptions of the aggregate decision rule. It is also much faster computationally making it more suitable to large scale data mining applications.

Citations

2438 Classification and Regression Trees – Breiman, Friedman, et al. - 1984
1453 Bagging Predictors – Breiman - 1996
1133 A decision-theoretic generalization of on-line learning and an application to boosting – Freund, Schapire - 1997
1004 Experiments with a new boosting algorithm – Schapire - 1996
635 Generalized Additive Models – Hastie, Tibshirani - 1990
520 Generalized Linear Models – McCullagh, Nelder - 1989
483 Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods – Schapire, Freund, et al. - 1997
453 The strength of weak learnability – Schapire - 1990
403 An introduction to computational learning theory – Kearns, Vazirani - 1994
401 Matching pursuits with timefrequency dictionaries – Mallat, Zhang - 1993
335 Very simple classification rules perform well on most commonly used data sets – Holte - 1993
298 Boosting a weak learning algorithm by majority – Freund - 1995
296 An experimental ! comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization, submitted to Machine Learning – Dietterich - 1998
267 Projection Pursuit Regression – Friedman, Stuetzle - 1981
157 Multivariate adaptive regression splines (with discussion). The Annals of Statistics – Friedman - 1991
108 Prediction games and arcing algorithms – Breiman - 1999
86 variance, and arcing classifiers – Breiman - 1996
77 Another approach to polychotomous classification – Friedman - 1996
72 Flexible discriminant analysis by optimal scoring – Hastie, Tibshirani, et al. - 1994
37 Linear smoothers and additive models (with discussion – Buja, Hastie, et al. - 1989
11 Classification by pairwise coupling. The Annals of Statistics – Hastie, Tibshirani - 1998
5 Nearest neighbor pattern classification', Proc – Cover - 1967