MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Why does bagging work? A bayesian account and its implications (1997) [26 citations — 6 self]

Download:
Download as a PDF | Download as a PS
by Pedro Domingos
In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining
http://www.cs.washington.edu/homes/pedrod/kdd97.ps.gz
Add To MetaCart

Abstract:

The error rate of decision-tree and other classification learners can often be much reduced by bagging: learning multiple models from bootstrap samples of the database, and combining them by uniform voting. In this paper we empirically test two alternative explanations for this, both based on Bayesian learning theory: (1) bagging works because it is an approximation to the optimal procedure of Bayesian model averaging, with an appropriate implicit prior; (2) bagging works because it effectively shifts the prior to a more appropriate region of model space. All the experimental evidence contradicts the first hypothesis, and confirms the second. Bagging Bagging (Breiman 1996a) is a simple and effective way to reduce the error rate of many classification learning algorithms. For example, in the empirical study described below, it reduces the error of a decision-tree learner in 19 of 26 databases, by 4 % on average. In the bagging procedure, given a training set of size s, a "bootstrap " replicate of it is constructed by taking s samples with replacement from the training set. Thus a new training set of the same size is produced, where each of the original examples may appear once, more than once, or not. On average, 63 % of the original examples will appear in the bootstrap sample. The learning algorithm is then applied to this training set. This procedure is repeated m times, and the resulting m models are aggregated by uniform voting. Bagging is one of several "multiple model " approaches that have recently received much attention (see, for example, (Chan, Stolfo, & Wolpert 1996)). Other procedures of this type include boosting (Freund & Schapire 1996) and stacking (Wolpert 1992). Two related explanations have been proposed for bagging's success, both in a classical statistical framework.

Citations

3307 C4.5: Programs for machine learning – Quinlan - 1993
2195 UCI Repository of Machine Learning Databases – Blake, Merz - 1998
1504 Bagging Predictors – Breiman - 1996
1031 Experiments with a new boosting algorithm – Freund, Schapire - 1996
594 Bayesian Theory – Bernardo, Smith - 1994
156 On bias, variance, 0/1 - loss, and the curse-of-dimensionality – Friedman - 1997
140 Constructing optimal binary decision trees is NP-complete – Hyafil, Rivest - 1976
138 Bias plus variance decomposition for zeroone loss functions – Kohavi - 1996
127 Error-correcting output coding corrects bias and variance – Kong, Dietterich - 1995
89 Bias, variance and arcing classifiers – BREIMAN - 1996
82 A Theory of Learning Classification Rules – Buntine - 1990
46 Knowledge acquisition from examples via multiple models – Domingos - 1997
38 On finding the most probable model – Cheeseman - 1990
30 Bayesian model averaging – Madigan, Raftery, et al. - 1996