MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  On the size of training set and the benefit from ensemble (2004) [1 citations — 0 self]

Download:
Download as a PDF
by Zhi-hua Zhou, Dan Wei, Gang Li, Honghua Dai
In Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’04
http://cs.nju.edu.cn/people/zhouzh/zhouzh.files/Publication/pakdd04a.pdf
Add To MetaCart

Abstract:

Abstract. In this paper, the impact of the size of the training set on the benefit from ensemble, i.e. the gains obtained by employing ensemble learning paradigms, is empirically studied. Experiments on Bagged/ Boosted J4.8 decision trees with/without pruning show that enlarging the training set tends to improve the benefit from Boosting but does not significantly impact the benefit from Bagging. This phenomenon is then explained from the view of bias-variance reduction. Moreover, it is shown that even for Boosting, the benefit does not always increase consistently along with the increase of the training set size since single learners sometimes may learn relatively more from additional training data that are randomly provided than ensembles do. Furthermore, it is observed that the benefit from ensemble of unpruned decision trees is usually bigger than that from ensemble of pruned decision trees. This phenomenon is then explained from the view of error-ambiguity balance. 1

Citations

1453 Bagging Predictors – Breiman - 1996
1133 A decision-theoretic generalization of on-line learning and an application to boosting – Freund, Schapire - 1997
1117 E.: Data Mining: Practical machine learning tools and techniques. 2nd edn – Witten, Frank - 2005
489 Neural networks and the bias/variance dilemma – Geman, Bienenstock, et al. - 1992
356 An empirical comparison of voting classification algorithms: Bagging, boosting and variants – Bauer, Kohavi - 1999
308 Neural network ensembles, cross validation, and active learning – Krogh, Vedelsby - 1995
199 Arcing classifiers – Breiman - 1998
157 Machine learning research: Four current directions – Dietterich - 1997
79 C.J.: UCI Repository of machine learning databases, http://www.ics.uci.edu/~mlearn /MLRepository.html – Blake, Keogh, et al. - 1998