• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

DMCA

An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants (1999)

Cached

  • Download as a PDF

Download Links

  • [www-connex.lip6.fr]
  • [robotics.stanford.edu]
  • [robotics.stanford.edu]
  • [www.cs.utsa.edu]
  • [robotics.stanford.edu]
  • [robotics.stanford.edu]
  • [ai.stanford.edu]
  • [ai.stanford.edu]
  • [nlp.cs.swarthmore.edu]
  • [robotics.stanford.edu]
  • [ai.stanford.edu]
  • [nlp.cs.swarthmore.edu]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Eric Bauer , Ron Kohavi
Venue:MACHINE LEARNING
Citations:692 - 2 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@ARTICLE{Bauer99anempirical,
    author = {Eric Bauer and Ron Kohavi},
    title = {An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants},
    journal = {MACHINE LEARNING },
    year = {1999},
    volume = {36},
    pages = {105--139}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several variants in conjunction with a decision tree inducer (three variants) and a Naive-Bayes inducer. The purpose of the study is to improve our understanding of why and when these algorithms, which use perturbation, reweighting, and combination techniques, affect classification error. We provide a bias and variance decomposition of the error to show how different methods and variants influence these two terms. This allowed us to determine that Bagging reduced variance of unstable methods, while boosting methods (AdaBoost and Arc-x4) reduced both the bias and variance of unstable methods but increased the variance for Naive-Bayes, which was very stable. We observed that Arc-x4 behaves differently than AdaBoost if reweighting is used instead of resampling, indicating a fundamental difference. Voting variants, some of which are introduced in this paper, include: pruning versus no pruning, use of probabilistic estimates, weight perturbations (Wagging), and backfitting of data. We found that Bagging improves when probabilistic estimates in conjunction with no-pruning are used, as well as when the data was backfit. We measure tree sizes and show an interesting positive correlation between the increase in the average tree size in AdaBoost trials and its success in reducing the error. We compare the mean-squared error of voting methods to non-voting methods and show that the voting methods lead to large and significant reductions in the mean-squared errors. Practical problems that arise in implementing boosting algorithms are explored, including numerical instabilities and underflows. We use scatterplots that graphically show how AdaBoost reweights instances, emphasizing not only "hard" areas but also outliers and noise.

Keyphrases

voting classification algorithm    empirical comparison    probabilistic estimate    mean-squared error    unstable method    combination technique    variance decomposition    interesting positive correlation    non-voting method    affect classification error    decision tree inducer    tree size    several variant    real-world datasets    average tree size    significant reduction    certain classifier    fundamental difference    large empirical study    numerical instability    different method    hard area    voting method    boosting algorithm    classification algorithm    practical problem    adaboost reweights instance    naive-bayes inducer    weight perturbation    adaboost trial   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University