Why averaging classifiers can protect against overfitting (2001) [8 citations — 0 self]
Abstract:
We study a simple learning algorithm for binary classification. Instead of predicting with the best hypothesis in the hypothesis class, this algorithm predicts with a weighted average of all hypotheses, weighted exponentially with respect to their training error. We show that the prediction of this algorithm is much more stable than the prediction of an algorithm that predicts with the best hypothesis. By allowing the algorithm to abstain from predicting on some examples, we show that the predictions it makes when it does not abstain are very reliable. Finally, we show that the probability that the algorithm abstains is at most about twice the generalization error of the best hypothesis in the class. 1
Citations
| 1453 | Bagging Predictors – Breiman - 1996 |
| 483 | Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods – Schapire, Freund, et al. - 1997 |
| 437 | The weighted majority algorithm – Littlestone, Warmuth - 1994 |
| 230 | Reducing multiclass to binary: A unifying approach for margin classifiers – Allwein, Schapire, et al. |
| 228 | How to use expert advice – Cesa-Bianchi, Freund, et al. - 1997 |
| 223 | On the method of bounded differences – McDiarmid - 1989 |
| 151 | On bias, variance, 0/1 - loss, and the curse-of-dimensionality – Friedman - 1997 |
| 117 | Bayesian Methods for Adaptive Models – MacKay - 1992 |
| 38 | The heuristic of instability in model selection – Breiman - 1996 |
| 17 | A pac analysis of a bayesian estimator – Shawe-Taylor, Williamson - 1997 |
| 1 | Devroye, L azl o Gy orfi, and G abor Lugosi. A Probabilistic Theory of Pattern Recognition – Luc - 1996 |

