MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Generalization Error of Combined Classiers

Download:
Download as a PDF | Download as a PS
by Llew Mason, Peter L. Bartlett, Mostefa Golea
http://syseng.anu.edu.au/~lmason/combined.ps
Add To MetaCart

Abstract:

We derive an upper bound on the generalization error of classiers which can be represented as thresholded convex combinations of thresholded convex combinations of functions. Such classiers include single hidden-layer threshold networks and voted combinations of decision trees (such as those produced by boosting algorithms). The derived bound depends on the proportion of training examples with margin less than some threshold and the average complexity of the combined functions (where the average is over the weights assigned to each function in the convex combination). The complexity of the individual functions in the combination depends on their closeness to threshold. By representing a decision tree as a thresholded convex combination of weighted leaf functions, we apply this result to bound the generalization error of combinations of decision trees. Previous bounds depend on the margin of the combined classier and the average complexity of the decision trees in the combination, where the complexity of each decision tree depends on the total number of leaves. Our bound also depends on the margin of the combined classier and the average complexity of the decision trees, but our measure of complexity for an individual decision tree is based on the distribution of training examples over leaves and can be signicantly smaller than the total number of leaves.

Citations

1453 Bagging Predictors – Breiman - 1996
1133 A decision-theoretic generalization of on-line learning and an application to boosting – Freund, Schapire - 1997
1004 Experiments with a new boosting algorithm – Schapire - 1996
654 On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab – Vapnik, Červonekis - 1971
573 A Probabilistic Theory of Pattern Recognition – Devroye, Gyorfi, et al. - 1996
293 What size net gives valid generalization – Baum, Haussler - 1989
176 On the density of families of sets – Sauer - 1972
122 Probability inequalities for sums of bounded random variables – Hoeding - 1963
108 Prediction games and arcing algorithms – Breiman - 1999
89 Boosting the margin: a new explanation for the eectiveness of voting methods – Schapire, Freund, et al. - 1998
87 Boosting in the limit: Maximizing the margin of learned ensembles – Grove, Schuurmans - 1998
17 The sample complexity of pattern classi with neural networks: the size of the weights is more important than the size of the network – Bartlett - 1998
11 Arcing classi The Annals of Statistics – Breiman - 1998
11 Generalization in decision trees and DNF: Does size matter – Golea, Bartlett, et al. - 1998
6 A result of Vapnik with applications. Discrete Applied Mathematics – Anthony, Shawe-Taylor - 1993
3 Developing higher order networks with empirically selected units – Kowalczyk, FerrY - 1993
2 Generalization in threshold networks, combined decision trees and combined mask perceptrons – Mason, Bartlett, et al. - 1998