Download:
|
by Llew Mason, Peter L. Bartlett, Mostefa Golea
http://syseng.anu.edu.au/~lmason/combined.ps
Add To MetaCart
Abstract:
We derive an upper bound on the generalization error of classiers which can be represented as thresholded convex combinations of thresholded convex combinations of functions. Such classiers include single hidden-layer threshold networks and voted combinations of decision trees (such as those produced by boosting algorithms). The derived bound depends on the proportion of training examples with margin less than some threshold and the average complexity of the combined functions (where the average is over the weights assigned to each function in the convex combination). The complexity of the individual functions in the combination depends on their closeness to threshold. By representing a decision tree as a thresholded convex combination of weighted leaf functions, we apply this result to bound the generalization error of combinations of decision trees. Previous bounds depend on the margin of the combined classier and the average complexity of the decision trees in the combination, where the complexity of each decision tree depends on the total number of leaves. Our bound also depends on the margin of the combined classier and the average complexity of the decision trees, but our measure of complexity for an individual decision tree is based on the distribution of training examples over leaves and can be signicantly smaller than the total number of leaves.
Citations
|
1453
|
Bagging Predictors
– Breiman
- 1996
|
|
1133
|
A decision-theoretic generalization of on-line learning and an application to boosting
– Freund, Schapire
- 1997
|
|
1004
|
Experiments with a new boosting algorithm
– Schapire
- 1996
|
|
654
|
On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab
– Vapnik, Červonekis
- 1971
|
|
573
|
A Probabilistic Theory of Pattern Recognition
– Devroye, Gyorfi, et al.
- 1996
|
|
293
|
What size net gives valid generalization
– Baum, Haussler
- 1989
|
|
176
|
On the density of families of sets
– Sauer
- 1972
|
|
122
|
Probability inequalities for sums of bounded random variables
– Hoeding
- 1963
|
|
108
|
Prediction games and arcing algorithms
– Breiman
- 1999
|
|
89
|
Boosting the margin: a new explanation for the eectiveness of voting methods
– Schapire, Freund, et al.
- 1998
|
|
87
|
Boosting in the limit: Maximizing the margin of learned ensembles
– Grove, Schuurmans
- 1998
|
|
17
|
The sample complexity of pattern classi with neural networks: the size of the weights is more important than the size of the network
– Bartlett
- 1998
|
|
11
|
Arcing classi The Annals of Statistics
– Breiman
- 1998
|
|
11
|
Generalization in decision trees and DNF: Does size matter
– Golea, Bartlett, et al.
- 1998
|
|
6
|
A result of Vapnik with applications. Discrete Applied Mathematics
– Anthony, Shawe-Taylor
- 1993
|
|
3
|
Developing higher order networks with empirically selected units
– Kowalczyk, FerrY
- 1993
|
|
2
|
Generalization in threshold networks, combined decision trees and combined mask perceptrons
– Mason, Bartlett, et al.
- 1998
|