#### DMCA

## Additive Logistic Regression: a Statistical View of Boosting (1998)

### Cached

### Download Links

- [stat.stanford.edu]
- [utstat.toronto.edu]
- [www-stat.stanford.edu]
- CiteULike

### Other Repositories/Bibliography

Venue: | Annals of Statistics |

Citations: | 1703 - 25 self |

### Citations

5785 |
Classification and Regression Trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ...p(x)) \Gamma f(x) ' 2 (32) 2 The population algorithm described here translates immediately to an implementation on data when E(\Deltajx) is replaced by a regression method, such as regression trees (=-=Breiman et al. 1984-=-). While the role of the weights are somewhat artificial in the L 2 case, they are not in any implementation; w(x) is constant when conditioned on x, but the w(x i ) in a terminal node of a tree, for ... |

3550 | Bagging predictors
- Breiman
- 1996
(Show Context)
Citation Context ...M (Breiman, Friedman, Olshen & Stone 1984) as the base classifier. This adaptation grows fixed-size trees in a "best-first" manner (see Section 7, page 32). Included in the figure is the bag=-=ged tree (Breiman 1996-=-) which averages trees grown on bootstrap resampled versions of the training data. Bagging is purely a variance-reduction technique, and since trees tend to have high variance, bagging often produces ... |

3401 | A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
- Freund, Schapire
- 1997
(Show Context)
Citation Context ... generalization error. This theory (Schapire 1990) has evolved in the machine learning community, initially based on the concepts of PAC learning (Kearns & Vazirani 1994), and later from game theory (=-=Freund 1995, Breiman -=-1997). Early versions of boosting "weak learners" (Schapire 1990) are far simpler than those described here, and the theory is more precise. The bounds and the theory associated with the Ada... |

3037 | Generalized Linear Models - McCullagh, Nelder - 1983 |

2379 | Generalized Additive Models - HASTIE, TIBSHIRANI - 1990 |

2165 | Experiments with a new boosting algorithm - Freund, Schapire - 1996 |

1642 | Matching pursuit with time-frequency dictionaries - Mallat, Zhang - 1993 |

884 | Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics - Schapire, Freund, et al. - 1998 |

850 | The strength of weak learnability
- Schapire
- 1990
(Show Context)
Citation Context ...imate for fm (x) in Section 3. Freund & Schapire (1996) and Schapire & Singer (1998) provide some theory to support their algorithms, in the form of upper bounds on generalization error. This theory (=-=Schapire 1990-=-) has evolved in the machine learning community, initially based on the concepts of PAC learning (Kearns & Vazirani 1994), and later from game theory (Freund 1995, Breiman 1997). Early versions of boo... |

684 | An introduction to computational learning theory - Kearns, Vazirani - 1994 |

600 | An experimental comparison of three methods for constructing ensemble of decision trees
- Dietterich
(Show Context)
Citation Context ...A variety of other examples (not shown) exhibit similar behavior with all boosting methods. Note that other committee approaches to classification such as bagging (Breiman 1996) and randomized trees (=-=Dietterich 1998-=-), while admitting parallel implementations, cannot take advantage of this approach to reduce computation. 9 Concluding remarks In order to understand a learning procedure statistically it is necessar... |

547 | Projection pursuit regression - Friedman, Stuetzle - 1981 |

538 | Very Simple Classification Rules Perform Well on Most Commonly Used Datasets
- Holte
- 1993
(Show Context)
Citation Context ...illicit performance differences among the methods being tested. Such complicated boundaries are not likely to often occur in practice. Many practical problems involve comparatively simple boundaries (=-=Holte 1993-=-); in such cases performance differences will still be situation dependent, but correspondingly less pronounced. 6 Some experiments with data In this section we show the results of running the four fi... |

508 | Boosting a Weak Learning Algorithm by Majority
- Freund
- 1995
(Show Context)
Citation Context ... generalization error. This theory (Schapire 1990) has evolved in the machine learning community, initially based on the concepts of PAC learning (Kearns & Vazirani 1994), and later from game theory (=-=Freund 1995, Breiman -=-1997). Early versions of boosting "weak learners" (Schapire 1990) are far simpler than those described here, and the theory is more precise. The bounds and the theory associated with the Ada... |

298 |
Multivariate adaptive regression splines (with discussion
- Friedman
- 1991
(Show Context)
Citation Context ...ient number of boosts, the stump based model achieved superior performance. More generally, one can consider an expansion of the of the decision boundary function in a functional ANOVA decomposition (=-=Friedman 1991-=-) B(x) = X j f j (x j ) + X j;k f jk (x j ; x k ) + X j;k;l f jkl (x j ; x k ; x l ) + ::: (43) The first sum represents the closest function to B(x) that is additive in the original features, the fir... |

181 |
Another approach to polychotomous classification
- Friedman
- 1996
(Show Context)
Citation Context ...pooled complement classes. Even if the decision boundaries separating all class pairs are relatively simple, pooling classes can produce complex decision boundaries that are difficult to approximate (=-=Friedman 1996-=-). By considering all of the classes simultaneously, the symmetric multi--class model is better able to take advantage of simple pairwise boundaries when they exist (Hastie & Tibshirani 1998). As note... |

169 | Prediction games and arcing algorithms
- Breiman
- 1999
(Show Context)
Citation Context ...on error. This theory (Schapire 1990) has evolved in the machine learning community, initially based on the concepts of PAC learning (Kearns & Vazirani 1994), and later from game theory (Freund 1995, =-=Breiman 1997). Early v-=-ersions of boosting "weak learners" (Schapire 1990) are far simpler than those described here, and the theory is more precise. The bounds and the theory associated with the AdaBoost algorith... |

140 | Flexible discriminant analysis by optimal scoring - Hastie, Tibshirani, et al. - 1994 |

103 | Bias, variance and arcing classifiers
- Breiman
- 1996
(Show Context)
Citation Context ...M (Breiman, Friedman, Olshen & Stone 1984) as the base classifier. This adaptation grows fixed-size trees in a "best-first" manner (see Section 7, page 32). Included in the figure is the bag=-=ged tree (Breiman 1996-=-a) which averages trees grown on bootstrap resampled versions of the training data. Bagging is purely a 1 Essentially the same as AdaBoost.M1 for binary data (Freund & Schapire 1996) Discrete AdaBoost... |

85 | Linear Smoothers and Additive Models (with discussions and rejoinder - Buja, Hastie, et al. - 1989 |

38 | Classification by Pairwise Coupling,” The Annals of Statistics (26:2 - Hastie, Tibshirani - 1998 |

10 | Nearest neighbor pattern classification', Proc - Cover - 1967 |