25 citations found. Retrieving documents...
L. Breiman. Bias, variance and arcing classifiers. University of California, Dept. of Statistics, Technical Report, 1996.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Any Two Learning Algorithms Are (Almost) Exactly Identical - Wolpert (2000)   (Correct)

....most importantly though, the results of this paper spotlight a glaring gap in current understanding of the mathematics of supervised learning. By NFL, no supervised learning algorithm can be justified without a priori assumptions concerning the prior P (f ) In particular, this is true of boosting [3], support vector machines [4] bagging [2] stacking [1] cross validation, and similar currently popular techniques. Indeed, by the results of this paper, the widespread popularity of those techniques implicitly reflects an extremely large amount of information 16 imputed to the prior. For some ....

L. Breiman. Bias, variance and arcing classifiers. University of California, Dept. of Statistics, Technical Report, 1996.


A Computational Environment for Extracting Rules from.. - Baranauskas, Monard.. (2000)   (Correct)

....phase, two important concepts are applied that concern classifiers: the bias plus variance decomposition of a classifier and ensembles of classifier. They will be shortly described in Sections 3.1 and 3.2. 3. 1 The Bias plus Variance Decomposition The classifier Fundamental Decomposition [8, 6] principle states that the classifier error can be viewed as three basic components: the minimum error that can be obtained by the ideal classifier: the lower bound on the expected error of any learning algorithm; the bias which measures how closely the learning algorithm s average guess, over ....

....lead to improved predictive accuracy when classifying instances that are not among the training set. There is considerable diversity in the methods used to assemble the ensembles, including stacking [24] windowing [21] bagging [7] wagging [4] and more recently boosting [22, 14, 15] and arcing [6, 8]. Usually, classifiers are combined using a majority or weighted vote mechanism. It has been suggested that both bagging and boosting reduce error by reducing the variance term in Equation 1 [8] Also, 15] argue that boosting attempts to reduce the error in the bias term in the equation since it ....

[Article contains additional citation context not shown here]

Breiman, L. (1996c). Bias, variance and arcing classifiers. Technical Report 460, Statistics Department, University of California.


Generalization in Threshold Networks, Combined Decision.. - Mason, Bartlett, Golea   (Correct)

....a quantitywhich can be significantly smaller than the total number of leaves. 1. Introduction The idea of combining multiple classifiers by weighted majorityvote has recently received much attention following the empirical success of techniques such as boosting [3, 4] bagging [1] and arcing [2]. Despite the significant performance gains exhibited over a single classifier, the theoretical basis for such improved performance has, until recently, been weak. In [8]Schapire et al. derive an upper bound for the generalization error of anyconvex combination of binary classifiers which depends ....

Leo Breiman. Bias, variance and arcing classifiers. Technical Report 460, Statistics Department, University of California, Berkeley, April 1996.


Incremental Support Vector Machine Learning: a Local.. - Ralaivola.. (2001)   (4 citations)  (Correct)

....selection criterion. Looking at the bound, we can see that a value of must be chosen: in order to do that, we take a time varying value defined as g; 5 Experimental Results Experiments were conducted on three different binary classification problems: Banana [14] Ringnorm [2] and Diabetes. Datasets are available at www.first.gmd.de raetsch . For each problem, we tested LISVM for different values of the threshold . The main points we want to assess are the classification accuracy our algorithm is able to achieve, the appropriateness of the proposed criterion to ....

L. Breiman. Bias, variance and arcing classifiers. Technical Report 460, University of California, Berkeley, CA, USA, 1996.


Mining Decision Trees from Data Streams in a Mobile Environment - Kargupta, Park (2001)   (Correct)

....different ways to compute the output of the ensemble. Averaging the outputs of the individual models with uniform weight is probably the simplest possibility. Perrone and Cooper [19] 17] refer to this method as Basic Ensemble Method (BEM) or naive Bagging. Breiman proposed an Arcing method Arc fx [2][3] for mining from large data set and stream data . It is fundamentally based on the idea of Arcing adaptive re sampling by giving higher weights to those instances that are usually mis classified. We consider both of these ensemble learning techniques to create an ensemble of decision trees ....

L. Breiman. Bias, variance and arcing classifiers. Technical Report 460, Statistics Department, University of California at Berkeley, 1996.


Parallelizing Boosting and Bagging - Yu, Skillicorn (2001)   (1 citation)  (Correct)

....One approach to predictor generation is to use small subsets of the training data, build predictors based on each subset, and then deploy a predictor that combines the predictions of each of the individual predictors, using either voting or regression. The advantage of such ensemble approaches [4] is that the combined prediction tends to have much smaller variance than that of a single monolithic predictor trained on the same dataset. Techniques based on small subsets have been shown to have the greatest known accuracies on several problems [6] 1 When the subsets are chosen ....

L. Breiman. Bias, variance and arcing classifiers. Technical Report 460, Department of Statistics, University of California, Berkeley, 1996.


Generalization in Threshold Networks, Combined Decision.. - Mason, Bartlett, Golea   (Correct)

....less than the maximum depth and total number of leaves respectively. 1. Introduction The idea of combining multiple classifiers by weighted majority vote has recently received much attention following the empirical success of techniques such as boosting [3, 4] bagging [1] and arcing [2]. Despite the significant performance gains exhibited over a single classifier, the theoretical basis for such improved performance has, until recently, been weak. In [8] Schapire et al. derive an upper bound for the generalization error of any convex combination of binary classifiers which ....

Leo Breiman. Bias, variance and arcing classifiers. Technical Report 460, Statistics Department, University of California, Berkeley, April 1996.


Bias, Variance and Prediction Error for Classification Rules - Robert Tibshirani.. (1996)   (1 citation)  (Correct)

....rules Robert Tibshirani Department of Preventive Medicine and Biostatistics and Department of Statistics University of Toronto Toronto, Canada April 3, 1996 c flUniversity of Toronto Abstract We study the notions of bias and variance for classification rules. Following Efron (1978) and Breiman (1996) we develop a decomposition of prediction error into its natural components. Then we derive bootstrap estimates of these components and illustrate how they can be used to describe the error behaviour of a classifier in practice. In the process we also obtain a bootstrap estimate of the error ....

....Bayes rule. Define the aggregated predictor by CA (t) j E F C(t; X ) 9) We imagine drawing an infinite collection of training sets and applying the classifier C(t; X ) to each. The elements of CA (t) are the proportions of times each class is predicted in the infinite collection, at input t. Breiman (1996) coined the term aggregated , and called its bootstrap estimate CA (t) E F [C(t; X ) the bagged ( bootstrap aggregated ) predictor. It is called a bootstrap smoothed estimate in Efron Tibshirani (1995) We discuss this more in section 3. In order to define the bias and variance of C, we ....

[Article contains additional citation context not shown here]

Breiman, L. (1996), Bias, variance and arcing classifiers, Technical report, University of California, Berkeley.


On the Optimality of the Simple Bayesian Classifier under.. - Pazzani (1997)   (107 citations)  (Correct)

....the relative behavior of estimation algorithms: those with greater representational power, and thus greater ability to respond to the sample, tend to have lower bias, but also higher variance. Recently, several authors (Kong Dietterich, 1995; Kohavi Wolpert, 1996; Tibshirani, 1996; Breiman, 1996; Friedman, 1996) have proposed similar bias variance decompositions for zero one loss functions. In particular, Friedman (1996) has shown, using normal approximations to the class probabilities, that the bias variance interaction now takes a very different form. Zero one loss can be highly ....

Breiman, L. (1996). Bias, variance and arcing classifiers (Technical Report 460). Statistics Department, University of California at Berkeley, Berkeley, CA. ftp://ftp.stat.berkeley.edu/users/breiman/arcall.ps.Z.


Decimated Input Ensembles for Improved Generalization - Tumer, Oza (1999)   (4 citations)  (Correct)

.... the individual classifiers to be combined, it is important to have classifiers that have complementary information, i.e. have the lowest possible correlation [1] 14] 20] 30] There are many methods for actively promoting diversity among the classifiers to be pooled, including bagging [5] [6], boosting [10] 11] cross validation partitioning [18] 30] and error correcting output codes [9] In this work, we present input decimation, a method that: ffl reduces the dimensionality of the data, thus lessening the impact of the curse of dimensionality ; ffl reduces the correlation ....

....Many combining methods incorporate (explicitly or implicitly) a correlation reduction method to improve generalization. Some partition the training set much like one does when using cross validation and train one classifier on each partition [18] 30] A better known method, bagging [5] [6], constructs several sets of m training examples drawn randomly with replacement out of the original set of m training examples and trains one classifier using each of these resampled training sets. Boosting [10] 11] also subsamples the input space, but the training samples are selected based ....

L. Breiman. Bias, variance and arcing classifiers. Technical Report 460, Department of Statistics, University of California, Berkeley, 1996.


Improving Bagging Performance by Increasing Decision Tree.. - Pfahringer, al.   (Correct)

....and an inherently parallel nature which encourages efficient implementation on multiprocessors an advantage not shared by boosting. However, boosting seems empirically to give better results, and explanations have been sought for this difference in terms of a decomposition into bias and variance[4, 2]. Furthermore, it has recently been noticed that boosting tends to produce a more diverse set of classifiers than bagging, and this has been cited as a factor in increased performance [11] The present paper explores the proposition that the higher performance of boosting can be explained by the ....

....(as claimed in [11] for example) that boosting assigns smaller weights to hypotheses with lower accuracy. A good hypothesis for the current probability distribution might have a considerable error rate on the original training set represented by the initial probability distribution. Unlike both [4] and [11] our implementation does not resample and restart in cases where ffl t exceeds 0.5. In preliminary experiments we found that crossing this threshold is a very good indicator that boosting is failing. Failure may be due to excessive noise in the training data, or to inadequate attributes, ....

[Article contains additional citation context not shown here]

Breiman L.: Bias, Variance and Arcing Classifiers, University of California, Statistics Department, Berkeley, CA, Technical Report 460, 1996.


Integrating Multiple Learned Models for Improving and Scaling.. - Most Modern   (Correct)

....on estimating the error of bagging (Wolpert Macready, 1997; Tibshirani, 1996; Breiman, 1996c) Bauer and Kohavi s article provides a large scale empirical comparison of a number of voting based algorithms for combining classifiers. Using fourteen data sets, they investigated variants of bagging (Breiman, 1996a) and boosting (Schapire, 1990) with decision tree (three variants) and Naive Bayes algorithms as the base inducers. They analyzed error rates through a decomposition of bias and variance and observed their influence on misclassification. Their results provide some insights on earlier observations (Breiman, 1996a) that ....

Breiman, L. (1996b). Bias, variance and arcing classifiers. Tech. rep. 460, University of California, Berkeley, CA.


Bayesian Neural Networks for Classification: How Useful is.. - Penny, Roberts (1999)   (5 citations)  (Correct)

....ARD is only seen to be beneficial for the Ionosphere data. For the other data sets the number of spurious inputs was not sufficient to upset non ARD methods. 3. 4 Committees of networks The error made by a classifier may be split up into two components; a bias component and a variance component (Breiman, 1996). If the classifiers are of sufficient complexity, such as MLPs with large numbers of hidden units, then the bias component will be small. The variance component, however, will be large. But if the networks are used in committees the variance component can be reduced, thus reducing the overall ....

Breiman, L. (1996) Bias, Variance and Arcing classifiers. Technical Report 460, Statistics Department, University of California.


Bias, Variance and Prediction Error for Classification Rules - Tibshirani (1996)   (1 citation)  (Correct)

....point t, C(t; X ) outputs ( 45, 55) or ( 9, 1) with probabilities 2=3 and 1=3. Then CA (t) 6; 4) and so predicts the first class. But if C(t; X ) outputs the class indicator, that is (0; 1) or (1; 0) with probabilities 2=3 and 1=3, then CA (t) 1=3; 2=3) and so predicts the second class. Breiman (1996) coined the term aggregated , and called its bootstrap estimate CA (t) E F [C(t; X ) the bagged ( bootstrap aggregated ) predictor. It is called a bootstrap smoothed estimate in Efron Tibshirani (1995) Bagging mimics aggregation by averaging C(t; X ) over training sets drawn from F ....

....the term aggregated , and called its bootstrap estimate CA (t) E F [C(t; X ) the bagged ( bootstrap aggregated ) predictor. It is called a bootstrap smoothed estimate in Efron Tibshirani (1995) Bagging mimics aggregation by averaging C(t; X ) over training sets drawn from F . In Breiman (1996), bagging is seen to reduce classification error by about 20 on average over a collection of problems. Breiman also reports very little difference when bagging was applied to the class probability estimates, or the corresponding indicator vector for the maximum probability. We discuss bagging ....

[Article contains additional citation context not shown here]

Breiman, L. (1996), Bias, variance and arcing classifiers, Technical report, University of California, Berkeley.


On the Optimality of the Simple Bayesian Classifier under.. - Domingos, Pazzani (1997)   (107 citations)  (Correct)

....the relative behavior of estimation algorithms: those with greater representational power, and thus greater ability to respond to the sample, tend to have lower bias, but also higher variance. Recently, several authors (Kong Dietterich, 1995; Kohavi Wolpert, 1996; Tibshirani, 1996; Breiman, 1996; Friedman, 1996) have proposed similar biasvariance decompositions for zero one loss functions. In particular, Friedman (1996) has shown, using normal approximations to the class probabilities, that the biasvariance interaction now takes a very different form. Zero one loss can be highly ....

Breiman, L. (1996). Bias, variance and arcing classifiers (Technical Report 460). Statistics Department, University of California at Berkeley, Berkeley, CA. ftp://ftp.stat.berkeley.edu/- users/breiman/arcall.ps.Z.


An Empirical Comparison Of two Sampling Techniques For Training.. - Lipnickas (2000)   (Correct)

No context found.

L. Breiman. Bias, variance and arcing classifiers. Technical report 460, Statistics Departament, University of California, Berkeley, 1996.


A Unified Bias-Variance Decomposition and its Applications - Domingos (2000)   (9 citations)  (Correct)

No context found.

Breiman, L. (1996b). Bias, variance and arcing classifiers (Technical Report 460). Statistics Department, University of California at Berkeley, Berkeley, CA.


A Unified Bias-Variance Decomposition for Zero-One and Squared Loss - Domingos (2000)   (9 citations)  (Correct)

No context found.

Breiman, L. 1996b. Bias, variance and arcing classifiers. Technical Report 460, Statistics Department, University of California at Berkeley, Berkeley, CA.


On Combining Artificial Neural Nets - Sharkey (1996)   (26 citations)  (Correct)

No context found.

Breiman, L. (1996a) Bias, Variance and Arcing Classifiers. Technical Report 460.


Bayesian Integration of Rule Models - Domingos   (Correct)

No context found.

Breiman, L. (1996b). Bias, variance and arcing classifiers. Technical Report 460, Statistics Department, University of California at Berkeley, Berkeley, CA. ftp://- ftp.stat.berkeley.edu/users/breiman/arcall.ps.Z.


Do Not Forget: Full Memory in Memory-Based Learning of.. - van den Bosch, Daelemans (1998)   (3 citations)  (Correct)

No context found.

Breiman, L. 1996b. Bias, variance and arcing classifiers.


Multi-Agent Reinforcement Learning: Weighting and Partitioning - Sun, Peterson (1999)   (5 citations)  (Correct)

No context found.

L. Breiman, (1996c). Bias, variance and arcing classifiers. Technical Report 460. University of California, Berkeley.


The Out-of-Bootstrap Method for Model Averaging and Selection - Sunil Rao (1996)   (4 citations)  (Correct)

No context found.

Breiman, L. (1996b), Bias, variance and arcing classifiers, Technical report, University of California, Berkeley.


Complements to 'Pattern Recognition and Neural Networks' - Ripley (1996)   (484 citations)  (Correct)

No context found.

Breiman, L. (1996c) Bias, variance and arcing classifiers. Technical report, Department of Statistics, University of California at Berkeley.


Why Does Bagging Work? A Bayesian Account and its Implications - Domingos   (Correct)

No context found.

Breiman, L. 1996b. Bias, variance and arcing classifiers.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC