34 citations found. Retrieving documents...
Breiman, L. (1996a): The heuristic of instability in model selection. Annals of Statistics, 24, 2350--2383.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Distributed Data Mining Systems - Prodromidis (1999)   (Correct)

....classifiers into an ensemble meta classifier by learning how they predict, i.e. by observing their input output behavior. 16 Several methods for integrating ensembles of models have been studied, including techniques that combine the set of models in some linear fashion [ Ali Pazzani, 1996; Breiman, 1994; 1996; Freund Schapire, 1995; Krogh Vedelsby, 1995; LeBlanc Tibshirani, 1993; Littlestone Warmuth, 1989; Opitz Shavlik, 1996; Perrone Cooper, 1993; Schapire, 1990; Tresp Taniguchi, 1995 ] techniques that employ referee functions to arbitrate among the predictions generated by the ....

....number of examples and hence as the number of the base classifiers increases (the overall time 20 complexity of SCANN is O( M K) 2.3.4 Other Meta Learning Techniques For completeness we briefly outline some other popular methods for forming ensembles of classifiers. Bagging Bagging [ Breiman, 1994 ] employs sampling techniques to generate many training subsets of di#erent distribution over which it computes multiple models. The method combines the models using unweighted majority voting. Boosting Boosting [ Freund Schapire, 1995; Schapire, 1990 ] learns a set of classifiers in a ....

Breiman, L. 1994. Heuristics of instability in model selection. Technical report, Department of Statistics, University of California at Berkeley.


Effective and Efficient Pruning of Meta-Classifiers in a .. - Prodromidis, Stolfo.. (1998)   (Correct)

.... Machine Learning and KDD literature that compute, evaluate and combine ensembles of classifiers [10] Most of these algorithms, however, generate their classifiers by applying the learning algorithms on variants of the same data set or use different approaches to combine the derived classifiers [2, 11, 15, 16, 17, 26, 27, 35, 39]. Breiman [3] and LeBlanc and Tibshirani [19] for example, acknowledge the value of using multiple predictive models to increase accuracy, but they rely on cross validation data and analytical methods, e.g. least squares regression) to compute the best linear combination of the available ....

L. Breiman. Heuristics of instability in model selection. Technical report, Department of Statistics, University of California at Berkeley, 1994.


Effective Pruning of Neural Network Classifier Ensembles - Lazarevic, Obradovic   (Correct)

....When predictions from these versions are combined, more stable and smother predictions are generated. When applied to neural networks, these techniques can yield dramatic improvements in generalization performance [4, 5] The reason for this is because neural networks are inherently unstable [1, 6], i.e. small changes in training set and or parameter selection may produce large changes in performance. The most of combination methods for classifiers assume that the classifiers forming the classifier ensemble have to be both diverse and accurate. The diversity assumption means that the ....

Breiman, L., Heuristic of instability in model selection, Technical Report, Statistics Department, University of California at Berkley, 1996.


Analysis of a Pseudo-Bayesian Prediction Method - Yoav Freund Att   (Correct)

....prior distribution. The prior and posterior distributions are internal to the algorithm and are not part of the world around it. We hope that this paper will shed some new light on the use of algorithms that average many hypotheses such as Bayesian algorithms and averaging methods such as bagging [2, 3]. The paper is organized as follows. We start in Itwsection II by describing the prediction algorithm. We give the basic analysis of the algorithm in Itwsection III. In Itwsection IV, we bound the performance of (x) in terms of the error of the best hypothesis in the class. We conclude in ....

Leo Breiman. The heuristics of instability in model selection. The Annals of Statistics, 24:2350--2383, 1996.


Cost Complexity-based Pruning of Ensemble Classifiers - Prodromidis, Stolfo (1999)   (6 citations)  (Correct)

....It also provides a set of meta learning agents that combine the computed models that were learned (perhaps) at di#erent sites. Several methods for integrating ensembles of models have been studied, including techniques that combine the set of models in some linear fashion (Ali and Pazzani, 1996; Breiman, 1994; Breiman, 1996; Freund and Schapire, 1995; Krogh and Vedelsby, 1995; Opitz and Shavlik, 1996; Perrone and Cooper, 1993; Schapire, 1990; Tresp and Taniguchi, 1995; LeBlanc and Tibshirani, 1993) techniques that employ referee functions to arbitrate among the predictions generated by the ....

Breiman, L. (1994), Heuristics of instability in model selection, Technical report, Department of Statistics, University of California at Berkeley.


Distributed Data Mining: The JAM System Architecture - Prodromidis, Stolfo..   (1 citation)  (Correct)

....seeks to compute a meta classi er that integrates in some principled fashion the separately learned classi ers to boost overall predictive accuracy. Several methods for integrating ensembles of models have been studied, including techniques that combine the set of models in some linear fashion [1, 3, 4, 17, 24, 25, 27, 33, 35, 51, 54], e.g. majority or weighted voting, bagging, etc. techniques that employ referee functions to arbitrate among the predictions generated by the classi ers [7, 20, 22, 50, 21, 23, 34] e.g. arbiters, mixture of experts, etc. methods that rely on principal components analysis [29, 31] e.g. ....

L. Breiman. Heuristics of instability in model selection. Technical report, Department of Statistics, University of California at Berkeley, 1994.


Cost Complexity Pruning of Ensemble Classifiers - Prodromidis, Stolfo   (Correct)

....distribution mechanism. It also provides a set of meta learning agents that combine the computed models that were learned (perhaps) at di#erent sites. Several methods for integrating ensembles of models have been studied, including techniques that combine the set of models in some linear fashion [1, 2, 3, 12, 20, 27, 29, 37, 39, 21], techniques that employ referee functions to arbitrate among the predictions generated by the classifiers, 16, 17, 18, 19, 28, 36] methods that rely on principal components analysis [23, 24] or methods that apply inductive learning techniques to learn the behavior and properties of the ....

L. Breiman. Heuristics of instability in model selection. Technical report, Department of Statistics, University of California at Berkeley, 1994.


An Empirical Comparison of Voting Classification Algorithms.. - Bauer, Kohavi (1999)   (153 citations)  (Correct)

....the training set. For large m, this is about 1 Gamma 1=e = 63:2 , which means that each bootstrap sample contains only about 63.2 unique instances from the training set. This perturbation causes different classifiers to be built if the inducer is unstable (e.g. neural networks, decision trees) (Breiman 1994) and the performance can improve if the induced classifiers are good and not correlated; however, Bagging may slightly degrade the performance of stable algorithms (e.g. k nearest neighbor) because effectively smaller training sets are used for training each classifier (Breiman 1996b) 4.2. ....

Breiman, L. (1994), Heuristics of instability in model selection, Technical Report Statistics Department, University of California at Berkeley.


Why Averaging Classifiers Can Protect Against Overfitting - Freund, Mansour, al. (2001)   (3 citations)  (Correct)

....alternative to predicting Part of this work was done while visiting AT T Labs. using the single best hypothesis is to average the prediction of those hypotheses whose performance on the training set is close to optimal. Two popular methods of this type are Bayesian averaging [9] and bagging [3, 4]. There is considerable experimental evidence that such averaging can significantly reduce the amount of overfitting suffered by the learning algorithm. However, there is, we believe, a lack of theory for explaining this reduction. In the context of bagging, the common explanation is based on the ....

....prior distribution. The prior and posterior distributions are internal to the algorithm and are not part of the world around it. We hope that this paper will shed some new light on the use of algorithms that average many hypotheses such as Bayesian algorithms and averaging methods such as bagging [3, 4]. The paper is organized as follows. We start in Section 2 by describing the prediction algorithm. We give the basic analysis of the algorithm in Section 3. In Section 4, we bound the performance of (x) in terms of the error of the best hypothesis in the class. We conclude in Section 5 by giving ....

Leo Breiman. The heuristics of instability in model selection. The Annals of Statistics, 24:2350--2383, 1996. 1 We can prove a similar result for j ? 0 using a slightly more complicated proof. However, because j is typically large in this paper, we omit this proof. 7


Why Averaging Classifiers Can Protect Against Overfitting - Freund, Mansour, Schapire (2001)   (3 citations)  (Correct)

....is indeed the only or the best thing to do. One popular alternative to predicting using the single best hypothesis is to average the prediction of those hypotheses whose performance on the training set is close to optimal. Two popular methods of this type are Bayesian averaging [9] and bagging [3, 4]. There is considerable experimental evidence that such averaging can significantly reduce the amount of overfitting suffered by the learning algorithm. However, there is, we believe, a lack of theory for explaining this reduction. In the context of bagging, the common explanation is based on the ....

....prior distribution. The prior and posterior distributions are internal to the algorithm and are not part of the world around it. We hope that this paper will shed some new light on the use of algorithms that average many hypotheses such as Bayesian algorithms and averaging methods such as bagging [3, 4]. The paper is organized as follows. We start in Section 2 by describing the prediction algorithm. We give the basic analysis of the algorithm in Section 3. In Section 4, we bound the performance of (x) in terms of the error of the best hypothesis in the class. We conclude in Section 5 by giving ....

Leo Breiman. The heuristics of instability in model selection. The Annals of Statistics, 24:2350--2383, 1996.


Using Correspondence Analysis to Combine Classifiers - Merz (1998)   (27 citations)  (Correct)

....used are filled with examples and not sparsely populated or irrelevant. Again, the choice of dimensionality, K , will help to ensure that the dimensions retained contain relevant information about the predictions of the learned models. 3. The nearest neighbor algorithm is stable. Breiman [2] defines the stability of an algorithm as its sensitivity to minor changes in the training data. Stable algorithms are not sensitive to small changes in the training data, unstable algorithms are. A general heuristic is to have the Level 0 learners be unstable, 16 CHRISTOPHER MERZ thus producing ....

....in this work is limited in that it handles redundancy in the model set poorly, i.e. several very similar models will receive the same weight, possibly overpowering the vote of another model making a unique contribution. Two other methods for assigning fixed weights to each model are Bagging [2] and Boosting [34] These methods are tightly coupled to the model generation phase rather than being general combining techniques. The goal is to generate a set of models which are likely to make uncorrelated errors (or to have higher variance) thus increasing the potential payoffs in the ....

L. Breiman. Heuristics of instability in model selection. Technical report, Department of Statistics, University of California at Berkeley, 1994.


Meta-Learning in Distributed Data Mining Systems: Issues.. - Prodromidis, Chan, al. (2000)   (34 citations)  (Correct)

.... Learning and KDD literature that compute, evaluate and combine ensembles of classifiers [19] can be considered as weighted voting among hypothesis (models) Furthermore, most of these algorithms, generate their classifiers by applying the same learning algorithms on variants of the same data set [3, 24, 30, 32, 34, 48, 49, 62, 68]. Breiman [4] and LeBlanc and Tibshirani [35] for example, acknowledge the value of using multiple predictive models to increase accuracy, but they rely on cross validation data and analytical methods, e.g. least squares regression) to compute the best linear 9 combination of the available ....

L. Breiman. Heuristics of instability in model selection. Technical report, Department of Statistics, University of California at Berkeley, 1994.


Tuning Diversity in Bagged Neural Network Ensembles - Carney, Cunningham (1999)   (8 citations)  (Correct)

....from these versions are combined (averaged for example) smoother more stable predictions are generated. When applied to neural networks, these techniques can yield dramatic improvements in generalization performance (see e.g. 5, 7] This is because neural networks are inherently unstable [2, 4] i.e. small changes in training set and or parameter selection can produce large changes in performance. This idea of combining predictions from multiple versions has been around for quite a while its origins in the neural network literature can be traced back to as early as 1965 [23] It has ....

L. Breiman, Heuristics of instability in model selection, Technical Report, Statistics Department, University of California at Berkeley, California, 1994.


A Principal Components Approach to Combining Regression Estimates - Merz, PAZZANI (1998)   (14 citations)  (Correct)

....is generated using the same algorithm, but different training data. The data for a particular model is obtained by sampling from the original training examples according to a probability distribution. The probability distribution is defined by the particular approach, Bagging or Boosting. Bagging [1] is a method for exploiting the variance of a learning algorithm by applying it to various version of the data set, and averaging them (uniformly) for an overall reduction in variance, or prediction error. Variations on the training data are obtained by sampling from the original training data ....

L. Breiman. Heuristics of instability in model selection. Technical report, Department of Statistics, University of California at Berkeley, 1994.


Why Averaging Classifiers Can Protect Against Overfitting - Freund, Mansour, al. (2000)   (3 citations)  (Correct)

....is indeed the only or the best thing to do. One popular alternative to predicting using the single best hypothesis is to average the prediction of those hypotheses whose performance on the training set is close to optimal. Two popular methods of this type are Bayesian averaging [9] and bagging [3, 4]. There is considerable experimental evidence that such averaging can significantly reduce the amount of overfitting suffered by the learning algorithm. However, there is, we believe, a lack of theory for explaining this reduction. In the context of bagging, the common explanation is based on the ....

....prior distribution. The prior and posterior distributions are internal to the algorithm and are not part of the world around it. We hope that this paper will shed some new light on the use of algorithms that average many hypotheses such as Bayesian algorithms and averaging methods such as bagging [3, 4]. 2 The paper is organized as follows. We start in Section 2 by describing the prediction algorithm. We give the basic analysis of the algorithm in Section 3. In Section 4, we bound the performance of (x) in terms of the error of the best hypothesis in the class. We conclude in Section 5 by ....

Leo Breiman. The heuristics of instability in model selection. The Annals of Statistics, 24:2350--2383, 1996.


Stability Problems with Artificial Neural Networks and.. - Cunningham, Carney.. (1999)   (7 citations)  (Correct)

....of clinical applications of ANNs [2] This is despite significant practical problems with their application. A practical problem that has come to prominence recently is that ANNs are unstable predictors. That is to say that small changes in the training data set may produce very different models [4][5] 7] and consequently different performance on unseen data. Breiman suggests that these different models may result from the training of the ANN getting caught in different local minima in the error surface [5] In this paper we show that this instability means that estimations of the ....

Breiman L., Heuristics of instability in model selection, Technical Report No. 416, Statistics Department, University of California at Berkeley, 1994.


Selection of Learning Algorithms for Trading Systems Based .. - Obradovic, Chenoweth (1996)   (1 citation)  (Correct)

....size in a manner which minimizes predictive information loss. The feature selection technique used in our trading system consisted of combining the results from several different statistically based methods into a final feature set. This reduces the instability problems described by Breiman in [2]. Breiman pointed out that small changes in the data set used in the feature selection process can cause drastic changes in the final set of features. To minimize this effect for multivariate financial time series, we suggested a procedure which averages the results of several feature selection ....

Breiman L., "The Heuristics of Instability in Model Selection," Technical Report No. 416, Statistics Department, University of California, Berkeley.


Effective and Efficient Pruning of Meta-Classifiers in a.. - Prodromidis, Stolfo (1999)   (Correct)

.... Machine Learning and KDD literature that compute, evaluate and combine ensembles of classifiers [10] Most of these algorithms, however, generate their classifiers by applying the learning algorithms on variants of the same data set or use different approaches to combine the derived classifiers [2, 11, 15, 16, 17, 26, 27, 35, 39]. Breiman [3] and LeBlanc and Tibshirani [19] for example, acknowledge the value of using multiple predictive models to increase accuracy, but they rely on cross validation data and analytical methods, e.g. least squares regression) to compute the best linear combination of the available ....

L. Breiman. Heuristics of instability in model selection. Technical report, Department of Statistics, University of California at Berkeley, 1994.


The NeuralBAG Algorithm: Optimizing Generalization.. - Carney, Cunningham (1998)   (2 citations)  (Correct)

....by 31 and 40 respectively. We also show that, on average, bagged networks trained using NeuralBAG out perform bagged networks trained using k fold cross validation by 17 . 1 Introduction One fundamental weakness of neural networks is that they are unstable or exhibit high variance [3] (i.e. small changes in training set and or parameter selection can cause large changes in performance. This instability is magnified when real world systems such as foreign exchange markets are modelled because, typically, only a limited amount of useful training data is available. One way to ....

....multiple versions of a predictor which, when combined, should perform better than a single predictor built to solve the same problem. For bagging to work it is very important that the predictor is unstable bagging can actually degrade the performance of stable predictors [1] Breiman showed [3] that techniques such as neural networks, classification trees and regression trees are unstable, while techniques such as k nearest neighbour methods are stable. To symbolically formalise how bagging works, let us first assume we have a predictor (in our case a neural network) constructed using a ....

L. Breiman. Heuristics of instability in model selection. Technical Report TR416, Statistics Department, University of California at Berkeley, California, 1994.


A Multi-Component Nonlinear Prediction System for the S&P.. - Chenoweth, Obradovic (1996)   (8 citations)  (Correct)

....processes rely on fundamentally sound statistically based techniques [11] These techniques are practical, easy to understand, and easily implemented. However, they suffer from instability problems, meaning that small data perturbations lead to drastic changes in the final reduced feature set [3]. This problem is especially pronounced for stock market models because the data is non stationary and very noise. For these reasons, the feature selection procedure adopted in this paper is to use several Down NN Up NN Historical Data Predicted Rate Feature selection Predicted Rate Decision ....

L. Breiman, "The Heuristics of Instability in Model Selection," Technical Report No. 416, Statistics Department, University of California, Berkeley, CA, 1994.


An Explicit Feature Selection Strategy for Predictive.. - Chenoweth, Obradovic   (1 citation)  (Correct)

....In addition to computational efficiency, another important issue is how to deal with instability problems. A feature selection procedure is unstable if a small change in the data used in the selection process results in drastic changes in the selected feature set. Recently, it was suggested by Breiman [1994] that unstable procedures can be stabilized by averaging the results from several different feature selection processes. Therefore, to reduce the level of instability this study is performing feature selection using several combinations of the selection techniques and selection criteria discussed ....

Breiman, L., [1994] "The Heuristics of Instability in Model Selection," Technical Report No. 416, Statistics Department, University of California, Berkeley.


An Empirical Comparison of Voting Classification Algorithms.. - Bauer, Kohavi (1998)   (153 citations)  (Correct)

....from the training set. For large m, this is about 1 1 e = 63.2 , which means that each bootstrap sample contains only about 63.2 unique instances from the training set. This perturbation causes di#erent classifiers to be built if the inducer is unstable (e.g. neural networks, decision trees) (Breiman 1994) and the performance can improve if the induced classifiers are good and not correlated; however, Bagging may slightly degrade the performance of stable algorithms (e.g. k nearest neighbor) because e#ectively smaller training sets are used for training each classifier (Breiman 1996b) 4.2. ....

Breiman, L. (1994), Heuristics of instability in model selection, Technical Report Statistics Department, University of California at Berkeley.


Error Reduction through Learning Multiple Descriptions - Ali, Pazzani (1996)   (76 citations)  (Correct)

....members of the ensemble are all competent (accurate) The hold back approach would seem to be an obvious approach. However, for some of the small data sets presented here, using a hold back set may decrease accuracy since there would not be enough examples to learn good models. 6. Previous work Breiman (in press; 1994) provides a characterization of learning algorithms which are amenable to the multiple models approach. He puts forward the notion of an unstable algorithm an algorithm for which small perturbations in the training set will lead to significant differences in predicted classifications on an ....

Breiman, L. (1994.) Heuristics of instability in model selection. (Technical Report University of California, Berkeley). Statistics Department, . Breiman, L. (in press.) Bagging Predictors Machine Learning, ?, ?.


Analysis of a Bias Effect in a Tree-Based Variable Importance .. - Marco Sandri And   (Correct)

No context found.

Breiman, L. (1996a): The heuristic of instability in model selection. Annals of Statistics, 24, 2350--2383.


Wrappers For Performance Enhancement And Oblivious Decision Graphs - Kohavi (1995)   (43 citations)  (Correct)

No context found.

BIBLIOGRAPHY 252 Breiman, L. (1994b), Heuristics of instability in model selection, Technical Report Statistics Department, University of California at Berkeley.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC