27 citations found. Retrieving documents...
A. Krogh and J. Vedelsby. Neural network ensembles, cross validation and active learning. In Advances in Neural Information Processing Systems 7, 1995.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

An Experimental Study on Diversity for Bagging and.. - Kuncheva, Skurichina.. (2002)   (1 citation)  (Correct)

....combination aims at a higher accuracy than that of a single D. The literature on classi er combination highlights the necessity of measuring and using the degree of diversity, independence, orthogonality, complementarity, etc, which are intuitively desirable characteristics of a classi er team [5, 13, 16, 23, 29, 34]. Theoretically, a group of independent classi ers will improve upon the single classi er when majority vote combination is used. A dependent set of classi ers may be either better or worse [22] Sometimes the di erence is bene cial to the ensemble and yet sometimes it might be harmful. There ....

A. Krogh and J. Vedelsby. Neural network ensembles, cross validation and active learning. In G. Tesauro, D.S. Touretzky, and T.K. Leen, editors, Advances in Neural Information Processing Systems, volume 7, pages 231-238. MIT Press, Cambridge, MA, 1995.


Effective Pruning of Neural Network Classifier Ensembles - Lazarevic, Obradovic   (Correct)

....methods for classifiers assume that the classifiers forming the classifier ensemble have to be both diverse and accurate. The diversity assumption means that the classifiers have to make independent classification errors, in order to improve overall prediction accuracy. Both theoretical [7, 8] and empirical work [9, 10] has shown that a good neural network ensemble is one where the individual networks are both accurate and make errors on different parts of the input space. For example, Hansen and Salamon [7] have shown that a multiple classifier system based on a simple majority ....

Krogh, A., Vedelsby, J., Neural Network Ensembles, Cross Validation and Active Learning , in Tesauro, G., Touretzky, D., and Leen, T., (Eds.), Advances in Neural Information Processing Systems, vol. 7, MIT Press, 1995.


Comparing Decomposition Methods for Classification - Masulli, Valentini (2000)   (Correct)

....have been obtained on two data sets from the UCI repository [11] ECOC have been generated through BoseChauduri Hocquenghem (BCH) 3] and exhaustive [5] algorithms. The comparison of the di erent classi cation decomposition methods was performed using resampling and cross validation methods [7] for estimating the expected misclassi cation risk. As shown in Fig. 1a, on the rst data base we have considered, constituted by the glass data set, the decomposition methods based on CC, ECOC BCH and exhaustive ECOC perform better than those based on OPC. No signi cant di erences can be noticed ....

A. Krogh, J. Vedelsby. Neural networks ensembles, cross validation and active learning. In Touretzky D. S., Tesauro G., Leen T.K., editors, Advances in Neural Information Processing Systems, volume 7, pages 107115. MIT Press, Cambridge, MA, 1995.


Mutual Information Methods for Evaluating Dependence Among.. - Masulli, Valentini (2001)   (Correct)

....E , i.e. I A E = 1 N N X i=1 I A E i ; I B E = 1 N N X i=1 I B E i (10) where N is the number of evaluations. These can be attained running several times the learning algorithm with di erent initial conditions, or using resampling techniques, such as k fold cross validation [19]. We assume that both I A E i and I B E i ; i 2 f1; Ng are randomly drawn from a normal distribution with means A and B respectively and unknown 1 Here we consider I E , but the same argumentations can be applied to I SE . 14 variance. Hence I A E and I B E (eq. 10) are ....

A. Krogh and J. Vedelsby. Neural networks ensembles, cross validation and active learning. In Touretzky D. S., Tesauro G., Leen T.K., editor, 26 Advances in Neural Information Processing Systems, volume 7, pages 107-115. MIT Press, Cambridge, MA, 1995.


Ten Measures of Diversity in Classifier Ensembles: Limits.. - Kuncheva, Whitaker (2001)   (10 citations)  (Correct)

....be points for which none of the classi ers is correct. 4) It is recognized that a negative correlation should be pursued when designing classi er ensembles, and many such design methods have been proposed, predominantly altering the available training set to build 1 the individual classi ers [3, 8, 12, 17, 19, 20, 5, 23, 24, 25, 29]. Practically, there is no unique choice of a measure of diversity or dependence. There are pairwise measures which are calculated for each pair of classi ers in D and then averaged [25, 5, 28, 29, 27, 10, 9] and nonpairwise measures that either use the idea of entropy or correlation of ....

....of a measure of diversity or dependence. There are pairwise measures which are calculated for each pair of classi ers in D and then averaged [25, 5, 28, 29, 27, 10, 9] and nonpairwise measures that either use the idea of entropy or correlation of individual outputs with the averaged output of D [1, 2, 11, 12, 19, 23], or are based on the distribution of diculty of the data points [7, 6, 16, 30, 22, 21] In this study we present 10 measures of classi er diversity for oracle classi er outputs: 4 pairwise and 6 non pairwise. We give the limits of the measures for the case of L = 2 classi ers of the same ....

A. Krogh and J. Vedelsby. Neural network ensembles, cross validation and active learning. In G. Tesauro, D.S. Touretzky, and T.K. Leen, editors, Advances in Neural Information Processing Systems, volume 7, pages 231-238. MIT Press, Cambridge, MA, 1995.


Ensembles of Nonconformist Neural Networks - van Wezel, Out, Kosters   (Correct)

....are bagging and bumping. In bagging, w k = 1=K; k = 1; K, so all networks are equally important in the combination. In bumping, only the network with the lowest error on the training data is used for the combined forecast, so w = 1 and w k = 0; k 6= More sophisticated schemes (see [1, 2, 5, 4, 7]) attempt to minimize a cost function that expresses the quality of the combination as a function of the ensemble weights w. This cost function contains the covariance matrix of the errors of the ensemble members as a component. In fact, it can be written as w T w (see [7] This is just the ....

A. Krogh and J. Vedelsby. Neural networks, cross validation and active learning. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing 7, pages 231-238. Morgan Kaufmann, 1995.


Measures of Diversity in Classifier Ensembles - Kuncheva, Whitaker (2003)   (15 citations)  (Correct)

....data point, and Level 4, where there might be points for which none of the classi ers is correct. It is recognized that a negative correlation should be pursued when designing classi er ensembles, and many such design methods have been proposed, predominantly altering the available training set [5, 11, 13, 20, 22, 23, 9, 24, 27, 28, 31]. Here we are interested in designing measures of diversity which are related to the majority vote accuracy in the case of correct incorrect type of output. As the example in the introduction shows, the relationship between P maj and the diversity is not straightforward (cf [15] The study is to ....

A. Krogh and J. Vedelsby. Neural network ensembles, cross validation and active learning. In G. Tesauro, D.S. Touretzky, and T.K. Leen, editors, Advances in Neural Information Processing Systems, volume 7, pages 231-238. MIT Press, Cambridge, MA, 1995.


Decimated Input Ensembles for Improved Generalization - Tumer, Oza (1999)   (4 citations)  (Correct)

.... to have classifiers that have complementary information, i.e. have the lowest possible correlation [1] 14] 20] 30] There are many methods for actively promoting diversity among the classifiers to be pooled, including bagging [5] 6] boosting [10] 11] cross validation partitioning [18], 30] and error correcting output codes [9] In this work, we present input decimation, a method that: ffl reduces the dimensionality of the data, thus lessening the impact of the curse of dimensionality ; ffl reduces the correlation among the classifiers by training them on different ....

....combiner performance. III. Input Decimated Features Many combining methods incorporate (explicitly or implicitly) a correlation reduction method to improve generalization. Some partition the training set much like one does when using cross validation and train one classifier on each partition [18], 30] A better known method, bagging [5] 6] constructs several sets of m training examples drawn randomly with replacement out of the original set of m training examples and trains one classifier using each of these resampled training sets. Boosting [10] 11] also subsamples the input ....

A. Krogh and J. Vedelsby. Neural network ensembles, cross validation and active learning. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems-7, pages 231--238. M.I.T. Press, 1995.


Accurate Modelling with Minimised Data Collection - An.. - RayChaudhuri, Hamey (1996)   (Correct)

....criterion to our sample and repeat the process of having several models examine random subsamples, then after several iterations of this process the models must agree closely at some stage. Our ideas are based upon active learning concepts introduced by Cohn at al [1, 2] and Krogh and Vedelsby [5]. The emphasis of our algorithm however is upon minimising data gathering without having to compromise modelling accuracy, i.e. increasing generalisation error. Instead of adding one labeled point at a time and having all the data examined by all the neural networks in the ensemble (or ....

....Passive Data Samples (each point averaged over 20 experiments) Fig. 7: Active Data Sampling versus Passive 1.5 1 0.5 0.5 1 1.5 1.5 1 0.5 0 0.5 1 1.5 Fig. 8: Square Wave Step used in Definitive Experiments 4. Higher Sampling Density around Significant Inputs Krogh and Vedelsby s investigations [5] did not lead to the definition of a stopping criterion, nor was an analysis made of the amount of data collected. The studies reported in [5] concentrated upon the difference in generalisation error in the cases of active and passive learning. In this work we have added a stopping criterion to ....

[Article contains additional citation context not shown here]

Anders Krogh and Jesper Vedelsby. Neural network ensembles, cross validation and active learning. In G. Tesauro, D.S. Touretzky, and T.K. Leen, editors, Advances in Neural In- formation Processing Systems 7. MIT Press, Cambridge MA, 1995.


An Algorithm for Active Data Collection for Learning -.. - RayChaudhuri, Hamey (1995)   (2 citations)  (Correct)

....error as well as the amount of data sampled an apparently contradictory pair of aims. We propose a means of achieving both objectives simultaneously in this work. We have based our ideas upon concepts already introduced by researchers such as Cohn et al. at MIT [2, 3] and Krogh and Vedelsby [7]. 2 Existing Methods and a Data Minimising Approach Recent proposed schemes of active learning in the neural network literature have covered both active data subset selection as well as active selection of unlabeled data. 2.1 Active Data Subset Selection Tamburini and Davoli [14] have ....

....the same time not increase our generalisation error, i.e. retain modelling accuracy. Cohn et al. [2, 3] have carried out active learning by choosing unlabeled inputs that minimise the expected value of the learner s mean squared error. We use the query by committee approach. Krogh and Vedelsby [7] have successfully applied this idea to neural network learning; however their emphasis has been upon reducing the generalisation error of a neural network ensemble rather than minimising data collection. They began training with one example and added one point at a time corresponding to maximum ....

[Article contains additional citation context not shown here]

Anders Krogh and Jesper Vedelsby. Neural network ensembles, cross validation and active learning. In G. Tesauro, D.S. Touretzky, and T.K. Leen, editors, Advances in Neural Information Processing Systems 7. MIT Press, Cambridge MA, 1995.


Maximum Likelihood Weights for a Linear Ensemble of.. - van Wezel, Kosters, Kok (1998)   (Correct)

....a well known statistical method. The various combination methods are tested on several data sets, with encouraging results. KEYWORDS: ensembles, maximum likelihood, neural networks 1. Introduction Recently, neural network researchers have devoted a lot of attention to neural network ensembles [4, 6, 5]. In neural network ensembles, the outputs of different neural networks trained for the same problem are used to form a combined prediction. This combination is usually linear. Much effort has been put into determining the best coefficients of the linear combination, but the currently existing ....

....on the training data. This way, all but one of the outputs are discarded. This method is sometimes referred to as bumping. Despite of the obvious simplicity, these schemes often yield a significant improvement over the average performance of the ensemble members. A third method, introduced in [6], is more sophisticated. It attempts to find optimized values for the elements of w. This is done by introducing two constraints on the weights, namely the constraint that the weights sum to one ( P k w k = 1) and a positivity constraint on the weights (8k : w k 0) and then writing the error ....

A. Krogh and J. Vedelsby. Neural networks, cross validation and active learning. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing 7, pages 231--238. Morgan Kaufmann, 1995.


Face Recognition Using Hybrid Classifier Systems - Gutta, Wechsler (1996)   (14 citations)  (Correct)

.... are examples of such an approach and they are usually implemented using ensembles of networks If one were to define ambiguity as the variation of the output ensemble members over unlabeled ( test ) data, the disagreement between networks can be quantified and corrective training take place[6]. An active learning scheme, corresponding to corrective training, would retrain on those test examples for whom the ensemble strongly disagrees. 3. Face Recognition Identifying people seems straightforward people do it all the time in business and in social encounters. Automated ....

Krogh, A., and Vedelsby, J., Neural Network Ensembles, Cross Validation and Active Learning, NIPS, 7, Morgan Kaufmann, 1995.


Cost-Effective Querying Leading To Dual Control - RayChaudhuri, Hamey (1996)   (Correct)

.... [21, 7] is a query filter that uses Cohn s general notion of selective sampling from a region of uncertainty [2] It leads to building effective querying algorithms that have been applied to different problem scenarios by Krogh and 2 QUERY BY COMMITTEE AND THE STATISTICAL JACK KNIFE 2 Vedelsby [11], by Matan [13] and in our earlier work [16, 17, 18] 1.1 The Current Investigation Selectively sampling query filtered data from an environment is a major step towards reducing the cost of data labeling in a learning process. However, even in a selective querying method the costs associated ....

....of hypotheses are in disagreement over the label of a point then it becomes necessary to query the environment for its label and to add it to the training set [7] The level of disagreement between a committee of hypotheses is therefore the criterion for querying. In Krogh and Vedelsby s paper [11] and our earlier work [16, 17, 18] the statistical variance function of the different hypotheses propounded by a committee of feedforward neural networks, is computed. This function is defined to quantify the level of disagreement hence the point corresponding to the global maximum of this ....

[Article contains additional citation context not shown here]

Anders Krogh and Jesper Vedelsby. Neural network ensembles, cross validation and active learning. In G. Tesauro, D.S. Touretzky, and T.K. Leen, editors, Advances in Neural Information Processing Systems 7. MIT Press, Cambridge MA, 1995.


Unknown -   Self-citation (Ensembles)   (Correct)

No context found.

A. Krogh and J. Vedelsby. Neural network ensembles, cross validation and active learning. In Advances in Neural Information Processing Systems 7, 1995.


Constructing Diverse Classifier Ensembles using Artificial.. - Melville, Mooney (2003)   (4 citations)  Self-citation (Ensembles)   (Correct)

No context found.

A. Krogh and J. Vedelsby. Neural network ensembles, cross validation and active learning. In Advances in Neural Information Processing Systems 7, 1995.


Constructing Diverse Classifier Ensembles using Artificial.. - Melville, Mooney   (1 citation)  Self-citation (Ensembles)   (Correct)

.... methods [Hastie et al. 2001] Constructing a diverse committee in which each hypothesis is as different as possible (decorrelated with other members of the ensemble) while still maintaining consistency with the training data is known to be a theoretically important property of a good committee [Krogh and Vedelsby, 1995] . Although all successful ensemble methods encourage diversity to some extent, few have focused directly on the goal of maximizing diversity. Existing methods that focus on achieving diversity [Opitz and Shavlik, 1996; Rosen, 1996] are fairly complex and are not general meta learners like bagging ....

....Cross validated learning curves support the hypothesis that DECORATEd trees generally result in greater classification accuracy for small training sets. 2 Ensembles and Diversity In an ensemble, the combination of the output of several classifiers is only useful if they disagree on some inputs [Krogh and Vedelsby, 1995] . We refer to the measure of disagreement as the diversity of the ensemble. There have been several methods proposed to measure ensemble diversity [Kuncheva and Whitaker, 2002] usually dependent on the measure of accuracy. For regression, where the mean squared error is commonly used to ....

[Article contains additional citation context not shown here]

A. Krogh and J. Vedelsby. Neural network ensembles, cross validation and active learning. In Advances in Neural Information Processing Systems 7, 1995.


That Elusive Diversity in Classifier Ensembles - Kuncheva (2003)   (6 citations)  Self-citation (Ensembles)   (Correct)

....5 concludes the paper. 2 Diversity Classifiers in an ensemble should be di#erent from each other, otherwise there is no gain in combining them. Quantifying this di#erence, named also diversity, orthogonality, complementarity, has been identified as an important research direction by many authors [2, 11, 14, 15, 20]. Measures of the connection between two classifier outputs can be derived from the statistical literature (e.g. 23] There is less clarity on the subject when three or more classifiers are concerned. There is no strict definition of what is intuitively perceived as diversity. At least not in ....

A. Krogh and J. Vedelsby. Neural network ensembles, cross validation and active learning. In G. Tesauro, D.S. Touretzky, and T.K. Leen, editors, Advances in Neural Information Processing Systems, volume 7, pages 231--238. MIT Press, Cambridge, MA, 1995.


Ensembles of Learning Machines - Valentini, Masulli (2002)   (2 citations)  Self-citation (Ensembles)   (Correct)

....we actually may not be able to find it. Building an ensemble using, for instance, di#erent starting points may achieve a better approximation, even if no assurance of this is given. Another way to look at the need for ensembles is represented by the classical bias variance analysis of the error [45, 78]: di#erent works have shown that several ensemble methods reduce variance [15, 87] or both bias and variance [15, 39, 77] Recently the improved generalization capabilities of di#erent ensemble methods have also been interpretated in the framework of the theory of large margin classifiers [89, ....

A. Krogh and J. Vedelsby. Neural networks ensembles, cross validation and active learning. In D.S. Touretzky, G. Tesauro, and T.K. Leen, editors, Advances in Neural Information Processing Systems, volume 7, pages 107--115. MIT Press, Cambridge, MA, 1995.


Input Decimation Ensembles: Decorrelation through.. - Oza, Tumer (2001)   (8 citations)  Self-citation (Ensembles)   (Correct)

....level of performance in the base classi ers that constitute the ensemble and reduce their correlations. There are many ensemble methods that actively promote diversity (i.e. lower correlations in the outputs) among their base classi ers. Bagging [4] boosting [7] and crossvalidation partitioning [9, 14] generate diverse base classi ers by training with di erent subsets of the training set. Error correcting output codes [5] generate new training sets with di erent class labels and use these di erent training sets to generate base classi ers. Merz [10] use Principal Component Analysis [8] to ....

A. Krogh and J. Vedelsby. Neural network ensembles, cross validation and active learning. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems-7, pages 231-238. M.I.T. Press, 1995.


Tuning Diversity in Bagged Neural Network Ensembles - Carney, Cunningham (1999)   (8 citations)  Self-citation (Ensembles)   (Correct)

.... forecast combining [13] However, it has not gained widepread use until recently, largely because it requires significant computational resources, especially when applied to learners such as neural networks. In this paper we focus on bagging. We study the general work of Krogh and Vedelsby [19] on neural network ensembles and show how their key insight that diversity in an ensemble can significantly influence ensemble generalization performance can be used to better understand the underlying properties of bagging. Using this, we develop an early stopping technique that effectively tunes ....

....given above does not quantify the extent to which bagging can improve the generalization performance of neural networks or under what circumstances. To study bagging in more depth and clearly illustrate its properties and limitations, we will now show how the general work of Krogh and Vedelsby [19] on neural network ensembles applies to bagged ensembles. Using the notation introduced above and the terminology introduced in [19] let us define the ambiguity of a single member of a bagged ensemble on a prediction for t as a b (x) OE b (x) Gamma OE bag (x) 2 ; 3) and the ensemble ....

[Article contains additional citation context not shown here]

A. Krogh and J. Vedelsby, Neural network ensembles, cross-validation and active learning, in: G. Tesauro, D. Touretzky and T. Lean, eds., Advances in Neural Information Processing Systems 7 (MIT Press, 1995) 231--238.


Learning with Ensembles: How over-fitting can be useful - Sollich (1996)   (28 citations)  Self-citation (Krogh Ensembles)   (Correct)

....for the same task. A combination of many different predictors can often improve predictions, and in statistics this idea has been investigated extensively, see e.g. 1, 2, 3] In the neural networks community, ensembles of neural networks have been investigated by several groups, see for instance [4, 5, 6, 7]. Usually the networks in the ensemble are trained independently and then their predictions are combined. In this paper we study an ensemble of linear networks trained on different but overlapping training sets. The limit in which all the networks are trained on the full data set and the one ....

.... the error of the ensemble ffl(x) the error of the kth predictor ffl k (x) and its ambiguity a k (x) ffl(x) y(x) Gamma f (x) 2 (2) ffl k (x) y(x) Gamma f k (x) 2 (3) a k (x) f k (x) Gamma f (x) 2 : 4) The ensemble error can be written as ffl(x) ffl(x) Gamma a(x) [7], where ffl(x) P k k ffl k (x) is the average of the errors of the individual predictors and a(x) P k k a k (x) is the average of their ambiguities, which is simply the variance of the output over the ensemble. By averaging over the input distribution P (x) and implicitly over the ....

[Article contains additional citation context not shown here]

A. Krogh and J. Vedelsby: Neural Network Ensembles, Cross Validation and Active Learning. In NIPS 7 (MIT Press, Cambridge MA, 1995).


On Voting Ensembles of Classifiers (Extended Abstract) - Matan   Self-citation (Ensembles)   (Correct)

....estimators has been studied both theoretically and experimentally. In [3] the performance of an ensemble of classifiers is studied under a probabilistic modeling of the individual votes. Much of the work in combining ensembles has been done with combining regression classifiers. For example in [5, 4] the model considered is one where the ensemble s individual results are averaged: h ens (x) 1 n [h 1 (x) h 2 (x) h n (x) 15) The most common performance measure used for regressors is the Mean Square Error(MSE) measure: MSE(h) j Z D (h(x) Gamma c(x) 2 dx (16) It is well ....

A. Krogh and J. Vedelsby. Neural network ensembles, cross validation and active learning. In G. Tesaro, D. Touretzky, and J. Alspector, editors, Advances in Neural Information Processing Systems, volume 7. MIT Press, 1995.


Committee Formation for Reliable and Accurate Neural.. - Edwards, Murray (2000)   (Correct)

No context found.

A. Krogh and J. Vedelsby. Neural network ensembles, cross validation and active learning. In G. Tesauro, D.S. Touretzky, and T.K. Leen, editors, Proc. Neural Information Processing Systems (NIPS) Conference, pages 231--238. MIT Press, 1995.


Linear and Order Statistics Combiners for Pattern Classification - Tumer, Ghosh (1999)   (21 citations)  (Correct)

No context found.

A. Krogh and J. Vedelsby. Neural network ensembles, cross validation and active learning. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems-7, pages 231--238. M.I.T. Press, 1995.


Classifier Combining through Trimmed Means and Order Statistics - Tumer, Ghosh (1998)   (Correct)

No context found.

A. Krogh and J. Vedelsby. Neural network ensembles, cross validation and active learning. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems-7, pages 231--238. M.I.T. Press, 1995.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC