Results 11 - 20
of
25
Bagging Equalizes Influence
, 2002
"... Bagging constructs an estimator by averaging predictors trained on bootstrap samples. Bagged estimates almost consistently improve on the original predictor. It is thus important to understand the reasons for this success, and also for the occasional failures. It is widely believed that bagging is e ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Bagging constructs an estimator by averaging predictors trained on bootstrap samples. Bagged estimates almost consistently improve on the original predictor. It is thus important to understand the reasons for this success, and also for the occasional failures. It is widely believed that bagging is effective thanks to the variance reduction stemming from averaging predictors. However, seven years from its introduction, bagging is still not fully understood. This paper provides experimental evidence supporting the hypothesis that bagging stabilizes prediction by equalizing the influence of training examples. This eect is detailed in two dierent frameworks: estimation on the real line and regression. Bagging's improvements/deteriorations are explained by the goodness/badness of highly influential examples, in situations where the usual variance reduction argument is at best questionable. Finally, reasons for the equalization effect are advanced. They support that other resampling strategies such as half-sampling should provide qualitatively identical effects while being computationally less demanding than bootstrap sampling.
Combining Predictors: Comparison of Five Meta Machine Learning Methods
- Information Science, an International Journal
, 1999
"... Keywords: Machine Learning, Combining Predictors, Mixtures of Experts, Ensemble, Neural Network. For some years there has been a trend towards combining predictors, and away from monolithic predictors. Two groups of meta machine learning methods are the ensemble methods and the mixtures of exper ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Keywords: Machine Learning, Combining Predictors, Mixtures of Experts, Ensemble, Neural Network. For some years there has been a trend towards combining predictors, and away from monolithic predictors. Two groups of meta machine learning methods are the ensemble methods and the mixtures of experts methods. In this article, ve dierent representatives are presented, discussed, and compared. Three from the ensemble group, and two from the mixtures of experts group. The selected methods are Simple ensemble, AdaBoost, Bagging, Hierarchical Mixtures of Experts, and a variation on the mixtures of experts method found in [7], called DynCo. DynCo can use any type of predictors, that can be trained with gradient descent, it has a powerful combination method, and it encourages cooperation among the experts. DynCo compares favorably with the other four methods. 1 Introduction The aim of machine learning is to make a good predictor. This means nding a machine learning methods that ...
Analysis and Synthesis of Agents that Learn from Distributed Dynamic Data Sources. Invited chapter
- Emergent Neural Computational Architectures based on Neuroscience (this volume
, 2001
"... Abstract. We propose a theoretical framework for specification and analysis of a class of learning problems that arise in open-ended environments that contain multiple, distributed, dynamic data and knowledge sources. We introduce a family of learning operators for precise specification of some exis ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
Abstract. We propose a theoretical framework for specification and analysis of a class of learning problems that arise in open-ended environments that contain multiple, distributed, dynamic data and knowledge sources. We introduce a family of learning operators for precise specification of some existing solutions and to facilitate the design and analysis of new algorithms for this class of problems. We state some properties of instance and hypothesis representations, and learning operators that make exact learning possible in some settings. We also explore some relationships between models of learning using different subsets of the proposed operators under certain assumptions. 1 Learning from Distributed Dynamic Data Many practical knowledge discovery tasks (e.g., learning the behavior of complex computer systems from observations, computer-aided scientific discovery in bioinformatics) present several new challenges in machine learning. The data repositories in such applications tend to be very large, physically distributed,
On the Bayes-risk consistency of boosting methods
, 2001
"... This paper attempts to fill in the theoretical vacuum about consistency of boosting methods, and proposes elements of explanation for their efficiency in practice. We combine known techniques for deriving margin-based bounds with some new results to explain under which conditions minimizing a cost f ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper attempts to fill in the theoretical vacuum about consistency of boosting methods, and proposes elements of explanation for their efficiency in practice. We combine known techniques for deriving margin-based bounds with some new results to explain under which conditions minimizing a cost functional conducts to the Bayes risk, at least asymptotically. For previous work on Bayes risk consistency, we refer to Breiman [9], Bühlmann and Yu [11], Jiang [22], [23], Mannor and Meir [26], Mannor, Meir and Mendelson [27].
Bias-Variance Analysis and Ensembles of SVM
, 2002
"... Accuracy, diversity, and learning characteristics of base learners critically influence the effectiveness of ensemble methods. Bias-variance decomposition of the error can be used as a tool to gain insights into the behavior of learning algorithms, in order to properly design ensemble methods well-t ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Accuracy, diversity, and learning characteristics of base learners critically influence the effectiveness of ensemble methods. Bias-variance decomposition of the error can be used as a tool to gain insights into the behavior of learning algorithms, in order to properly design ensemble methods well-tuned to the properties of a specific base learner. In this work we analyse bias-variance decomposition of the error in Support Vector Machines (SVM), characterizing it with respect to the kernel and its parameters. We show that the bias-variance decomposition offers a rationale to develop ensemble methods using SVMs as base learners, and we outline two directions for developing SVM ensembles, exploiting the SVM bias characteristics and the bias-variance dependence on the kernel parameters.
Consistency for L2Boosting and Matching Pursuit with Trees and Tree-type Basis Functions
, 2002
"... We present new consistency results in regression and classification for L 2 Boosting, a powerful variant of boosting with the squared error loss function. For any dimension of the predictor, a square-integrable regression or an arbitrary conditional probability function, potentially discontinuous ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We present new consistency results in regression and classification for L 2 Boosting, a powerful variant of boosting with the squared error loss function. For any dimension of the predictor, a square-integrable regression or an arbitrary conditional probability function, potentially discontinuous, can be consistently estimated with L 2 Boosting using tree-type learners. We also discuss close connections to matching pursuits for basis functions in signal processing and demonstrate differences between tree and rectangle indicator basis functions. Depending on the signal to noise ratio, one of them will be better than the other and we thus get additional flexibility to tune boosting to high or low noise problems.
A Consistent Strategy for Boosting Algorithms
"... The probability of error of classification methods based on convex combinations of simple base classifiers by "boosting" algorithms is investigated. The main result of the paper is that certain regularized boosting algorithms provide Bayes-risk consistent classifiers under the only assumption that t ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The probability of error of classification methods based on convex combinations of simple base classifiers by "boosting" algorithms is investigated. The main result of the paper is that certain regularized boosting algorithms provide Bayes-risk consistent classifiers under the only assumption that the Bayes classifier may be approximated by a convex combination of the base classifiers. Non-asymptotic distribution-free bounds are also developed which offer interesting new insight into how boosting works and help explain their success in practical classification problems.
Examining the relationship between majority vote accuracy and diversity in bagging and boosting
, 2000
"... Much current research is undertaken into combining classifiers to increase the classification accuracy. We show, by means of an enumerative example, how combining classifiers can lead to much greater or lesser accuracy than each individual classifier. Measures of diversity among the classifiers take ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Much current research is undertaken into combining classifiers to increase the classification accuracy. We show, by means of an enumerative example, how combining classifiers can lead to much greater or lesser accuracy than each individual classifier. Measures of diversity among the classifiers taken from the literature are shown to only exhibit a weak relationship with majority vote accuracy. Two commonly used methods of designing classifier ensembles, Bagging and Boosting, are examined on benchmark datasets. Bagging is shown to produce teams with little diversity or improvement in accuracy, while Boosting tends to produce more diverse classifier teams showing an improvement in accuracy.
A note on the richness of convex hulls of VC classes
, 2003
"... We prove the existence of a class A of subsets of Rd of vc dimension 1 such that the symmetric convex hull F of the class of characteristic functions of sets in A is rich in the following sense. For any absolutely continuous probability measure µ on Rd, measurable set B ⊂ Rd and ǫ> 0, there exists a ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We prove the existence of a class A of subsets of Rd of vc dimension 1 such that the symmetric convex hull F of the class of characteristic functions of sets in A is rich in the following sense. For any absolutely continuous probability measure µ on Rd, measurable set B ⊂ Rd and ǫ> 0, there exists a function f ∈ F such that the measure of the symmetric difference of B and the set where f is positive is less than ǫ. The question was motivated by the investigation of the theoretical properties of certain algorithms in machine learning. Let A be a class of sets in Rd and define the symmetric convex hull of A as the class of functions k� k� absconv(A) = ai Ai(x) : k> 0, ai ∈ R, |ai | = 1, Ai ∈ A i=1 where A(x) denotes the indicator function of A. For every f ∈ absconv(A), define the set

