#### DMCA

## Bootstrap - inspired techniques in computation intelligence (2007)

Venue: | Signal Processing Magazine, IEEE |

Citations: | 24 - 5 self |

### Citations

3648 | Bagging predictors
- Breiman
- 1996
(Show Context)
Citation Context ... is strategically altered for each new classifier, giving boosting its unique attributes. Two more recent competitors to boosting are Breiman’s bagging (bootstrap aggregation) used for small datasets =-=[12]-=-, and pasting small votes used for large datasets [13], both of which follow a standard bootstrap resampling process. Other ensemble architectures that use the cross validation/jackknife approach for ... |

3495 | A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences 55(1
- Freund, Schapire
- 1997
(Show Context)
Citation Context ...ength of weak learnability [10], Schapire introduced boosting, an elegant approach to generate a strong classifier by combining weaker ones. The boosting algorithm, and its popular successor AdaBoost =-=[11]-=-, generate an ensemble of classifiers, each trained with a subset of the training data resampled from the original training dataset; hence, a bootstrap approach. Unlike the standard bootstrap, however... |

1896 |
Bootstrap methods: another look at the jackknife
- Efron
- 1979
(Show Context)
Citation Context ...g power. His approach was simple and elegant, if not controversial at first: treat the available dataset as if it were the entire population, and take repeated samples from this (pseudo) distribution =-=[1]-=-. He called each such sample a bootstrap sample, from which the statistic of interest is estimated. Repeating this process many times, we can simulate having many samples from the original distributio... |

1419 | Combining classifiers
- Kittler, Duin
- 1996
(Show Context)
Citation Context ...ass-specific outputs can also be used, where the class receiving the highest combined support is then chosen by the ensemble. Theoretical analyses of these and other combination rules can be found in =-=[19]-=-, [20]. BAGGING Let S be the original training dataset of n instances, and S∗ b , b = 1,... ,B be the bth bootstrap sample of size n drawn from S. One classifier is trained with each S∗ b . This resam... |

1282 | A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model
- Kohavi
- 1995
(Show Context)
Citation Context ...4] detects when and how much the classifier is overfitting and proportionately reduces the weight of ÊrrR. The comparison of these different estimates has been well researched and can be found in [3]–=-=[7]-=-. BOOTSTRAP-INSPIRED TECHNIQUES IN ENSEMBLE SYSTEMS: AN OVERVIEW Beyond error estimation, bootstrap-based ideas have also been used in recent development of many ensemble-based algorithms. These algor... |

976 | Adaptive mixtures of local experts
- Jacobs, Jordan, et al.
- 1991
(Show Context)
Citation Context ...which form different decision boundaries. These boundaries can be averaged to obtain a more accurate decision boundary. Wolpert’s stacked generalization [14] and Jacob and Jordan’s mixture of experts =-=[15]-=-, [16]. The key enabling concept in all ensemble based systems is diversity. Clearly, there is no advantage in combining classifiers that provide identical outputs. An ensemble system is most benefici... |

895 | Boosting the margin: A new explanation for the effectiveness of voting methods
- Schapire, Freund, et al.
- 1997
(Show Context)
Citation Context ...ly lead to overfitting of the data. One of the most celebrated features of AdaBoost, however, is its surprising resistance to overfitting, a phenomenon whose explanation is based on the margin theory =-=[21]-=-. STACKED GENERALIZATION AND MIXTURE OF EXPERTS The idea in stacked generalization is to learn whether training data have been properly learned. Bootstrap-inspired resampling provides a natural mechan... |

883 | Hierarchical mixtures of experts and the EM algorithm
- Jordan, Jacobs
- 1994
(Show Context)
Citation Context ...form different decision boundaries. These boundaries can be averaged to obtain a more accurate decision boundary. Wolpert’s stacked generalization [14] and Jacob and Jordan’s mixture of experts [15], =-=[16]-=-. The key enabling concept in all ensemble based systems is diversity. Clearly, there is no advantage in combining classifiers that provide identical outputs. An ensemble system is most beneficial if ... |

869 | The strength of weak learnability
- Schapire
- 1990
(Show Context)
Citation Context ...le systems can be traced back to some earlier studies such as [8], [9], it is Schapire’s 1990 paper that is widely recognized as the seminal work on ensemble systems. In strength of weak learnability =-=[10]-=-, Schapire introduced boosting, an elegant approach to generate a strong classifier by combining weaker ones. The boosting algorithm, and its popular successor AdaBoost [11], generate an ensemble of c... |

731 | Stacked generalization.
- Wolpert
- 1992
(Show Context)
Citation Context ...bsets are used to train different classifiers, which form different decision boundaries. These boundaries can be averaged to obtain a more accurate decision boundary. Wolpert’s stacked generalization =-=[14]-=- and Jacob and Jordan’s mixture of experts [15], [16]. The key enabling concept in all ensemble based systems is diversity. Clearly, there is no advantage in combining classifiers that provide identic... |

687 |
Neural network ensembles,
- Hansen, Salamon
- 1990
(Show Context)
Citation Context ...tiple classifiers, each trained on a different subset of the data, obtained through bootstrap resampling. While the history of ensemble systems can be traced back to some earlier studies such as [8], =-=[9]-=-, it is Schapire’s 1990 paper that is widely recognized as the seminal work on ensemble systems. In strength of weak learnability [10], Schapire introduced boosting, an elegant approach to generate a ... |

590 | The random subspace method for constructing decision forests
- Ho
- 1998
(Show Context)
Citation Context ... out the individual errors (Figure 1). Diversity among classifiers can be achieved in many ways, such as training classifiers with different subsets of the features (so-called random subspace methods =-=[17]-=-). However, using different training data subsets obtained by resampling of the original training data is most commonly used and constitutes the link between ensemble systems and bootstrap techniques.... |

407 |
Estimating the error rate of a prediction rule: improvement on crossvalidation
- Efron
- 1983
(Show Context)
Citation Context ... nb TE ,on which the bootstrap error estimate εb is computed. Repeating this process B times gives B such error estimates. The final error estimate is then the mean of the individual bootstrap errors =-=[3]-=-: Êrr ∗ = 1 B∑ εb = B b =1 1 ⎛ B∑ B b =1 ⎝ 1 n b Te nb ∑Te Q(x i =1 b i ) ⎞ ⎠, x b ∈ S ∗b TE . (5) Choosing a sufficiently large B, the large variance among individual estimates can be reduced thanks ... |

262 | Nonlinear neural networks: Principles, mechanisms, and architectures.
- Grossberg
- 1988
(Show Context)
Citation Context ... seen thus far but depends only on previous hypotheses and the current data. This definition raises the stability–plasticity dilemma: some information will inevitably be lost to learn new information =-=[22]-=-. This is because stability is the ability of a classifier to retain its knowledge, whereas plasticity is the ability to learn new knowledge; and the two phenomena typically contradict each other. Man... |

210 |
Improvement on Cross-validation: the .632+bootstrap method
- Efron, Timshirani
- 1997
(Show Context)
Citation Context ...0.368ÊrrR). (6) B b =1 While this estimate further reduces the variance, it may increase the bias if the classifier memorizes the training data. The more recently proposed 0.632 + bootstrap estimator =-=[4]-=- detects when and how much the classifier is overfitting and proportionately reduces the weight of ÊrrR. The comparison of these different estimates has been well researched and can be found in [3]–[7... |

141 |
A Theoretical Study on Six Classifier Fusion Strategies,”
- Kuncheva
- 2002
(Show Context)
Citation Context ...ecific outputs can also be used, where the class receiving the highest combined support is then chosen by the ensemble. Theoretical analyses of these and other combination rules can be found in [19], =-=[20]-=-. BAGGING Let S be the original training dataset of n instances, and S∗ b , b = 1,... ,B be the bth bootstrap sample of size n drawn from S. One classifier is trained with each S∗ b . This resampling ... |

114 | Learn++: An incremental learning algorithm for supervised neural networks.
- Polikar
- 2001
(Show Context)
Citation Context ...lar approach can still be used to learn incrementally under these scenarios. Learn ++ is such an algorithm, shown to learn incrementally from new data, even when such data introduce new classes [23], =-=[24]-=-. Recall that the distribution update rule in AdaBoost is designed to focus its bootstrap samples on increasingly difficult IEEE SIGNAL PROCESSING MAGAZINE [65] JULY 2007instances, determined accordi... |

55 | Classifier ensembles for changing environments, in:
- Kuncheva
- 2004
(Show Context)
Citation Context ...mation, assuming that all previous data still carry relevant information. If previous information is no longer relevant, then a forgetting mechanism can be introduced to remove irrelevant classifiers =-=[25]-=-, [26]. The block diagram of the entire algorithm is illustrated Figure 4. We now look at two real-world applications where Learn ++ can be used to learn incrementally from new data that subsequently ... |

51 | Bootstrap techniques for error estimation,” - Jain, Dubes, et al. - 1987 |

44 | Pasting small votes for classification in large databases and on-line,
- Breiman
- 1999
(Show Context)
Citation Context ...ing boosting its unique attributes. Two more recent competitors to boosting are Breiman’s bagging (bootstrap aggregation) used for small datasets [12], and pasting small votes used for large datasets =-=[13]-=-, both of which follow a standard bootstrap resampling process. Other ensemble architectures that use the cross validation/jackknife approach for splitting training data include IEEE SIGNAL PROCESSING... |

39 |
Majority systems and the Condorcet jury theorem. The Statistician
- Boland
- 1989
(Show Context)
Citation Context ...e correct class with a probability of one half or higher, the correct classification performance of the ensemble approaches “one” as the number of classifiers increases (Condorcet Jury Theorem (1786) =-=[18]-=-). In weighted majority voting, each classifier is given a voting weight inversely proportional to its resubstitution error. The class with the largest total vote is then declared the winner. Algebrai... |

35 |
Composite classifier system design: Concepts and methodology
- Dasarathy, Sheela
- 1979
(Show Context)
Citation Context ...e multiple classifiers, each trained on a different subset of the data, obtained through bootstrap resampling. While the history of ensemble systems can be traced back to some earlier studies such as =-=[8]-=-, [9], it is Schapire’s 1990 paper that is widely recognized as the seminal work on ensemble systems. In strength of weak learnability [10], Schapire introduced boosting, an elegant approach to genera... |

25 |
Bootstrap Techniques for Signal Processing,
- Iskander
- 2004
(Show Context)
Citation Context ...strap methods are widely used for signal detection and spectral estimation, details and many applications of which can be found elsewhere in this issue as well as in Zoubir and Iskander’s recent text =-=[2]-=-. In computational intelligence, however, the statistic of particular interest is often the true generalization error of a classifier that is trained on a finitesize training dataset. Given such a cla... |

3 |
Can AdaBoost.M1 learn incrementally? A comparison to Learn++ under different combination rules
- Mohammed, Leander, et al.
- 2006
(Show Context)
Citation Context ...an ensemble of ensembles. A suitably modified version of AdaBoost, run iteratively for each new dataset, can learn incrementally if the data distribution does not change in time (stationary learning) =-=[23]-=-. However, a more interesting—and arguably more challenging—problem is the introduction of new classes, or different number of classes being represented in each new dataset. By making strategic modifi... |

1 | Estimation of error rate for linear discriminant functions by resampling: Non-Gaussian populations - Chernick, Murthy, et al. - 1988 |