DMCA
Research Article An Efficient Ensemble Learning Method for
Citations
3495 | A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences 55(1
- Freund, Schapire
- 1997
(Show Context)
Citation Context ...n of axes may build a complete different tree, the diversity of the ensemble system can be guaranteed by the transformation [22]. Comparedwith the other proposed ensemble approaches, such as AdaBoost =-=[23]-=-, Bagging [24], and Random Forest [25], Rotation Forest is more robust because it can always enhance the generalization ability of the individual classifiers and the diversity in the ensemble at the s... |
1088 | Using Bayesian networks to analyze expression data
- Friedman, Linial, et al.
- 2000
(Show Context)
Citation Context ...ny of them have been employed for both steps, including the techniques of feature selection [7], classification techniques, for example, K-NN [8], support vector machines [9, 10], and neural networks =-=[11]-=-. Most of the existing research works attempt to choose an optimal subset of genes and then generalize an accurate classification model based on the selected genes. The microarray data measures the ex... |
687 |
Neural network ensembles,
- Hansen, Salamon
- 1990
(Show Context)
Citation Context ...fying data under uncertainties [15, 17]. However, a necessary and sufficient condition for an ensemble to outperform its individual members is that the base classifiers should be accurate and diverse =-=[18]-=-. An accurate classifier is one that has an error rate of better than randomly guessing classes for new unseen samples. On the other hand, two classifiers are said to be diverse if their decisions are... |
625 | Ensemble methods in machine learning
- Dietterich
- 2000
(Show Context)
Citation Context ...uncertainties of gene expression data. Ensemble methodology is an efficient technique that has increasingly been adopted to combine multiple learning algorithms to improve overall prediction accuracy =-=[15]-=-.These ensemble techniques have the advantage to alleviate the small sample size problem by averaging and incorporating over multiple classification models to reduce the potential for overfitting the ... |
610 | An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting and randomization.
- Dietterich
- 2000
(Show Context)
Citation Context ...e techniques have the advantage to alleviate the small sample size problem by averaging and incorporating over multiple classification models to reduce the potential for overfitting the training data =-=[16]-=-. In this way the training data set may be used in a more efficient way, which is critical to many bioinformatics applications with small sample size. Much research has shown the promise of ensemble l... |
568 | Support vector machine classification and validation of cancer tissue samples using microarray expression data
- Furey, Cristianini, et al.
- 2000
(Show Context)
Citation Context ... have been introduced, and many of them have been employed for both steps, including the techniques of feature selection [7], classification techniques, for example, K-NN [8], support vector machines =-=[9, 10]-=-, and neural networks [11]. Most of the existing research works attempt to choose an optimal subset of genes and then generalize an accurate classification model based on the selected genes. The micro... |
530 |
A practical approach to feature selection.
- Kira, Rendell
- 1992
(Show Context)
Citation Context ...ed filter selection (CFS), minimum redundancy maximum relevance (mRMR), and general signal to noise ratio (GSNR) in comparison with FCBF. ReliefF [33] is an extension of the original Relief algorithm =-=[34]-=- that adds the ability of dealing with multiclass problems, and it is more robust and capable of dealing with incomplete andnoisy data.TheRelief familymethods are especially attractive because theymay... |
474 | Estimating Attributes: Analysis and Extensions of RELIEF
- Kononenko
- 1994
(Show Context)
Citation Context ...election algorithms, that is, ReliefF, correlation-based filter selection (CFS), minimum redundancy maximum relevance (mRMR), and general signal to noise ratio (GSNR) in comparison with FCBF. ReliefF =-=[33]-=- is an extension of the original Relief algorithm [34] that adds the ability of dealing with multiclass problems, and it is more robust and capable of dealing with incomplete andnoisy data.TheRelief f... |
427 |
Exploring the new world of the genome with DNA microarrays
- Brown, Botstein
- 1999
(Show Context)
Citation Context ...gene microarray data analysis is a powerful and revolutionary tool for biological and medical researches by allowing the simultaneous monitoring of the expression levels of tens of thousands of genes =-=[1]-=-. This is done by measuring the signal intensity of fluorescing molecules attached to DNA species that are bound to complementary strands of DNA localized to the surface of the microarray. Usually a r... |
419 |
Microarray data normalization and transformation
- Quackenbush
- 2002
(Show Context)
Citation Context ...labeled populations of reverse-transcribed mRNA. Having captured the spot intensities, the obtained intensities undergo a normalization preprocessing stage to remove systematic errors within the data =-=[2]-=-. Early application of microarrays to the study of human disease conditions rapidly revealed their potential as a medical diagnostic tool [3, 4]. This is a class prediction problem to which supervised... |
348 | Face recognition by independent component analysis
- Bartlett
- 2002
(Show Context)
Citation Context ...ot necessarily orthogonal basis, which may reconstruct the data better than PCA in the presence of noise. Finally, it is sensitive to high-order statistics in the data, not just the covariance matrix =-=[27]-=-. Here, for the sake of comparison, we experiment with both PCA and ICA transformation methods and will report on their efficiency for our gene microarray classification task later on. 2.3. Gene Selec... |
333 |
Selection bias in gene extraction on the basis of microarray gene-expression data
- Ambroise, McLachlan
- 2002
(Show Context)
Citation Context ... is assessed from the unseen test set [31]. However, due to the small number of instances in gene microarray datasets, such an approach can lead to unreliable results. Instead, Ambroise and McLachlan =-=[38]-=- suggested splitting the data using 10-fold cross-validation or 0.632 + bootstrap. A comparative study of several different error estimation techniques on microarray classification shows that 0.632 + ... |
239 | Minimum redundancy feature selection from microarray gene expression data.
- Ding, Peng
- 2005
(Show Context)
Citation Context ...mutual information (MI) between pairs of features whereas relevance is measured by the MI between each feature and the class labels.ThemRMRmethod has also been applied successfully to microarray data =-=[36]-=-. The GSNR is a measure of the ratio between intergroup and intragroup variations. Higher GSNR values indicate higher discrimination power for the gene. GSNR selects ... |
209 | H.: Efficient feature selection via analysis of relevance and redundancy
- Yu, Liu
- 2004
(Show Context)
Citation Context ...]. When the number of features becomes very large, the filter model is usually chosen due to its computational efficiency.Here, we utilize fast correlation-based filter (FCBF) as previous experiments =-=[30]-=- suggest that FCBF is an efficient and fast feature selection algorithm for classification of high dimensional data. FCBF model uses interdependence of features together with the dependence to the cla... |
163 | Unsupervised feature selection using feature similarity
- Mitra, Murthy, et al.
- 2002
(Show Context)
Citation Context ...of features, removes irrelevant, noisy, and redundant data, and results in acceptable classification accuracy. There are two broad categories for feature selection algorithms, filter model or wrapper =-=[28]-=-. The filter model relies on general characteristics of the training data to choose best features without involving any learning algorithm. 4 BioMed Research International Input (i) ... |
141 | A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression
- Li, Zhang, et al.
(Show Context)
Citation Context ...5 remaining feature subset thus contains the predominant features with zero redundant features in terms of ... |
135 | Is Cross-Validation Valid for Small-Sample Microarray
- Braga-Neto, Dougherty
- 2004
(Show Context)
Citation Context ...on techniques on microarray classification shows that 0.632 + bootstrap can be more appropriate than other estimators including resubstitution estimator, crossvalidation, and leave-one-out estimation =-=[39]-=-. Therefore, in this work, we deployed a balanced 0.632 + bootstrap technique to evaluate the performance of the gene selection algorithm considered in this study. The 0.632 + bootstrap requires sampl... |
131 | Pruning adaptive boosting,”
- Margineantu, Dietterich
- 1997
(Show Context)
Citation Context ... expressive dependency. Here, to investigate the ability of the proposed ICAbased RotBoost ensemble to build accurate and diverse base learners efficiently, the pairwise diversity measure is utilized =-=[40]-=-. This diversity measure evaluates the level of agreement between a pair of base learners while correcting for chance, which is named as Kappa statistic. For ... |
126 |
NOTTERMAN et al.: Broad patterns of gene expression revealed by cluster analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS
- ALON, BARKAI
- 1999
(Show Context)
Citation Context ... problem to which supervised learning techniques are ideally suited. Some studies have been reported on the application of microarray gene expression data analysis for molecular cancer classification =-=[5, 6]-=-. In fact, Microarray analysis has demonstrated that accurate cancer diagnosis can be achieved by performing microarray data classification, that is, by constructing classifiers to compare the gene ex... |
118 | Quantitative quality control in microarray image processing and data acquisition,”
- Wang, Ghosh, et al.
- 2001
(Show Context)
Citation Context ...associated with various uncertainties such as microarray data, gathering process which include fabrication, hybridization and image processing. These uncertainties always add various sources of noise =-=[13]-=-. Because of the impact of different uncertainties together with the lack of labeled training samples, the conventional machine learning techniques face complicated challenges to develop reliable clas... |
103 |
Ringner M, et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 2001;7:673
- Khan, Wei
(Show Context)
Citation Context ...ocessing stage to remove systematic errors within the data [2]. Early application of microarrays to the study of human disease conditions rapidly revealed their potential as a medical diagnostic tool =-=[3, 4]-=-. This is a class prediction problem to which supervised learning techniques are ideally suited. Some studies have been reported on the application of microarray gene expression data analysis for mole... |
80 |
Random forests,”Machine Learning,
- Breiman
- 2001
(Show Context)
Citation Context ...nt tree, the diversity of the ensemble system can be guaranteed by the transformation [22]. Comparedwith the other proposed ensemble approaches, such as AdaBoost [23], Bagging [24], and Random Forest =-=[25]-=-, Rotation Forest is more robust because it can always enhance the generalization ability of the individual classifiers and the diversity in the ensemble at the same time. C. Zhang and J. Zhang [19] p... |
71 |
Applications of independent component analysis to microarrays. Genome Biol 4: R76.1. Liebermeister W.
- Lee, Batzoglou
- 2003
(Show Context)
Citation Context ...n be guaranteed by the selected transformation. There are largely two kinds of transformation methods, that is, PCA and ICA. PCA projects the data into a new space spanned by the principal components =-=[26]-=-. In contrast to PCA, ICA decomposes an input dataset into components so that each component is statistically as independent from the others as possible. It appears that ICA has a greater advantage ov... |
56 |
et al., “Knowledge-based analysis of microarray gene expression data using support vector machines
- Brown
- 2000
(Show Context)
Citation Context ... have been introduced, and many of them have been employed for both steps, including the techniques of feature selection [7], classification techniques, for example, K-NN [8], support vector machines =-=[9, 10]-=-, and neural networks [11]. Most of the existing research works attempt to choose an optimal subset of genes and then generalize an accurate classification model based on the selected genes. The micro... |
40 |
Machine Learning in DNA Microarray Analysis for Cancer Classification
- Cho, Won
- 2003
(Show Context)
Citation Context ...y machine learning algorithms have been introduced, and many of them have been employed for both steps, including the techniques of feature selection [7], classification techniques, for example, K-NN =-=[8]-=-, support vector machines [9, 10], and neural networks [11]. Most of the existing research works attempt to choose an optimal subset of genes and then generalize an accurate classification model based... |
36 |
de Vijver, et al: Gene expression profiling predicts clinical outcome of breast cancer. Nature
- LJ, Dai, et al.
(Show Context)
Citation Context ... problem to which supervised learning techniques are ideally suited. Some studies have been reported on the application of microarray gene expression data analysis for molecular cancer classification =-=[5, 6]-=-. In fact, Microarray analysis has demonstrated that accurate cancer diagnosis can be achieved by performing microarray data classification, that is, by constructing classifiers to compare the gene ex... |
33 |
Bagging predictors,”Machine Learning,
- Breiman
- 1996
(Show Context)
Citation Context ...build a complete different tree, the diversity of the ensemble system can be guaranteed by the transformation [22]. Comparedwith the other proposed ensemble approaches, such as AdaBoost [23], Bagging =-=[24]-=-, and Random Forest [25], Rotation Forest is more robust because it can always enhance the generalization ability of the individual classifiers and the diversity in the ensemble at the same time. C. Z... |
22 |
Markov blanket-embedded genetic algorithm for gene selection,"
- Zhu
- 2007
(Show Context)
Citation Context ...varian cancer 15154 253 2 MLL 12582 72 3 SRBCT 2308 83 4 Lung cancer 12533 181 5 remaining feature subset thus contains the predominant features with zero redundant features in terms of ... |
17 |
et al., “Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios
- Gordon
- 2002
(Show Context)
Citation Context ... much irrelevant information.This dimensionality degrades classification 2 BioMed Research International performance. Moreover, datasets typically contain few samples for training (e.g., lung dataset =-=[12]-=- contains 12535 genes and only 181 samples), leading to the curse of dimensionality problem. It is essential, therefore, to find efficientmethods for reducing the size of the feature set. To avoid the... |
16 |
A novel ensemble machine learning for robust microarray data classification
- Peng
- 2005
(Show Context)
Citation Context ...l machine learning techniques face complicated challenges to develop reliable classification models. Quite often selecting only a few genes can discriminate a majority of training instances correctly =-=[14]-=-. However, the generalization ability of such classifier model based on a few principal genes and a limited number of labeled training instances cannot be guaranteed. It is therefore essential to deve... |
14 | RotBoost: A technique for combining Rotation Forest and AdaBoost. Pattern recognition letters 29
- Zhang, Zhang
- 2008
(Show Context)
Citation Context ...le methods utilize a base classification algorithm to differently permutated training sets. Examples of these techniques include AdaBoost, Bagging, Random Subspace, Random Forest, and Rotation Forest =-=[19]-=-. AdaBoost has become a very popular choice for its simplicity and adaptability [20]. This algorithm builds an ensemble of classifiers by utilizing a specified base learning algorithm to successive ob... |
10 |
Machine learning in bioinformatics: A brief survey and recommendations for practitioners.
- Bhaskar, Hoyle, et al.
- 2006
(Show Context)
Citation Context ...ocessing stage to remove systematic errors within the data [2]. Early application of microarrays to the study of human disease conditions rapidly revealed their potential as a medical diagnostic tool =-=[3, 4]-=-. This is a class prediction problem to which supervised learning techniques are ideally suited. Some studies have been reported on the application of microarray gene expression data analysis for mole... |
10 | An experimental study on rotation forest ensembles’, in
- Kuncheva, Rodríguez
- 2007
(Show Context)
Citation Context ...under different conditions vary slightly and none of the values take obvious advantage. So there was no consistent relationship between the classification accuracy and... |
8 | A review of ensemble methods in bioinformatics
- Yang, Yang, et al.
- 2010
(Show Context)
Citation Context ...ich is critical to many bioinformatics applications with small sample size. Much research has shown the promise of ensemble learning for improving the accuracy in classifying data under uncertainties =-=[15, 17]-=-. However, a necessary and sufficient condition for an ensemble to outperform its individual members is that the base classifiers should be accurate and diverse [18]. An accurate classifier is one tha... |
6 |
IG-GA: A Hybrid Filter/Wrapper Method for feature Selection of Microarray Data”,
- Yang, Chuang, et al.
- 2009
(Show Context)
Citation Context ...lthough thesemodels tend to find features better suited to the learning algorithm resulting in superior learning performance, they also tend to be more computationally expensive than the filter model =-=[29]-=-. When the number of features becomes very large, the filter model is usually chosen due to its computational efficiency.Here, we utilize fast correlation-based filter (FCBF) as previous experiments [... |
6 |
An ensemble of filters and classifiers for microarray data classification,”
- Bolon-Canedo, Sanchez-Marono, et al.
- 2012
(Show Context)
Citation Context ...ief familymethods are especially attractive because theymay be applied in all situations, have low bias, include interaction among features, and may capture local dependencies that other methods miss =-=[35]-=-. The CFS method is based on test theory concepts and relies on a set of heuristics to assess the adequacy of subsets of features. These heuristics take into account both the usefulness of individual ... |
5 |
Cancer classification using rotation forest,
- Liu, Huang
- 2008
(Show Context)
Citation Context ...into some subsets, which are transformed individually. Since a small rotation of axes may build a complete different tree, the diversity of the ensemble system can be guaranteed by the transformation =-=[22]-=-. Comparedwith the other proposed ensemble approaches, such as AdaBoost [23], Bagging [24], and Random Forest [25], Rotation Forest is more robust because it can always enhance the generalization abil... |
1 |
Gene selection for cancer classification using wrapper approaches
- Sierra
- 2004
(Show Context)
Citation Context ... feature selection and classification. So far, many machine learning algorithms have been introduced, and many of them have been employed for both steps, including the techniques of feature selection =-=[7]-=-, classification techniques, for example, K-NN [8], support vector machines [9, 10], and neural networks [11]. Most of the existing research works attempt to choose an optimal subset of genes and then... |
1 |
A variant of rotation forest for constructing ensemble classifiers
- Zhang, Zhang
- 2010
(Show Context)
Citation Context ...g sets. Examples of these techniques include AdaBoost, Bagging, Random Subspace, Random Forest, and Rotation Forest [19]. AdaBoost has become a very popular choice for its simplicity and adaptability =-=[20]-=-. This algorithm builds an ensemble of classifiers by utilizing a specified base learning algorithm to successive obtained training sets that are formed by either resampling from the original training... |
1 |
andD.Van den Poel, “Ensemble classification based on generalized additive models
- DeBock
- 2010
(Show Context)
Citation Context ...a. Ensemble classification is achieved by means of majority voting, where an unlabeled unseen data is assigned the class with the highest number of votes among the individual classifiers’ predictions =-=[21]-=-. A successful variation upon Bagging is the Rotation Forest. Rotation Forest is an ensemble classification approach which is built with a set of decision trees. For each tree, the bootstrap samples e... |
1 |
Combining Pattern Classifiers, JohnWiley
- Kuncheva
- 2004
(Show Context)
Citation Context ...ween ... |