Results 1 - 10
of
162
Random forests
- Machine Learning
, 2001
"... Abstract. Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the fo ..."
Abstract
-
Cited by 785 (2 self)
- Add to MetaCart
Abstract. Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ∗∗∗, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.
Statistical pattern recognition: A review
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2000
"... The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques ..."
Abstract
-
Cited by 487 (20 self)
- Add to MetaCart
The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have bean receiving increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition, require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.
Extremely Randomized Trees
- MACHINE LEARNING
, 2003
"... This paper presents a new learning algorithm based on decision tree ensembles. In opposition to the classical decision tree induction method, the trees of the ensemble are built by selecting the tests during their induction fully at random. This extreme ..."
Abstract
-
Cited by 88 (30 self)
- Add to MetaCart
This paper presents a new learning algorithm based on decision tree ensembles. In opposition to the classical decision tree induction method, the trees of the ensemble are built by selecting the tests during their induction fully at random. This extreme
The Combining Classifier: to Train or Not to Train?
"... When more than a single classifier has been trained for the same recognition problem the question arises how this set of classifiers may be combined into a final decision rule. Several fixed combining rules are used that depend on the output values of the base classifiers only. They are almost alway ..."
Abstract
-
Cited by 57 (4 self)
- Add to MetaCart
When more than a single classifier has been trained for the same recognition problem the question arises how this set of classifiers may be combined into a final decision rule. Several fixed combining rules are used that depend on the output values of the base classifiers only. They are almost always suboptimal.
Diversity versus Quality in Classification Ensembles based on Feature Selection
- In 11th European Conference on Machine Learning
, 2000
"... Feature subset-selection has emerged as a useful technique for creating diversity in ensembles -- particularly in classification ensembles. In this paper we argue that this diversity needs to be monitored in the creation of the ensemble. We propose an entropy measure of the outputs of the ensembl ..."
Abstract
-
Cited by 43 (12 self)
- Add to MetaCart
Feature subset-selection has emerged as a useful technique for creating diversity in ensembles -- particularly in classification ensembles. In this paper we argue that this diversity needs to be monitored in the creation of the ensemble. We propose an entropy measure of the outputs of the ensemble members as a useful measure of the ensemble diversity. Further, we show that using the associated conditional entropy as a loss function (error measure) works well and the entropy in the ensemble predicts well the reduction in error due to the ensemble. These measures are evaluated on a medical prediction problem and are shown to predict the performance of the ensemble well. We also show that the entropy measure of diversity has the added advantage that it seems to model the change in diversity with the size of the ensemble. 1. Introduction Feature subset selection is an important issue in Machine Learning (Aha & Bankert, 1994; Bonzano, Cunningham & Smyth, 1997; Wettschereck, Aha,...
Learning a Rare Event Detection Cascade by Direct Feature Selection
- In NIPS
, 2003
"... Face detection is a canonical example of a rare event detection problem, in which target patterns occur with much lower frequency than non-targets. Out of millions of face-sized windows in an input image, for example, only a few will typically contain a face. Viola and Jones recently proposed a casc ..."
Abstract
-
Cited by 40 (2 self)
- Add to MetaCart
Face detection is a canonical example of a rare event detection problem, in which target patterns occur with much lower frequency than non-targets. Out of millions of face-sized windows in an input image, for example, only a few will typically contain a face. Viola and Jones recently proposed a cascade architecture for face detection which successfully addresses the rare event nature of the task. A central part of their method is a feature selection algorithm based on AdaBoost. We present a novel cascade learning algorithm based on forward feature selection which is two orders of magnitude faster than the Viola-Jones approach and yields classifiers of similar quality. This faster method could be used for more demanding classification tasks, such as on-line learning or searching the space of classifier structures. Our experimental results highlight the dominant role of the feature set in the success of the cascade approach. 1
Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions
- Bioinformatics
, 2003
"... Motivation: Two practical realities constrain the analysis of microarray data, mass spectra from proteomics, and biomedical infrared or magnetic resonance spectra. One is the ‘curse of dimensionality’: the number of features characterizing these data is in the thousands or tens of thousands. The oth ..."
Abstract
-
Cited by 37 (1 self)
- Add to MetaCart
Motivation: Two practical realities constrain the analysis of microarray data, mass spectra from proteomics, and biomedical infrared or magnetic resonance spectra. One is the ‘curse of dimensionality’: the number of features characterizing these data is in the thousands or tens of thousands. The other is the ‘curse of dataset sparsity’: the number of samples is limited. The consequences of these two curses are far-reaching when such data are used to classify the presence or absence of disease. Results: Using very simple classifiers, we show for several publicly available microarray and proteomics datasets how these curses influence classification outcomes. In particular, even if the sample per feature ratio is increased to the recommended 5–10 by feature extraction/reduction methods, dataset sparsity can render any classification result statistically suspect. In addition, several ‘optimal’ feature sets are typically identifiable for sparse datasets, all producing perfect classification results, both for the training and independent validation sets. This non-uniqueness leads to interpretational difficulties and casts doubt on the biological relevance of any of these ‘optimal’ feature sets. We suggest an approach to assess the relative quality of apparently equally good classifiers.
Relationships Between Combination Methods and Measures of Diversity in Combining Classifiers
- Information Fusion
"... Diversity, negative dependence, (or independence), orthogonality, complementarity, are intuitively desirable characteristics of a classifier team. It has been proved theoretically that a group of independent classifiers improve upon the single best classifier when majority vote combination is used. ..."
Abstract
-
Cited by 31 (2 self)
- Add to MetaCart
Diversity, negative dependence, (or independence), orthogonality, complementarity, are intuitively desirable characteristics of a classifier team. It has been proved theoretically that a group of independent classifiers improve upon the single best classifier when majority vote combination is used. A dependent set of classifiers may be either better or worse. It is assumed that this holds for other combination methods. It is therefore hoped that using measures of diversity will allow the identification of classifiers which will produce good results on combination. This study looks at the relationships between di erent methods of classifier combination and measures of diversity. We considered ten combination methods and ten measures of diversity.
Nearest neighbor classification from multiple feature subsets
- Intelligent Data Analysis
, 1999
"... Combining multiple classifiers is an effective technique for improving accuracy. There are many general combining algorithms, such as Bagging, Boosting, or Error Correcting Output Coding, that significantly improve classifiers like decision trees, rule learners, or neural networks. Unfortunately, th ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
Combining multiple classifiers is an effective technique for improving accuracy. There are many general combining algorithms, such as Bagging, Boosting, or Error Correcting Output Coding, that significantly improve classifiers like decision trees, rule learners, or neural networks. Unfortunately, these combining methods do not improve the nearest neighbor classifier. In this paper, we present MFS, a combining algorithm designed to improve the accuracy of the nearest neighbor (NN) classifier. MFS combines multiple NN classifiers each using only a random subset of features. The experimental results are encouraging: On 25 datasets from the UCI Repository, MFS signi cantly outperformed several standard NN variants and was competitive with boosted decision trees. In additional experiments, we show that MFS is robust to irrelevant features, and is able to reduce both bias and variance components of error.
Loudness predicts prominence: Fundamental frequency lends little
- Journal of the Acoustical Society of America
, 2005
"... ..."

