Results 1 - 10
of
106
Wrappers for feature subset selection
- ARTIFICIAL INTELLIGENCE
, 1997
"... In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a ..."
Abstract
-
Cited by 775 (3 self)
- Add to MetaCart
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach and show a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and
Irrelevant Features and the Subset Selection Problem
- MACHINE LEARNING: PROCEEDINGS OF THE ELEVENTH INTERNATIONAL
, 1994
"... We address the problem of finding a subset of features that allows a supervised induction algorithm to induce small high-accuracy concepts. We examine notions of relevance and irrelevance, and show that the definitions used in the machine learning literature do not adequately partition the features ..."
Abstract
-
Cited by 515 (22 self)
- Add to MetaCart
We address the problem of finding a subset of features that allows a supervised induction algorithm to induce small high-accuracy concepts. We examine notions of relevance and irrelevance, and show that the definitions used in the machine learning literature do not adequately partition the features into useful categories of relevance. We present definitions for irrelevance and for two degrees of relevance. These definitions improve our understanding of the behavior of previous subset selection algorithms, and help define the subset of features that should be sought. The features selected should depend not only on the features and the target concept, but also on the induction algorithm. We describe a method for feature subset selection using cross-validation that is applicable to any induction algorithm, and discuss experiments conducted with ID3 and C4.5 on artificial and real datasets.
Feature selection: Evaluation, application, and small sample performance
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1997
"... Abstract—A large number of algorithms have been proposed for feature subset selection. Our experimental results show that the sequential forward floating selection (SFFS) algorithm, proposed by Pudil et al., dominates the other algorithms tested. We study the problem of choosing an optimal feature s ..."
Abstract
-
Cited by 238 (9 self)
- Add to MetaCart
Abstract—A large number of algorithms have been proposed for feature subset selection. Our experimental results show that the sequential forward floating selection (SFFS) algorithm, proposed by Pudil et al., dominates the other algorithms tested. We study the problem of choosing an optimal feature set for land use classification based on SAR satellite images using four different texture models. Pooling features derived from different texture models, followed by a feature selection results in a substantial improvement in the classification accuracy. We also illustrate the dangers of using feature selection in small sample size situations. Index Terms—Feature selection, curse of dimensionality, genetic algorithm, node pruning, texture models, SAR image classification. 1
Feature Subset Selection Using A Genetic Algorithm
, 1997
"... : Practical pattern classification and knowledge discovery problems require selection of a subset of attributes or features (from a much larger set) to represent the patterns to be classified. This is due to the fact that the performance of the classifier (usually induced by some learning algorithm) ..."
Abstract
-
Cited by 148 (9 self)
- Add to MetaCart
: Practical pattern classification and knowledge discovery problems require selection of a subset of attributes or features (from a much larger set) to represent the patterns to be classified. This is due to the fact that the performance of the classifier (usually induced by some learning algorithm) and the cost of classification are sensitive to the choice of the features used to construct the classifier. Exhaustive evaluation of possible feature subsets is usually infeasible in practice because of the large amount of computational effort required. Genetic algorithms, which belong to a class of randomized heuristic search techniques, offer an attractive approach to find near-optimal solutions to such optimization problems. This paper presents an approach to feature subset selection using a genetic algorithm. Some advantages of this approach include the ability to accommodate multiple criteria such as accuracy and cost of classification into the feature selection process and to find fe...
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey
- Data Mining and Knowledge Discovery
, 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Abstract
-
Cited by 122 (1 self)
- Add to MetaCart
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art. Keywords: classification, tree-structured classifiers, data compaction 1. Introduction Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Enormous amounts of data are being collected daily from major scientific projects e.g., Human Genome...
Image classification for content-based indexing
- IEEE Transactions on Image Processing
, 2001
"... Abstract—Grouping images into (semantically) meaningful categories using low-level visual features is a challenging and important problem in content-based image retrieval. Using binary Bayesian classifiers, we attempt to capture high-level concepts from low-level image features under the constraint ..."
Abstract
-
Cited by 118 (2 self)
- Add to MetaCart
Abstract—Grouping images into (semantically) meaningful categories using low-level visual features is a challenging and important problem in content-based image retrieval. Using binary Bayesian classifiers, we attempt to capture high-level concepts from low-level image features under the constraint that the test image does belong to one of the classes. Specifically, we consider the hierarchical classification of vacation images; at the highest level, images are classified as indoor or outdoor; outdoor images are further classified as city or landscape; finally, a subset of landscape images is classified into sunset, forest, and mountain classes. We demonstrate that a small vector quantizer (whose optimal size is selected using a modified MDL criterion) can be used to model the class-conditional densities of the features, required by the Bayesian methodology. The classifiers have been designed and evaluated on a database of 6931 vacation photographs. Our system achieved a classification accuracy of 90.5 % for indoor/outdoor, 95.3 % for city/landscape, 96.6 % for sunset/forest & mountain, and 96 % for forest/mountain classification problems. We further develop a learning method to incrementally train the classifiers as additional data become available. We also show preliminary results for feature reduction using clustering techniques. Our goal is to combine multiple two-class classifiers into a single hierarchical classifier. Index Terms—Bayesian methods, content-based retrieval, digital libraries, image content analysis, minimum description length, semantic
Comparison of Algorithms that Select Features for Pattern Classifiers
, 2000
"... A comparative study of algorithms for large-scale feature selection (where the number of features is over 50) is carried out. In the study, the goodness of a feature subset is measured by leave-one-out correct-classification rate of a nearest-neighbor (1-NN) classifier and many practical problems ar ..."
Abstract
-
Cited by 96 (0 self)
- Add to MetaCart
A comparative study of algorithms for large-scale feature selection (where the number of features is over 50) is carried out. In the study, the goodness of a feature subset is measured by leave-one-out correct-classification rate of a nearest-neighbor (1-NN) classifier and many practical problems are used. A unified way is given to compare algorithms having dissimilar objectives. Based on results of many experiments, we give guidelines for the use of feature selection algorithms. Especially, it is shown that sequential floating search methods are suitable for small- and mediumscale problems and genetic algorithms are suitable for large-scale problems. Keywords: Feature selection, Monotonicity, Genetic algorithms, Leave-one-out method, k-nearest neighbor method 1 Introduction Feature selection in the design of pattern classifiers has three goals : (1) to reduce the cost of extracting features, (2) to improve the classification accuracy, and (3) to improve the reliability of the estimat...
A Probabilistic Approach to Feature Selection - A Filter Solution
"... Feature selection can be defined as a problem of finding a minimum set of M relevant attributes that describes the dataset as well as the original N attributes do, where M N . After examining the problems with both the exhaustive and the heuristic approach to feature selection, this paper pro ..."
Abstract
-
Cited by 86 (11 self)
- Add to MetaCart
Feature selection can be defined as a problem of finding a minimum set of M relevant attributes that describes the dataset as well as the original N attributes do, where M N . After examining the problems with both the exhaustive and the heuristic approach to feature selection, this paper proposes a probabilistic approach. The theoretic analysis and the experimental study show that the proposed approach is simple to implement and guaranteed to find the optimal if resources permit. It is also fast in obtaining results and effective in selecting features that improve the performance of a learning algorithm. An on-site application involving huge datasets has been conducted independently. It proves the effectiveness and scalability of the proposed algorithm. Discussed also are various aspects and applications of this feature selection algorithm. 1 Introduction The problem of feature selection can be defined as finding M relevant attributes among the N original attrib...
Correlation-based feature selection for machine learning
, 1998
"... A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that ..."
Abstract
-
Cited by 86 (3 self)
- Add to MetaCart
A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. A feature evaluation formula, based on ideas from test theory, provides an operational definition of this hypothesis. CFS (Correlation based Feature Selection) is an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy. CFS was evaluated by experiments on artificial and natural datasets. Three machine learning algorithms were used: C4.5 (a decision tree learner), IB1 (an instance based learner), and naive Bayes. Experiments on artificial datasets showed that CFS quickly identifies and screens irrelevant, redundant, and noisy features, and identifies relevant features as long as their relevance does not strongly depend on other features. On natural domains, CFS typically eliminated well over half the features. In most cases, classification accuracy using the reduced feature set equaled or bettered accuracy using the complete feature set.
Learning Boolean Concepts in the Presence of Many Irrelevant Features
- Artificial Intelligence
, 1994
"... In many domains, an appropriate inductive bias is the MIN-FEATURES bias, which prefers consistent hypotheses definable over as few features as possible. This paper defines and studies this bias in Boolean domains. First, it is shown that any learning algorithm implementing the MIN-FEATURES bias requ ..."
Abstract
-
Cited by 80 (0 self)
- Add to MetaCart
In many domains, an appropriate inductive bias is the MIN-FEATURES bias, which prefers consistent hypotheses definable over as few features as possible. This paper defines and studies this bias in Boolean domains. First, it is shown that any learning algorithm implementing the MIN-FEATURES bias requires \Theta( 1 ffl ln 1 ffi + 1 ffl [2 p + p ln n]) training examples to guarantee PAC-learning a concept having p relevant features out of n available features. This bound is only logarithmic in the number of irrelevant features. For implementing the MIN-FEATURES bias, the paper presents five algorithms that identify a subset of features sufficient to construct a hypothesis consistent with the training examples. FOCUS-1 is a straightforward algorithm that returns a minimal and sufficient subset of features in quasi-polynomial time. FOCUS-2 does the same task as FOCUS-1 but is empirically shown to be substantially faster than FOCUS-1. Finally, the Simple-Greedy, Mutual-Information-G...

