Abstract:
High sensitivity to irrelevant features is arguably the main shortcoming of simple lazy learners. In response to it, many feature selection methods have been proposed, including forward sequential selection (FSS) and backward sequential selection (BSS). Although they often produce substantial improvements in accuracy, these methods select the same set of relevant features everywhere in the instance space, and thus represent only a partial solution to the problem. In general, some features will be relevant only in some parts of the space; deleting them may hurt accuracy in those parts, but selecting them will have the same effect in parts where they are irrelevant. This article introduces RC, a new feature selection algorithm that uses a clusteringlike approach to select sets of locally relevant features (i.e., the features it selects may vary from one instance to another). Experiments in a large number of domains from the UCI repository show that RC almost always improves accuracy with respect to FSS and BSS, often with high significance. A study using artificial domains confirms the hypothesis that this difference in performance is due to RC's context sensitivity, and also suggests conditions where this sensitivity will and will not be an advantage. Another feature of RC is that it is faster than FSS and BSS, often by an order of magnitude or more.
Citations
|
3215
|
C4.5: Programs for Machine Learning
– Quinlan
- 1993
|
|
930
|
Case-based reasoning
– Kolodner
- 1993
|
|
792
|
Instance-Based Learning Algorithms
– Kibler
- 1991
|
|
655
|
UCI Repository of Machine Learning Databases [machine-readable data repository
– Murphy, Aha
- 1992
|
|
619
|
The CN2 Induction Algorithm
– Clark, Niblett
- 1989
|
|
490
|
Irrelevant features and the subset selection problem
– John, Kohavi
- 1994
|
|
456
|
Pattern Recognition: A Statistical Approach
– Devijver, Kittler
- 1982
|
|
414
|
Toward memory-based reasoning
– Stanfill, C, et al.
- 1986
|
|
335
|
Very simple classification rules perform well on most commonly used data sets
– Holte
- 1993
|
|
242
|
A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features
– Cost, Scott, et al.
- 1993
|
|
220
|
A practical approach to feature selection
– Rendell, Kira
- 1992
|
|
174
|
Learning With Many Irrelevant Features
– Almuallim, Ditterich
- 1991
|
|
165
|
Greedy Attribute Selection
– Caruana, Freitag
- 1994
|
|
153
|
A Nearest Hyperrectangle Learning Method
– Salzberg
- 1991
|
|
120
|
Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms
– Skalak
- 1994
|
|
88
|
Generalizing from case studies: a case study
– Aha
- 1992
|
|
86
|
Using Decision Trees to Improve Case-Based Learning
– Cardie
- 1993
|
|
72
|
Unifying instance-based and rule-based induction
– Domingos
- 1996
|
|
66
|
Constructing decision trees in noisy domains
– Niblett
- 1987
|
|
64
|
Trading MIPS and Memory for Knowledge Engineering
– Creecy, Masand, et al.
- 1992
|
|
63
|
Feature Selection and Extraction
– Kittler
- 1986
|
|
61
|
Efficiently Inducing Determinations: A Complete and Systematic Search Algorithm that Uses Optimal Pruning
– Schlimmer
- 1993
|
|
60
|
Feature selection for case-based classification of cloud types: an empirical comparison
– Aha, Bankert
- 1994
|
|
58
|
Learning Representative Exemplars of Concepts: An Initial Case Study
– Aha, Kibler
- 1987
|
|
45
|
Incremental, instance-based learning of independent and graded concept descriptions
– Aha
- 1989
|
|
45
|
Using domain knowledge to influence similarity judgement
– Cain, Pazzani, et al.
- 1991
|
|
43
|
Concept Learning and Flexible Weighting
– Aha, Goldstone
- 1992
|
|
36
|
A hybrid genetic algorithm for classification
– Kelly, Davis
- 1991
|
|
33
|
Oblivious decision trees and abstract cases
– Langley, Sage
- 1994
|
|
24
|
An Optimal Weighting Criterion of Case Indexing for Both Numeric and Symbolic Attributes
– Mohri, Tanaka
|
|
23
|
Learning a local similarity metric for case-based reasoning
– Ricci, Avesani
- 1995
|
|
11
|
Representing cases as knowledge sources that apply local similarity metrics
– Skalak
- 1992
|
|
10
|
Probability and Statistics (Second Edition
– DeGroot
- 1986
|
|
8
|
The RISE 2.0 system: A case study in multistrategy learning
– Domingos
- 1995
|
|
5
|
An Instance-Based LEarning Method for Databases: An Information Theoretic Approach
– Lee
- 1994
|
|
2
|
Analysis of Artificial Data Sets
– Schaffer
- 1989
|