MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Context-sensitive feature selection for lazy learners (1997) [55 citations — 1 self]

Download:
Download as a PDF | Download as a PS
by Pedro Domingos
Artificial Intelligence Review
http://www.cs.washington.edu/homes/pedrod/air95.ps.gz
Add To MetaCart

Abstract:

High sensitivity to irrelevant features is arguably the main shortcoming of simple lazy learners. In response to it, many feature selection methods have been proposed, including forward sequential selection (FSS) and backward sequential selection (BSS). Although they often produce substantial improvements in accuracy, these methods select the same set of relevant features everywhere in the instance space, and thus represent only a partial solution to the problem. In general, some features will be relevant only in some parts of the space; deleting them may hurt accuracy in those parts, but selecting them will have the same effect in parts where they are irrelevant. This article introduces RC, a new feature selection algorithm that uses a clusteringlike approach to select sets of locally relevant features (i.e., the features it selects may vary from one instance to another). Experiments in a large number of domains from the UCI repository show that RC almost always improves accuracy with respect to FSS and BSS, often with high significance. A study using artificial domains confirms the hypothesis that this difference in performance is due to RC's context sensitivity, and also suggests conditions where this sensitivity will and will not be an advantage. Another feature of RC is that it is faster than FSS and BSS, often by an order of magnitude or more.

Citations

3215 C4.5: Programs for Machine Learning – Quinlan - 1993
930 Case-based reasoning – Kolodner - 1993
792 Instance-Based Learning Algorithms – Kibler - 1991
655 UCI Repository of Machine Learning Databases [machine-readable data repository – Murphy, Aha - 1992
619 The CN2 Induction Algorithm – Clark, Niblett - 1989
490 Irrelevant features and the subset selection problem – John, Kohavi - 1994
456 Pattern Recognition: A Statistical Approach – Devijver, Kittler - 1982
414 Toward memory-based reasoning – Stanfill, C, et al. - 1986
335 Very simple classification rules perform well on most commonly used data sets – Holte - 1993
242 A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features – Cost, Scott, et al. - 1993
220 A practical approach to feature selection – Rendell, Kira - 1992
174 Learning With Many Irrelevant Features – Almuallim, Ditterich - 1991
165 Greedy Attribute Selection – Caruana, Freitag - 1994
153 A Nearest Hyperrectangle Learning Method – Salzberg - 1991
120 Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms – Skalak - 1994
88 Generalizing from case studies: a case study – Aha - 1992
86 Using Decision Trees to Improve Case-Based Learning – Cardie - 1993
72 Unifying instance-based and rule-based induction – Domingos - 1996
66 Constructing decision trees in noisy domains – Niblett - 1987
64 Trading MIPS and Memory for Knowledge Engineering – Creecy, Masand, et al. - 1992
63 Feature Selection and Extraction – Kittler - 1986
61 Efficiently Inducing Determinations: A Complete and Systematic Search Algorithm that Uses Optimal Pruning – Schlimmer - 1993
60 Feature selection for case-based classification of cloud types: an empirical comparison – Aha, Bankert - 1994
58 Learning Representative Exemplars of Concepts: An Initial Case Study – Aha, Kibler - 1987
45 Incremental, instance-based learning of independent and graded concept descriptions – Aha - 1989
45 Using domain knowledge to influence similarity judgement – Cain, Pazzani, et al. - 1991
43 Concept Learning and Flexible Weighting – Aha, Goldstone - 1992
36 A hybrid genetic algorithm for classification – Kelly, Davis - 1991
33 Oblivious decision trees and abstract cases – Langley, Sage - 1994
24 An Optimal Weighting Criterion of Case Indexing for Both Numeric and Symbolic Attributes – Mohri, Tanaka
23 Learning a local similarity metric for case-based reasoning – Ricci, Avesani - 1995
11 Representing cases as knowledge sources that apply local similarity metrics – Skalak - 1992
10 Probability and Statistics (Second Edition – DeGroot - 1986
8 The RISE 2.0 system: A case study in multistrategy learning – Domingos - 1995
5 An Instance-Based LEarning Method for Databases: An Information Theoretic Approach – Lee - 1994
2 Analysis of Artificial Data Sets – Schaffer - 1989