Results 1  10
of
43
Correlationbased feature selection for machine learning
, 1998
"... A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that ..."
Abstract

Cited by 297 (3 self)
 Add to MetaCart
(Show Context)
A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. A feature evaluation formula, based on ideas from test theory, provides an operational definition of this hypothesis. CFS (Correlation based Feature Selection) is an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy. CFS was evaluated by experiments on artificial and natural datasets. Three machine learning algorithms were used: C4.5 (a decision tree learner), IB1 (an instance based learner), and naive Bayes. Experiments on artificial datasets showed that CFS quickly identifies and screens irrelevant, redundant, and noisy features, and identifies relevant features as long as their relevance does not strongly depend on other features. On natural domains, CFS typically eliminated well over half the features. In most cases, classification accuracy using the reduced feature set equaled or bettered accuracy using the complete feature set.
Learning Boolean Concepts in the Presence of Many Irrelevant Features
 Artificial Intelligence
, 1994
"... In many domains, an appropriate inductive bias is the MINFEATURES bias, which prefers consistent hypotheses definable over as few features as possible. This paper defines and studies this bias in Boolean domains. First, it is shown that any learning algorithm implementing the MINFEATURES bias requ ..."
Abstract

Cited by 124 (0 self)
 Add to MetaCart
In many domains, an appropriate inductive bias is the MINFEATURES bias, which prefers consistent hypotheses definable over as few features as possible. This paper defines and studies this bias in Boolean domains. First, it is shown that any learning algorithm implementing the MINFEATURES bias requires \Theta( 1 ffl ln 1 ffi + 1 ffl [2 p + p ln n]) training examples to guarantee PAClearning a concept having p relevant features out of n available features. This bound is only logarithmic in the number of irrelevant features. For implementing the MINFEATURES bias, the paper presents five algorithms that identify a subset of features sufficient to construct a hypothesis consistent with the training examples. FOCUS1 is a straightforward algorithm that returns a minimal and sufficient subset of features in quasipolynomial time. FOCUS2 does the same task as FOCUS1 but is empirically shown to be substantially faster than FOCUS1. Finally, the SimpleGreedy, MutualInformationG...
Wrappers For Performance Enhancement And Oblivious Decision Graphs
, 1995
"... In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are stu ..."
Abstract

Cited by 122 (7 self)
 Add to MetaCart
In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are studied under the wrapper approach. The hypothesis spaces we investigate are: decision tables with a default majority rule (DTMs) and oblivious readonce decision graphs (OODGs).
Feature Selection for Machine Learning: Comparing a Correlationbased Filter Approach to the Wrapper
, 1999
"... Feature selection is often an essential data processing step prior to applying a learning algorithm. The removal of irrelevant and redundant information often improves the performance of machine learning algorithms. There are two common approaches: a wrapper uses the intended learning algorithm its ..."
Abstract

Cited by 50 (0 self)
 Add to MetaCart
Feature selection is often an essential data processing step prior to applying a learning algorithm. The removal of irrelevant and redundant information often improves the performance of machine learning algorithms. There are two common approaches: a wrapper uses the intended learning algorithm itself to evaluate the usefulness of features, while a filter evaluates features according to heuristics based on general characteristics of the data. The wrapper approach is generally considered to produce better feature subsets but runs much more slowly than a filter. This paper describes a new filter approach to feature selection that uses a correlation based heuristic to evaluate the worth of feature subsets When applied as a data preprocessing step for two common machine learning algorithms, the new method compares favourably with the wrapper but requires much less computation. Introduction Many factors affect the success of machine learning on a given task. The quality of the data is one...
Parcel: Feature Subset Selection in Variable Cost Domains
, 1998
"... The vast majority of classification systems are designed with a single set of features, and optimised to a single specified cost. However, in examples such as medical and financial risk modelling, costs are known to vary subsequent to system design. In this paper, we present a design method for feat ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
The vast majority of classification systems are designed with a single set of features, and optimised to a single specified cost. However, in examples such as medical and financial risk modelling, costs are known to vary subsequent to system design. In this paper, we present a design method for feature selection in the presence of varying costs. Starting from the Wilcoxon nonparametric statistic for the performance of a classification system, we introduce a concept called the maximum realisable receiver operating characteristic (MRROC), and prove a related theorem. A novel criterion for feature selection, based on the area under the MRROC curve, is then introduced. This leads to a framework which we call Parcel. This has the flexibility to use different combinations of features at different operating points on the resulting MRROC curve. Empirical support for each stage in our approach is provided by experiments on real world problems, with Parcel achieving superior results. iv v C...
Informationtheoretic algorithm for feature selection
 PATTERN RECOGNITION LETTERS
, 2001
"... Feature selection is used to improve efficiency of learning algorithms by finding an optimal subset of features. However, most feature selection techniques can handle only certain types of data. Additional limitations of existing methods include intensive computational requirements and inability t ..."
Abstract

Cited by 24 (7 self)
 Add to MetaCart
Feature selection is used to improve efficiency of learning algorithms by finding an optimal subset of features. However, most feature selection techniques can handle only certain types of data. Additional limitations of existing methods include intensive computational requirements and inability to identify redundant variables. In this paper, we are presenting a novel, informationtheoretic algorithm for feature selection, which finds an optimal set of attributes by removing both irrelevant and redundant features. The algorithm has a polynomial computational complexity and it is applicable to datasets of mixed nature. The method performance is evaluated on several benchmark datasets by using a standard classifier (C4.5).
Constructing XofN Attributes for Decision Tree Learning
 Machine Learning
, 1998
"... . While many constructive induction algorithms focus on generating new binary attributes, this paper explores novel methods of constructing nominal and numeric attributes. We propose a new constructive operator, XofN. An XofN representation is a set containing one or more attributevalue pairs. ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
. While many constructive induction algorithms focus on generating new binary attributes, this paper explores novel methods of constructing nominal and numeric attributes. We propose a new constructive operator, XofN. An XofN representation is a set containing one or more attributevalue pairs. For a given instance, the value of an XofN representation corresponds to the number of its attributevalue pairs that are true of the instance. A single XofN representation can directly and simply represent any concept that can be represented by a single conjunctive, a single disjunctive, or a single MofN representation commonly used for constructive induction, and the reverse is not true. In this paper, we describe a constructive decision tree learning algorithm, called XofN. When building decision trees, this algorithm creates one XofN representation, either as a nominal attribute or as a numeric attribute, at each decision node. The construction of XofN representations is carrie...
SelfOptimising CBR Retrieval
 In: Proceedings 12th IEEE International Conference on Tools with Artificial Intelligence
, 2000
"... One reason why CaseBased Reasoning (CBR) has become popular is because it reduces development cost compared to rulebased expert systems. Still, the knowledge engineering effort may be demanding. In this paper we present a tool which helps to reduce the knowledge acquisition effort for building a t ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
One reason why CaseBased Reasoning (CBR) has become popular is because it reduces development cost compared to rulebased expert systems. Still, the knowledge engineering effort may be demanding. In this paper we present a tool which helps to reduce the knowledge acquisition effort for building a typical CBR retrieval stage consisting of a decisiontree index and similarity measure. We use Genetic Algorithms to determine the relevance/importance of case features and to find optimal retrieval parameters. The optimisation is done using the data contained in the casebase. Because no (or little) other knowledge is needed this results in a selfoptimising CBR retrieval. To illustrate this we present how the tool has been applied to optimise retrieval for a tablet formulation problem. 1. Introduction Casebased reasoning (CBR) is a problemsolving methodology that finds solutions to new problems by analysing previously solved problems [12]. At the centre of a CBR system is a collection of...
Genetic algorithms for feature selection and weighting
 IN PROCEEDINGS OF THE IJCAIâ€™99 WORKSHOP ON AUTOMATING THE CONSTRUCTION OF CASE BASED REASONERS
, 1999
"... Automated techniques to optimise the retrieval of relevant cases in a CBR system are desirable as a way to reduce the expensive knowledge acquisition phase. This paper concentrates on feature selection methods that assist in indexing the casebase, and feature weighting methods that improve the simi ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Automated techniques to optimise the retrieval of relevant cases in a CBR system are desirable as a way to reduce the expensive knowledge acquisition phase. This paper concentrates on feature selection methods that assist in indexing the casebase, and feature weighting methods that improve the similaritybased selection of relevant cases. Two main types of method are presented: filter methods use no feedback from the learning algorithm that will be applied; wrapper methods incorporate feedback and hence take account of learning bias. Wrapper methods based on Genetic Algorithms have been found to deliver the best results with a tablet design application, but these generic methods are flexible
Finding essential attributes from binary data
, 2002
"... We consider data sets that consist of ndimensional binary vectors representing positive and negative examples for some (possibly unknown) phenomenon. A subset S of the attributes (or variables) of such a data set is called a support set if the positive and negative examples can be distinguished by ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
We consider data sets that consist of ndimensional binary vectors representing positive and negative examples for some (possibly unknown) phenomenon. A subset S of the attributes (or variables) of such a data set is called a support set if the positive and negative examples can be distinguished by using only the attributes in S. In this paper we study the problem of finding small support sets, a frequently arising task in various fields, including knowledge discovery, data mining, learning theory, logical analysis of data, etc. We study the distribution of support sets in randomly generated data, and discuss why finding small support sets is important. We propose several measures of separation (real valued set functions over the subsets of attributes), formulate optimization models for finding the smallest subsets maximizing these measures, and devise efficient heuristic algorithms to solve these (typically NPhard) optimization problems. We prove that several of the proposed heuristics have a guaranteed constant approximation ratio, and we report on computational experience comparing these heuristics with some others from the literature both on randomly generated and on real world data sets.