Results 1  10
of
1,249
Correlationbased feature selection for machine learning
, 1998
"... A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that ..."
Abstract

Cited by 297 (3 self)
 Add to MetaCart
A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. A feature evaluation formula, based on ideas from test theory, provides an operational definition of this hypothesis. CFS (Correlation based Feature Selection) is an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy. CFS was evaluated by experiments on artificial and natural datasets. Three machine learning algorithms were used: C4.5 (a decision tree learner), IB1 (an instance based learner), and naive Bayes. Experiments on artificial datasets showed that CFS quickly identifies and screens irrelevant, redundant, and noisy features, and identifies relevant features as long as their relevance does not strongly depend on other features. On natural domains, CFS typically eliminated well over half the features. In most cases, classification accuracy using the reduced feature set equaled or bettered accuracy using the complete feature set.
Feature Subset Selection Using A Genetic Algorithm
, 1997
"... : Practical pattern classification and knowledge discovery problems require selection of a subset of attributes or features (from a much larger set) to represent the patterns to be classified. This is due to the fact that the performance of the classifier (usually induced by some learning algorithm) ..."
Abstract

Cited by 273 (7 self)
 Add to MetaCart
: Practical pattern classification and knowledge discovery problems require selection of a subset of attributes or features (from a much larger set) to represent the patterns to be classified. This is due to the fact that the performance of the classifier (usually induced by some learning algorithm) and the cost of classification are sensitive to the choice of the features used to construct the classifier. Exhaustive evaluation of possible feature subsets is usually infeasible in practice because of the large amount of computational effort required. Genetic algorithms, which belong to a class of randomized heuristic search techniques, offer an attractive approach to find nearoptimal solutions to such optimization problems. This paper presents an approach to feature subset selection using a genetic algorithm. Some advantages of this approach include the ability to accommodate multiple criteria such as accuracy and cost of classification into the feature selection process and to find fe...
Interval propagation to reason about sets: definition and implementation of a practical language
 CONSTRAINTS
, 1997
"... Local consistency techniques have been introduced in logic programming in order to extend the application domain of logic programming languages. The existing languages based on these techniques consider arithmetic constraints applied to variables ranging over nite integer domains. This makes difficu ..."
Abstract

Cited by 121 (8 self)
 Add to MetaCart
Local consistency techniques have been introduced in logic programming in order to extend the application domain of logic programming languages. The existing languages based on these techniques consider arithmetic constraints applied to variables ranging over nite integer domains. This makes difficult a natural and concise modelling as well as an efficient solving of a class of NPcomplete combinatorial search problems dealing with sets. To overcome these problems, we propose a solution which consists in extending the notion of integer domains to that of set domains (sets of sets). We specify a set domain by an interval whose lower and upper bounds are known sets, ordered by set inclusion. We define the formal and practical framework of a new constraint logic programming language over set domains, called Conjunto. Conjunto comprises the usual set operation symbols ([ � \ � n), and the set inclusion relation (). Set expressions built using the operation symbols are interpreted as relations (s [ s1 = s2,...). In addition, Conjunto provides us with a set of constraints called graduated constraints (e.g. the set cardinality) which map sets onto arithmetic terms. This allows us to handle optimization problems by applying a cost function to the quantifiable, i.e., arithmetic, terms which are associated to set terms. The constraint solving in Conjunto is based on local consistency techniques using interval reasoning which are extended to handle set constraints. The main contribution of this paper concerns the formal definition of the language and its design and implementation as a practical language.
Data Mining in Soft Computing Framework: A Survey
 IEEE Transactions on Neural Networks
, 2001
"... The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the mode ..."
Abstract

Cited by 105 (3 self)
 Add to MetaCart
(Show Context)
The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model. The utility of the different soft computing methodologies is highlighted. Generally fuzzy sets are suitable for handling the issues related to understandability of patterns, incomplete/noisy data, mixed media information and human interaction, and can provide approximate solutions faster. Neural networks are nonparametric, robust, and exhibit good learning and generalization capabilities in datarich environments. Genetic algorithms provide efficient search algorithms to select a model, from mixed media data, based on some preference criterion/objective function. Rough sets are suitable for handling different types of uncertainty in data. Some challenges to data mining and the application of soft computing methodologies are indicated. An extensive bibliography is also included.
Rough sets: some extensions
 Information Sciences 177
, 2007
"... This article was originally published in a journal published by Elsevier, and the attached copy is provided by Elsevier for the author’s benefit and for the benefit of the author’s institution, for noncommercial research and educational use including without limitation use in instruction at your in ..."
Abstract

Cited by 80 (6 self)
 Add to MetaCart
(Show Context)
This article was originally published in a journal published by Elsevier, and the attached copy is provided by Elsevier for the author’s benefit and for the benefit of the author’s institution, for noncommercial research and educational use including without limitation use in instruction at your institution, sending it to specific colleagues that you know, and providing a copy to your institution’s administrator. All other uses, reproduction and distribution, including without limitation commercial reprints, selling or licensing copies or access, or posting on open internet sites, your personal or institution’s website or repository, are prohibited. For exceptions, permission may be sought for such use through Elsevier’s permissions site at:
Rough Mereology: A New Paradigm For Approximate Reasoning
, 1996
"... We are concerned with formal models of reasoning under uncertainty. Many approaches to this problem are known in the literature e.g. DempsterShafer theory, bayesianbased reasoning, belief networks, fuzzy logics etc. We propose rough mereology as a foundation for approximate reasoning about complex ..."
Abstract

Cited by 78 (33 self)
 Add to MetaCart
We are concerned with formal models of reasoning under uncertainty. Many approaches to this problem are known in the literature e.g. DempsterShafer theory, bayesianbased reasoning, belief networks, fuzzy logics etc. We propose rough mereology as a foundation for approximate reasoning about complex objects. Our notion of a complex object includes approximate proofs understood as schemes constructed to support our assertions about the world on the basis of our incomplete or uncertain knowledge. 1 Introduction We present a formal model of approximate reasoning about processes of synthesis of complex systems. First ideas of this approach have been presented in [15], [24], [25], [27], [28], [29], [30], [31]. Our research has been stimulated by the demand for solutions of the following groups of problems, estimated in [1] to be crucial for the progress in the area of automated design and manufacturing. These groups of problems are concerned with the treatment of: Group 1. Poorly defined...
Perspectives of granular computing
 Proceedings of 2005 IEEE International Conference on Granular Computing
, 2005
"... Abstract—As an emerging field of study, granular computing has received much attention. Many models, frameworks, methods and techniques have been proposed and studied. It is perhaps the time to seek for a general and unified view so that fundamental issues can be examined and clarified. This paper e ..."
Abstract

Cited by 72 (20 self)
 Add to MetaCart
(Show Context)
Abstract—As an emerging field of study, granular computing has received much attention. Many models, frameworks, methods and techniques have been proposed and studied. It is perhaps the time to seek for a general and unified view so that fundamental issues can be examined and clarified. This paper examines granular computing from three perspectives. By viewing granular computing as a way of structured thinking, we focus on its philosophical foundations in modeling human perception of the reality. By viewing granular computing as a method of structured problem solving, we examine its theoretical and methodological foundations in solving a wide range of realworld problems. By viewing granular computing as a paradigm of information processing, we turn our attention to its more concrete techniques. The three perspectives together offer a holistic view of granular computing.
Dynamic Reducts as a Tool for Extracting Laws from Decisions Tables
, 1994
"... . We apply rough set methods and boolean reasoning for knowledge discovery from decision tables. It is not always possible to extract general laws from experimental data by computing first all reducts [12] of a decision table and next decision rules on the basis of these reducts. We investigate a pr ..."
Abstract

Cited by 71 (13 self)
 Add to MetaCart
. We apply rough set methods and boolean reasoning for knowledge discovery from decision tables. It is not always possible to extract general laws from experimental data by computing first all reducts [12] of a decision table and next decision rules on the basis of these reducts. We investigate a problem how information about the reduct set changes in a random sampling process of a given decision table could be used to generate these laws. The reducts stable in the process of decision table sampling are called dynamic reducts. Dynamic reducts define the set of attributes called the dynamic core. This is the set of attributes included in all dynamic reducts. The set of decision rules can be computed from the dynamic core or from the best dynamic reducts. We report the results of experiments with different data sets, e.g. market data, medical data, textures and handwritten digits. The results are showing that dynamic reducts can help to extract laws from decision tables. Key words: evol...
Current Approaches to Handling Imperfect Information in Data and Knowledge Bases
, 1996
"... This paper surveys methods for representing and reasoning with imperfect information. It opens with an attempt to classify the different types of imperfection that may pervade data, and a discussion of the sources of such imperfections. The classification is then used as a framework for considering ..."
Abstract

Cited by 69 (1 self)
 Add to MetaCart
(Show Context)
This paper surveys methods for representing and reasoning with imperfect information. It opens with an attempt to classify the different types of imperfection that may pervade data, and a discussion of the sources of such imperfections. The classification is then used as a framework for considering work that explicitly concerns the representation of imperfect information, and related work on how imperfect information may be used as a basis for reasoning. The work that is surveyed is drawn from both the field of databases and the field of artificial intelligence. Both of these areas have long been concerned with the problems caused by imperfect information, and this paper stresses the relationships between the approaches developed in each.
SemanticsPreserving Dimensionality Reduction: Rough and FuzzyRough Based Approaches
 IEEE Transactions on Knowledge and Data Engineering
, 2004
"... Abstract—Semanticspreserving dimensionality reduction refers to the problem of selecting those input features that are most predictive of a given outcome; a problem encountered in many areas such as machine learning, pattern recognition, and signal processing. This has found successful application ..."
Abstract

Cited by 68 (11 self)
 Add to MetaCart
(Show Context)
Abstract—Semanticspreserving dimensionality reduction refers to the problem of selecting those input features that are most predictive of a given outcome; a problem encountered in many areas such as machine learning, pattern recognition, and signal processing. This has found successful application in tasks that involve data sets containing huge numbers of features (in the order of tens of thousands), which would be impossible to process further. Recent examples include text processing and Web content classification. One of the many successful applications of rough set theory has been to this feature selection area. This paper reviews those techniques that preserve the underlying semantics of the data, using crisp and fuzzy rough setbased methodologies. Several approaches to feature selection based on rough set theory are experimentally compared. Additionally, a new area in feature selection, feature grouping, is highlighted and a rough setbased feature grouping technique is detailed. Index Terms—Dimensionality reduction, feature selection, feature transformation, rough selection, fuzzyrough selection. 1