Results 1  10
of
883
Instancebased learning algorithms
 Machine Learning
, 1991
"... Abstract. Storing and using specific instances improves the performance of several supervised learning algorithms. These include algorithms that learn decision trees, classification rules, and distributed networks. However, no investigation has analyzed algorithms that use only specific instances to ..."
Abstract

Cited by 1359 (18 self)
 Add to MetaCart
Abstract. Storing and using specific instances improves the performance of several supervised learning algorithms. These include algorithms that learn decision trees, classification rules, and distributed networks. However, no investigation has analyzed algorithms that use only specific instances to solve incremental learning tasks. In this paper, we describe a framework and methodology, called instancebased learning, that generates classification predictions using only specific instances. Instancebased learning algorithms do not maintain a set of abstractions derived from specific instances. This approach extends the nearest neighbor algorithm, which has large storage requirements. We describe how storage requirements can be significantly reduced with, at most, minor sacrifices in learning rate and classification accuracy. While the storagereducing algorithm performs well on several realworld databases, its performance degrades rapidly with the level of attribute noise in training instances. Therefore, we extended it with a significance test to distinguish noisy instances. This extended algorithm's performance degrades gracefully with increasing noise levels and compares favorably with a noisetolerant decision tree algorithm.
Learning logical definitions from relations
 MACHINE LEARNING
, 1990
"... This paper describes FOIL, a system that learns Horn clauses from data expressed as relations. FOIL is based on ideas that have proved effective in attributevalue learning systems, but extends them to a firstorder formalism. This new system has been applied successfully to several tasks taken fro ..."
Abstract

Cited by 930 (8 self)
 Add to MetaCart
This paper describes FOIL, a system that learns Horn clauses from data expressed as relations. FOIL is based on ideas that have proved effective in attributevalue learning systems, but extends them to a firstorder formalism. This new system has been applied successfully to several tasks taken from the machine learning literature.
On the optimality of the simple Bayesian classifier under zeroone loss
 MACHINE LEARNING
, 1997
"... The simple Bayesian classifier is known to be optimal when attributes are independent given the class, but the question of whether other sufficient conditions for its optimality exist has so far not been explored. Empirical results showing that it performs surprisingly well in many domains containin ..."
Abstract

Cited by 805 (26 self)
 Add to MetaCart
The simple Bayesian classifier is known to be optimal when attributes are independent given the class, but the question of whether other sufficient conditions for its optimality exist has so far not been explored. Empirical results showing that it performs surprisingly well in many domains containing clear attribute dependences suggest that the answer to this question may be positive. This article shows that, although the Bayesian classifier’s probability estimates are only optimal under quadratic loss if the independence assumption holds, the classifier itself can be optimal under zeroone loss (misclassification rate) even when this assumption is violated by a wide margin. The region of quadraticloss optimality of the Bayesian classifier is in fact a secondorder infinitesimal fraction of the region of zeroone optimality. This implies that the Bayesian classifier has a much greater range of applicability than previously thought. For example, in this article it is shown to be optimal for learning conjunctions and disjunctions, even though they violate the independence assumption. Further, studies in artificial domains show that it will often outperform more powerful classifiers for common training set sizes and numbers of attributes, even if its bias is a priori much less appropriate to the domain. This article’s results also imply that detecting attribute dependence is not necessarily the best way to extend the Bayesian classifier, and this is also verified empirically.
Selection of relevant features and examples in machine learning
 ARTIFICIAL INTELLIGENCE
, 1997
"... In this survey, we review work in machine learning on methods for handling data sets containing large amounts of irrelevant information. We focus on two key issues: the problem of selecting relevant features, and the problem of selecting relevant examples. We describe the advances that have been mad ..."
Abstract

Cited by 590 (2 self)
 Add to MetaCart
(Show Context)
In this survey, we review work in machine learning on methods for handling data sets containing large amounts of irrelevant information. We focus on two key issues: the problem of selecting relevant features, and the problem of selecting relevant examples. We describe the advances that have been made on these topics in both empirical and theoretical work in machine learning, and we present a general framework that we use to compare different methods. We close with some challenges for future work in this area.
Estimating Continuous Distributions in Bayesian Classifiers
 In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence
, 1995
"... When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality ..."
Abstract

Cited by 489 (2 self)
 Add to MetaCart
When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality assumption and instead use statistical methods for nonparametric density estimation. For a naive Bayesian classifier, we present experimental results on a variety of natural and artificial domains, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using nonparametric kernel density estimation. We observe large reductions in error on several natural and artificial data sets, which suggests that kernel estimation is a useful tool for learning Bayesian models. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, San Mateo, 1995 1 Introduction In rec...
An analysis of Bayesian classifiers
 IN PROCEEDINGS OF THE TENTH NATIONAL CONFERENCE ON ARTI CIAL INTELLIGENCE
, 1992
"... In this paper we present anaveragecase analysis of the Bayesian classifier, a simple induction algorithm that fares remarkably well on many learning tasks. Our analysis assumes a monotone conjunctive target concept, and independent, noisefree Boolean attributes. We calculate the probability that t ..."
Abstract

Cited by 432 (17 self)
 Add to MetaCart
In this paper we present anaveragecase analysis of the Bayesian classifier, a simple induction algorithm that fares remarkably well on many learning tasks. Our analysis assumes a monotone conjunctive target concept, and independent, noisefree Boolean attributes. We calculate the probability that the algorithm will induce an arbitrary pair of concept descriptions and then use this to compute the probability of correct classification over the instance space. The analysis takes into account the number of training instances, the number of attributes, the distribution of these attributes, and the level of class noise. We also explore the behavioral implications of the analysis by presenting
Rule Induction with CN2: Some Recent Improvements
, 1991
"... The CN2 algorithm induces an ordered list of classification rules from examples using entropy as its search heuristic. In this short paper, we describe two improvements to this algorithm. Firstly, we present the use of the Laplacian error estimate as an alternative evaluation function and secondly, ..."
Abstract

Cited by 381 (2 self)
 Add to MetaCart
The CN2 algorithm induces an ordered list of classification rules from examples using entropy as its search heuristic. In this short paper, we describe two improvements to this algorithm. Firstly, we present the use of the Laplacian error estimate as an alternative evaluation function and secondly, we show how unordered as well as ordered rules can be generated. We experimentally demonstrate significantly improved performances resulting from these changes, thus enhancing the usefulness of CN2 as an inductive tool. Comparisons with Quinlan's C4.5 are also made. Keywords: learning, rule induction, CN2, Laplace, noise 1 Introduction Rule induction from examples has established itself as a basic component of many machine learning systems, and has been the first ML technology to deliver commercially successful applications (eg. the systems GASOIL [Slocombe et al., 1986], BMT [HayesMichie, 1990], and in process control [Leech, 1986]). The continuing development of inductive techniques is t...
Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier
"... The simple Bayesian classifier (SBC) is commonly thought to assume that attributes are independent given the class, but this is apparently contradicted by the surprisingly good performance it exhibits in many domains that contain clear attribute dependences. No explanation for this has been proposed ..."
Abstract

Cited by 353 (8 self)
 Add to MetaCart
The simple Bayesian classifier (SBC) is commonly thought to assume that attributes are independent given the class, but this is apparently contradicted by the surprisingly good performance it exhibits in many domains that contain clear attribute dependences. No explanation for this has been proposed so far. In this paper we show that the SBC does not in fact assume attribute independence, and can be optimal even when this assumption is violated by a wide margin. The key to this finding lies in the distinction between classification and probability estimation: correct classification can be achieved even when the probability estimates used contain large errors. We show that the previouslyassumed region of optimality of the SBC is a secondorder infinitesimal fraction of the actual one. This is followed by the derivation of several necessary and several sufficient conditions for the optimality of the SBC. For example, the SBC is optimal for learning arbitrary conjunctions and disjunctions, even though they violate the independence assumption. The paper also reports empirical evidence of the SBC's competitive performance in domains containing substantial degrees of attribute dependence.
Correlationbased feature selection for machine learning
, 1998
"... A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that ..."
Abstract

Cited by 297 (3 self)
 Add to MetaCart
A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. A feature evaluation formula, based on ideas from test theory, provides an operational definition of this hypothesis. CFS (Correlation based Feature Selection) is an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy. CFS was evaluated by experiments on artificial and natural datasets. Three machine learning algorithms were used: C4.5 (a decision tree learner), IB1 (an instance based learner), and naive Bayes. Experiments on artificial datasets showed that CFS quickly identifies and screens irrelevant, redundant, and noisy features, and identifies relevant features as long as their relevance does not strongly depend on other features. On natural domains, CFS typically eliminated well over half the features. In most cases, classification accuracy using the reduced feature set equaled or bettered accuracy using the complete feature set.
Feature Selection for Classification
 Intelligent Data Analysis
, 1997
"... Feature selection has been the focus of interest for quite some time and much work has been done. With the creation of huge databases and the consequent requirements for good machine learning techniques, new problems arise and novel approaches to feature selection are in demand. This survey is a com ..."
Abstract

Cited by 285 (9 self)
 Add to MetaCart
(Show Context)
Feature selection has been the focus of interest for quite some time and much work has been done. With the creation of huge databases and the consequent requirements for good machine learning techniques, new problems arise and novel approaches to feature selection are in demand. This survey is a comprehensive overview of many existing methods from the 1970's to the present. It identifies four steps of a typical feature selection method, and categorizes the different existing methods in terms of generation procedures and evaluation functions, and reveals hitherto unattempted combinations of generation procedures and evaluation functions. Representative methods are chosen from each category for detailed explanation and discussion via example. Benchmark datasets with different characteristics are used for comparative study. The strengths and weaknesses of different methods are explained. Guidelines for applying feature selection methods are given based on data types and domain characteris...