Download:
|
by Zijian Zheng, Geoffrey I. Webb
Machine Learning
http://www3.cm.deakin.edu.au/~zijian/Papers/zwlbrmlj-final.ps.gz
Add To MetaCart
Abstract:
Abstract. The naive Bayesian classifier provides a simple and effective approach to classifier learning, but its attribute independence assumption is often violated in the real world. A number of approaches have sought to alleviate this problem. A Bayesian tree learning algorithm builds a decision tree, and generates a local naive Bayesian classifier at each leaf. The tests leading to a leaf can alleviate attribute inter-dependencies for the local naive Bayesian classifier. However, Bayesian tree learning still suffers from the small disjunct problem of tree learning. While inferred Bayesian trees demonstrate low average prediction error rates, there is reason to believe that error rates will be higher for those leaves with few training examples. This paper proposes the application of lazy learning techniques to Bayesian tree induction and presents the resulting lazy Bayesian rule learning algorithm, called Lbr. This algorithm can be justified by a variant of Bayes theorem which supports a weaker conditional attribute independence assumption than is required by naive Bayes. For each test example, it builds a most appropriate rule with a local naive Bayesian classifier as its consequent. It is demonstrated that the computational requirements of Lbr are reasonable in a wide cross-section of natural domains. Experiments with these domains show that, on average, this new algorithm obtains lower error rates significantly more often than the reverse in comparison to a naive Bayesian classifier, C4.5, a Bayesian tree learning algorithm, a constructive Bayesian classifier that eliminates attributes and constructs new attributes using Cartesian products of existing nominal attributes, and a lazy decision tree learning algorithm. It also outperforms, although the result is not statistically significant, a selective naive Bayesian classifier.
Citations
|
4388
|
Probabilistic Reasoning in Intelligent Systems
– Pearl
- 1988
|
|
3215
|
C4.5: Programs for Machine Learning
– Quinlan
- 1993
|
|
2961
|
Pattern Classification and Scene Analysis
– Duda, Hart
- 1973
|
|
2438
|
Classification and Regression Trees
– Breiman, Friedman, et al.
- 1984
|
|
2138
|
UCI Repository of Machine Learning Databases
– Merz, Murphy
- 1996
|
|
792
|
Instance-Based Learning Algorithms
– Kibler
- 1991
|
|
533
|
Nearest Neighbor Pattern Classification
– Cover, Hart
- 1967
|
|
490
|
Irrelevant features and the subset selection problem
– John, Kohavi
- 1994
|
|
418
|
Multi-interval discretization of continuous-valued attributes for classification learning
– Fayyad, Irani
- 1993
|
|
345
|
Approximating discrete probability distributions with dependence trees
– Chow, Liu
- 1968
|
|
338
|
A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection
– Kohavi
- 1995
|
|
234
|
Beyond independence: Conditions for the optimality of the simple bayesian classifier
– Domingos, Pazzani
- 1996
|
|
232
|
An Analysis of Bayesian Classifiers
– Langley, Iba, et al.
- 1992
|
|
190
|
The Condensed Nearest Neighbor Rule
– Hart
- 1968
|
|
154
|
Induction of selective bayesian classifiers
– Langley, Sage
- 1994
|
|
141
|
Estimating probabilities: A crucial task in machine learning
– Cestnik
- 1990
|
|
140
|
Concept learning and the problem of small disjuncts
– Holte, Acker, et al.
- 1989
|
|
130
|
A Conservation Law for Generalization Performance
– Schaffer
- 1994
|
|
122
|
Improved Use of Continuous Attributes in C4.5
– Quinlan
- 1996
|
|
107
|
Semi-naive bayesian classifiers
– Kononenko
- 1991
|
|
100
|
Scaling up the accuracy of naive-Bayes classifiers: A decision-tree hybrid
– Kohavi
- 1996
|
|
98
|
Assistant-86: A knowledgeelicitation tool for sophisticated users
– Cestnik, Konenenko, et al.
- 1987
|
|
89
|
The Reduced Nearest Neighbor Rule
– Gates
- 1972
|
|
86
|
Learning Limited Dependence Bayesian Classifiers
– Sahami
- 1996
|
|
74
|
Lazy Decision Trees
– Friedman, Yun, et al.
- 1996
|
|
73
|
Building classifiers using bayesian networks
– Friedman, Goldszmidt
- 1996
|
|
63
|
Feature Selection and Extraction
– Kittler
- 1986
|
|
49
|
A Pattern Recognition Approach for Software Engineering Data Analysis
– Briand, Basili, et al.
- 1992
|
|
47
|
Comparison of inductive and naïve Bayesian learning approaches to automatic knowledge acquisition, In: Current Trends in Knowledge Acquisition
– Kononenko
- 1990
|
|
47
|
Inductive and Bayesian learning in medical diagnosis
– Kononenko
- 1993
|
|
45
|
Induction of recursive Bayesian classifiers
– Langley
- 1993
|
|
38
|
Constructive Induction of Cartesian Product Attributes", chapter 21
– Pazzani
- 2001
|
|
37
|
E cient learning of selective Bayesian network classi ers
– Singh, Provan
- 1995
|
|
33
|
For every generalization action, is there really an equal and opposite reaction? Analysis of the conservation law for generalization performance
– Rao, Gordon
- 1995
|
|
29
|
The problem of small disjuncts: Its remedy in decision trees
– Ting
- 1994
|
|
28
|
An entropy-based learning algorithm of bayesian conditional trees
– Geiger
- 1992
|
|
23
|
A comparison of induction algorithms for selective and non-selective Bayesian classi ers
– Singh, Provan
- 1995
|
|
20
|
Adjusted probability naive Bayesian induction
– Webb
- 1998
|
|
19
|
Nosing around the neighborhood: A new system structure and classification rule for recognition in partially exposed environments
– Dasarathy
- 1980
|
|
15
|
Discretization of continuous-valued attributes and instance-based learning
– Ting
- 1994
|
|
9
|
Statistics for technology, A COURSE IN applied statistics. Science Paperback, third edition
– Chatfield
- 1970
|
|
8
|
Discovering patterns in EEG-signals: Comparative study of a few methods
– Kubat, Flotzinger
- 1993
|
|
4
|
Local induction of decision trees: Towards interactive data mining
– Fulton, Kasif, et al.
- 1996
|
|
3
|
Classification learning using all rules
– Viswanathan
- 1998
|
|
3
|
A heuristic covering algorithm outperforms learning all rules
– Webb
- 1996
|