In this paper we re-investigate windowing for rule learning algorithms. We show that, contrary to previous results for decision tree learning, windowing can in fact achieve significant run-time gains in noise-free domains and explain the different behavior of rule learning algorithms by the fact that they learn each rule independently. The main contribution of this paper is integrative windowing, a new type of algorithm that further exploits this property by integrating good rules into the final theory right after they have been discovered. Thus it avoids re-learning these rules in subsequent iterations of the windowing process. Experimental evidence in a variety of noise-free domains shows that integrative windowing can in fact achieve substantial run-time gains. Furthermore, we discuss the problem of noise in windowing and present an algorithm that is able to achieve run-time gains in a set of experiments in a simple domain with artificial noise.
|
3215
|
C4.5: Programs for Machine Learning
– Quinlan
- 1993
|
|
2489
|
Induction of Decision Trees
– Quinlan
- 1986
|
|
2438
|
Classification and Regression Trees
– Breiman, Friedman, et al.
- 1984
|
|
1453
|
Bagging Predictors
– Breiman
- 1996
|
|
1004
|
Experiments with a new boosting algorithm
– Schapire
- 1996
|
|
747
|
Learning logical definitions from relations
– Quinlan
- 1990
|
|
620
|
Fast effective rule induction
– Cohen
- 1995
|
|
619
|
The CN2 Induction Algorithm
– Clark, Niblett
- 1989
|
|
575
|
Combining labeled and unlabeled data with co-training
– Blum, Mitchell
- 1998
|
|
540
|
Wrappers for Feature Subset Selection
– Kohavi, John
- 1997
|
|
486
|
Inverse entailment and Progol
– Muggleton
- 1995
|
|
337
|
Solving multiclass learning problems via error-correcting output codes
– Dietterich, Bakiri
- 1995
|
|
278
|
Learning efficient classification procedures and their application to chess endgames
– Quinlan
- 1983
|
|
263
|
Rule induction with CN2: Some recent improvements
– CLARK, BOSWELL
- 1991
|
|
247
|
Selection of relevant features and examples in machine learning
– Blum, Langley
- 1997
|
|
207
|
Improving generalization with active learning
– Cohn, Atlas, et al.
- 1994
|
|
199
|
Arcing classifiers
– Breiman
- 1998
|
|
168
|
Selective sampling using the query by committee algorithm
– Freund, Seung, et al.
- 1997
|
|
168
|
Query by Committee
– Seung, Opper, et al.
|
|
165
|
Greedy Attribute Selection
– Caruana, Freitag
- 1994
|
|
151
|
Heterogeneous Uncertainty Sampling for Supervised Learning
– Lewis, Catlett
- 1994
|
|
147
|
data mining to knowledge discovery in databases
– Fayyad, Piatetsky-Shapiro, et al.
- 1996
|
|
143
|
Employing em and pool-based active learning for text classification
– McCallum, Nigam
- 1998
|
|
140
|
Concept learning and the problem of small disjuncts
– Holte, Acker, et al.
- 1989
|
|
113
|
Shift of bias for inductive concept-learning
– Utgoff
- 1986
|
|
111
|
On the handling of continuous-valued attributes in decision tree generation
– Fayyad, Irani
- 1992
|
|
87
|
Incremental reduced error pruning
– Fürnkranz, Widmer
- 1994
|
|
84
|
Megainduction: machine learning on very large databases
– Catlett
- 1991
|
|
83
|
Boosting performance in neural networks
– Drucker, Schapire, et al.
- 1993
|
|
81
|
Discovering rules by induction from large collections of examples
– Quinlan
- 1979
|
|
79
|
Committee-based sampling for training probabilistic classifiers
– Dagan, Engelson
- 1995
|
|
77
|
Separate-and-Conquer Rule Learning
– Furnkranz
- 1999
|
|
75
|
Active Learning with Committees for Text Categorization
– Liere, Tadepelli
- 1997
|
|
61
|
An investigation of noise-tolerant relational concept learning algorithms
– Brunk, Pazzani
- 1991
|
|
57
|
Static versus dynamic sampling for data mining
– John, Langley
- 1996
|
|
56
|
Megainduction: a Test Flight
– Catlett
- 1991
|
|
53
|
An experimental comparison of human and machine learning formalisms
– Muggleton, Bain, et al.
- 1989
|
|
52
|
The power of sampling in knowledge discovery
– Kivinen, Mannila
- 1994
|
|
39
|
Training Text Classifiers by Uncertainty Sampling
– Lewis, Gale
- 1994
|
|
29
|
Pasting bites together for prediction in large data sets
– Breiman
- 1999
|
|
25
|
Pruning Algorithms for Rule Learning
– Furnkranz
- 1997
|
|
24
|
Approximate dependency inference from relations
– Kivinen, Mannila
- 1992
|
|
20
|
Peepholing: choosing attributes efficiently for megainduction
– Catlett
- 1992
|
|
18
|
How to shift bias: Lessons from the Baldwin effect
– Turney
- 1996
|
|
17
|
On the Quasi-Minimal Solution of the Covering Problem
– Michalski
- 1969
|
|
15
|
Experiments on the costs and benefits of windowing in ID3
– Wirth, Catlett
- 1988
|
|
13
|
Tractable induction and classification in first order logic via stochastic matching
– Sebag, Rouveirol
- 1997
|
|
12
|
Conditions for Occam's Razor applicability and noise elimination
– Gamberger, Lavrac
- 1997
|
|
12
|
Supervised Learning on Large Redundant Training sets
– M��ller
- 1993
|
|
11
|
Using partitioning to speed up specific-to-general rule induction
– Domingos
- 1996
|