(Enter summary)
Abstract: . One of the defining challenges for the KDD research community is to enable inductive
learning algorithms to mine very large databases. This paper summarizes, categorizes, and
compares existing work on scaling up inductive algorithms. We concentrate on algorithms that
build decision trees and rule sets, in order to provide focus and specific details; the issues and
techniques generalize to other types of data mining. We begin with a discussion of important
issues related to scaling up. We... (Update)
Cited by: More
Algorithms and Software for Collaborative.. - Caragea, Zhang..
(Correct)
Shared Memory Parallelization of Data Mining Algorithms.. - Jin, Yang, Agrawal (2004)
(Correct)
Prototype--based Mining of Numeric Data Streams - Francisco Ferrer--Troyano..
(Correct)
Similar documents (at the sentence level):
71.7%: A Survey of Methods for Scaling Up Inductive Algorithms - Provost, Kolluri (1999)
(Correct)
18.9%: A Survey of Methods for Scaling Up Inductive Learning Algorithms - Provost, Kolluri (1997)
(Correct)
6.4%: Distributed Data Mining: Scaling up and beyond - Provost (1999)
(Correct)
Active bibliography (related documents): More All
1.0: Scaling Up Inductive Learning with Massive Parallelism - Provost, Aronis
(Correct)
0.9: Rule-space Search for Knowledge-based Discovery - Provost, al. (1999)
(Correct)
0.8: Free Parallel Data Mining - Li (1998)
(Correct)
Similar documents based on text: More All
0.3: Scaling Up Inductive Algorithms: An Overview - Provost, Kolluri (1997)
(Correct)
0.2: Well-Trained PETs: Improving Probability Estimation Trees - Provost, Domingos (2000)
(Correct)
0.2: The WoRLD: Knowledge Discovery from Multiple Distributed.. - Aronis, Provost, Buchanan (1997)
(Correct)
Related documents from co-citation: More All
19: Programs for machine learning (context) - Quinlan - 1993
7: UCI Repository of machine learning databases (context) - Blake, Keogh et al. - 1998
7: Megainduction: A test flight (context) - Catlett - 1991
BibTeX entry: (Update)
Provost, F., & Kolluri, V. (1999). A survey of methods for scaling up inductive algorithms. http://citeseer.ist.psu.edu/provost99survey.html More
@article{ provost99survey,
author = "Foster J. Provost and Venkateswarlu Kolluri",
title = "A Survey of Methods for Scaling Up Inductive Algorithms",
journal = "Data Mining and Knowledge Discovery",
volume = "3",
number = "2",
pages = "131-169",
year = "1999",
url = "citeseer.ist.psu.edu/provost99survey.html" }
Citations (may not include all citations):
2133
Pattern Classification and Scene Analysis (context) - Duda, Hart - 1973
1359
Induction of decision trees (context) - Quinlan - 1986 ACM DBLP
1262
Classification and Regression Trees (context) - Breiman, Friedman et al. - 1984
910
Fast algorithms for mining association rules
- Agrawal, Srikant - 1994
667
UCI repository of machine learning databases (context) - Merz, Murphy - 1997
388
Inductive Logic Programming
- Muggleton - 1992 ACM DBLP
342
Wrappers for feature subset selection
- Kohavi, John - 1997 ACM DBLP
317
Learning quickly when irrelevant attributes abound: A new li.. (context) - Littlestone - 1988
274
Generalization as search (context) - Mitchell - 1982 DBLP
262
From data mining to knowledge discovery: An overview (context) - Fayyad, Piatetsky-Shapiro et al. DBLP
248
Fast effective rule induction
- Cohen - 1995 DBLP
216
Very simple classification rules perform well on most common.. (context) - Holte - 1993 ACM DBLP
209
Mining quantitative association rules in large relational ta..
- Srikant, Agrawal - 1996 ACM DBLP
189
Sampling large databases for association rules
- Toivonen - 1996
164
An efficient algorithm for mining association rules in large.. (context) - Savasere, Omiecinski et al. - 1995 ACM DBLP
163
Learning efficient classification procedures and their appli.. (context) - Quinlan - 1983
162
Simplifying decision trees
- Quinlan - 1987 ACM DBLP
147
Boolean feature discovery in empirical learning (context) - Pagallo, Haussler - 1990 ACM DBLP
145
SPRINT: A scalable parallel classifier for data mining
- Shafer, Agrawal et al. - 1996 DBLP
131
Data mining: An overview from database perspective
- Chen, Han et al. - 1997
111
SLIQ: A fast scalable classifier for data mining
- Mehta, Agrawal et al. - 1996 DBLP
107
Efficient noise-tolerant learning from statistical queries
- Kearns - 1993 ACM DBLP
99
Concept learning and the problem of small disjuncts (context) - Holte, Acker et al. - 1989
96
The need for biases in learning generalizations
- Mitchell - 1980
95
Estimating attributes: Analysis and extensions of Relief
- Kononenko - 1994 DBLP
87
Subset selection in regression (context) - Miller - 1990
86
JAM: Java agents for meta-learning over distributed database..
- Stolfo, Prodromidis et al. - 1997 DBLP
83
Incremental induction of decision trees
- Utgoff - 1989 ACM DBLP
79
Error reduction through learning multiple descriptions
- Ali, Pazzani - 1996 ACM DBLP
62
Megainduction: Machine learning on very large databases (context) - Catlett
60
A theory of learning classification rules
- Buntine - 1991
59
Toward parallel and distributed learning by meta-learning
- Chan, Stolfo - 1993
57
DBMiner: A system for mining knowledge in large relational d..
- Han, Fu et al. - 1996 DBLP
55
Integrating association rule mining with relational database..
- Sarawagi, Thomas et al. - 1998 DBLP
53
Knowledge discovery and data mining: Towards a unifying fram..
- Fayyad, Piatetsky-Shapiro et al. DBLP
51
Wrappers for Performance Enhancement and Oblivious Decision ..
- Kohavi - 1996 ACM
50
An experimental comparison of symbolic and connectionist lea.. (context) - Shavlik, Mooney et al. - 1991 DBLP
49
Adaptive fraud detection
- Fawcett, Provost - 1997 ACM DBLP
47
Parallel depth-first search (context) - Kumar, Rao - 1987
47
Theory and applications of agnostic pac-learning with small ..
- Auer, Holte et al. - 1995 DBLP
47
Megainduction: A test flight (context) - Catlett
45
An experimental comparison of the nearestneighbor and neares..
- Wettschereck, Dietterich - 1995
42
Mining Very Large Databases with Parallel Processing (context) - Freitas, Lavington - 1997 ACM
42
An information theoretic approach to rule induction from dat.. (context) - Smyth, Goodman - 1992 ACM DBLP
42
Efficiently inducing determinations: A complete and systemat..
- Schlimmer - 1993 DBLP
42
The logic of frames (context) - Hayes - 1979
40
Extracting Comprehensible Models from Trained Neural Network.. (context) - Craven - 1996 ACM
37
the accuracy of meta-learning for scalable data mining
- Chan, Stolfo - 1997
34
Developing tightly-coupled data mining applications on a rel..
- Agrawal, Shim - 1996 DBLP
33
Incremental reduced error pruning (context) - Furnkranz, Widmer - 1994 DBLP
32
Static versus dynamic sampling for data mining
- John, Langley - 1996 DBLP
32
Small disjuncts in action: Learning to diagnose errors in th.. (context) - Danyluk, Provost - 1993
31
An SE-tree based characterization of the induction problem
- Rymon - 1993 DBLP
30
Representation design and brute-force induction in a boeing ..
- Riddle, Segal et al. - 1994
30
Knowledge acquisition from examples via multiple models
- Domingos - 1997
29
Credit card fraud detection using meta-learning: Issues and ..
- Stolfo, Fan et al. - 1997
28
Data Surveyor: Searching the nuggets in parallel (context) - Holsheimer, Kersten et al. - 1996 DBLP
28
Decision theoretic subsampling for induction on large databa.. (context) - Musick, Catlett et al. - 1993 DBLP
28
Inductive policy: The pragmatics of bias selection (context) - Provost, Buchanan - 1995 DBLP
28
Efficient pruning methods for separate-and-conquer rule lear.. (context) - Cohen - 1993 DBLP
27
Scaling up: Distributed machine learning with cooperation
- Provost, Hennessy - 1996 DBLP
27
Evaluation of sampling for data mining of association rules
- Zaki, Parthasarathy et al. - 1997 ACM DBLP
25
Feature subset selection using the wrapper model: Overfittin.. (context) - Kohavi, Sommerfield - 1995
24
Learning decision lists using homogeneous rules
- Segal, Etzioni ACM DBLP
23
A review and comparative evaluation of feature weighting met.. (context) - Wettschereck, Aha et al. - 1997
23
The effects of training set size on decision tree complexity
- Oates, Jensen - 1997 ACM DBLP
23
Scaling up inductive learning with massive parallelism
- Provost, Aronis - 1996 ACM DBLP
22
Induction of one-level decision trees
- Iba, Langley - 1992
22
KDD for science data analysis: Issues and examples
- Fayyad, Haussler et al. - 1996 DBLP
22
Scaling up inductive logic programming by learning from inte..
- Blockeel, De Raedt et al. - 1999 DBLP
22
Large data sets lead to overly complex models: an explanatio..
- Oates, Jensen - 1998
22
An ounce of knowledge is worth a ton of data: Quantitative s..
- Gaines - 1989
21
Maximizing the predictive value of production rules (context) - Weiss, Galen et al. - 1990 ACM DBLP
21
Multiple comparisons in induction algorithms
- Jensen, Cohen - 1999 ACM DBLP
20
RL4: A tool for knowledge-based induction (context) - Clearwater, Provost - 1990
20
SKICAT: A machine learning system for automated cataloging o.. (context) - Fayyad, Weir et al. - 1993
19
Knowledge discovery from multiple databases (context) - Ribeiro, Kaufmann et al. - 1995 DBLP
18
Efficient progressive sampling
- Provost, Jensen et al. - 1999 ACM DBLP
18
Distributed machine learning: scaling up with coarsegrained .. (context) - Provost, Hennessy - 1994
18
Communications of the ACM (context) - Fox, Akscyn et al. - 1995
17
An overview of issues in developing industrial data mining a.. (context) - Piatetsky-Shapiro, Brachman et al. - 1996 DBLP
17
Pruning meta-classifiers in a distributed data mining system
- Prodromidis, Stolfo - 1998
15
Incremental batch learning (context) - Clearwater, Cheng et al. - 1989 ACM DBLP
15
Overcoming the myopia of inductive learning algorithms with ..
- Kononenko, Simec et al. - 1997
15
The WoRLD: Knowledge discovery from multiple distributed dat..
- Aronis, Kolluri et al. - 1997
14
DENDRAL and META-DENDRAL: Their applications dimensions (context) - Buchanan, Feigenbaum - 1978
14
Problem solving and rule induction: A unified view (context) - Simon, Lea - 1973
13
Exploiting background knowledge in automated discovery
- Aronis, Provost et al. - 1996 DBLP
13
Efficiently constructing relational features from background..
- Aronis, Provost - 1994 DBLP
13
Linear time rule induction
- Domingos
12
Knowledge representation in the large (context) - Karp, Paley - 1995 DBLP
10
Scaling Up inductive algorithms: An overview
- Provost, Kolluri DBLP
10
PARKA: A System for Massively Parallel Knowledge Representat.. (context) - Evett - 1994
9
Developing tightly-coupled applications on IBM DB2/CS relati.. (context) - Agrawal, Shim - 1995
9
Knowledge probing in distributed data mining (context) - Guo, Sutiwaraphun - 1998
9
SIPping from the data firehose
- John, Lent - 1997 DBLP
9
Scalable Data Mining for Rules (context) - Zaki - 1998 ACM
9
Data Mining and Knowledge Discovery (context) - Fayyad - 1997 ACM
9
Massively parallel matching of knowledge structures
- Andersen, Hendler et al. - 1994 ACM
8
Accelerated learning on the connection machine
- Cook, Holder - 1990
8
Using SQL primitives and parallel DB servers to speed up kno..
- Freitas, Lavington - 1996
8
Efficient specific-to-general rule induction (context) - Domingos DBLP
7
A method for reasoning with structured and continuous attrib..
- Kaufman, Michalski - 1996
7
Scalable parallel classification for data mining on shared m.. (context) - Zaki, Ho et al. - 1999
7
Applications of artificial intelligence for chemical inferen.. (context) - Buchanan, Smith et al. - 1976 ACM
6
the efficient gathering of sufficient statistics for classif..
- Graefe, Fayyad et al. - 1998
6
Policies for the Selection of Bias in Inductive Machine Lear.. (context) - Provost - 1992 ACM
6
Modeling decision tree performance with the power law (context) - Frey, Fisher - 1999
6
Increasing the efficiency of data mining algorithms with bre..
- Aronis, Provost - 1997
6
Special issue on bias evaluation and selection (context) - DesJardins, Gordon - 1995
5
Inducing and Combining Multiple Decision Trees (context) - Williams - 1990
5
Multi-layer incremental induction
- Wu, Lo - 1998 ACM DBLP
5
Private communication (context) - Jensen - 1998
4
Direct access of an ILP algorithm to a database management s..
- Brockhausen, Morik - 1996
4
Using multi-attribute predicates for mining classification r.. (context) - Chen - 1995 ACM DBLP
4
A storage system for scalable knowledge representation (context) - Karp, Paley et al. - 1994 ACM DBLP
4
ARIEL: A massively parallel symbolic learning assistant for .. (context) - Lathrop, Webster et al. - 1990
4
Generating C4 (context) - Kufrin - 1997
4
Sample size and misclassification: Is more always better (context) - Harris-Jones, Haines - 1997
3
A comparison of prediction accuracy (context) - Lim, Loh - 1999
3
Data mining and statistics: What's the connection (context) - Friedman - 1997
3
Integrative windowing
- Furnkranz - 1998 DBLP
3
On handling tree-structure attributes in decision tree learn.. (context) - Almuallim, Akiba et al. - 1995
3
Private communication (context) - Haines - 1998
3
Learning decision lists using homogenous rules (context) - Segal, Etzioni
2
A survey of methods for scaling up inductive learning (context) - Provost, Kolluri
2
A tutorial introduction to high performance data mining (context) - Grossman, Bailey - 1998
2
Rainforest - a frameword for fast decision tree construction.. (context) - Gehrke, Ramakrishnan et al. - 1998
2
Exploiting parallelism in a scientific discovery system to i..
- Galal, Cook et al. - 1999
2
Free Parallel Data Mining
- Li - 1998 ACM DBLP
2
Quantifiying inductive bias: AI learning algorithms and Vali.. (context) - Haussler - 1988
2
Combining decision trees learned in parallel
- Hall, Chawla et al. - 1998
1
Crossing the chasm: From academic machine learning to commer.. (context) - Kohavi - 1998
1
KDD-98 Workshop on Distributed Data Mining (context) - Kargupta, Chan - 1998
1
From large to huge: A statistician's reaction to KDD and DM (context) - INDUCTIVE, Huber - 1997
1
Supporting large-scale computational science (context) - Musick - 1998
1
Towards scalable learning with non-uniform class and cost di.. (context) - INDUCTIVE, Chan et al. - 1998
1
Iterative weakening: Optimal and near-optimal policies for t.. (context) - INDUCTIVE, Provost - 1993 DBLP
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.lans.ece.utexas.edu/course/ee380l/1999fall/papers/list2/index.shtml): More
The Variable Selection Problem - George (1999)
(Correct)
Detecting Change in Categorical Data: Mining Contrast Sets - Bay, Pazzani (1999)
(Correct)
Interactive Data Analysis: The Control Project - Ata Analysis Is (1999)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC