See this document in CiteSeerX!

A Survey of Methods for Scaling Up Inductive Algorithms (1999)  (Make Corrections)  (35 citations)
Foster Provost, Venkateswarlu Kolluri
Data Mining and Knowledge Discovery



  Home/Search   Context   Related

Links:   ACM   DBLP

 
View or download:
utexas.edu/course/...alingsurvey.ps.gz
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  utexas.edu/course/e...index.shtml (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: . One of the defining challenges for the KDD research community is to enable inductive learning algorithms to mine very large databases. This paper summarizes, categorizes, and compares existing work on scaling up inductive algorithms. We concentrate on algorithms that build decision trees and rule sets, in order to provide focus and specific details; the issues and techniques generalize to other types of data mining. We begin with a discussion of important issues related to scaling up. We... (Update)

Cited by:   More
Algorithms and Software for Collaborative.. - Caragea, Zhang..   (Correct)
Shared Memory Parallelization of Data Mining Algorithms.. - Jin, Yang, Agrawal (2004)   (Correct)
Prototype--based Mining of Numeric Data Streams - Francisco Ferrer--Troyano..   (Correct)

Similar documents (at the sentence level):
71.7%:   A Survey of Methods for Scaling Up Inductive Algorithms - Provost, Kolluri (1999)   (Correct)
18.9%:   A Survey of Methods for Scaling Up Inductive Learning Algorithms - Provost, Kolluri (1997)   (Correct)
6.4%:   Distributed Data Mining: Scaling up and beyond - Provost (1999)   (Correct)

Active bibliography (related documents):   More   All
1.0:   Scaling Up Inductive Learning with Massive Parallelism - Provost, Aronis   (Correct)
0.9:   Rule-space Search for Knowledge-based Discovery - Provost, al. (1999)   (Correct)
0.8:   Free Parallel Data Mining - Li (1998)   (Correct)

Similar documents based on text:   More   All
0.3:   Scaling Up Inductive Algorithms: An Overview - Provost, Kolluri (1997)   (Correct)
0.2:   Well-Trained PETs: Improving Probability Estimation Trees - Provost, Domingos (2000)   (Correct)
0.2:   The WoRLD: Knowledge Discovery from Multiple Distributed.. - Aronis, Provost, Buchanan (1997)   (Correct)

Related documents from co-citation:   More   All
19:   Programs for machine learning (context) - Quinlan - 1993
7:   UCI Repository of machine learning databases (context) - Blake, Keogh et al. - 1998
7:   Megainduction: A test flight (context) - Catlett - 1991

BibTeX entry:   (Update)

Provost, F., & Kolluri, V. (1999). A survey of methods for scaling up inductive algorithms. http://citeseer.ist.psu.edu/provost99survey.html   More

@article{ provost99survey,
    author = "Foster J. Provost and Venkateswarlu Kolluri",
    title = "A Survey of Methods for Scaling Up Inductive Algorithms",
    journal = "Data Mining and Knowledge Discovery",
    volume = "3",
    number = "2",
    pages = "131-169",
    year = "1999",
    url = "citeseer.ist.psu.edu/provost99survey.html" }
Citations (may not include all citations):
2133   Pattern Classification and Scene Analysis (context) - Duda, Hart - 1973
1359   Induction of decision trees (context) - Quinlan - 1986  ACM   DBLP
1262   Classification and Regression Trees (context) - Breiman, Friedman et al. - 1984
910   Fast algorithms for mining association rules - Agrawal, Srikant - 1994
667   UCI repository of machine learning databases (context) - Merz, Murphy - 1997
388   Inductive Logic Programming - Muggleton - 1992  ACM   DBLP
342   Wrappers for feature subset selection - Kohavi, John - 1997  ACM   DBLP
317   Learning quickly when irrelevant attributes abound: A new li.. (context) - Littlestone - 1988
274   Generalization as search (context) - Mitchell - 1982  DBLP
262   From data mining to knowledge discovery: An overview (context) - Fayyad, Piatetsky-Shapiro et al.  DBLP
248   Fast effective rule induction - Cohen - 1995  DBLP
216   Very simple classification rules perform well on most common.. (context) - Holte - 1993  ACM   DBLP
209   Mining quantitative association rules in large relational ta.. - Srikant, Agrawal - 1996  ACM   DBLP
189   Sampling large databases for association rules - Toivonen - 1996
164   An efficient algorithm for mining association rules in large.. (context) - Savasere, Omiecinski et al. - 1995  ACM   DBLP
163   Learning efficient classification procedures and their appli.. (context) - Quinlan - 1983
162   Simplifying decision trees - Quinlan - 1987  ACM   DBLP
147   Boolean feature discovery in empirical learning (context) - Pagallo, Haussler - 1990  ACM   DBLP
145   SPRINT: A scalable parallel classifier for data mining - Shafer, Agrawal et al. - 1996  DBLP
131   Data mining: An overview from database perspective - Chen, Han et al. - 1997
111   SLIQ: A fast scalable classifier for data mining - Mehta, Agrawal et al. - 1996  DBLP
107   Efficient noise-tolerant learning from statistical queries - Kearns - 1993  ACM   DBLP
99   Concept learning and the problem of small disjuncts (context) - Holte, Acker et al. - 1989
96   The need for biases in learning generalizations - Mitchell - 1980
95   Estimating attributes: Analysis and extensions of Relief - Kononenko - 1994  DBLP
87   Subset selection in regression (context) - Miller - 1990
86   JAM: Java agents for meta-learning over distributed database.. - Stolfo, Prodromidis et al. - 1997  DBLP
83   Incremental induction of decision trees - Utgoff - 1989  ACM   DBLP
79   Error reduction through learning multiple descriptions - Ali, Pazzani - 1996  ACM   DBLP
62   Megainduction: Machine learning on very large databases (context) - Catlett
60   A theory of learning classification rules - Buntine - 1991
59   Toward parallel and distributed learning by meta-learning - Chan, Stolfo - 1993
57   DBMiner: A system for mining knowledge in large relational d.. - Han, Fu et al. - 1996  DBLP
55   Integrating association rule mining with relational database.. - Sarawagi, Thomas et al. - 1998  DBLP
53   Knowledge discovery and data mining: Towards a unifying fram.. - Fayyad, Piatetsky-Shapiro et al.  DBLP
51   Wrappers for Performance Enhancement and Oblivious Decision .. - Kohavi - 1996  ACM
50   An experimental comparison of symbolic and connectionist lea.. (context) - Shavlik, Mooney et al. - 1991  DBLP
49   Adaptive fraud detection - Fawcett, Provost - 1997  ACM   DBLP
47   Parallel depth-first search (context) - Kumar, Rao - 1987
47   Theory and applications of agnostic pac-learning with small .. - Auer, Holte et al. - 1995  DBLP
47   Megainduction: A test flight (context) - Catlett
45   An experimental comparison of the nearestneighbor and neares.. - Wettschereck, Dietterich - 1995
42   Mining Very Large Databases with Parallel Processing (context) - Freitas, Lavington - 1997  ACM
42   An information theoretic approach to rule induction from dat.. (context) - Smyth, Goodman - 1992  ACM   DBLP
42   Efficiently inducing determinations: A complete and systemat.. - Schlimmer - 1993  DBLP
42   The logic of frames (context) - Hayes - 1979
40   Extracting Comprehensible Models from Trained Neural Network.. (context) - Craven - 1996  ACM
37   the accuracy of meta-learning for scalable data mining - Chan, Stolfo - 1997
34   Developing tightly-coupled data mining applications on a rel.. - Agrawal, Shim - 1996  DBLP
33   Incremental reduced error pruning (context) - Furnkranz, Widmer - 1994  DBLP
32   Static versus dynamic sampling for data mining - John, Langley - 1996  DBLP
32   Small disjuncts in action: Learning to diagnose errors in th.. (context) - Danyluk, Provost - 1993
31   An SE-tree based characterization of the induction problem - Rymon - 1993  DBLP
30   Representation design and brute-force induction in a boeing .. - Riddle, Segal et al. - 1994
30   Knowledge acquisition from examples via multiple models - Domingos - 1997
29   Credit card fraud detection using meta-learning: Issues and .. - Stolfo, Fan et al. - 1997
28   Data Surveyor: Searching the nuggets in parallel (context) - Holsheimer, Kersten et al. - 1996  DBLP
28   Decision theoretic subsampling for induction on large databa.. (context) - Musick, Catlett et al. - 1993  DBLP
28   Inductive policy: The pragmatics of bias selection (context) - Provost, Buchanan - 1995  DBLP
28   Efficient pruning methods for separate-and-conquer rule lear.. (context) - Cohen - 1993  DBLP
27   Scaling up: Distributed machine learning with cooperation - Provost, Hennessy - 1996  DBLP
27   Evaluation of sampling for data mining of association rules - Zaki, Parthasarathy et al. - 1997  ACM   DBLP
25   Feature subset selection using the wrapper model: Overfittin.. (context) - Kohavi, Sommerfield - 1995
24   Learning decision lists using homogeneous rules - Segal, Etzioni  ACM   DBLP
23   A review and comparative evaluation of feature weighting met.. (context) - Wettschereck, Aha et al. - 1997
23   The effects of training set size on decision tree complexity - Oates, Jensen - 1997  ACM   DBLP
23   Scaling up inductive learning with massive parallelism - Provost, Aronis - 1996  ACM   DBLP
22   Induction of one-level decision trees - Iba, Langley - 1992
22   KDD for science data analysis: Issues and examples - Fayyad, Haussler et al. - 1996  DBLP
22   Scaling up inductive logic programming by learning from inte.. - Blockeel, De Raedt et al. - 1999  DBLP
22   Large data sets lead to overly complex models: an explanatio.. - Oates, Jensen - 1998
22   An ounce of knowledge is worth a ton of data: Quantitative s.. - Gaines - 1989
21   Maximizing the predictive value of production rules (context) - Weiss, Galen et al. - 1990  ACM   DBLP
21   Multiple comparisons in induction algorithms - Jensen, Cohen - 1999  ACM   DBLP
20   RL4: A tool for knowledge-based induction (context) - Clearwater, Provost - 1990
20   SKICAT: A machine learning system for automated cataloging o.. (context) - Fayyad, Weir et al. - 1993
19   Knowledge discovery from multiple databases (context) - Ribeiro, Kaufmann et al. - 1995  DBLP
18   Efficient progressive sampling - Provost, Jensen et al. - 1999  ACM   DBLP
18   Distributed machine learning: scaling up with coarsegrained .. (context) - Provost, Hennessy - 1994
18   Communications of the ACM (context) - Fox, Akscyn et al. - 1995
17   An overview of issues in developing industrial data mining a.. (context) - Piatetsky-Shapiro, Brachman et al. - 1996  DBLP
17   Pruning meta-classifiers in a distributed data mining system - Prodromidis, Stolfo - 1998
15   Incremental batch learning (context) - Clearwater, Cheng et al. - 1989  ACM   DBLP
15   Overcoming the myopia of inductive learning algorithms with .. - Kononenko, Simec et al. - 1997
15   The WoRLD: Knowledge discovery from multiple distributed dat.. - Aronis, Kolluri et al. - 1997
14   DENDRAL and META-DENDRAL: Their applications dimensions (context) - Buchanan, Feigenbaum - 1978
14   Problem solving and rule induction: A unified view (context) - Simon, Lea - 1973
13   Exploiting background knowledge in automated discovery - Aronis, Provost et al. - 1996  DBLP
13   Efficiently constructing relational features from background.. - Aronis, Provost - 1994  DBLP
13   Linear time rule induction - Domingos
12   Knowledge representation in the large (context) - Karp, Paley - 1995  DBLP
10   Scaling Up inductive algorithms: An overview - Provost, Kolluri  DBLP
10   PARKA: A System for Massively Parallel Knowledge Representat.. (context) - Evett - 1994
9   Developing tightly-coupled applications on IBM DB2/CS relati.. (context) - Agrawal, Shim - 1995
9   Knowledge probing in distributed data mining (context) - Guo, Sutiwaraphun - 1998
9   SIPping from the data firehose - John, Lent - 1997  DBLP
9   Scalable Data Mining for Rules (context) - Zaki - 1998  ACM
9   Data Mining and Knowledge Discovery (context) - Fayyad - 1997  ACM
9   Massively parallel matching of knowledge structures - Andersen, Hendler et al. - 1994  ACM
8   Accelerated learning on the connection machine - Cook, Holder - 1990
8   Using SQL primitives and parallel DB servers to speed up kno.. - Freitas, Lavington - 1996
8   Efficient specific-to-general rule induction (context) - Domingos  DBLP
7   A method for reasoning with structured and continuous attrib.. - Kaufman, Michalski - 1996
7   Scalable parallel classification for data mining on shared m.. (context) - Zaki, Ho et al. - 1999
7   Applications of artificial intelligence for chemical inferen.. (context) - Buchanan, Smith et al. - 1976  ACM
6   the efficient gathering of sufficient statistics for classif.. - Graefe, Fayyad et al. - 1998
6   Policies for the Selection of Bias in Inductive Machine Lear.. (context) - Provost - 1992  ACM
6   Modeling decision tree performance with the power law (context) - Frey, Fisher - 1999
6   Increasing the efficiency of data mining algorithms with bre.. - Aronis, Provost - 1997
6   Special issue on bias evaluation and selection (context) - DesJardins, Gordon - 1995
5   Inducing and Combining Multiple Decision Trees (context) - Williams - 1990
5   Multi-layer incremental induction - Wu, Lo - 1998  ACM   DBLP
5   Private communication (context) - Jensen - 1998
4   Direct access of an ILP algorithm to a database management s.. - Brockhausen, Morik - 1996
4   Using multi-attribute predicates for mining classification r.. (context) - Chen - 1995  ACM   DBLP
4   A storage system for scalable knowledge representation (context) - Karp, Paley et al. - 1994  ACM   DBLP
4   ARIEL: A massively parallel symbolic learning assistant for .. (context) - Lathrop, Webster et al. - 1990
4   Generating C4 (context) - Kufrin - 1997
4   Sample size and misclassification: Is more always better (context) - Harris-Jones, Haines - 1997
3   A comparison of prediction accuracy (context) - Lim, Loh - 1999
3   Data mining and statistics: What's the connection (context) - Friedman - 1997
3   Integrative windowing - Furnkranz - 1998  DBLP
3   On handling tree-structure attributes in decision tree learn.. (context) - Almuallim, Akiba et al. - 1995
3   Private communication (context) - Haines - 1998
3   Learning decision lists using homogenous rules (context) - Segal, Etzioni
2   A survey of methods for scaling up inductive learning (context) - Provost, Kolluri
2   A tutorial introduction to high performance data mining (context) - Grossman, Bailey - 1998
2   Rainforest - a frameword for fast decision tree construction.. (context) - Gehrke, Ramakrishnan et al. - 1998
2   Exploiting parallelism in a scientific discovery system to i.. - Galal, Cook et al. - 1999
2   Free Parallel Data Mining - Li - 1998  ACM   DBLP
2   Quantifiying inductive bias: AI learning algorithms and Vali.. (context) - Haussler - 1988
2   Combining decision trees learned in parallel - Hall, Chawla et al. - 1998
1   Crossing the chasm: From academic machine learning to commer.. (context) - Kohavi - 1998
1   KDD-98 Workshop on Distributed Data Mining (context) - Kargupta, Chan - 1998
1   From large to huge: A statistician's reaction to KDD and DM (context) - INDUCTIVE, Huber - 1997
1   Supporting large-scale computational science (context) - Musick - 1998
1   Towards scalable learning with non-uniform class and cost di.. (context) - INDUCTIVE, Chan et al. - 1998
1   Iterative weakening: Optimal and near-optimal policies for t.. (context) - INDUCTIVE, Provost - 1993  DBLP



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://www.lans.ece.utexas.edu/course/ee380l/1999fall/papers/list2/index.shtml):   More
The Variable Selection Problem - George (1999)   (Correct)
Detecting Change in Categorical Data: Mining Contrast Sets - Bay, Pazzani (1999)   (Correct)
Interactive Data Analysis: The Control Project - Ata Analysis Is (1999)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC