(Enter summary)
Abstract: This paper establishes common ground for researchers addressing the challenge of scaling up inductive data mining algorithms to very large databases, and for practitioners who want to understand the state of the art. We begin with a discussion of important, but often tacit, issues related to scaling up. We then overview existing methods, categorizing them into three main approaches. Finally, we use the overview to recommend how to proceed when dealing with a large problem and where future... (Update)
Context of citations to this paper: More
...on very large data sets. In fact, there is a large body of literature on attempts to scale up algorithms to handle large data sets [1, 2, 3]. This body of work primarily addresses the issue of how to reduce the high computational costs of traditional learning algorithms so...
.... approaches are surveyed in more detail in [Freitas Lavington 98] A related survey of approaches for scaling up data mining: Provost Kolluri 97,98] Divides scalable data mining into three main approaches: 1) Data Partitioning instance sampling attribute sampling...
Cited by: More
A Hybrid Model for Delivering Internet-based Distributed Data.. - Krishnaswamy (2002)
(Correct)
Distributed Data Mining Systems - Prodromidis (1999)
(Correct)
Effective and Efficient Pruning of Meta-Classifiers in a .. - Prodromidis, Stolfo.. (1998)
(Correct)
Similar documents (at the sentence level):
43.6%: A Survey of Methods for Scaling Up Inductive Learning Algorithms - Provost, Kolluri (1997)
(Correct)
16.3%: A Survey of Methods for Scaling Up Inductive Algorithms - Provost, Kolluri (1999)
(Correct)
Active bibliography (related documents): More All
0.1: Collective Data Mining: A New Perspective Toward Distributed.. - Kargupta, al (1999)
(Correct)
0.1: PKDD'98 Tutorial on Scalable, High-Performance Data Mining with.. - Freitas (1998)
(Correct)
0.0: A Study of Support Vectors on Model Independent Example Selection - Syed, Li, Sung (1999)
(Correct)
Similar documents based on text: More All
0.2: Distributed Data Mining: Scaling up and beyond - Provost (1999)
(Correct)
0.2: Scaling up and Evaluation - Esther Duflo Paper
(Correct)
0.1: The WoRLD: Knowledge Discovery from Multiple Distributed.. - Aronis, Provost, Buchanan (1997)
(Correct)
Related documents from co-citation: More All
7: Programs for machine learning (context) - Quinlan - 1993
6: Jam: Java agents for meta-learning over distributed databases
- Stolfo, Prodromidis et al. - 1997
6: Classification and Regression Trees (context) - Breiman, Friedman et al. - 1984
BibTeX entry: (Update)
F. Provost and V. Kolluri. Scaling up inductive algorithms: An overview. In Proc. Third Intl. Conf. Knowledge Discovery and Data Mining, pages 239--242, 1997. http://citeseer.ist.psu.edu/provost97scaling.html More
@inproceedings{ provost97scaling,
author = "Foster J. Provost and Venkateswarlu Kolluri",
title = "Scaling Up Inductive Algorithms: An Overview",
booktitle = "Knowledge Discovery and Data Mining",
pages = "239--242",
year = "1997",
url = "citeseer.ist.psu.edu/provost97scaling.html" }
Citations (may not include all citations):
274
Generalization as search (context) - Mitchell - 1982
216
Very Simple Classification Rules Perform Well on Most Common.. (context) - Holte - 1993
209
Mining Quantitative Association Rules in Large Relational Ta..
- Srikant, Agrawal - 1996
149
Quantifying inductive bias: AI learning algorithms and Valia.. (context) - Haussler - 1988
62
Megainduction: machine learning on very large databases (context) - Catlett - 1991
60
A theory of learning classification rules
- Buntine - 1991
53
Knowledge Discovery and Data Mining: Towards a Unifying Fram..
- Fayyad, Piatetsky-Shapiro et al. - 1996
28
Inductive Policy: The pragmatics of bias selection (context) - Provost, Buchanan - 1995
25
Feature subset selection using wrapper model: Overfitting an.. (context) - Kohavi, Sommerfield - 1995
23
Scaling up inductive learning with massive parallelism
- Provost, Aronis - 1996
14
Problem solving and rule induction: A unified view (context) - Simon, Lea - 1973
13
A Survey of Methods for Scaling Up Inductive Learning Algori..
- Provost, Kolluri - 1997
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.stern.nyu.edu/~fprovost): More
On Applied Research in Machine Learning - Provost (1998)
(Correct)
Well-Trained PETs: Improving Probability Estimation Trees - Provost, Domingos (2000)
(Correct)
Machine Learning from Imbalanced Data Sets 101 (Extended Abstract) - Provost
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC