(Enter summary)
Abstract: One of the defining challenges for the KDD (Knowledge Discovery in Databases)
research community is to enable inductive learning algorithms to mine very large data
bases. We discuss various issues related to scaling up and provide an overview of
the various methods to scale up data mining. Specifically, we discuss sampling as an
efficient, simple and robust technique to scale up data mining. We also discuss on
estimating the minimum sample size required for the data mining algorithm, and the... (Update)
Similar documents based on text: More All
0.3: Using Data Mining for Crop Genebank Management - Addala, al.
(Correct)
0.2: Scaling up and Evaluation - Esther Duflo Paper
(Correct)
0.0: Vision-Aided Outdoor Navigation of an Autonomous.. - Southall, Hague.. (1999)
(Correct)
BibTeX entry: (Update)
@misc{ addala-sampling,
author = "Raju Addala",
title = "Sampling: An efficient, simple and robust technique for scaling up data
mining",
url = "citeseer.ist.psu.edu/394238.html" }
Citations (may not include all citations):
203
What size net gives valid generalization (context) - Baum, Haussler - 1989
189
Sampling large databases for association rules
- Toivonen - 1996
139
Knowledge discovery in databases: an overview
- Frawley, Piatetsky-Shapiro et al. - 1991
35
A survey of methods for scaling up inductive algorithms
- Provost, Kolluri - 1999
32
Static versus dynamic sampling for data mining
- John, Langley - 1996
29
The power of sampling in knowledge discovery
- Kivinen, Mannila - 1993
23
The effects of training set size on decision tree complexity
- Oates, Jensen - 1997
22
Large datasets lead to overly complex models: an explanation..
- Oates, Jensen - 1998
21
A study of two sampling methods for analysing large datasets..
- Srinivasan - 1999
21
Department of Computing and Information Science (context) - Skillicorn, parallel et al. - 1999
18
Efficient progressive sampling
- Provost, Jensen et al. - 1999
10
Effects of sample size in classifier design (context) - Fukunaga, Hayes - 1985
8
Evaluation of sampling for data mining of association rules
- Zaki, Parthasarathy et al. - 1996
6
Modeling decision tree performance with the power law (context) - Frey, Jr - 1999
5
Is sampling useful in data mining (context) - Lee, Cheung et al. - 1998
[Article contains additional citations not shown here]
Documents on the same site (http://www.qucis.queensu.ca/home/addala/publish.html):
Using Data Mining for Crop Genebank Management - Addala, al.
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC