See this document in CiteSeerX!

Sampling: An efficient, simple and robust technique for scaling up data mining  (Make Corrections)  
Raju Addala



  Home/Search   Context   Related

 
View or download:
qucis.queensu.ca/home/add...sampling.ps
Cached:  PDF   PS.gz  PS  Image  Update  Help

From:  qucis.queensu.ca/home/a...publish (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: One of the defining challenges for the KDD (Knowledge Discovery in Databases) research community is to enable inductive learning algorithms to mine very large data bases. We discuss various issues related to scaling up and provide an overview of the various methods to scale up data mining. Specifically, we discuss sampling as an efficient, simple and robust technique to scale up data mining. We also discuss on estimating the minimum sample size required for the data mining algorithm, and the... (Update)

Similar documents based on text:   More   All
0.3:   Using Data Mining for Crop Genebank Management - Addala, al.   (Correct)
0.2:   Scaling up and Evaluation - Esther Duflo Paper   (Correct)
0.0:   Vision-Aided Outdoor Navigation of an Autonomous.. - Southall, Hague.. (1999)   (Correct)

BibTeX entry:   (Update)

@misc{ addala-sampling,
  author = "Raju Addala",
  title = "Sampling: An efficient, simple and robust technique for scaling up data
    mining",
  url = "citeseer.ist.psu.edu/394238.html" }
Citations (may not include all citations):
203   What size net gives valid generalization (context) - Baum, Haussler - 1989
189   Sampling large databases for association rules - Toivonen - 1996
139   Knowledge discovery in databases: an overview - Frawley, Piatetsky-Shapiro et al. - 1991
35   A survey of methods for scaling up inductive algorithms - Provost, Kolluri - 1999
32   Static versus dynamic sampling for data mining - John, Langley - 1996
29   The power of sampling in knowledge discovery - Kivinen, Mannila - 1993
23   The effects of training set size on decision tree complexity - Oates, Jensen - 1997
22   Large datasets lead to overly complex models: an explanation.. - Oates, Jensen - 1998
21   A study of two sampling methods for analysing large datasets.. - Srinivasan - 1999
21   Department of Computing and Information Science (context) - Skillicorn, parallel et al. - 1999
18   Efficient progressive sampling - Provost, Jensen et al. - 1999
10   Effects of sample size in classifier design (context) - Fukunaga, Hayes - 1985
8   Evaluation of sampling for data mining of association rules - Zaki, Parthasarathy et al. - 1996
6   Modeling decision tree performance with the power law (context) - Frey, Jr - 1999
5   Is sampling useful in data mining (context) - Lee, Cheung et al. - 1998

[Article contains additional citations not shown here]

Documents on the same site (http://www.qucis.queensu.ca/home/addala/publish.html):
Using Data Mining for Crop Genebank Management - Addala, al.   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC