See this document in CiteSeerX!

Sampling: An efficient, simple and robust technique for scaling up data mining  (Make Corrections)  
Raju Addala



  Home/Search   Context   Related

 
View or download:
qucis.queensu.ca/home/add...sampling.ps
Cached:  PDF   PS.gz  PS  Image  Update  Help

From:  qucis.queensu.ca/home/a...publish (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: One of the defining challenges for the KDD (Knowledge Discovery in Databases) research community is to enable inductive learning algorithms to mine very large data bases. We discuss various issues related to scaling up and provide an overview of the various methods to scale up data mining. Specifically, we discuss sampling as an efficient, simple and robust technique to scale up data mining. We also discuss on estimating the minimum sample size required for the data mining algorithm, and the... (Update)

Similar documents (at the sentence level):
6.5%:   Using Data Mining for Crop Genebank Management - Addala, al.   (Correct)

Active bibliography (related documents):   More   All
0.5:   Knowledge Refinement to Debug and Maintain a Tablet.. - Craw, Boswell, Rowe (1997)   (Correct)
0.3:   A Survey of Methods for Scaling Up Inductive Algorithms - Provost, Kolluri (1999)   (Correct)
0.3:   Efficient Progressive Sampling - Provost, Jensen, Oates (1999)   (Correct)

System load high. Please wait...
Timeout. Please try your query later.
Similar documents based on text:   More   All
0.2:   Scaling up and Evaluation - Esther Duflo Paper   (Correct)
0.0:   Vision-Aided Outdoor Navigation of an Autonomous.. - Southall, Hague.. (1999)   (Correct)
0.0:   A Stochastic Dominance Analysis Of Alternative.. - Gebremedhin..   (Correct)

BibTeX entry:   (Update)

@misc{ addala-sampling,
  author = "Raju Addala",
  title = "Sampling: An efficient, simple and robust technique for scaling up data
    mining",
  url = "citeseer.ist.psu.edu/394238.html" }
Citations (may not include all citations):
203   What size net gives valid generalization (context) - Baum, Haussler - 1989
189   Sampling large databases for association rules - Toivonen - 1996
139   Knowledge discovery in databases: an overview - Frawley, Piatetsky-Shapiro et al. - 1991
35   A survey of methods for scaling up inductive algorithms - Provost, Kolluri - 1999
32   Static versus dynamic sampling for data mining - John, Langley - 1996
29   The power of sampling in knowledge discovery - Kivinen, Mannila - 1993
23   The effects of training set size on decision tree complexity - Oates, Jensen - 1997
22   Large datasets lead to overly complex models: an explanation.. - Oates, Jensen - 1998
21   A study of two sampling methods for analysing large datasets.. - Srinivasan - 1999
21   Department of Computing and Information Science (context) - Skillicorn, parallel et al. - 1999
18   Efficient progressive sampling - Provost, Jensen et al. - 1999
10   Effects of sample size in classifier design (context) - Fukunaga, Hayes - 1985
8   Evaluation of sampling for data mining of association rules - Zaki, Parthasarathy et al. - 1996
6   Modeling decision tree performance with the power law (context) - Frey, Jr - 1999
5   Is sampling useful in data mining (context) - Lee, Cheung et al. - 1998
4   Mining very large Databases with parallel processing (context) - Freitas, Lavington - 1998
2   Neutral genetic markers and conservation genetics: Simulated.. (context) - Bataillon, David et al. - 1996
2   Core collections: A practical approach to genetic resources .. (context) - Brown - 1989
2   The case for core collections (context) - Brown - 1989
2   Genetic perspectives of germplasm collections (context) - Frankel - 1984
2   Maximizing genetic diversity in core collections of wild rel.. (context) - Schoen, Brown - 1995
2   The selection of training cases for automated knowledge refi.. (context) - Palmer, Craw - 1997
2   outcrossing rate and core collection formation in lentil ger.. (context) - Erskine, Muelbauer et al. - 1991
2   Choosing rice germplasm for evaluation (context) - Vaughan - 1991
2   Methods of developing core collection of annual medicago spe.. (context) - Diwan, McIntosh et al. - 1995
1   Decision theortic subsampling for induction on large databas.. (context) - Musick, Catlett et al. - 1993
1   Evaluation of five strategies for obtaining a core subset fr.. (context) - Zeuli - 1993
1   It's sink or swim as a tide wave of data approaches (context) - Reichhardt - 1999
1   Artificial Intelligence Research (context) - Furnkranz, windowing - 1998

Documents on the same site (http://www.qucis.queensu.ca/home/addala/publish.html):
Using Data Mining for Crop Genebank Management - Addala, al.   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC