See this document in CiteSeerX!

Scaling Clustering Algorithms to Large Databases (1998)  (Make Corrections)  (111 citations)
P.S. Bradley, Usama Fayyad, Cory Reina
Knowledge Discovery and Data Mining



  Home/Search   Context   Related

 
View or download:
nwu.edu/~harsha/Clustering...scaleKM.ps
toronto.edu/~miller/csc2525...bfr98.pdf
microsoft.com/pub/tr/tr9837.ps
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  nwu.edu/~harsha/Clustering...clus (more)
From:  microsoft.com/scripts/pub...trpub
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: Practical clustering algorithms require multiple data scans to achieve convergence. For large databases, these scans become prohibitively expensive. We present a scalable clustering framework applicable to a wide class of iterative clustering. We require at most one scan of the database. In this work, the framework is instantiated and numerically justified with the popular K-Means clustering algorithm. The method is based on identifying regions of the data that are compressible, regions that... (Update)

Cited by:   More
Mining Evolving Web Clickstreams with Explicit Retrieval .. - Nasraoui, Cardona, Rojas (2004)   (Correct)
Using Context to Assist in Personal File Retrieval - Soules (2006)   (Correct)
TECNO-STREAMS: Tracking Evolving Clusters in Noisy.. - Nasraoui, Uribe.. (2003)   (Correct)

Similar documents (at the sentence level):
7.6%:   Refining Initial Points for K-Means Clustering - Bradley, Fayyad (1998)   (Correct)
6.0%:   Initialization of Iterative Refinement Clustering Algorithms - Fayyad, Reina, Bradley (1998)   (Correct)

Active bibliography (related documents):   More   All
1.4:   Scaling EM (Expectation-Maximization) Clustering to Large.. - Bradley, Fayyad, Reina (1999)   (Correct)
0.5:   Mathematical Programming for Data Mining: Formulations.. - Bradley, Fayyad.. (1998)   (Correct)
0.3:   Knowledge Discovery From Distributed And Textual Data - Cho (1999)   (Correct)

Similar documents based on text:   More   All
0.3:   Compressed Data Cubes for OLAPAggregate Query.. - Jayavel.. (1988)   (Correct)
0.3:   Taming the Giants and the Monsters: Mining Large Databases for.. - Fayyad (1998)   (Correct)
0.2:   KDD for Science Data Analysis: Issues and Examples - Fayyad, Haussler, Stolorz (1996)   (Correct)

Related documents from co-citation:   More   All
36:   Efficient and Effective Clustering Methods for Spatial Data Mining - Ng, Han - 1994
35:   BIRCH: An Efficient Data Clustering Method for Very Large Databases - Zhang, Ramakrishnan et al. - 1996
33:   Automatic subspace clustering of high dimensional data for data mining applicati.. - Agrawal, Gehrke et al. - 1998

BibTeX entry:   (Update)

P. S. Bradley, U. Fayyad, and C. Reina, "Scaling Clustering Algorithms to Large Databases", To appear, Proc. 4 th International Conf. on Knowledge Discovery and Data Mining (KDD-98). AAAI Press, Aug. 1998. http://citeseer.ist.psu.edu/bradley98scaling.html   More

@inproceedings{ bradley98scaling,
    author = "Paul S. Bradley and Usama M. Fayyad and Cory Reina",
    title = "Scaling Clustering Algorithms to Large Databases",
    booktitle = "Knowledge Discovery and Data Mining",
    pages = "9-15",
    year = "1998",
    url = "citeseer.ist.psu.edu/bradley98scaling.html" }
Citations (may not include all citations):
2133   Pattern Classification and Scene Analysis (context) - Duda, Hart - 1973
1662   Neural Networks for Pattern Recognition (context) - Bishop - 1995
897   Introduction to Statistical Pattern Recognition (context) - Fukunaga - 1990
512   Density Estimation for Statistics and Data Analysis (context) - Silverman - 1986
474   Advances in Knowledge Discovery and Data Mining (context) - Fayyad, Piatetsky-Shapiro et al. - 1996
349   Knowledge Acquisition via Incremental Conceptual Clustering (context) - Fisher - 1987
283   Some methods for classification and analysis of multivariate.. - MacQueen - 1967
242   Efficient and effective clustering methods for spatial data .. - Ng, Han - 1994
149   Multivariate Density Estimation (context) - Scott - 1992
118   Model-based gaussian and non-Gaussian Clustering (context) - Banfield, Raftery - 1993
111   Scaling Clustering Algorithms to Large Databases - Bradley, Fayyad et al. - 1998
77   Finding Groups in Data (context) - Kaufman, Rousseeuw - 1989
70   Clustering Algorithms (context) - Rasmussen - 1992
66   Multivariate Observations (context) - Seber - 1984
50   Refining Initial Points for KMeans Clustering - Bradley, Fayyad - 1998
46   A database interface for clustering in large spatial databas.. (context) - Ester, Kriegel et al. - 1995
42   BIRCH: A new data clustering algorithm and its applications - Zhang, Ramakrishnan et al. - 1997
36   Cluster analysis of multivariate data: Efficiency vs. interp.. (context) - Forgy - 1965
30   A statistical perspective on knowledge discovery in database.. - Pregibon, Elder - 1996
26   Clustering via Concave Minimization - Bradley, Mangasarian et al. - 1997
26   Statistical Themes and Lessons for Data Mining - Glymour, Madigan et al. - 1997
25   Bayesian Classification (AutoClass): Theory and Results - Cheeseman, Stutz - 1996
18   K-Means-Type Algorithms: A Generalized Convergence Theorem a.. (context) - Selim, Ismail - 1984
12   Mining Science Data (context) - Fayyad, Haussler et al. - 1996
7   Industrial Applications of Data Mining and Knowledge Discove.. (context) - Brachman, Khabaza et al. - 1996
6   An experimental comparison of several clustering methods (context) - Meila, Heckerman - 1998
6   Maximum Likelihood from Incomplete Data via theEM algorithm (context) - Dempster, Laird et al. - 1977
4   Application of Classification and Clustering to Sky Survey C.. (context) - Fayyad, Djorgovski et al. - 1997
3   Refining Initialization of Clustering Algorithms (context) - Fayyad, Reina et al. - 1998



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://www.ece.nwu.edu/~harsha/Clustering/clus.html):   More
An Experimental Comparison of Several Clustering and.. - Meila, Heckerman (1998)   (Correct)
Parallel Algorithms for Hierarchical Clustering - Olson (1995)   (Correct)
Wavelet-Based Clustering for Very Large.. - Sheikholeslami, Yu, .. (1998)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC