(Enter summary)
Abstract: Practical clustering algorithms require multiple data scans to
achieve convergence. For large databases, these scans become
prohibitively expensive. We present a scalable clustering
framework applicable to a wide class of iterative clustering.
We require at most one scan of the database. In this work, the
framework is instantiated and numerically justified with the
popular K-Means clustering algorithm. The method is based
on identifying regions of the data that are compressible,
regions that... (Update)
Cited by: More
Mining Evolving Web Clickstreams with Explicit Retrieval .. - Nasraoui, Cardona, Rojas (2004)
(Correct)
Using Context to Assist in Personal File Retrieval - Soules (2006)
(Correct)
TECNO-STREAMS: Tracking Evolving Clusters in Noisy.. - Nasraoui, Uribe.. (2003)
(Correct)
Similar documents (at the sentence level):
7.6%: Refining Initial Points for K-Means Clustering - Bradley, Fayyad (1998)
(Correct)
6.0%: Initialization of Iterative Refinement Clustering Algorithms - Fayyad, Reina, Bradley (1998)
(Correct)
Active bibliography (related documents): More All
1.4: Scaling EM (Expectation-Maximization) Clustering to Large.. - Bradley, Fayyad, Reina (1999)
(Correct)
0.5: Mathematical Programming for Data Mining: Formulations.. - Bradley, Fayyad.. (1998)
(Correct)
0.3: Knowledge Discovery From Distributed And Textual Data - Cho (1999)
(Correct)
Similar documents based on text: More All
0.3: Compressed Data Cubes for OLAPAggregate Query.. - Jayavel.. (1988)
(Correct)
0.3: Taming the Giants and the Monsters: Mining Large Databases for.. - Fayyad (1998)
(Correct)
0.2: KDD for Science Data Analysis: Issues and Examples - Fayyad, Haussler, Stolorz (1996)
(Correct)
Related documents from co-citation: More All
36: Efficient and Effective Clustering Methods for Spatial Data Mining
- Ng, Han - 1994
35: BIRCH: An Efficient Data Clustering Method for Very Large Databases
- Zhang, Ramakrishnan et al. - 1996
33: Automatic subspace clustering of high dimensional data for data mining applicati..
- Agrawal, Gehrke et al. - 1998
BibTeX entry: (Update)
P. S. Bradley, U. Fayyad, and C. Reina, "Scaling Clustering Algorithms to Large Databases", To appear, Proc. 4 th International Conf. on Knowledge Discovery and Data Mining (KDD-98). AAAI Press, Aug. 1998. http://citeseer.ist.psu.edu/bradley98scaling.html More
@inproceedings{ bradley98scaling,
author = "Paul S. Bradley and Usama M. Fayyad and Cory Reina",
title = "Scaling Clustering Algorithms to Large Databases",
booktitle = "Knowledge Discovery and Data Mining",
pages = "9-15",
year = "1998",
url = "citeseer.ist.psu.edu/bradley98scaling.html" }
Citations (may not include all citations):
2133
Pattern Classification and Scene Analysis (context) - Duda, Hart - 1973
1662
Neural Networks for Pattern Recognition (context) - Bishop - 1995
897
Introduction to Statistical Pattern Recognition (context) - Fukunaga - 1990
512
Density Estimation for Statistics and Data Analysis (context) - Silverman - 1986
474
Advances in Knowledge Discovery and Data Mining (context) - Fayyad, Piatetsky-Shapiro et al. - 1996
349
Knowledge Acquisition via Incremental Conceptual Clustering (context) - Fisher - 1987
283
Some methods for classification and analysis of multivariate..
- MacQueen - 1967
242
Efficient and effective clustering methods for spatial data ..
- Ng, Han - 1994
149
Multivariate Density Estimation (context) - Scott - 1992
118
Model-based gaussian and non-Gaussian Clustering (context) - Banfield, Raftery - 1993
111
Scaling Clustering Algorithms to Large Databases
- Bradley, Fayyad et al. - 1998
77
Finding Groups in Data (context) - Kaufman, Rousseeuw - 1989
70
Clustering Algorithms (context) - Rasmussen - 1992
66
Multivariate Observations (context) - Seber - 1984
50
Refining Initial Points for KMeans Clustering
- Bradley, Fayyad - 1998
46
A database interface for clustering in large spatial databas.. (context) - Ester, Kriegel et al. - 1995
42
BIRCH: A new data clustering algorithm and its applications
- Zhang, Ramakrishnan et al. - 1997
36
Cluster analysis of multivariate data: Efficiency vs. interp.. (context) - Forgy - 1965
30
A statistical perspective on knowledge discovery in database..
- Pregibon, Elder - 1996
26
Clustering via Concave Minimization
- Bradley, Mangasarian et al. - 1997
26
Statistical Themes and Lessons for Data Mining
- Glymour, Madigan et al. - 1997
25
Bayesian Classification (AutoClass): Theory and Results
- Cheeseman, Stutz - 1996
18
K-Means-Type Algorithms: A Generalized Convergence Theorem a.. (context) - Selim, Ismail - 1984
12
Mining Science Data (context) - Fayyad, Haussler et al. - 1996
7
Industrial Applications of Data Mining and Knowledge Discove.. (context) - Brachman, Khabaza et al. - 1996
6
An experimental comparison of several clustering methods (context) - Meila, Heckerman - 1998
6
Maximum Likelihood from Incomplete Data via theEM algorithm (context) - Dempster, Laird et al. - 1977
4
Application of Classification and Clustering to Sky Survey C.. (context) - Fayyad, Djorgovski et al. - 1997
3
Refining Initialization of Clustering Algorithms (context) - Fayyad, Reina et al. - 1998
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.ece.nwu.edu/~harsha/Clustering/clus.html): More
An Experimental Comparison of Several Clustering and.. - Meila, Heckerman (1998)
(Correct)
Parallel Algorithms for Hierarchical Clustering - Olson (1995)
(Correct)
Wavelet-Based Clustering for Very Large.. - Sheikholeslami, Yu, .. (1998)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC