MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Abstract BIRCH: An Efficient Data Clustering Method for Very Large Databases

Download:
Download as a PDF
by Tian Zhang, Raghu Ramakrishnan, Miron Livny
http://www.lans.ece.utexas.edu/course/ee380l/1999fall/papers/list2/p103-zhang.pdf
Add To MetaCart

Abstract:

Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely st,udied problems in this area is the identification of clusters, or deusel y populated regions, in a multi-dir nensional clataset. Prior work does not adequately address the problem of large datasets and minimization of 1/0 costs. This paper presents a data clustering method named Bfll (;”H (Balanced Iterative Reducing and Clustering using Hierarchies), and demonstrates that it is especially suitable for very large databases. BIRCH incrementally and clynami-call y clusters incoming multi-dimensional metric data points to try to produce the best quality clustering with the avail-able resources (i. e., available memory and time constraints). BIRCH can typically find a goocl clustering with a single scan of the data, and improve the quality further with a few acl-ditioual scans. BIRCH is also the first clustering algorithm proposerl in the database area to handle “noise) ’ (data points that are not part of the underlying pattern) effectively. We evaluate BIRCH’S time/space efficiency, data input order sensitivity, and clustering quality through several experiments. We also present a performance comparisons of BIR (;’H versus CLARA NS, a clustering method proposed recently for large datasets, and S11OW that BIRCH is consistently superior. 1

Citations

1097 Vector Quantization and Signal Compression – Gersho, Gray - 1992
1 Auto Class : A Bayesian (Ylassijlcation SUstem – Peter - 1988
1 G’Imter-ing Methodologies in Ezplorator~ Data Anczlgsis Advances in (.~omputers, Edited by – Dubes, Jaiu
1 aud Xiaowei Xu, A Database Interface for Clustering – Ester, Kriegel - 1995
1 Knowl?dgr Discouery in Larg? Spatial f)atabas~s: Focusing Techniques for Eficie~~t ~lass Identijlcation – Ester, Kriegel, et al. - 1995
1 Knowledge Acqui~itzon uia lncr.mew f~l – Fisher - 1987
1 Iterative (optimization and Simp[ijication of Hierarchical Clusterings – Fisher
1 Finding Groups in Data - An I?Ltroductio?L toCluster Analysis – Kaufman, Rousseeuw - 1990
1 with Incremental (;oncept Formation – Lebowitz, Experiment - 1987
1 T.Lee, Clustering anrdgsis ar,d its application., A+ vauces in Information Systems ,Science, Edited by – C - 1981
1 A Survey of Recent Advance. in Hierarchical 6’lustering A~g Orithms, The (.;omputer Jourmal – Murtagb
1 Ejficimt and ,Eflectiue (blustering Methods for – Ng, Hau - 1994
1 L31son, Parallel Algorithms for Hierarchical C’[usteri?~g – Clark
1 Ragbu Rau)akt-ishuan, aud Mirou Liv,,y, BIRCH: An Ef%cient Data Clustering Mtthod for VPTV Largr Databases – g - 1995