• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Fast Approximate Spectral Clustering (2009)

Cached

  • Download as a PDF

Download Links

  • [stat-www.berkeley.edu]
  • [stat-ftp.berkeley.edu]
  • [www.stat.berkeley.edu]
  • [www.cs.berkeley.edu]
  • [www.cs.berkeley.edu]
  • [www2.berkeley.intel-research.net]
  • [www2.berkeley.intel-research.net]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Donghui Yan , Ling Huang , Michael I. Jordan
Citations:13 - 0 self
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Yan09fastapproximate,
    author = {Donghui Yan and Ling Huang and Michael I. Jordan},
    title = {Fast Approximate Spectral Clustering},
    year = {2009}
}

Bookmark

citeulike Connotea Bibsonomy Del.icio.us Digg Reddit

OpenURL

 

Abstract

Spectral clustering refers to a flexible class of clustering procedures that can produce high-quality clusterings on small data sets but which has limited applicability to large-scale problems due to its computational complexity of O(n 3), with n the number of data points. We extend the range of spectral clustering by developing a general framework for fast approximate spectral clustering in which a distortion-minimizing local transformation is first applied to the data. This framework is based on a theoretical analysis that provides a statistical characterization of the effect of local distortion on the mis-clustering rate. We develop two concrete instances of our general framework, one based on local k-means clustering (KASP) and one based on random projection trees (RASP). Extensive experiments show that these algorithms can achieve significant speedups with little degradation in clustering accuracy. Specifically, our algorithms outperform k-means by a large margin in terms of accuracy, and run several times faster than approximate spectral clustering based on the Nyström method, with comparable accuracy and significantly smaller memory footprint. Remarkably, our algorithms make it possible for a single machine to spectral cluster data sets with a million observations within several minutes. 1

Citations

1824 Normalized cuts and image segmentation - Shi, Malik - 1997
1117 Clustering Algorithms - Hartigan - 1975
912 Data clustering: A review - Jain, Murty, et al. - 1999
785 Random forests - Breiman - 2001
756 On spectral clusterings: Analysis and an algorithm - Ng, Jordan, et al. - 2002
634 An optimal algorithm for approximate nearest neighbor searching fixed dimensions - Arya, Mount, et al. - 1998
616 A fast and high quality multilevel scheme for partitioning irregular graphs - Karypis, Kumar - 1999
415 Approximation Theorems of Mathematical Statistics - Serfling - 1980
367 A multi-level algorithm for partitioning graphs - Hendrickson, Leland - 1995
288 Introduction to matrix computations - Stewart - 1973
256 Measurement Error Models - Fuller - 1987
207 Using the nyström method to speed up kernel machines - Williams, Seeger
203 Vetta A: On clustering: good, bad and spectral - Kannan, Vampala
184 Refining initial points for K-means Clustering - Bradley, Fayyad - 1998
170 A tutorial on spectral clustering - Luxburg
139 Sparse greedy matrix approximation for machine learning - Smola, Schölkopf - 2000
133 Efficient svm training using lowrank kernel representation - Fine, Scheinberg
129 An Efficient k-Means Clustering Algorithm: Analysis and Implementation - Kanungo, David, et al. - 2002
120 Multiclass spectral clustering - Yu, Shi - 2003
117 Spectral grouping using the Nyström method - Fowlkes, Belongie, et al. - 2004
109 Asymptotic Quantization Error of Continuous Signals and the Quantization Dimension - Zador - 1982
99 K-means++: The advantages of careful seeding - Arthur, Vassilvitskii - 2007
98 Approximate clustering via core-sets - Bădoiu, Har-Peled, et al. - 2002
62 An empirical comparison of four initialization methods for the k-means algorithm - Peña, Lozano, et al. - 1999
56 On the nyström method for approximating a gram matrix for improved kernel-based learning - Drineas, Mahoney
47 Weighted graph cuts without eigenvectors: a multilevel approach - Dhillon, Guan, et al.
31 A survey of sampling from contaminated distributions. Contributions to Probability and Statistics (in: I. Olkin et al., eds - Tukey - 1960
30 Learning spectral clustering, with application to speech separation - Bach, Jordan
29 Y.: Random projection trees and low dimensional manifolds - Dasgupta, Freund
26 The strong law of large numbers for U-statistics - Hoeffding - 1961
15 Learning segmentation with random walk - Meila, Shi - 2001
14 Likelihood-based data squashing: a modeling approach to instance construction - Madigan, Raghavan, et al.
12 Spectral Clustering with Perturbed Data - Huang, Yan, et al. - 2008
7 Density-based multiscale data condensation - Mitra, Murthy, et al.
6 A simple linear time (1 + ǫ)-approximation algorithms for k-means clustering in any dimensions - Kumar, Sabharwal, et al. - 2004
4 Hierarchical initialization approach for K-Means clustering - Lu, Tang, et al. - 2008
3 Fast iterative kernel principal component analysis - GUNTER, SCHRAUDOLPH, et al. - 2007
2 A method for initialising the k-means clustering algorithm using kd-trees - Redmond, Heneghen
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University