• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Approximate Clustering without the Approximation

Cached

  • Download as a PDF

Download Links

  • [www.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www-2.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www-2.cs.cmu.edu]
  • [www.cc.gatech.edu]
  • [www.lb.cs.cmu.edu]
  • [www-cgi.cs.cmu.edu]
  • [www-cgi.cs.cmu.edu.]
  • [www-cgi.cs.cmu.edu]
  • [www-cgi.cs.cmu.edu]
  • [www-cgi.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www-cgi.cs.cmu.edu]
  • [www-2.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www-2.cs.cmu.edu]
  • [www-cgi.cs.cmu.edu.]
  • [www-cgi.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www.lb.cs.cmu.edu]
  • [www-cgi.cs.cmu.edu.]
  • [www-cgi.cs.cmu.edu]
  • [www.lb.cs.cmu.edu]
  • [www.siam.org]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Maria-florina Balcan , Avrim Blum , Anupam Gupta
Citations:22 - 14 self
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Balcan_approximateclustering,
    author = {Maria-florina Balcan and Avrim Blum and Anupam Gupta},
    title = {Approximate Clustering without the Approximation},
    year = {}
}

Years of Citing Articles

Bookmark

citeulike Connotea Bibsonomy Del.icio.us Digg Reddit

OpenURL

 

Abstract

Approximation algorithms for clustering points in metric spaces is a flourishing area of research, with much research effort spent on getting a better understanding of the approximation guarantees possible for many objective functions such as k-median, k-means, and min-sum clustering. This quest for better approximation algorithms is further fueled by the implicit hope that these better approximations also give us more accurate clusterings. E.g., for many problems such as clustering proteins by function, or clustering images by subject, there is some unknown “correct” target clustering and the implicit hope is that approximately optimizing these objective functions will in fact produce a clustering that is close (in symmetric difference) to the truth. In this paper, we show that if we make this implicit assumption explicit—that is, if we assume that any c-approximation to the given clustering objective F is ǫ-close to the target—then we can produce clusterings that are O(ǫ)-close to the target, even for values c for which obtaining a c-approximation is NP-hard. In particular, for k-median and k-means objectives, we show that we can achieve this guarantee for any constant c> 1, and for min-sum objective we can do this for any constant c> 2. Our results also highlight a somewhat surprising conceptual difference between assuming that the optimal solution to, say, the k-median objective is ǫ-close to the target, and assuming that any approximately optimal solution is ǫ-close to the target, even for approximation factor say c = 1.01. In the former case, the problem of finding a solution that is O(ǫ)-close to the target remains computationally hard, and yet for the latter we have an efficient algorithm.

Citations

1484 D: Pattern Classification - Duda, Hart, et al.
801 A probabilistic theory of pattern recognition - Devroye, Györfi, et al. - 1996
703 SCOP: a Structural Classification Of Proteins database for the investigation of sequences and structures - Murzin, Brenner, et al. - 1995
558 Least squares quantization in PCM - Lloyd - 1982
277 Approximation algorithms for metric facility location and k-median problems using the primaldual schema and lagrangian relaxation - Jain, Vazirani - 2001
187 Improved combinatorial algorithms for facility location problems - Charikar, Guha
179 Local search heuristics for k-median and facility location problems - Arya, Garg, et al.
175 Expander flows, geometric embeddings and graph partitioning - Arora, Rao, et al. - 2004
168 A constant-factor approximation algorithm for the k-median problem - Charikar, Guha, et al.
138 Learning Mixtures of Gaussians - Dasgupta - 1999
103 Approximation schemes for euclidean k-medians and related problems - Arora, Raghavan, et al. - 1998
94 A new greedy approach for facility location problems - Jain, Mahdian, et al.
75 The Pfam protein families database - Finn - 2008
68 Sublinear time algorithms for metric space problems - Indyk - 1999
68 An impossibility theorem for clustering - Kleinberg
61 A threshold of lnn for approximating set cover - Feige - 1998
59 Comparing clusterings by the variation of information - Meila - 2003
58 Learning mixtures of arbitrary gaussians - Arora, Kannan - 2005
51 Testing of clustering - Alon, Dar, et al. - 2000
51 Comparing Clustering – An axiomatic view - Meila - 2005
45 Approximating min-sum k-clustering in metric spaces - Bartal, Charikar, et al. - 2001
44 Yuval Rabani. Approximation schemes for clustering problems - Vega, Karpinski, et al. - 2003
42 Evolutionary clustering - CHAKRABARTI, KUMAR, et al. - 2006
38 Sublinear time approximate clustering - Mishra, Oblinger, et al. - 2001
36 On spectral learning of mixtures of distributions - Achlioptas, McSherry
32 The spectral method for general mixture models - Kannan, Salmasian, et al. - 2005
32 The effectiveness of lloyd-type methods for the k-means problem - Ostrovsky, Rabani, et al.
32 Clustering for edge-cost minimization - Schulman - 2000
20 A spectral algorithm for learning mixture models - Vempala, Wang - 2004
19 A framework for statistical clustering with constant time approximation algorithms for k-median and k-means - Ben-David - 2007
19 A sober look at clustering stability - Ben-David, Luxburg, et al. - 2006
16 k-means projective clustering - Agarwal, Mustafa
9 Finding low error clusterings - Balcan, Braverman
8 Stability of k-means clustering - Ben-David, Pál, et al. - 2007
8 Sublinear-time approximation for clustering via random sampling - Czumaj, Sohler - 2004
7 Stability of k-means clustering - Rakhlin, Caponnetto - 2007
7 Are stable instances easy - Bilu, Linial
6 A simple linear time (1 + ǫ)-approximation algorithms for k-means clustering in any dimensions - Kumar, Sabharwal, et al. - 2004
5 Which data sets are ‘clusterable’? - a theoretical study of clusterability - Ackerman, Ben-David - 2008
5 Greedy strikes back: Improved algorithms for facility location - Guha, Khuller - 1999
4 Worst-case and smoothed analyses of the ICP algorithm, with an application to the k-means method - Arthur, Vassilvitskii - 2006
4 A discrimantive framework for clustering via similarity functions - Balcan, Blum, et al. - 2008
4 Stability yields a ptas for k-median and k-means clustering - Awasthi, Blum, et al.
4 Agnostic clustering - Balcan, Roeglin, et al. - 2009
4 Small space representations for metric min-sum k-clustering and their applications - Czumaj, Sohler - 2007
3 Clustering for edge-cost minimization (extended abstract - Schulman - 2000
3 Center-based clustering under perturbation stability - Awasthi, Blum, et al. - 2012
3 Clustering with spectral norm and the k-means algorithm - Kumar, Kannan
2 The uniqueness of a good clustering for k-means - Meila
2 Better guarantees for sparsest cut clustering - Balcan
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University