Results 1 - 10
of
29
A local search approximation algorithm for k-means clustering
, 2004
"... In k-means clustering we are given a set of n data points in d-dimensional space ℜd and an integer k, and the problem is to determine a set of k points in ℜd, called centers, to minimize the mean squared distance from each data point to its nearest center. No exact polynomial-time algorithms are kno ..."
Abstract
-
Cited by 113 (1 self)
- Add to MetaCart
(Show Context)
In k-means clustering we are given a set of n data points in d-dimensional space ℜd and an integer k, and the problem is to determine a set of k points in ℜd, called centers, to minimize the mean squared distance from each data point to its nearest center. No exact polynomial-time algorithms are known for this problem. Although asymptotically efficient approximation algorithms exist, these algorithms are not practical due to the very high constant factors involved. There are many heuristics that are used in practice, but we know of no bounds on their performance. We consider the question of whether there exists a simple and practical approximation algorithm for k-means clustering. We present a local improvement heuristic based on swapping centers in and out. We prove that this yields a (9 + ε)-approximation algorithm. We present an example showing that any approach based on performing a fixed number of swaps achieves an approximation factor of at least (9 − ε) in all sufficiently high dimensions. Thus, our approximation factor is almost tight for algorithms based on performing a fixed number of swaps. To establish the practical value of the heuristic, we present an empirical study that shows that, when combined with
Applications of weighted voronoi diagrams and randomization to variance-based k-clustering (Extended Abstract)
"... In this paper we consider the k-clustering problem for a set S of n points pi = (xi) in the d-dimensional space with variance-based errors as clustering criteria, motivated from the color quantization problem of computing a color lookup table for frame buffer display. As the inter-cluster criterion ..."
Abstract
-
Cited by 106 (4 self)
- Add to MetaCart
In this paper we consider the k-clustering problem for a set S of n points pi = (xi) in the d-dimensional space with variance-based errors as clustering criteria, motivated from the color quantization problem of computing a color lookup table for frame buffer display. As the inter-cluster criterion to minimize, the sum of intracluster errors over every cluster is used, and as the intracluster criterion of a cluster Sj, jSjjff\Gamma 1 X pi2Sj kxi \Gamma _x(Sj)k2 is considered, where k \Delta k is the L2 norm and _x(Sj) is the centroid of points in Sj, i.e., (1=jSjj) Ppi2Sj xi. The cases of ff = 1; 2 correspond to the sum of squared errors and the all-pairs sum of squared errors, respectively. The k-clustering problem under the criterion with ff = 1; 2 are treated in a unified manner by characterizing the optimum solution to the k-clustering problem by the ordinary Euclidean Voronoi diagram and the weighted Voronoi diagram with both multiplicative and additive weights. With this framework, the problem is related to the generalized primary shutter function for the Voronoi diagrams. The primary shutter function is shown to be O(nO(kd)), which implies that, for fixed k, this clustering problem can be solved in a polynomial time. For the problem with the most typical intra-cluster criterion of the sum of squared errors, we also present an efficient randomized algorithm which, roughly speaking, finds an ffl-approximate 2-clustering in O(n(1=ffl)d) time, which is quite practical and may be used to real large-scale problems such as the color quantization problem.
Cluster analysis and mathematical programming
- MATHEMATICAL PROGRAMMING
, 1997
"... ..."
(Show Context)
Optimal Energy Aware Clustering in Sensor Networks
, 2002
"... Sensor networks is among the fastest growing technologies that have the potential of changing our lives drastically. These collaborative, dynamic and distributed computing and communicating systems will be self organizing. They will have capabilities of distributing a task among themselves for effic ..."
Abstract
-
Cited by 39 (1 self)
- Add to MetaCart
Sensor networks is among the fastest growing technologies that have the potential of changing our lives drastically. These collaborative, dynamic and distributed computing and communicating systems will be self organizing. They will have capabilities of distributing a task among themselves for efficient computation. There are many challenges in implementation of such systems: energy dissipation and clustering being one of them. In order to maintain a certain degree of service quality and a reasonable system lifetime, energy needs to be optimized at every stage of system operation. Sensor node clustering is another very important optimization problem. Nodes that are clustered together will easily be able to communicate with each other. Considering energy as an optimization parameter while clustering is imperative. In this paper we study the theoretical aspects of the clustering problem in sensor networks with application to energy optimization. We illustrate an optimal algorithm for clustering the sensor nodes such that each cluster (which has a master) is balanced and the total distance between sensor nodes and master nodes is minimized. Balancing the clusters is needed for evenly distributing the load on all master nodes. Minimizing the total distance helps in reducing the communication overhead and hence the energy dissipation. This problem (which we call balanced k-clustering) is modeled as a mincost flow problem which can be solved optimally using existing techniques.
A Randomized Approximation Scheme for Metric MAX-CUT
"... Metric MAX-CUT is the problem of dividing a set of points in metric space into two parts so as to maximize the sum of the distances between points belonging to distinct parts. We show that metric MAX-CUT has a polynomial time randomized approximation scheme. ..."
Abstract
-
Cited by 31 (5 self)
- Add to MetaCart
Metric MAX-CUT is the problem of dividing a set of points in metric space into two parts so as to maximize the sum of the distances between points belonging to distinct parts. We show that metric MAX-CUT has a polynomial time randomized approximation scheme.
On the Number of Crossing-Free Matchings, Cycles, and Partitions
, 2006
"... We show that a set of n points in the plane has at most O(10.05n) perfect matchings with crossing-free straight-line embedding. The expected number of perfect crossing-free matchings of a set of n points drawn i.i.d. from an arbitrary distribution in the plane is at most ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
We show that a set of n points in the plane has at most O(10.05n) perfect matchings with crossing-free straight-line embedding. The expected number of perfect crossing-free matchings of a set of n points drawn i.i.d. from an arbitrary distribution in the plane is at most
On Approximate Geometric K-Clustering
, 1999
"... For a partition of an n-point set X ae R d into k subsets (clusters) S 1 ; S 2 ; : : : ; S k , we consider the cost function P k i=1 P x2S i kx \Gamma c(S i )k 2 , where c(S i ) denotes the center of gravity of S i . For k = 2 and for any fixed d and " ? 0, we present a deterministic alg ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
For a partition of an n-point set X ae R d into k subsets (clusters) S 1 ; S 2 ; : : : ; S k , we consider the cost function P k i=1 P x2S i kx \Gamma c(S i )k 2 , where c(S i ) denotes the center of gravity of S i . For k = 2 and for any fixed d and " ? 0, we present a deterministic algorithm that finds a 2-clustering with cost no worse than (1 + ")- times the minimum cost in time O(n log n); the constant of proportionality depends polynomially on ". For an arbitrary fixed k, we get an O(n log k n) algorithm for a fixed ", again with a polynomial dependence on ". 1 Introduction We consider a geometric k-clustering problem: given an n-point set X ae R d and a natural number k 2, find a partition (clustering) \Pi = (S 1 ; S 2 ; : : : ; S k ) of X into k disjoint nonempty subsets that minimizes a suitable cost function among all k-clusterings of X. The cost function should show how tightly each S i is "packed together" and how well the different S i are separated from eac...
The Analysis of a Simple k-Means Clustering Algorithm
, 2000
"... K-means clustering is a very popular clustering technique, which is used in numerous applications. Given a set of n data points in R d and an integer k, the problem is to determine a set of k points R d , called centers, so as to minimize the mean squared distance from each data point to its nea ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
K-means clustering is a very popular clustering technique, which is used in numerous applications. Given a set of n data points in R d and an integer k, the problem is to determine a set of k points R d , called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper we present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is very easy to implement. It differs from most other approaches in that it precomputes a kd-tree data structure for the data points rather than the center points. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time. Second, we have implemented the algorithm and performed a number of empirical studies, both on synthetically generated data and on real data from...
Geometric clustering to minimize the sum of cluster sizes
- In Proc. 13th European Symp. Algorithms, Vol 3669 of LNCS
, 2005
"... Abstract. We study geometric versions of the min-size k-clustering problem, a clustering problem which generalizes clustering to minimize the sum of cluster radii and has important applications. We prove that the problem can be solved in polynomial time when the points to be clustered are located on ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
(Show Context)
Abstract. We study geometric versions of the min-size k-clustering problem, a clustering problem which generalizes clustering to minimize the sum of cluster radii and has important applications. We prove that the problem can be solved in polynomial time when the points to be clustered are located on a line. For Euclidean spaces of higher dimensions, we show that the problem is NP-hard and present polynomial time approximation schemes. The latter result yields an improved approximation algorithm for the related problem of k-clustering to minimize the sum of cluster diameters. 1