Results 1  10
of
29
A local search approximation algorithm for kmeans clustering
, 2004
"... In kmeans clustering we are given a set of n data points in ddimensional space ℜd and an integer k, and the problem is to determine a set of k points in ℜd, called centers, to minimize the mean squared distance from each data point to its nearest center. No exact polynomialtime algorithms are kno ..."
Abstract

Cited by 113 (1 self)
 Add to MetaCart
(Show Context)
In kmeans clustering we are given a set of n data points in ddimensional space ℜd and an integer k, and the problem is to determine a set of k points in ℜd, called centers, to minimize the mean squared distance from each data point to its nearest center. No exact polynomialtime algorithms are known for this problem. Although asymptotically efficient approximation algorithms exist, these algorithms are not practical due to the very high constant factors involved. There are many heuristics that are used in practice, but we know of no bounds on their performance. We consider the question of whether there exists a simple and practical approximation algorithm for kmeans clustering. We present a local improvement heuristic based on swapping centers in and out. We prove that this yields a (9 + ε)approximation algorithm. We present an example showing that any approach based on performing a fixed number of swaps achieves an approximation factor of at least (9 − ε) in all sufficiently high dimensions. Thus, our approximation factor is almost tight for algorithms based on performing a fixed number of swaps. To establish the practical value of the heuristic, we present an empirical study that shows that, when combined with
Applications of weighted voronoi diagrams and randomization to variancebased kclustering (Extended Abstract)
"... In this paper we consider the kclustering problem for a set S of n points pi = (xi) in the ddimensional space with variancebased errors as clustering criteria, motivated from the color quantization problem of computing a color lookup table for frame buffer display. As the intercluster criterion ..."
Abstract

Cited by 106 (4 self)
 Add to MetaCart
In this paper we consider the kclustering problem for a set S of n points pi = (xi) in the ddimensional space with variancebased errors as clustering criteria, motivated from the color quantization problem of computing a color lookup table for frame buffer display. As the intercluster criterion to minimize, the sum of intracluster errors over every cluster is used, and as the intracluster criterion of a cluster Sj, jSjjff\Gamma 1 X pi2Sj kxi \Gamma _x(Sj)k2 is considered, where k \Delta k is the L2 norm and _x(Sj) is the centroid of points in Sj, i.e., (1=jSjj) Ppi2Sj xi. The cases of ff = 1; 2 correspond to the sum of squared errors and the allpairs sum of squared errors, respectively. The kclustering problem under the criterion with ff = 1; 2 are treated in a unified manner by characterizing the optimum solution to the kclustering problem by the ordinary Euclidean Voronoi diagram and the weighted Voronoi diagram with both multiplicative and additive weights. With this framework, the problem is related to the generalized primary shutter function for the Voronoi diagrams. The primary shutter function is shown to be O(nO(kd)), which implies that, for fixed k, this clustering problem can be solved in a polynomial time. For the problem with the most typical intracluster criterion of the sum of squared errors, we also present an efficient randomized algorithm which, roughly speaking, finds an fflapproximate 2clustering in O(n(1=ffl)d) time, which is quite practical and may be used to real largescale problems such as the color quantization problem.
Cluster analysis and mathematical programming
 MATHEMATICAL PROGRAMMING
, 1997
"... ..."
(Show Context)
Optimal Energy Aware Clustering in Sensor Networks
, 2002
"... Sensor networks is among the fastest growing technologies that have the potential of changing our lives drastically. These collaborative, dynamic and distributed computing and communicating systems will be self organizing. They will have capabilities of distributing a task among themselves for effic ..."
Abstract

Cited by 39 (1 self)
 Add to MetaCart
Sensor networks is among the fastest growing technologies that have the potential of changing our lives drastically. These collaborative, dynamic and distributed computing and communicating systems will be self organizing. They will have capabilities of distributing a task among themselves for efficient computation. There are many challenges in implementation of such systems: energy dissipation and clustering being one of them. In order to maintain a certain degree of service quality and a reasonable system lifetime, energy needs to be optimized at every stage of system operation. Sensor node clustering is another very important optimization problem. Nodes that are clustered together will easily be able to communicate with each other. Considering energy as an optimization parameter while clustering is imperative. In this paper we study the theoretical aspects of the clustering problem in sensor networks with application to energy optimization. We illustrate an optimal algorithm for clustering the sensor nodes such that each cluster (which has a master) is balanced and the total distance between sensor nodes and master nodes is minimized. Balancing the clusters is needed for evenly distributing the load on all master nodes. Minimizing the total distance helps in reducing the communication overhead and hence the energy dissipation. This problem (which we call balanced kclustering) is modeled as a mincost flow problem which can be solved optimally using existing techniques.
A Randomized Approximation Scheme for Metric MAXCUT
"... Metric MAXCUT is the problem of dividing a set of points in metric space into two parts so as to maximize the sum of the distances between points belonging to distinct parts. We show that metric MAXCUT has a polynomial time randomized approximation scheme. ..."
Abstract

Cited by 31 (5 self)
 Add to MetaCart
Metric MAXCUT is the problem of dividing a set of points in metric space into two parts so as to maximize the sum of the distances between points belonging to distinct parts. We show that metric MAXCUT has a polynomial time randomized approximation scheme.
On the Number of CrossingFree Matchings, Cycles, and Partitions
, 2006
"... We show that a set of n points in the plane has at most O(10.05n) perfect matchings with crossingfree straightline embedding. The expected number of perfect crossingfree matchings of a set of n points drawn i.i.d. from an arbitrary distribution in the plane is at most ..."
Abstract

Cited by 30 (4 self)
 Add to MetaCart
We show that a set of n points in the plane has at most O(10.05n) perfect matchings with crossingfree straightline embedding. The expected number of perfect crossingfree matchings of a set of n points drawn i.i.d. from an arbitrary distribution in the plane is at most
On Approximate Geometric KClustering
, 1999
"... For a partition of an npoint set X ae R d into k subsets (clusters) S 1 ; S 2 ; : : : ; S k , we consider the cost function P k i=1 P x2S i kx \Gamma c(S i )k 2 , where c(S i ) denotes the center of gravity of S i . For k = 2 and for any fixed d and " ? 0, we present a deterministic alg ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
For a partition of an npoint set X ae R d into k subsets (clusters) S 1 ; S 2 ; : : : ; S k , we consider the cost function P k i=1 P x2S i kx \Gamma c(S i )k 2 , where c(S i ) denotes the center of gravity of S i . For k = 2 and for any fixed d and " ? 0, we present a deterministic algorithm that finds a 2clustering with cost no worse than (1 + ") times the minimum cost in time O(n log n); the constant of proportionality depends polynomially on ". For an arbitrary fixed k, we get an O(n log k n) algorithm for a fixed ", again with a polynomial dependence on ". 1 Introduction We consider a geometric kclustering problem: given an npoint set X ae R d and a natural number k 2, find a partition (clustering) \Pi = (S 1 ; S 2 ; : : : ; S k ) of X into k disjoint nonempty subsets that minimizes a suitable cost function among all kclusterings of X. The cost function should show how tightly each S i is "packed together" and how well the different S i are separated from eac...
The Analysis of a Simple kMeans Clustering Algorithm
, 2000
"... Kmeans clustering is a very popular clustering technique, which is used in numerous applications. Given a set of n data points in R d and an integer k, the problem is to determine a set of k points R d , called centers, so as to minimize the mean squared distance from each data point to its nea ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
Kmeans clustering is a very popular clustering technique, which is used in numerous applications. Given a set of n data points in R d and an integer k, the problem is to determine a set of k points R d , called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for kmeans clustering is Lloyd's algorithm. In this paper we present a simple and efficient implementation of Lloyd's kmeans clustering algorithm, which we call the filtering algorithm. This algorithm is very easy to implement. It differs from most other approaches in that it precomputes a kdtree data structure for the data points rather than the center points. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a datasensitive analysis of the algorithm's running time. Second, we have implemented the algorithm and performed a number of empirical studies, both on synthetically generated data and on real data from...
Geometric clustering to minimize the sum of cluster sizes
 In Proc. 13th European Symp. Algorithms, Vol 3669 of LNCS
, 2005
"... Abstract. We study geometric versions of the minsize kclustering problem, a clustering problem which generalizes clustering to minimize the sum of cluster radii and has important applications. We prove that the problem can be solved in polynomial time when the points to be clustered are located on ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We study geometric versions of the minsize kclustering problem, a clustering problem which generalizes clustering to minimize the sum of cluster radii and has important applications. We prove that the problem can be solved in polynomial time when the points to be clustered are located on a line. For Euclidean spaces of higher dimensions, we show that the problem is NPhard and present polynomial time approximation schemes. The latter result yields an improved approximation algorithm for the related problem of kclustering to minimize the sum of cluster diameters. 1