19 citations found. Retrieving documents...
D. Pollard, "Strong consistency of k--means clustering", in The Annals of Statistics, volume 9, pages 135--140, 1981.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Principal Curves: Learning, Design, And Applications - Kégl (1999)   (Correct)

....of the empirical loss minimization principle used for vector quantization design by presenting results on consistency and rate of convergence in Section 2.1.2. 2.1. 2 Consistency and Rate Of Convergence Consistency of the empirical quantizer design under general conditions was proven by Pollard [Pol81, Pol82]. The first rate of convergence results were obtained by Linder et al. LLZ94] In particular, LLZ94] showed that if the distribution of X is concentrated on a bounded region, there exists a constant c such that ) cd k logn : 8) An extension of this result to distributions with ....

D. Pollard. Strong consistency of k-means clustering. Annals of Statistics, 9:135--140, 1981.


Unsupervised Curve Clustering using B-Splines - Pierre-Andr (2001)   (6 citations)  (Correct)

....and B of IR p by: h(A; B) if and only if every point of A is within distance of at least one point of B, and vice versa. B F denotes the Borel eld of F derived from the Hausdor metric. We need the following assumption which, as pointed out by Lemaire (1983) is less restrictive than Pollard s (1981) assumptions : H 1 inffu(z) j z 2 Fg inffu(z) j z 2 F; card z kg. Proposition 3.1 Under H 1 , the (unique) minimizer z of u exists and there also exists a unique sequence of measurable functions z n from( A; P ) into (F; B F ) such that z n ( M n for all 2 and u n ( n ....

Pollard, D. (1981) Strong consistency of k-means clustering, Ann. Stat., 9, 135-140.


`1 + 1 > 2': Merging Distance and Density Based Clustering - Dash, Liu, Xu (2001)   (Correct)

....in DBSCAN finds all points that are densityconnected for a given density threshold. Both approaches have their advantages and disadvantages. K means is very popular due to its ease of implementation, linear time complexity in the size of the data, and almost sure convergence to local optima [14, 16]. But it is affected by noise 1 among other disadvantages. On the other hand, density based clustering is capable of finding arbitrary shape clusters, and in handling noise well. But it is slow in comparison and faces difficulty in setting density threshold. In this paper we analyze the two ....

D. Pollard. Strong consistency of K-means clustering. The Annals of Statistics, 9(1):135--140, 1981.


Multiscale Annealing for Grouping and Unsupervised Texture.. - Puzicha, Buhmann (1999)   (2 citations)  (Correct)

....clear, that a certain number of data points has to be available to reliable estimate a given number of clusters. The question of how many e ective data points are needed to distinguish K clusters has been addressed in the context of uniform convergence of empirical means to their expectations [44, 45]. As a key observation in this context, grouping algorithms should be robust with respect to measurement noise in the image recording process and should not be a ected by modeling uncertainty and the natural within class variability. More speci cally, algorithms should abstract from the ....

D. Pollard, \Strong consistency of k{means clustering," The Annals of Statistics, vol. 9, no. 1, pp. 135-140, 1981.


On the Training Distortion of Vector Quantizers - Linder (2000)   (1 citation)  (Correct)

....Q n converges (in some sense) to its lower bound D(Q ) as n 1. Of particular interest are the empirically optimal quantizers Q n minimizing the training distortion: D n (Q n ) minQn D n (Q n ) The consistency of empirically optimal quantizers was first investigated by Pollard [2, 3] for the case of mean squared quantizer distortion. His results show, among other things, that for a stationary and ergodic training sequence, the test distortion D(Q n ) of an empirically optimal quantizer converges to D(Q ) with probability one as n 1. Pollard s results imply that the ....

D. Pollard, "Strong consistency of k-means clustering," Annals of Statistics, vol. 9, no. 1, pp. 135--140, 1981.


Clustering Massive Datasets With Applications in Software Metrics .. - Maitra (1998)   (1 citation)  (Correct)

....algorithm is dependent on the initial guesses, and has been seen to work best with homogeneous, spherical and reasonably well separated clusters. Moreover, in spite of the heuristic and empirical aspects of the methodology, the method has some optimality properties, such as strong consistency [36]. Clustering algorithms, including the k means algorithm are not practical to implement on massive datasets, i.e. datasets where the number of observations N is very large. Taking a sample from this dataset and then applying the clustering algorithm followed by classi cation of the remaining ....

Pollard, D. (1981). Strong consistency of k-means clustering. Ann. Stat. 9:135-40.


Statistical Clustering - Hartigan (2000)   (Correct)

....algorithm of this type; each object is represented by a point in pdimensional space, the dissimilarity between any two points is squared euclidean distance, and the ideal object for each class is the centroid of the class. Asymptotic behaviour 6 of the k means algorithm is described in Pollard [19] (1981) 2.2 Hierarchical clustering Hierarchical clustering constructs trees of clusters of objects, in which any two clusters are disjoint, or one includes the other. The cluster of all objects is the root of the tree. Agglomerative algorithms, Lance and Williams [15] 1967) require a denition ....

Pollard D 1981 Strong consistency of k-means clustering.Annals of Statistics. 9:135-140


Statistical Data Compression by Optimal Segmentation - Theory.. - Steiner (1999)   (1 citation)  (Correct)

....are fixed in advance. For the numerical solution of the segmentation problem we use an equivalent formulation of the optimization problem to deduce an algorithm for the solution (see Ptzelberger and Strasser [26] These problems are closely related to the idea of vector quantization (e. g Pollard [22]) and competitive learning (see Kohonen [12] Note that k means is normally applied in cluster analysis (e.g. Bock [4] Our primary interest is neither the detection of clusters nor the establishment of a correct classification, but we search a topological representation of the data set. The ....

....in each iteration step. If the data set is too large, the method becomes too slow. Alternatively, one can use an adaptive approach, which continually improves the initial solution, but does not apply the whole data set in each adaptation step. We use Learning Vector Quantization (LVQ, see Pollard [22], or Martinetz and Schulten [14] and Martinetz, Berkovich and Schulten [15] for a generalized version) This method does not explicitly work with a partition, but directly manipulates the initial representing points. In each iteration step a data point is selected randomly. The nearest prototype ....

POLLARD, D. Strong consistency of k-means clustering. Annals of Statistics 9 (1981), 135--140.


Clustering and Quantization by MSP-Partitions - Pötzelberger, Strasser (2000)   (1 citation)  (Correct)

....kind of problems generalizes the well known minimum variance partition problem of statistical cluster analysis. The extension is a multivariate version of the topic considered by Bock, 4] It is basically different from those kinds of quantization problems which have been considered by Pollard, [15], and Prna, 14] and which are called principal point problems by Flury, 7] As a side result it is shown that some competitive learning problems considered by Kohonen, 10] belong to the class considered in this paper. The paper contains theoretical results like existence of optima, ....

....the field of statistical cluster analysis (see Bock, 3] Since our main results have a structure which is very similar to this classical theory we present some basics of the case f(x) x 2 in section 3. There we also discuss alternative approaches to quantization like those of Pollard, [15], Prna, 14] and Flury, 7] The results of this paper are contained in sections 4 and 5. In section 4 we give a detailed overview of our theoretical results. The mathematical proofs are collected in section 5. Section 6 contains some examples. Section 7 is an appendix where we provide additional ....

[Article contains additional citation context not shown here]

D. Pollard. Strong consistency of k-means clustering. Annals of Statistics, 9:135--140, 1981.


Reduction of Complexity - Strasser (2000)   (Correct)

....i.e. the generalization of the principal point problem, however, is still tractable by gradient or stochastic gradient methods. In the literature on vector quantization the extension of the principal point problem to general distance measures is a big issue. As an incomplete survey we mention Pollard, 1981 and 1982, Flury, 1990 and 1993, Flury, Tarpey and Li, 1995, P arna, 1986 and 1990, and Kipper and P arna, 1992. The mathematical problems related to these generalizations are difficult and thus an attractive challenge to mathematicians. However, the practical importance of results in the ....

Pollard, D. Strong consistency of k-means clustering. Annals of Statistics, 9:135--140, 1981.


Data Compression And Statistical Inference - Strasser   (Correct)

....another function. If this is done with the second part of the variance decomposition, i.e. with the inner variance, then we arrive at optimization problems which are important for vector quantization. This type of problems is an important topic in the literature. Let us mention papers by Pollard, [26] and [27] by Flury, 9] and [10] by Flury, Tarpey and Li, 8] by Parna, 23] and [24] and by Kipper and Parna, 13] However, we are going a different way. We vary the convex function in the first part of the decomposition. This leads to our optmization problem stated at the beginning of ....

D. Pollard. Strong consistency of k-means clustering. Annals of Statistics, 9:135--140, 1981.


Clustering Massive Datasets - Maitra (1998)   (Correct)

....or by performing hierarchical clustering to obtain a quick grouping from where the means of the grouped hierarchies can be chosen as the initial guess. Despite the heuristic and empirical aspects of the methodology, k means clustering has some optimality properties, such as strong consistency [28]. The k means clustering algorithm is not practical to implement when the dataset is massive, i.e. N is very large. Taking a sample from this dataset and then applying the clustering algorithm is practical but would ignore the smaller groups, thus compromising on the riches associated with the ....

Pollard, D. (1981). Strong consistency of k-means clustering. Ann. Stat. 9:135-40.


Data Compression By Unsupervised Classification - Pötzelberger, Strasser   (Correct)

....Administration, Augasse 2 6, A 1090 Vienna, Austria E mail: Klaus.Poetzelberger wu wien.ac.at, Helmut.Strasser wu wien.ac. at Preprint, November 1997 Abstract: This paper deals with a general class of classification methods which are related both to vector quantization in the sense of Pollard, [12], as well as to competitive learning in the sense of Kohonen, 10] The basic duality of minimum variance partitioning and vector quantization known from statistical cluster analysis is shown to be true for this whole class of classification problems. The paper contains theoretical results like ....

....This idea is based on the optimization of an objective function which is a measure of the information contained in the partition. Our approach to the problem of data compression is closely related to the idea of vector quantization. Vector quantization, which has been considered e.g. by Pollard, [12] and [13] is also the subject of very recent papers like Cuesta Albertos, Gordaliza and Matran, 9] and Bouton and Pages, 2] The problem of vector quantization in the sense treated by Pollard, 12] contains as a special case the problem of finding a minimum variance partition (MVP) The ....

[Article contains additional citation context not shown here]

D. Pollard. Strong consistency of k-means clustering. Annals of Statistics, 9:135--140, 1981.


The Minimax Distortion Redundancy in Empirical Quantizer.. - Bartlett, Linder, Lugosi (1997)   (10 citations)  (Correct)

....also on the real source. The problem of quantifying how good empirically designed quantizers are compared to the truly optimal ones has been extensively studied for the case when the training data consists of n vectors independently drawn from the source distribution. It was shown by Pollard [16, 18] under general conditions that the method of empirical error minimization is consistent in the following sense. Let D n be mean squared error (MSE) of the empirically optimal quantizer, when measured on the real source, and let D be the minimum MSE achieved by an optimal quantizer. An ....

....minimax expected distortion redundancy is some constant times d a s k 1 Gammab=d n for some values of a 2 [1; 3=2] and b 2 [2; 4] Another challenging problem is to find (or give bounds on) the weak minimax convergence rate defined at the end of Section 2. In particular, Pollard s result [16] suggests that the weak minimax rate can still be O(1=n) for a class of sources with sufficiently regular and smooth densities. We have no conjecture at present, however, as to what the weak rate might be for the class of all sources concentrated on S(0; p d) Appendix Proof of Step 3. Let C = ....

D. Pollard. Strong consistency of k-means clustering. Annals of Statistics, 9:135--140, 1981.


Multiscale Annealing for Real-Time Unsupervised Texture.. - Puzicha, Buhmann (1998)   (5 citations)  (Correct)

....reduces at coarser resolution levels, splitting strategy and coarse to fine optimization should be interleaved. The question of how many effective data points are needed to distinguish K clusters has been addressed in the context of uniform convergence of empirical means to their expectations [43, 44]. Assume a sequence of N i.i.d. vectors x i 2 IR d drawn according to an underlying distribution P with compact support, i.e. P (kxk 2 Z) 1 and a nearest neighbor rule for M given y. Then bounds for the deviation of the empirical costs Hkm ( x i ) y) from the expected costs of a ....

D. Pollard, "Strong consistency of k--means clustering," The Annals of Statistics, vol. 9, no. 1, pp. 135--140, 1981.


Fuzzy Clustering for Content-based Indexing in Multimedia Database - Yue (2001)   (Correct)

No context found.

D. Pollard, "Strong consistency of k--means clustering", in The Annals of Statistics, volume 9, pages 135--140, 1981.


Lagrangian Empirical Design of Variable-Rate Vector Quantizers.. - Linder (2002)   (Correct)

No context found.

D. Pollard, "Strong consistency of k-means clustering," Annals of Statistics, vol. 9, no. 1, pp. 135--140, 1981.


Learning-Theoretic Methods in Vector Quantization - Linder (2001)   (1 citation)  (Correct)

No context found.

D. Pollard. Strong consistency of k-means clustering. Annals of Statistics, 9, no. 1:135--140, 1981.


A general approach to Bahadur-Kiefer representations .. - Miguel A. Arcones..   (Correct)

No context found.

Pollard, D. (1981). Strong consistency of k--means clustering. Ann. Statist. 9 135--140.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC