| E. Forgy, "Cluster Analysis of multivariate data: efficiency versus interpretability of classifications", Biometrics, 21, 768, 1965. |
....is that as k increases, i.e. when the average cluster size becomes smaller our refinement algorithm leads to greater improvement. 6 Related Work The k means algorithm has been well studied and is one of the most widely used clustering methods [5] Some of the important early work is due to Forgy[6] and MacQueen[13] In the vector quantization literature, k means clustering is also referred to as the Lloyd Max algorithm[7] see [8] for a comprehensive history of quantization and its relations to statistical clustering. Many variants of k means exist; the version we presented in Section 2 is ....
....literature, k means clustering is also referred to as the Lloyd Max algorithm[7] see [8] for a comprehensive history of quantization and its relations to statistical clustering. Many variants of k means exist; the version we presented in Section 2 is generally attributed to Forgy (see [6, 13]) and is similar to the one given in [5, 10.4.3] we call this batch k means since the centroids are updated after a batch of points has been reassigned. Another version, which we call incremental k means, randomly selects a single vector x whose re assignment from a cluster i to a cluster ....
E. Forgy. Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications. Biometrics, 21(3):768, 1965.
....of (complete) user session records collected in the data collection step, and generates as output the set of clusters on each server. For ease of discussion, we shall refer to each resource as a document. For this step any well known single server document clustering algorithm, such as K Means [5, 12, 10, 6], Single Link [9, 7, 10] Complete link [7, 10] Leader Algorithm [9] an adaptive clustering algorithm [16] etc. can be used in order to generate non overlapping clusters of documents. The co occurrence frequency of documents in complete user session records is used for determining the ....
E. Forgy. Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications. Biometrics, 21:768, 1965.
....data smoothing, SEM algorithm, small number sample set, smoothing parameter estimation. I. INTRODUCTION I N INTELLIGENT statistical data analysis or unsupervised classification, cluster analysis is to determine the cluster number or cluster membership of a set of given samples, 1] 2] 3] [27], by its mean vector, In most cases, the first step of the clustering is to determine the cluster number. The second step is to design a proper clustering algorithm. In recent years, several clustering analysis algorithms have been developed to partition samples into several clusters, in which ....
....samples into several clusters, in which the number of clusters is predetermined. The most notable approaches are, for example, the mean square error (MSE) clustering and finite mixture model algorithms. The MSE clustering algorithm typically is implemented by the well known mean algorithm [1] [27]. This method requires specifying the number of clusters, in advance. If is correctly selected, then it can produce a good clustering result; otherwise, data sets cannot be grouped into appropriate clusters. However, in most cases the number of clusters is unknown in advance. Because it is ....
E. W. Forgy, "Cluster analysis of multivariate data: Efficiency versus interpretability of classifications," Biometrics, vol. 21, no. 3, p. 768.
....consists of major iterations that first reassign all the points to their nearest centroids, and then recompute centroids of newly assembled groups. Iterations continue until a stopping criterion is achieved (for example, no reassignments happen) This version is known as Forgy s algorithm [For65] and has many advantages: It easily works with any Lp norm It allows straightforward parallelization [DM99] It is insensitive with respect to data ordering. Another version of k means iterative optimization reassigns points based on more detailed analysis of effects on the objective ....
Forgy, E. Cluster analysis of multivariate data: Efficiency versus interpretability of classification. Biometrics, 21,768-780, 1965.
....is proposed. This method is here presented with reference to two specific bisecting divisive clustering algorithms: x the bisecting K means algorithm; x the Principal Direction Divisive Partitioning (PDDP) algorithm. K means is the most celebrated and widely used clustering technique (see e.g. [F65], GJJ96] JD88] JMF99] SI84] SKV00] hence it is the best representative of the class of iterative centroid based divisive algorithms. On the other hand, PDDP is a recently proposed technique ( B97] B98] BG 00a] BG 00b] It is representative of the non iterative techniques based ....
....algorithm. This bisecting algorithm has been recently discussed and emphasized in [SKV00] and [WW 97] It is here worth noting that the algorithm above recalled is the very classical and basic version of K means (except for a slightly modified initialization step) also known as Forgy s algorithm ([F65], GJJ96] Many variations of this basic version of the algorithm have been proposed, aiming to reduce the computational demand, at the price of (hopefully little) sub optimality. PDDP Step 1. Compute the centroid w of M . Step 2. Compute the auxiliary matrix M as we M M # , where e is a ....
Forgy, E. (1965). "Cluster Analysis of Multivariate Data: Efficiency versus Interpretability of Classification". Biometrics, pp.768-780.
....involve the tree like construction process. Instead, the first step is to select a cluster centre or seed, and all objects (data points) within a pre specified threshold distance are included in the resulting cluster. The k means is a non hierarchical algorithm, originally known as Forgy s method [54, 92] and has been used extensively in pattern recognition and visualisation. This approach is to view clustering as an estimation of centroids among data points. It solves the clustering problem using the following algorithm. 1. Choose k initial cluster centroids 2. Assign each data point to its ....
....assumptions about how the classification is actually done. There are various theories and algorithms as described in section 2.2 of Chapter 2 which are used to classify objects. In appendix A, we show how to compute the conditional probability vector (3) based on the popular K center Cluster Model [17, 54]. Hierarchical clustering procedures such as agglomerative and divisive techniques [96] and the nonhierarchical clustering technique like partition around mediods (PAM) 75] can be considered as other sophisticated cluster models. 4.2 Quality of Distributed Information Sources The first ....
[Article contains additional citation context not shown here]
Forgy E., "Cluster Analysis of Multivariate Data: Efficiency vs. Interpretability of Classifications", Biometrics, 21:768, 1965.
....problem [81, 63] the goal is to group or cluster the data into sets of like points. One hopes to obtain clusters revealing some sort of high level characterization of the points belonging to individual clusters. Exemplar or prototype based clustering approaches include Forgy s method [61], the MacQueen algorithm [95] commonly referred to as batch and online k Mean clustering) Kohonen maps [87] and competitive learning [141] Probabilistic clustering methods include the COBWEB algorithm [60] AutoClass [39] the Expectation Maximization (EM) algorithm [50, 124] and, more ....
E. W. Forgy. Cluster analysis of multivariate data: Efficiency versus interpretability of classifications. Biometric Soc. Meetings, Riverside CA (Abstract in Biometrics 21, No. 3, 768), 1965.
....X i=1 i (1 Gamma x (l) ij ) log 2 (1 Gamma ij ) x (l) ij log 2 ij j Gamma log 2 j (13) GLA is dependent on the quality of the initial solution generated in step 0. Usually randomization is applied here. The most typical way to generate the initial solution is as follows [39] Step 0.1. Draw k vectors z (1) z (k) from input set X t randomly Step 0.2. Set j = z (l) j 2 [1; k] Note that the number of ways to choose the initial centroids is i t k j , and thus we have to rely on the random centroids. Other widely used way to ....
Forgy, E.: "Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications", Biometrics, 21, 1965, 768.
....(SJ1, RWO, RSA, RSW, SJ2 and CMO) iteratively. This simple algorithm thus applies the operators in a round robin fashion. The initial classification in Step 1 is generated by taking k randomly chosen data vectors as cluster centroids, and by assigning the data vectors to the nearest clusters [16]. This approach is considered to be weaker than the widely used McQueen s method [17,18] but on the other hand, the final result of the LS does not depend strongly upon the quality of the initial solution, and the method of selecting the nearest cluster is fast. Algorithm MOLS Step 1. Draw k ....
Forgy E. Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications. Biometrics 1965; 21:768
....grouping strategy is that it should have minimal spatial and temporal complexities, and should be dynamic so as to efficiently handle changes at the level of either the users base or the server configuration. Existing algorithms for computing client and or page groups relate to the data analysis [18, 9, 26, 14, 12] and data mining [1, 27, 21, 8] domains. An evaluation of the various eligible algorithms with respect to our criteria, is proposed in [6] Based on the result of this evaluation, our algorithm is based on associative data mining [2, 13, 4] Briefly stated, associative data mining computes a set ....
E.W. Forgy. Cluster analysis of multivariate data: Efficiency versus interpretability of classifications. Biometric Society Meetings Riverside, 1965.
....by c (k) The dimension of the feature vectors x (q) x 1 (q) x N (q) and centers c (k) c 1 (k) c N (k) is N. The feature vectors are standardized independently in each component so the vectors belong to the N cube [0,1] N . 2. 1 The k means Algorithm (Forgy [10], 1965 and MacQueen [15] 1967) This is the most heavily used clustering algorithm because it is used as an initial process for many other algorithms [18] The number K of clusters is input first and K centers are initialized by taking K feature vectors as seeds via c (k) x (k) for k = ....
....so forth. This allows a string of small clusters to be merged to form a longer one, provided that the stringent merge criteria are met, with a final shape that can be different from that of a normed ball. 5. Computer Results 5.1 Tests on Data. To show the affects of ordering on the usual (Forgy [10]) k means algorithm we used the data file testfz1a.dta shown in Figure 2 and the file testfz1b.dta formed from it by exchanging feature vector number 1 with feature vector number 8 in the ordering. With K = 5 the different results are shown in Figures 3 (testfz1a.dta) and 4 (testfz1b.dta) Other ....
E. Forgy, "Cluster analysis of multivariate data: efficiency versus interpretability of classifications," Biometrics 21, 768, 1965.
....groups is called a cluster, a region in which the density of objects is locally higher than in other regions. In this paper, data clustering is viewed as a data partitioning problem. Several approaches to find groups in a given database have been developed, but we focus on the K Means algorithm [1, 11, 12, 14, 17, 20, 28] as it is one of the most used iterative partitional clustering algorithms and because it may also be used to initialize more expensive 1 clustering algorithms (e.g. the EM algorithm) 3, 6, 22] However, it is well known that the K Means algorithm suffers from initial starting conditions ....
Forgy, E. (1965). Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications. Biometrics, 21, 768.
....sequential, however, and also was found to lead to problems with roundoff error on large data Page 2 sets. To avoid these problems, our implementation only updates the centroids after all patterns have been assigned to a cluster, which is the same method that was used originally by Forgy [10]. Experimental Results. We applied P CLUSTER to standard texture images [11] each of which had been passed through a set of Gabor filters to produce 20 features per pixel. Jain and Farrokhnia [1] previously used the sequential CLUSTER program to segment a variety of such textured images, ....
E. Forgy, "Cluster analysis of multivariate data: Efficiency versus interpretability of classifications," Biometrics, vol. 21, p. 768, 1965.
....at where the clusters are centered, 7 and what their initial parameters (covariances) are. The initial guess is typically some random setting unless one has prior knowledge of the data. For more details on initialization of clustering algorithms see [8, 17] A clustering algorithm such as K Means [9,18,32]orEM[10,7,12] simply iterates over the data and maps the initial parameters to a new set of parameters that result in a statistical model which fits the data better. The traditional K Means algorithm assumes all clusters are modeled by Gaussians that haveidentical, diagonal covariance matrices ....
E. Forgy. Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications. Biometrics, 21(768), 1965.
No context found.
E. Forgy, "Cluster Analysis of multivariate data: efficiency versus interpretability of classifications", Biometrics, 21, 768, 1965.
No context found.
E. Forgy. Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications. Biometrics, 21(3):768, 1965.
No context found.
E. Forgy, "Cluster Analysis of multivariate data: efficiency versus interpretability of classifications", Biometrics, 21, 768, 1965.
No context found.
Edward W. Forgy, Cluster analysis of multivariate data: Efficiency versus interpretability of classifications, Biometrics 21 (1965), 768.
No context found.
E. Forgy. Cluster analysis of multivariate data: efficiency vs. interpretability of classifications. In WNAR meetings, Univ of Calif Riverside, number 768, 1965.
No context found.
E. Forgy. Cluster analysis of multivariate data: efficiency versus interpretability of classifications. In Biometrics, volume 21, page 768, 1965.
No context found.
E. W. Forgy, "Cluster Analysis of Multivariate Data: Efficiency Versus Interpretability of Classifications," in Biometric Soc, Meetings, Riverside, California, 1965, (Abstract in Biometrics 21, No.3, p768).
No context found.
E. Forgy, "Cluster analysis of multivariate data: efficiency versus interpretability of classifications," Biometrica, vol. 21, p. 768, 1965.
No context found.
E. Forgy, Cluster analysis of multivariate data: efficiency vs interpretability of classifications, Biometrics, 1965
No context found.
E.W. Forgy, "Cluster Analysis of Multivariate Data: Efficiency Versus Interpretability of Classifications", Biometric Soc. Meetings, Riverside, California (Abstract in Biometrics 21), No. 3, 768), 1965.
No context found.
E. Forgy. Cluster Analysis of Multivariate Data: Efficiency versus Interpretability of Classifications. Biometrics, 21:768, 1965. (Abstract).
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC