| Inderjit S. Dhillon and Dharmendra S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In Large-Scale Parallel Data Mining, Lecture Notes in Artificial Intelligence, pages 245--260, 2000. |
....the quality of clustering is heavily dependent on grid size and density threshold parameters. A survey of parallel algorithms for hierarchical clustering using distance based metrics is given in [Ols95] These are more theoretical PRAM algorithms. Recently, k means algorithm has been parallelized [DM99] but is limited, however, in its applicability, as it requires the user to specify k, the number of clusters, and also does not find clusters in subspaces. Clusters are unions of connected high density cells. Two k dimensional cells are connected if they have a common face in the k dimensional ....
I.S. Dhillon and D.S. Modha. A data-clustering algorithm on distributed memory multiprocessors. Large-Scale Parallel KDD Systems, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999.
.... widely investigated problems in many fields (such as Machine Learning, Data Mining, Computational Geometry, and of course Information Retrieval) However, there is no clear indication as to whether or not existing algorithms could effectively be employed in large scale web applications (see, e.g. [4, 5], for a discussion of the difficulties connected to the efficient clustering of very large document collections and for sequential and distributed algorithms with state of the art performances) In this paper we isolate a problem, which we call Minimum Redirections Problem, related to the ....
I. S. Dhillon and D. S. Modha, A data clustering algorithm on distributed memory multiprocessing, Large-Scale Parallel Data Mining, Lecture Notes in Artificial Intelligence, Volume 1759, pp. 245-260, 2000.
....fact, if any clusters are changed, J is reduced. As J is bounded from below it converges and as a consequence the algorithm converges. It is also known that the k means will always converge to a local minimum [17] The k means algorithm may be viewed as a variant of the EM algorithm [56] In [26] a parallel k means algorithm is proposed. The authors also provide a careful analysis of the algorithm s computational complexity. There are two Algorithm 7 k means algorithm Select k arbitrary data points z 1 , z k . repeat T i : z i ) z s ) s = 1, p z i ....
....Finding the minimum for each point requires at total of kN comparisons, then one needs to compute the new average for each cluster which requires nd additions and kd divisions. The cost is usually dominated in data mining by the costs for the determination of all the distances and thus the time is [26]: T = O(NkdI) where I is the number of iterations. For the parallel algorithm (shared nothing) the data is initially distributed over the discs of all the processors. Then each processor computes the distances of its elements to all cluster centers. This is done in parallel and so the most ....
I.S. Dhillon and D.S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In Zaki and Ho [73].
....tree [8] Several researches study techniques for parallelizing clustering algorithms, which can be considered as the unsupervised learning problem. Ruocco and Frieder [15] propose parallel single link and single pass algorithm for clustering documents worked on an Intel Paragon. Dhillon and Modha [3] introduce an effective parallelization of the k means clustering algorithm implemented on an IBM POWERparallel SP2. Forman and Zhang [4] also present a general technique for parallelizing a class of center based clustering algorithms including k means, k harmonic means, and EM algorithm performed ....
Dhillon, I.S., and Modha, D.S. A data-clustering algorithm on distributed memory multiprocessors. Large-Scale Parallel Data Mining, pages 245-260, 1999.
No context found.
Inderjit S. Dhillon and Dharmendra S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In Large-Scale Parallel Data Mining, Lecture Notes in Artificial Intelligence, pages 245--260, 2000.
No context found.
I. S. Dhillon and D. S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In Proceedings of Workshop on Large-Scale Parallel KDD Systems (in conjunction with SIGKDD), pages 245--260, August 1999.
No context found.
I. S. Dhillon and D. S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In M. Zaki and C. Ho, editors, Large Scale Parallel Data Mining, pages 245--260. LNCS vol 1759. Springer, 2000.
No context found.
I. S. Dhillon and D. S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In KDD, pages 245--260, 1999.
No context found.
I. S. Dhillon and D. S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In ACM SIGKDD, 1999.
No context found.
I.S. Dhillon and D.S. Modha, "A Data-Clustering Algorithm on Distributed Memory Multiprocessors," Proc. Workshop Large-Scale Parallel KDD Systems, in conjunction with the Fifth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '99), pp. 4756, Aug. 1999.
No context found.
I. S. Dhillon and D. S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In Proceedings of Large-scale Parallel KDD Systems Workshop, ACM SIGKDD, Aug. 15-18 1999.
No context found.
Dhillon, I.S. and Modha, D.S. "A data clustering algorithm on distributed memory machines," ACM SIGKDD Workshop on Large-Scale Parallel KDD Systems (with KDD99), August 1999.
No context found.
Inderjit S. Dhillon and Dharmendra S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In In Proceedings of Workshop on Large-Scale Parallel KDD Systems, in conjunction with the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 99), pages 47 -- 56, August 1999.
No context found.
Dhillon, I. S. and Modha, D. M., A Data Clustering Algorithm on Distributed Memory Multiprocessors, in Large-Scale Parallel Data Mining, Lecture Notes in Artificial Intelligence, Volume 1759, pages 245260, 2000.
No context found.
I. S. Dhillon and D. S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In KDD, pages 245--260, 1999.
No context found.
I.S. Dhillon and D.S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In Zaki and Ho [73].
No context found.
Inderjit S. Dhillon and Dharmendra S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In In Proceedings of Workshop on Large-Scale Parallel KDD Systems, in conjunction with the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 99), pages 47 -- 56, August 1999.
No context found.
Inderjit S. Dhillon and Dharmendra S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In In Proceedings of Workshop on Large-Scale Parallel KDD Systems, in conjunction with the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 99), pages 47 - 56, August 1999.
No context found.
I. Dhillon and D. Modha. A Data-clustering Algorithm on Distributed Memory Multiprocessors. In Proceedings of the KDD'99 Workshop on High Performance Knowledge Discovery, pages 245--260, 1999.
No context found.
I. S. Dhillon and D. S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In Large-Scale Parallel Data Mining, Lecture Notes in Artificial Intelligence, pages 245--260, 2000.
No context found.
I. S. Dhillon and D. S. Modha. A dataclustering algorithm on distributed memory multiprocesors. In M. J. Zaki and C.-T. Ho (eds), Large-Scale Parallel Data Mining, Springer-Verlag, LNCS 1759, pages 245-- 260, 1999.
No context found.
Dhillon I. S., Modh Dh. S.: "A Data-Clustering Algorithm On Distributed Memory Multiprocessors", Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD 99) 98] Ester M., Kriegel H.-P., Sander J., WimmerM.,XuX.: "Incremental Clustering for Mining in a Data Warehousing Environment", VLDB 98
No context found.
I. Dhillon and D. Modha. A data clustering algorithm on distributed memory multiprocessors. In Workshop on Large-Scale Parallel KDD Systems, 1999.
No context found.
I.S. Dhillon and D.S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In Large-Scale Parallel Data Mining, Lecture Notes in Artificial Intelligence, pages 245--260, 2000.
No context found.
Dhillon, I. S. and Modha, D. S. (1999). A data-clustering algorithm on distributed memory multiprocessors. In Proc. Large-scale Parallel KDD Systems Workshop, ACM SIGKDD.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC