Results 1 - 10
of
296
A fast and high quality multilevel scheme for partitioning irregular graphs
- SIAM JOURNAL ON SCIENTIFIC COMPUTING
, 1998
"... Recently, a number of researchers have investigated a class of graph partitioning algorithms that reduce the size of the graph by collapsing vertices and edges, partition the smaller graph, and then uncoarsen it to construct a partition for the original graph [Bui and Jones, Proc. ..."
Abstract
-
Cited by 1189 (15 self)
- Add to MetaCart
Recently, a number of researchers have investigated a class of graph partitioning algorithms that reduce the size of the graph by collapsing vertices and edges, partition the smaller graph, and then uncoarsen it to construct a partition for the original graph [Bui and Jones, Proc.
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract
-
Cited by 572 (15 self)
- Add to MetaCart
(Show Context)
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacian-based methods in a statistical setting.
Co-clustering documents and words using Bipartite Spectral Graph Partitioning
, 2001
"... ..."
(Show Context)
Survey of clustering data mining techniques
, 2002
"... Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in math ..."
Abstract
-
Cited by 408 (0 self)
- Add to MetaCart
(Show Context)
Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in mathematics, statistics, and numerical analysis. From a machine learning perspective clusters correspond to hidden patterns, the search for clusters is unsupervised learning, and the resulting system represents a data concept. From a practical perspective clustering plays an outstanding role in data mining applications such as scientific data exploration, information retrieval and text mining, spatial database applications, Web analysis, CRM, marketing, medical diagnostics, computational biology, and many others. Clustering is the subject of active research in several fields such as statistics, pattern recognition, and machine learning. This survey focuses on clustering in data mining. Data mining adds to clustering the complications of very large datasets with very many attributes of different types. This imposes unique
Data Clustering: 50 Years Beyond K-Means
, 2008
"... Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into taxonomic ranks: domain, kingdom, phylum, class, etc.). Cluster analysis is the formal study of algorithms and m ..."
Abstract
-
Cited by 294 (7 self)
- Add to MetaCart
Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into taxonomic ranks: domain, kingdom, phylum, class, etc.). Cluster analysis is the formal study of algorithms and methods for grouping, or clustering, objects according to measured or perceived intrinsic characteristics or similarity. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes data clustering (unsupervised learning) from classification or discriminant analysis (supervised learning). The aim of clustering is exploratory in nature to find structure in data. Clustering has a long and rich history in a variety of scientific fields. One of the most popular and simple clustering algorithms, K-means, was first published in 1955. In spite of the fact that K-means was proposed over 50 years ago and thousands of clustering algorithms have been published since then, K-means is still widely used. This speaks to the difficulty of designing a general purpose clustering algorithm and the illposed problem of clustering. We provide a brief overview of clustering, summarize well known clustering methods, discuss the major challenges and key issues in designing clustering algorithms, and point out some of the emerging and useful research directions, including semi-supervised clustering, ensemble clustering, simultaneous feature selection, and data clustering and large scale data clustering.
Transductive Learning via Spectral Graph Partitioning
- In ICML
, 2003
"... We present a new method for transductive learning, which can be seen as a transductive version of the k nearest-neighbor classifier. ..."
Abstract
-
Cited by 237 (0 self)
- Add to MetaCart
(Show Context)
We present a new method for transductive learning, which can be seen as a transductive version of the k nearest-neighbor classifier.
A Min-max Cut Algorithm for Graph Partitioning and Data Clustering
, 2001
"... An important application of graph partitioning is data clustering using a graph model -- the pairwise similarities between all data objects form a weighted graph adjacency matrix that contains all necessary information for clustering. Here we propose a new algorithm for graph partition with an objec ..."
Abstract
-
Cited by 213 (15 self)
- Add to MetaCart
An important application of graph partitioning is data clustering using a graph model -- the pairwise similarities between all data objects form a weighted graph adjacency matrix that contains all necessary information for clustering. Here we propose a new algorithm for graph partition with an objective function that follows the min-max clustering principle. The relaxed version of the optimization of the min-max cut objective function leads to the Fiedler vector in spectral graph partition. Theoretical analyses of min-max cut indicate that it leads to balanced partitions, and lower bonds are derived. The min-max cut algorithm is tested on newsgroup datasets and is found to outperform other current popular partitioning/clustering methods. The linkage-based re nements in the algorithm further improve the quality of clustering substantially. We also demonstrate that the linearized search order based on linkage differential is better than that based on the Fiedler vector, providing another effective partition method.
Spectral partitioning works: planar graphs and finite element meshes, in:
- Proceedings of the 37th Annual Symposium on Foundations of Computer Science,
, 1996
"... Abstract Spectral partitioning methods use the Fiedler vector-the eigenvector of the second-smallest eigenvalue of the Laplacian matrix-to find a small separator of a graph. These methods are important components of many scientific numerical algorithms and have been demonstrated by experiment to wo ..."
Abstract
-
Cited by 201 (10 self)
- Add to MetaCart
(Show Context)
Abstract Spectral partitioning methods use the Fiedler vector-the eigenvector of the second-smallest eigenvalue of the Laplacian matrix-to find a small separator of a graph. These methods are important components of many scientific numerical algorithms and have been demonstrated by experiment to work extremely well. In this paper, we show that spectral partitioning methods work well on bounded-degree planar graphs and finite element meshes-the classes of graphs to which they are usually applied. While naive spectral bisection does not necessarily work, we prove that spectral partitioning techniques can be used to produce separators whose ratio of vertices removed to edges cut is O( √ n) for bounded-degree planar graphs and two-dimensional meshes and O(n 1/d ) for well-shaped d-dimensional meshes. The heart of our analysis is an upper bound on the second-smallest eigenvalues of the Laplacian matrices of these graphs: we prove a bound of O(1/n) for bounded-degree planar graphs and O(1/n 2/d ) for well-shaped d-dimensional meshes.