Results 11  20
of
572
Learning spectral clustering, with application to speech separation
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive new cost fun ..."
Abstract

Cited by 70 (6 self)
 Add to MetaCart
Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive new cost functions for spectral clustering based on measures of error between a given partition and a solution of the spectral relaxation of a minimum normalized cut problem. Minimizing these cost functions with respect to the partition leads to new spectral clustering algorithms. Minimizing with respect to the similarity matrix leads to algorithms for learning the similarity matrix from fully labelled datasets. We apply our learning algorithm to the blind onemicrophone speech separation problem, casting the problem as one of segmentation of the spectrogram.
Fast Approximate Spectral Clustering
, 2009
"... Spectral clustering refers to a flexible class of clustering procedures that can produce highquality clusterings on small data sets but which has limited applicability to largescale problems due to its computational complexity of O(n 3), with n the number of data points. We extend the range of spe ..."
Abstract

Cited by 58 (1 self)
 Add to MetaCart
(Show Context)
Spectral clustering refers to a flexible class of clustering procedures that can produce highquality clusterings on small data sets but which has limited applicability to largescale problems due to its computational complexity of O(n 3), with n the number of data points. We extend the range of spectral clustering by developing a general framework for fast approximate spectral clustering in which a distortionminimizing local transformation is first applied to the data. This framework is based on a theoretical analysis that provides a statistical characterization of the effect of local distortion on the misclustering rate. We develop two concrete instances of our general framework, one based on local kmeans clustering (KASP) and one based on random projection trees (RASP). Extensive experiments show that these algorithms can achieve significant speedups with little degradation in clustering accuracy. Specifically, our algorithms outperform kmeans by a large margin in terms of accuracy, and run several times faster than approximate spectral clustering based on the Nyström method, with comparable accuracy and significantly smaller memory footprint. Remarkably, our algorithms make it possible for a single machine to spectral cluster data sets with a million observations within several minutes. 1
Unsupervised modeling of object categories using link analysis techniques
 In CVPR
, 2008
"... We propose an approach for learning visual models of object categories in an unsupervised manner in which we first build a largescale complex network which captures the interactions of all unit visual features across the entire training set and we infer information, such as which features are in wh ..."
Abstract

Cited by 54 (5 self)
 Add to MetaCart
(Show Context)
We propose an approach for learning visual models of object categories in an unsupervised manner in which we first build a largescale complex network which captures the interactions of all unit visual features across the entire training set and we infer information, such as which features are in which categories, directly from the graph by using link analysis techniques. The link analysis techniques are based on wellestablished graph mining techniques used in diverse applications such as WWW, bioinformatics, and social networks. The techniques operate directly on the patterns of connections between features in the graph rather than on statistical properties, e.g., from clustering in feature space. We argue that the resulting techniques are simpler, and we show that they perform similarly or better compared to state of the art techniques on common data sets. We also show results on more challenging data sets than those that have been used in prior work on unsupervised modeling.
Clustering partially observed graphs via convex optimization.
 Journal of Machine Learning Research,
, 2014
"... Abstract This paper considers the problem of clustering a partially observed unweighted graphi.e., one where for some node pairs we know there is an edge between them, for some others we know there is no edge, and for the remaining we do not know whether or not there is an edge. We want to organiz ..."
Abstract

Cited by 47 (13 self)
 Add to MetaCart
Abstract This paper considers the problem of clustering a partially observed unweighted graphi.e., one where for some node pairs we know there is an edge between them, for some others we know there is no edge, and for the remaining we do not know whether or not there is an edge. We want to organize the nodes into disjoint clusters so that there is relatively dense (observed) connectivity within clusters, and sparse across clusters. We take a novel yet natural approach to this problem, by focusing on finding the clustering that minimizes the number of "disagreements"i.e., the sum of the number of (observed) missing edges within clusters, and (observed) present edges across clusters. Our algorithm uses convex optimization; its basis is a reduction of disagreement minimization to the problem of recovering an (unknown) lowrank matrix and an (unknown) sparse matrix from their partially observed sum. We evaluate the performance of our algorithm on the classical Planted Partition/Stochastic Block Model. Our main theorem provides sufficient conditions for the success of our algorithm as a function of the minimum cluster size, edge density and observation probability; in particular, the results characterize the tradeoff between the observation probability and the edge density gap. When there are a constant number of clusters of equal size, our results are optimal up to logarithmic factors.
A Closed Form Solution to Robust Subspace Estimation and Clustering
"... We consider the problem of fitting one or more subspaces to a collection of data points drawn from the subspaces and corrupted by noise/outliers. We pose this problem as a rank minimization problem, where the goal is to decompose the corrupted data matrix as the sum of a clean, selfexpressive, low ..."
Abstract

Cited by 43 (4 self)
 Add to MetaCart
(Show Context)
We consider the problem of fitting one or more subspaces to a collection of data points drawn from the subspaces and corrupted by noise/outliers. We pose this problem as a rank minimization problem, where the goal is to decompose the corrupted data matrix as the sum of a clean, selfexpressive, lowrank dictionary plus a matrix of noise/outliers. Our key contribution is to show that, for noisy data, this nonconvex problem can be solved very efficiently and in closed form from the SVD of the noisy data matrix. Remarkably, this is true for both one or more subspaces. An important difference with respect to existing methods is that our framework results in a polynomial thresholding of the singular values with minimal shrinkage. Indeed, a particular case of our framework in the case of a single subspace leads to classical PCA, which requires no shrinkage. In the case of multiple subspaces, our framework provides an affinity matrix that can be used to cluster the data according to the subspaces. In the case of data corrupted by outliers, a closedform solution appears elusive. We thus use an augmented Lagrangian optimization framework, which requires a combination of our proposed polynomial thresholding operator with the more traditional shrinkagethresholding operator. 1.
Spectral clustering of graphs with general degrees in the extended planted partition model
 Journal of Machine Learning Research  Proceedings Track
"... Abstract In this paper, we examine a spectral clustering algorithm for similarity graphs drawn from a simple random graph model, where nodes are allowed to have varying degrees, and we provide theoretical bounds on its performance. The random graph model we study is the Extended Planted Partition ( ..."
Abstract

Cited by 42 (0 self)
 Add to MetaCart
Abstract In this paper, we examine a spectral clustering algorithm for similarity graphs drawn from a simple random graph model, where nodes are allowed to have varying degrees, and we provide theoretical bounds on its performance. The random graph model we study is the Extended Planted Partition (EPP) model, a variant of the classical planted partition model. The standard approach to spectral clustering of graphs is to compute the bottom k singular vectors or eigenvectors of a suitable graph Laplacian, project the nodes of the graph onto these vectors, and then use an iterative clustering algorithm on the projected nodes. However a challenge with applying this approach to graphs generated from the EPP model is that unnormalized Laplacians do not work, and normalized Laplacians do not concentrate well when the graph has a number of low degree nodes. We resolve this issue by introducing the notion of a degreecorrected graph Laplacian. For graphs with many low degree nodes, degree correction has a regularizing effect on the Laplacian. Our spectral clustering algorithm projects the nodes in the graph onto the bottom k right singular vectors of the degreecorrected randomwalk Laplacian, and clusters the nodes in this subspace. We show guarantees on the performance of this algorithm, demonstrating that it outputs the correct partition under a wide range of parameter values. Unlike some previous work, our algorithm does not require access to any generative parameters of the model.
Overlapping community detection at scale: a nonnegative matrix factorization approach
 In WSDM
, 2013
"... Network communities represent basic structures for understanding the organization of realworld networks. A community (also referred to as a module or a cluster) is typically thought of as a group of nodes with more connections amongst its members than between its members and the remainder of the n ..."
Abstract

Cited by 41 (5 self)
 Add to MetaCart
(Show Context)
Network communities represent basic structures for understanding the organization of realworld networks. A community (also referred to as a module or a cluster) is typically thought of as a group of nodes with more connections amongst its members than between its members and the remainder of the network. Communities in networks also overlap as nodes belong to multiple clusters at once. Due to the difficulties in evaluating the detected communities and the lack of scalable algorithms, the task of overlapping community detection in large networks largely remains an open problem. In this paper we present BIGCLAM (Cluster Affiliation Model for Big Networks), an overlapping community detection method that scales to large networks of millions of nodes and edges. We build on a novel observation that overlaps between communities are densely connected. This is in sharp contrast with present community detection methods which implicitly assume that overlaps between communities are sparsely connected and thus cannot properly extract overlapping communities in networks. In this paper, we develop a modelbased community detection algorithm that can detect densely overlapping, hierarchically nested as well as nonoverlapping communities in massive networks. We evaluate our algorithm on 6 large social, collaboration and information networks with groundtruth community information. Experiments show state of the art performance both in terms of the quality of detected communities as well as in speed and scalability of our algorithm.
Clustering to find exemplar terms for keyphrase extraction
 In Proceedings of EMNLP
, 2009
"... Keyphrases are widely used as a brief summary of documents. Since manual assignment is timeconsuming, various unsupervised ranking methods based on importance scores are proposed for keyphrase extraction. In practice, the keyphrases of a document should not only be statistically important in the do ..."
Abstract

Cited by 40 (5 self)
 Add to MetaCart
Keyphrases are widely used as a brief summary of documents. Since manual assignment is timeconsuming, various unsupervised ranking methods based on importance scores are proposed for keyphrase extraction. In practice, the keyphrases of a document should not only be statistically important in the document, but also have a good coverage of the document. Based on this observation, we propose an unsupervised method for keyphrase extraction. Firstly, the method finds exemplar terms by leveraging clustering techniques, which guarantees the document to be semantically covered by these exemplar terms. Then the keyphrases are extracted from the document using the exemplar terms. Our method outperforms sateoftheart graphbased ranking methods (TextRank) by 9.5 % in F1measure. 1
C.: Spectral clustering of linear subspaces for motion segmentation
 In: IEEE I. Conf. Comp. Vis
"... This paper studies automatic segmentation of multiple motions from tracked feature points through spectral embedding and clustering of linear subspaces. We show that the dimension of the ambient space is crucial for separability, and that low dimensions chosen in prior work are not optimal. We sugge ..."
Abstract

Cited by 40 (0 self)
 Add to MetaCart
(Show Context)
This paper studies automatic segmentation of multiple motions from tracked feature points through spectral embedding and clustering of linear subspaces. We show that the dimension of the ambient space is crucial for separability, and that low dimensions chosen in prior work are not optimal. We suggest lower and upper bounds together with a datadriven procedure for choosing the optimal ambient dimension. Application of our approach to the Hopkins155 video benchmark database uniformly outperforms a range of stateoftheart methods both in terms of segmentation accuracy and computational speed. 1.
Flexible constrained spectral clustering
 in KDD, 2010
"... Constrained clustering has been wellstudied for algorithms like Kmeans and hierarchical agglomerative clustering. However, how to encode constraints into spectral clustering remains a developing area. In this paper, we propose a flexible and generalized framework for constrained spectral clusterin ..."
Abstract

Cited by 38 (4 self)
 Add to MetaCart
(Show Context)
Constrained clustering has been wellstudied for algorithms like Kmeans and hierarchical agglomerative clustering. However, how to encode constraints into spectral clustering remains a developing area. In this paper, we propose a flexible and generalized framework for constrained spectral clustering. In contrast to some previous efforts that implicitly encode MustLink and CannotLink constraints by modifying the graph Laplacian or the resultant eigenspace, we present a more natural and principled formulation, which preserves the original graph Laplacian and explicitly encodes the constraints. Our method offers several practical advantages: it can encode the degree of belief (weight) in MustLink and CannotLink constraints; it guarantees to lowerbound how well the given constraints are satisfied using a userspecified threshold; and it can be solved deterministically in polynomial time through generalized eigendecomposition. Furthermore, by inheriting the objective function from spectral clustering and explicitly encoding the constraints, much of the existing analysis of spectral clustering techniques is still valid. Consequently our work can be posed as a natural extension to unconstrained spectral clustering and be interpreted as finding the normalized mincut of a labeled graph. We validate the effectiveness of our approach by empirical results on realworld data sets, with applications to constrained image segmentation and clustering benchmark data sets with both binary and degreeofbelief constraints.