• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Multiway cuts and spectral clustering (2004)

by M Meila, L Xu
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 20
Next 10 →

Learning spectral clustering, with application to speech separation

by Francis R. Bach, Michael I. Jordan - JOURNAL OF MACHINE LEARNING RESEARCH , 2006
"... Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive new cost fun ..."
Abstract - Cited by 70 (6 self) - Add to MetaCart
Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive new cost functions for spectral clustering based on measures of error between a given partition and a solution of the spectral relaxation of a minimum normalized cut problem. Minimizing these cost functions with respect to the partition leads to new spectral clustering algorithms. Minimizing with respect to the similarity matrix leads to algorithms for learning the similarity matrix from fully labelled datasets. We apply our learning algorithm to the blind one-microphone speech separation problem, casting the problem as one of segmentation of the spectrogram.

Clustering by weighted cuts in directed graphs

by Marina Meilă, William Pentney - In Proceedings of the 2007 SIAM International Conference on Data Mining , 2007
"... In this paper we formulate spectral clustering in directed graphs as an optimization problem, the objective being a weighted cut in the directed graph. This objective extends several popular criteria like the normalized cut and the averaged cut to asymmetric affinity data. We show that this problem ..."
Abstract - Cited by 30 (1 self) - Add to MetaCart
In this paper we formulate spectral clustering in directed graphs as an optimization problem, the objective being a weighted cut in the directed graph. This objective extends several popular criteria like the normalized cut and the averaged cut to asymmetric affinity data. We show that this problem can be relaxed to a Rayleigh quotient problem for a symmetric matrix obtained from the original affinities and therefore a large body of the results and algorithms developed for spectral clustering of symmetric data immediately extends to asymmetric cuts. 1
(Show Context)

Citation Context

... symmetric affinity matrix A, which is a special case for us. In this case, if one takes thought of as “weights” associated to the graph nodes; such an interpretation is central to the symmetric case =-=[4, 18]-=-. Here, in addition to the weights Di, we assume that the user may provide two other sets of positive weights for the nodes: the volume weights denoted by Ti and the row weights T ′ i (the associated ...

Model averaging and dimension selection for the singular value decomposition

by Peter D. Hoff - Journal of the American Statistical Association , 2007
"... Many multivariate data analysis techniques for an m × n matrix Y are related to the model Y = M+E, where Y is an m×n matrix of full rank and M is an unobserved mean matrix of rank K < (m ∧ n). Typically the rank of M is estimated in a heuristic way and then the least-squares estimate of M is obta ..."
Abstract - Cited by 17 (2 self) - Add to MetaCart
Many multivariate data analysis techniques for an m × n matrix Y are related to the model Y = M+E, where Y is an m×n matrix of full rank and M is an unobserved mean matrix of rank K &lt; (m ∧ n). Typically the rank of M is estimated in a heuristic way and then the least-squares estimate of M is obtained via the singular value decomposition of Y, yielding an estimate that can have a very high variance. In this paper we suggest a model-based alternative to the above approach by providing prior distributions and posterior estimation for the rank of M and the components of its singular value decomposition. In addition to providing more accurate inference, such an approach has the advantage of being extendable to more general data-analysis situations, such as inference in the presence of missing data and estimation in a generalized linear modeling framework.
(Show Context)

Citation Context

...at are larger than 1, often suggested in the factor analysis literature; and ˆ Ke, the index of the largest gap in the eigenvalues of Y ′ Y, used in machine learning and clustering (see, for example, =-=Meila and Xu 2003-=-). Descriptions of the sampling distributions of these estimators are presented in Table 1. The only case in which the peak of the sampling distribution for one of these estimators obtained the correc...

Clustering and Community Detection in Directed Networks: A Survey

by Fragkiskos D. Malliaros, Michalis Vazirgiannis , 2013
"... Networks (or graphs) appear as dominant structures in diverse domains, including sociology, biology, neuroscience and computer science. In most of the aforementioned cases graphs are directed – in the sense that there is directionality on the edges, making the semantics of the edges non symmetric as ..."
Abstract - Cited by 10 (0 self) - Add to MetaCart
Networks (or graphs) appear as dominant structures in diverse domains, including sociology, biology, neuroscience and computer science. In most of the aforementioned cases graphs are directed – in the sense that there is directionality on the edges, making the semantics of the edges non symmetric as the source node transmits some property to the target one but not vice versa. An interesting feature that real networks present is the clustering or community structure property, under which the graph topology is organized into modules commonly called communities or clusters. The essence here is that nodes of the same community are highly similar while on the contrary, nodes across communities present low similarity. Revealing the underlying community structure of directed complex networks has become a crucial and interdisciplinary topic with a plethora of relevant application domains. Therefore, naturally there is a recent wealth of research production in the area of mining directed graphs – with clustering being the primary method sought and the primary tool for community detection and evaluation. The goal of this paper is to offer an in-depth comparative review of the methods presented so far for clustering directed networks along with the relevant necessary methodological background and also related applications. The survey commences by offering a concise review of the fundamental concepts and methodological base on which graph clustering algorithms

An Information Theoretic Approach to Machine Learning

by Robert Jenssen , 2005
"... In this thesis, theory and applications of machine learning systems based on information theoretic criteria as performance measures are studied. A new clustering algorithm based on maximizing the Cauchy-Schwarz (CS) divergence measure between probability density functions (pdfs) is proposed. The CS ..."
Abstract - Cited by 8 (2 self) - Add to MetaCart
In this thesis, theory and applications of machine learning systems based on information theoretic criteria as performance measures are studied. A new clustering algorithm based on maximizing the Cauchy-Schwarz (CS) divergence measure between probability density functions (pdfs) is proposed. The CS divergence is estimated non-parametrically using the Parzen window technique for density estimation. The problem domain is transformed from discrete 0/1 cluster membership values to continuous membership values. A constrained gradient descent maximization algorithm is implemented. The gradients are stochastically approximated to reduce computational complexity, making the algorithm more practical. Parzen window annealing is incorporated into the algorithm to help avoid convergence to a local maximum. The clustering results obtained on synthetic and real data are encouraging. The Parzen window-based estimator for the CS divergence is shown to have a dual expression as a measure of the cosine of the angle between cluster mean vectors in a feature space determined by the eigenspectrum of a Mercer kernel matrix. A spectral clustering
(Show Context)

Citation Context

...ces. But they use different kernel matrices, and utilize the information contained in the eigenspectrum of the matrices in different manners. Multiway cuts have also been studied (Chang et al., 1994, =-=Meila and Xu, 2004-=-). For other related work, see for example (Weiss, 1999, Kannan et al., 2000, Alpert and Yao, 1995, Azar et al., 2001, Scott and Longuet-Higgins, 1990, Higham and Kibble, 2004, Jenssen et al., 2004). ...

Information Theoretic Spectral Clustering

by Robert Jenssen, Torbjørn Eltoft, Jose C. Principe - In Proceedings of International Joint Conference on Neural Networks
"... Abstract — We discuss a new information-theoretic framework for spectral clustering that is founded on the recently introduced Information Cut. A novel spectral clustering algorithm is proposed, where the clustering solution is given as a linearly weighted combination of certain top eigenvectors of ..."
Abstract - Cited by 8 (3 self) - Add to MetaCart
Abstract — We discuss a new information-theoretic framework for spectral clustering that is founded on the recently introduced Information Cut. A novel spectral clustering algorithm is proposed, where the clustering solution is given as a linearly weighted combination of certain top eigenvectors of the data affinity matrix. The Information Cut provides us with a theoretically well defined graph-spectral cost function, and also establishes a close link between spectral clustering, and non-parametric density estimation. As a result, a natural criterion for creating the data affinity matrix is provided. We present preliminary clustering results to illustrate some of the properties of our algorithm, and we also make comparative remarks. I.
(Show Context)

Citation Context

...6. or membership vector, to take real values. Normally, finding more than two clusters requires a recursive implementation of the method. Procedures for finding multi-way cuts have also been proposed =-=[8]-=-. Other related methods are presented in [9], [10]. For a unifying review of spectral clustering, see [11]. Despite that spectral clustering methods have been observed empirically to work well in a nu...

On Potts Model Clustering, Kernel K-Means and Density Estimation

by Alejandro Murua, Larissa Stanberry, Werner Stuetzle - JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS , 2008
"... ... follow the same recipe: (i) choose a measure of similarity between observations; (ii) define a figure of merit assigning a large value to partitions of the data that put similar observations in the same cluster; and (iii) optimize this figure of merit over partitions. Potts model clustering repr ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
... follow the same recipe: (i) choose a measure of similarity between observations; (ii) define a figure of merit assigning a large value to partitions of the data that put similar observations in the same cluster; and (iii) optimize this figure of merit over partitions. Potts model clustering represents an interesting variation on this recipe. Blatt, Wiseman, and Domany defined a new figure of merit for partitions that is formally similar to the Hamiltonian of the Potts model for ferromagnetism, extensively studied in statistical physics. For each temperature T, the Hamiltonian defines a distribution assigning a probability to each possible configuration of the physical system or, in the language of clustering, to each partition. Instead of searching for a single partition optimizing the Hamiltonian, they sampled a large number of partitions from this distribution for a range of temperatures. They proposed a heuristic for choosing an appropriate temperature and from the sample of partitions associated with this chosen temperature, they then derived what we call a consensus clustering: two observations are put in the same consensus cluster if they belong to the same cluster in the majority of the random partitions. In a sense, the consensus clustering is an “average ” of plausible
(Show Context)

Citation Context

...e label, and a small similarity to pairs known to have different labels. A. APPENDIX: MULTIWAY NORMALIZED CUT The normalized cut between any two clusters k, and k ′ is defined as (Shi and Malik 2000; =-=Meila and Xu 2003-=-; Yu and Shi 2003) NCut(k, k ′ ( 1 ) = vol (k) + 1 vol (k′ ) ∑n n∑ zki zk ) ′ jki j, (A.1) i=1 j=1 where vol (ℓ) = ∑n ∑nj=1 i=1 zℓiki j , ℓ = 1, . . . , q. The MNCut of any given partition is then def...

Spectral clustering for speech separation

by Francis R. Bach, Michael I. Jordan
"... Spectral clustering refers to a class of recent techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this chapter, we introduce ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Spectral clustering refers to a class of recent techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this chapter, we introduce the main concepts and algorithms together with recent advances in learning the similarity matrix from data. The techniques are illustrated on the blind one-microphone speech separation problem, by casting the problem as one of segmentation of the spectrogram. 1.
(Show Context)

Citation Context

...u et al. (2001), we derive the spectral relaxation through normalized cuts. Alternative frameworks, based on Markov random walks (Meila and Shi, 2002), on different definitions of the normalized cut (=-=Meila and Xu, 2003-=-), or on constrained optimization (Higham and Kibble, 2004), lead to similar spectral relaxations. 22.1 Similarity matrices Spectral clustering refers to a class of techniques for clustering that are...

Topics in structured host-antagonist interactions

by Maria Annichia Riolo , 2014
"... For my parents ii ACKNOWLEDGEMENTS Many thanks to my research collaborators and the members of my dissertation committee for making this dissertation possible. ..."
Abstract - Add to MetaCart
For my parents ii ACKNOWLEDGEMENTS Many thanks to my research collaborators and the members of my dissertation committee for making this dissertation possible.

1 Unsupervised Learning of Boosted Tree Classifier using Graph Cuts for Hand Pose Recognition

by Toufiq Parag, Ahmed Elgammal
"... This study proposes an unsupervised learning approach for the task of hand pose recognition. Considering the large variation in hand poses, classification using a decision tree seems highly suitable for this purpose. Various research works have used boosted decision trees and have shown encouraging ..."
Abstract - Add to MetaCart
This study proposes an unsupervised learning approach for the task of hand pose recognition. Considering the large variation in hand poses, classification using a decision tree seems highly suitable for this purpose. Various research works have used boosted decision trees and have shown encouraging results for pose recognition. This work also employs a boosted classifier tree learned in an unsupervised manner for hand pose recognition. We use a recursive two way spectral clustering method, namely the Normalized Cut method (NCut), to generate the decision tree. A binary boosting classifier is then learned at each node of the tree generated by the clustering algorithm. Since the output of the clustering algorithm may contain outliers in practice, the variant of boosting algorithm applied at each node is the Soft Margin version of AdaBoost, which was developed to maximize the classifier margin in a noisy environment. We propose a novel approach to learn the weak classifiers of the boosting process using the partitioning vector given by the NCut algorithm. The algorithm applies a linear regression of feature responses with the partitioning vector and utilizes the sample weights used in boosting to learn the weak hypotheses. Initial result shows satisfactory performances in recognizing complex hand poses with large variations in background and illumination. This framework of tree classifier can also be applied to general multi-class object recognition. 1
(Show Context)

Citation Context

... provided by the clustering algorithm. Using our notation, r (the second eigenvector of the standard eigensystem defined in [13]) stores the partitioning information for subsets IA and IB. As we know =-=[9]-=-, in ideal cases, r is a piecewise linear vector characterizing the correspondence between the points to the clusters to be discovered. Therefore, when classifying between the two subsets produced by ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University