Results 1 - 10
of
20
Learning spectral clustering, with application to speech separation
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive new cost fun ..."
Abstract
-
Cited by 70 (6 self)
- Add to MetaCart
Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive new cost functions for spectral clustering based on measures of error between a given partition and a solution of the spectral relaxation of a minimum normalized cut problem. Minimizing these cost functions with respect to the partition leads to new spectral clustering algorithms. Minimizing with respect to the similarity matrix leads to algorithms for learning the similarity matrix from fully labelled datasets. We apply our learning algorithm to the blind one-microphone speech separation problem, casting the problem as one of segmentation of the spectrogram.
Clustering by weighted cuts in directed graphs
- In Proceedings of the 2007 SIAM International Conference on Data Mining
, 2007
"... In this paper we formulate spectral clustering in directed graphs as an optimization problem, the objective being a weighted cut in the directed graph. This objective extends several popular criteria like the normalized cut and the averaged cut to asymmetric affinity data. We show that this problem ..."
Abstract
-
Cited by 30 (1 self)
- Add to MetaCart
(Show Context)
In this paper we formulate spectral clustering in directed graphs as an optimization problem, the objective being a weighted cut in the directed graph. This objective extends several popular criteria like the normalized cut and the averaged cut to asymmetric affinity data. We show that this problem can be relaxed to a Rayleigh quotient problem for a symmetric matrix obtained from the original affinities and therefore a large body of the results and algorithms developed for spectral clustering of symmetric data immediately extends to asymmetric cuts. 1
Model averaging and dimension selection for the singular value decomposition
- Journal of the American Statistical Association
, 2007
"... Many multivariate data analysis techniques for an m × n matrix Y are related to the model Y = M+E, where Y is an m×n matrix of full rank and M is an unobserved mean matrix of rank K < (m ∧ n). Typically the rank of M is estimated in a heuristic way and then the least-squares estimate of M is obta ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
(Show Context)
Many multivariate data analysis techniques for an m × n matrix Y are related to the model Y = M+E, where Y is an m×n matrix of full rank and M is an unobserved mean matrix of rank K < (m ∧ n). Typically the rank of M is estimated in a heuristic way and then the least-squares estimate of M is obtained via the singular value decomposition of Y, yielding an estimate that can have a very high variance. In this paper we suggest a model-based alternative to the above approach by providing prior distributions and posterior estimation for the rank of M and the components of its singular value decomposition. In addition to providing more accurate inference, such an approach has the advantage of being extendable to more general data-analysis situations, such as inference in the presence of missing data and estimation in a generalized linear modeling framework.
Clustering and Community Detection in Directed Networks: A Survey
, 2013
"... Networks (or graphs) appear as dominant structures in diverse domains, including sociology, biology, neuroscience and computer science. In most of the aforementioned cases graphs are directed – in the sense that there is directionality on the edges, making the semantics of the edges non symmetric as ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Networks (or graphs) appear as dominant structures in diverse domains, including sociology, biology, neuroscience and computer science. In most of the aforementioned cases graphs are directed – in the sense that there is directionality on the edges, making the semantics of the edges non symmetric as the source node transmits some property to the target one but not vice versa. An interesting feature that real networks present is the clustering or community structure property, under which the graph topology is organized into modules commonly called communities or clusters. The essence here is that nodes of the same community are highly similar while on the contrary, nodes across communities present low similarity. Revealing the underlying community structure of directed complex networks has become a crucial and interdisciplinary topic with a plethora of relevant application domains. Therefore, naturally there is a recent wealth of research production in the area of mining directed graphs – with clustering being the primary method sought and the primary tool for community detection and evaluation. The goal of this paper is to offer an in-depth comparative review of the methods presented so far for clustering directed networks along with the relevant necessary methodological background and also related applications. The survey commences by offering a concise review of the fundamental concepts and methodological base on which graph clustering algorithms
An Information Theoretic Approach to Machine Learning
, 2005
"... In this thesis, theory and applications of machine learning systems based on information theoretic criteria as performance measures are studied. A new clustering algorithm based on maximizing the Cauchy-Schwarz (CS) divergence measure between probability density functions (pdfs) is proposed. The CS ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
(Show Context)
In this thesis, theory and applications of machine learning systems based on information theoretic criteria as performance measures are studied. A new clustering algorithm based on maximizing the Cauchy-Schwarz (CS) divergence measure between probability density functions (pdfs) is proposed. The CS divergence is estimated non-parametrically using the Parzen window technique for density estimation. The problem domain is transformed from discrete 0/1 cluster membership values to continuous membership values. A constrained gradient descent maximization algorithm is implemented. The gradients are stochastically approximated to reduce computational complexity, making the algorithm more practical. Parzen window annealing is incorporated into the algorithm to help avoid convergence to a local maximum. The clustering results obtained on synthetic and real data are encouraging. The Parzen window-based estimator for the CS divergence is shown to have a dual expression as a measure of the cosine of the angle between cluster mean vectors in a feature space determined by the eigenspectrum of a Mercer kernel matrix. A spectral clustering
Information Theoretic Spectral Clustering
- In Proceedings of International Joint Conference on Neural Networks
"... Abstract — We discuss a new information-theoretic framework for spectral clustering that is founded on the recently introduced Information Cut. A novel spectral clustering algorithm is proposed, where the clustering solution is given as a linearly weighted combination of certain top eigenvectors of ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
(Show Context)
Abstract — We discuss a new information-theoretic framework for spectral clustering that is founded on the recently introduced Information Cut. A novel spectral clustering algorithm is proposed, where the clustering solution is given as a linearly weighted combination of certain top eigenvectors of the data affinity matrix. The Information Cut provides us with a theoretically well defined graph-spectral cost function, and also establishes a close link between spectral clustering, and non-parametric density estimation. As a result, a natural criterion for creating the data affinity matrix is provided. We present preliminary clustering results to illustrate some of the properties of our algorithm, and we also make comparative remarks. I.
On Potts Model Clustering, Kernel K-Means and Density Estimation
- JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
, 2008
"... ... follow the same recipe: (i) choose a measure of similarity between observations; (ii) define a figure of merit assigning a large value to partitions of the data that put similar observations in the same cluster; and (iii) optimize this figure of merit over partitions. Potts model clustering repr ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
... follow the same recipe: (i) choose a measure of similarity between observations; (ii) define a figure of merit assigning a large value to partitions of the data that put similar observations in the same cluster; and (iii) optimize this figure of merit over partitions. Potts model clustering represents an interesting variation on this recipe. Blatt, Wiseman, and Domany defined a new figure of merit for partitions that is formally similar to the Hamiltonian of the Potts model for ferromagnetism, extensively studied in statistical physics. For each temperature T, the Hamiltonian defines a distribution assigning a probability to each possible configuration of the physical system or, in the language of clustering, to each partition. Instead of searching for a single partition optimizing the Hamiltonian, they sampled a large number of partitions from this distribution for a range of temperatures. They proposed a heuristic for choosing an appropriate temperature and from the sample of partitions associated with this chosen temperature, they then derived what we call a consensus clustering: two observations are put in the same consensus cluster if they belong to the same cluster in the majority of the random partitions. In a sense, the consensus clustering is an “average ” of plausible
Spectral clustering for speech separation
"... Spectral clustering refers to a class of recent techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this chapter, we introduce ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Spectral clustering refers to a class of recent techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this chapter, we introduce the main concepts and algorithms together with recent advances in learning the similarity matrix from data. The techniques are illustrated on the blind one-microphone speech separation problem, by casting the problem as one of segmentation of the spectrogram. 1.
Topics in structured host-antagonist interactions
, 2014
"... For my parents ii ACKNOWLEDGEMENTS Many thanks to my research collaborators and the members of my dissertation committee for making this dissertation possible. ..."
Abstract
- Add to MetaCart
For my parents ii ACKNOWLEDGEMENTS Many thanks to my research collaborators and the members of my dissertation committee for making this dissertation possible.
1 Unsupervised Learning of Boosted Tree Classifier using Graph Cuts for Hand Pose Recognition
"... This study proposes an unsupervised learning approach for the task of hand pose recognition. Considering the large variation in hand poses, classification using a decision tree seems highly suitable for this purpose. Various research works have used boosted decision trees and have shown encouraging ..."
Abstract
- Add to MetaCart
(Show Context)
This study proposes an unsupervised learning approach for the task of hand pose recognition. Considering the large variation in hand poses, classification using a decision tree seems highly suitable for this purpose. Various research works have used boosted decision trees and have shown encouraging results for pose recognition. This work also employs a boosted classifier tree learned in an unsupervised manner for hand pose recognition. We use a recursive two way spectral clustering method, namely the Normalized Cut method (NCut), to generate the decision tree. A binary boosting classifier is then learned at each node of the tree generated by the clustering algorithm. Since the output of the clustering algorithm may contain outliers in practice, the variant of boosting algorithm applied at each node is the Soft Margin version of AdaBoost, which was developed to maximize the classifier margin in a noisy environment. We propose a novel approach to learn the weak classifiers of the boosting process using the partitioning vector given by the NCut algorithm. The algorithm applies a linear regression of feature responses with the partitioning vector and utilizes the sample weights used in boosting to learn the weak hypotheses. Initial result shows satisfactory performances in recognizing complex hand poses with large variations in background and illumination. This framework of tree classifier can also be applied to general multi-class object recognition. 1