Results 1  10
of
30
A geometric analysis of subspace clustering with outliers
 ANNALS OF STATISTICS
, 2012
"... This paper considers the problem of clustering a collection of unlabeled data points assumed to lie near a union of lower dimensional planes. As is common in computer vision or unsupervised learning applications, we do not know in advance how many subspaces there are nor do we have any information a ..."
Abstract

Cited by 66 (3 self)
 Add to MetaCart
(Show Context)
This paper considers the problem of clustering a collection of unlabeled data points assumed to lie near a union of lower dimensional planes. As is common in computer vision or unsupervised learning applications, we do not know in advance how many subspaces there are nor do we have any information about their dimensions. We develop a novel geometric analysis of an algorithm named sparse subspace clustering (SSC) [11], which significantly broadens the range of problems where it is provably effective. For instance, we show that SSC can recover multiple subspaces, each of dimension comparable to the ambient dimension. We also prove that SSC can correctly cluster data points even when the subspaces of interest intersect. Further, we develop an extension of SSC that succeeds when the data set is corrupted with possibly overwhelmingly many outliers. Underlying our analysis are clear geometric insights, which may bear on other sparse recovery problems. A numerical study complements our theoretical analysis and demonstrates the effectiveness of these methods.
HighRank Matrix Completion and Subspace Clustering with Missing Data
, 2011
"... This paper considers the problem of completing a matrix with many missing entries under the assumption that the columns of the matrix belong to a union of multiple lowrank subspaces. This generalizes the standard lowrank matrix completion problem to situations in which the matrix rank can be quite ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
(Show Context)
This paper considers the problem of completing a matrix with many missing entries under the assumption that the columns of the matrix belong to a union of multiple lowrank subspaces. This generalizes the standard lowrank matrix completion problem to situations in which the matrix rank can be quite high or even full rank. Since the columns belong to a union of subspaces, this problem may also be viewed as a missingdata version of the subspace clustering problem. Let X be an n × N matrix whose (complete) columns lie in a union of at most k subspaces, each of rank ≤ r < n, and assume N ≫ kn. The main result of the paper shows that under mild assumptions each column of X can be perfectly recovered with high probability from an incomplete version so long as at least CrN log 2 (n) entries of X are observed uniformly at random, with C> 1 a constant depending on the usual incoherence conditions, the geometrical arrangement of subspaces, and the distribution of columns over the subspaces. The result is illustrated with numerical experiments and an application to Internet distance matrix completion and topology identification. 1
Graph connectivity in sparse subspace clustering
 In Computer Vision and Pattern Recognition (CVPR’11
, 2011
"... Sparse Subspace Clustering (SSC) is one of the recent approaches to subspace segmentation. In SSC a graph is constructed whose nodes are the data points and whose edges are inferred from the L1sparse representation of each point by the others. It has been proved that if the points lie on a mixture ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
Sparse Subspace Clustering (SSC) is one of the recent approaches to subspace segmentation. In SSC a graph is constructed whose nodes are the data points and whose edges are inferred from the L1sparse representation of each point by the others. It has been proved that if the points lie on a mixture of independent subspaces, the graphical structure of each subspace is disconnected from the others. However, the problem of connectivity within each subspace is still unanswered. This is important since the subspace segmentation in SSC is based on finding the connected components of the graph. Our analysis is built upon the connection between the sparse representation through L1norm minimization and the geometry of convex polytopes proposed by the compressed sensing community. After introduction of some assumptions to make the problem welldefined, it is proved that the connectivity within each subspace holds for 2 and 3dimensional subspaces. The claim of connectivity for general ddimensional case, even for generic configurations, is proved false by giving a counterexample in dimensions greater than 3. 1.
HighRank Matrix Completion
"... This paper considers the problem of completing a matrix with many missing entries under the assumption that the columns of the matrix belong to a union of multiple lowrank subspaces. This generalizes the standard lowrank matrix completion problem to situations in which the matrix rank can be quite ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
This paper considers the problem of completing a matrix with many missing entries under the assumption that the columns of the matrix belong to a union of multiple lowrank subspaces. This generalizes the standard lowrank matrix completion problem to situations in which the matrix rank can be quite high or even full rank. Since the columns belong to a union of subspaces, this problem may also be viewed as a missingdata version of the subspace clustering problem. Let X be an n×N matrix whose (complete) columns lie in a union of at most k subspaces, each of rank ≤ r < n, and assume N ≫ kn. The main result of the paper shows that under mild assumptions each column of X can be perfectly recovered with high probability from an incomplete version so long as at least CrN log 2 (n) entries of X are observed uniformly at random, with C> 1 a constant depending on the usual incoherence conditions, the geometrical arrangement of subspaces, and the distribution of columns over the subspaces. The result is illustrated with numerical experiments and an application to Internet distance matrix completion and topology identification. 1
Guess who rated this movie: Identifying users through subspace clustering. Arxiv preprint arXiv:1208.1544
, 2012
"... It is often the case that, within an online recommender system, multiple users share a common account. Can such shared accounts be identified solely on the basis of the userprovided ratings? Once a shared account is identified, can the different users sharing it be identified as well? Whenever such ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
It is often the case that, within an online recommender system, multiple users share a common account. Can such shared accounts be identified solely on the basis of the userprovided ratings? Once a shared account is identified, can the different users sharing it be identified as well? Whenever such user identification is feasible, it opens the way to possible improvements in personalized recommendations, but also raises privacy concerns. We develop a model for composite accounts based on unions of linear subspaces, and use subspace clustering for carrying out the identification task. We show that a significant fraction of such accounts is identifiable in a reliable manner, and illustrate potential uses for personalized recommendation. 1
Multivariate Volume Visualization through Dynamic Projections
"... temperature pressure Subspace View Navigation Panel Figure 1: An example of the semiautomatic transfer function (TF) design for Hurricane Isabel dataset. Left: subspace view navigation panel. Here each node represents a subspace view, views from the same subspace are grouped together with the same ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
temperature pressure Subspace View Navigation Panel Figure 1: An example of the semiautomatic transfer function (TF) design for Hurricane Isabel dataset. Left: subspace view navigation panel. Here each node represents a subspace view, views from the same subspace are grouped together with the same color, and arrows connecting the nodes indicate the current exploration path. Right: during the TF design process, we exploit relations among different parts of the data by exploring various subspace views via the navigation panel, and further modify and refine the TF. (a)(b) We import four subspace labels (into the PCA view in (a)) based on subspace clustering as the initial TF design, where parts of the hurricane eye are readily visible in the volume visualization (b). (ce) We explore the relations among different parts of the attribute space through dynamic projections between multiple subspace views. (fi) We further modify and refine the TF based on information we learn from the dynamic projections to arrive at the final visualization, where a cutaway view of the volume is shown in (i) that highlights different regions of the hurricane eye. To study the attributes that distinguish these regions, temperature and pressure profiles of the subspace view in (h) are shown in (j) and (k) respectively. See Section 5 for more details. We propose a multivariate volume visualization framework that tightly couples dynamic projections with a highdimensional trans
Clustered factor analysis of multineuronal spike data
"... Highdimensional, simultaneous recordings of neural spiking activity are often explored, analyzed and visualized with the help of latent variable or factor models. Such models are however illequipped to extract structure beyond shared, distributed aspects of firing activity across multiple cells. ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Highdimensional, simultaneous recordings of neural spiking activity are often explored, analyzed and visualized with the help of latent variable or factor models. Such models are however illequipped to extract structure beyond shared, distributed aspects of firing activity across multiple cells. Here, we extend unstructured factor models by proposing a model that discovers subpopulations or groups of cells from the pool of recorded neurons. The model combines aspects of mixture of factor analyzer models for capturing clustering structure, and aspects of latent dynamical system models for capturing temporal dependencies. In the resulting model, we infer the subpopulations and the latent factors from data using variational inference and model parameters are estimated by Expectation Maximization (EM). We also address the crucial problem of initializing parameters for EM by extending a sparse subspace clustering algorithm to integervalued spike count observations. We illustrate the merits of the proposed model by applying it to calciumimaging data from spinal cord neurons, and we show that it uncovers meaningful clustering structure in the data. 1
Online lowrank subspace clustering by basis dictionary pursuit. arXiv preprint arXiv:1503.08356,
, 2015
"... Abstract LowRank Representation (LRR) has been a significant method for segmenting data that are generated from a union of subspaces. It is also known that solving LRR is challenging in terms of time complexity and memory footprint, in that the size of the nuclear norm regularized matrix is nbyn ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract LowRank Representation (LRR) has been a significant method for segmenting data that are generated from a union of subspaces. It is also known that solving LRR is challenging in terms of time complexity and memory footprint, in that the size of the nuclear norm regularized matrix is nbyn (where n is the number of samples). In this paper, we thereby develop a novel online implementation of LRR that reduces the memory cost from O(n 2 ) to O(pd), with p being the ambient dimension and d being some estimated rank (d < p ≪ n). We also establish the theoretical guarantee that the sequence of solutions produced by our algorithm converges to a stationary point of the expected loss function asymptotically. Extensive experiments on synthetic and realistic datasets further substantiate that our algorithm is fast, robust and memory efficient.
Intersecting Faces: Nonnegative Matrix Factorization With New Guarantees Rong Ge
"... Abstract Nonnegative matrix factorization (NMF) is a natural model of admixture and is widely used in science and engineering. A plethora of algorithms have been developed to tackle NMF, but due to the nonconvex nature of the problem, there is little guarantee on how well these methods work. Rece ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract Nonnegative matrix factorization (NMF) is a natural model of admixture and is widely used in science and engineering. A plethora of algorithms have been developed to tackle NMF, but due to the nonconvex nature of the problem, there is little guarantee on how well these methods work. Recently a surge of research have focused on a very restricted class of NMFs, called separable NMF, where provably correct algorithms have been developed. In this paper, we propose the notion of subsetseparable NMF, which substantially generalizes the property of separability. We show that subsetseparability is a natural necessary condition for the factorization to be unique or to have minimum volume. We developed the FaceIntersect algorithm which provably and efficiently solves subsetseparable NMF under natural conditions, and we prove that our algorithm is robust to small noise. We explored the performance of FaceIntersect on simulations and discuss settings where it empirically outperformed the stateofart methods. Our work is a step towards finding provably correct algorithms that solve large classes of NMF problems.