Results 1 - 10
of
22
Greedy Feature Selection for Subspace Clustering Greedy Feature Selection for Subspace Clustering
"... Unions of subspaces are a powerful nonlinear signal model for collections of highdimensional data. In order to leverage existing methods that exploit this unique signal structure, the subspaces that signals of interest occupy must be known a priori or learned directly from data. In this work, we ana ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Unions of subspaces are a powerful nonlinear signal model for collections of highdimensional data. In order to leverage existing methods that exploit this unique signal structure, the subspaces that signals of interest occupy must be known a priori or learned directly from data. In this work, we analyze the performance of greedy feature selection strategies for learning unions of subspaces from ensembles of high-dimensional data. We develop sufficient conditions that are required for orthogonal matching pursuit (OMP) to select subsets of points from the ensemble that live in the same subspace, a property which we refer to as exact feature selection (EFS). These conditions highlight the link between the sampling of each subspace in the ensemble and the geometry between pairs of subspaces in order to guarantee EFS. Following this analysis, we provide an empirical study of greedy feature selection strategies and characterize the gap between OMP and near neighbor-based approaches. We find that the gap between these two methods is particularly pronounced when the tiling of subspaces in the ensemble is sparse, suggesting that OMP can be used in a number of regimes where nearest neighbor approaches fail to reveal the subspace affinity between points in the ensemble.
Low Rank Subspace Clustering (LRSC)
, 2013
"... We consider the problem of fitting one or more subspaces to a collection of data points drawn from the subspaces and corrupted by noise and/or gross errors. We pose this problem as a non-convex optimization problem, where the goal is to decompose the corrupted data matrix as the sum of a clean and s ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
We consider the problem of fitting one or more subspaces to a collection of data points drawn from the subspaces and corrupted by noise and/or gross errors. We pose this problem as a non-convex optimization problem, where the goal is to decompose the corrupted data matrix as the sum of a clean and self-expressive dictionary plus a matrix of noise and/or gross errors. By selfexpressive we mean a dictionary whose atoms can be expressed as linear combinations of themselves with low-rank coefficients. In the case of noisy data, our key contribution is to show that this non-convex matrix decomposition problem can be solved in closed form from the SVD of the noisy data matrix. The solution involves a novel polynomial thresholding operator on the singular values of the data matrix, which requires minimal shrinkage. For one subspace, a particular case of our framework leads to classical PCA, which requires no shrinkage. For multiple subspaces, the low-rank coefficients obtained by our framework can be used to construct a data affinity matrix from which the clustering of the data according to the subspaces can be obtained by spectral clustering. In the case of data corrupted by gross errors, we solve the problem using an alternating minimization approach, which combines our polynomial thresholding operator with the more traditional shrinkage-thresholding operator. Experiments on motion segmentation and face clustering show that our framework performs on par with state-of-the-art techniques at a reduced computational cost.
Differentially Private Subspace Clustering
"... Abstract Subspace clustering is an unsupervised learning problem that aims at grouping data points into multiple "clusters" so that data points in a single cluster lie approximately on a low-dimensional linear subspace. It is originally motivated by 3D motion segmentation in computer visi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract Subspace clustering is an unsupervised learning problem that aims at grouping data points into multiple "clusters" so that data points in a single cluster lie approximately on a low-dimensional linear subspace. It is originally motivated by 3D motion segmentation in computer vision, but has recently been generically applied to a wide range of statistical machine learning problems, which often involves sensitive datasets about human subjects. This raises a dire concern for data privacy. In this work, we build on the framework of differential privacy and present two provably private subspace clustering algorithms. We demonstrate via both theory and experiments that one of the presented methods enjoys formal privacy and utility guarantees; the other one asymptotically preserves differential privacy while having good performance in practice. Along the course of the proof, we also obtain two new provable guarantees for the agnostic subspace clustering and the graph connectivity problem which might be of independent interests.
Sparse Subspace Clustering with Missing Entries
"... We consider the problem of clustering incom-plete data drawn from a union of subspaces. Classical subspace clustering methods are not ap-plicable to this problem because the data are in-complete, while classical low-rank matrix com-pletion methods may not be applicable because data in multiple subsp ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We consider the problem of clustering incom-plete data drawn from a union of subspaces. Classical subspace clustering methods are not ap-plicable to this problem because the data are in-complete, while classical low-rank matrix com-pletion methods may not be applicable because data in multiple subspaces may not be low rank. This paper proposes and evaluates two new ap-proaches for subspace clustering and completion. The first one generalizes the sparse subspace clustering algorithm so that it can obtain a sparse representation of the data using only the observed entries. The second one estimates a suitable ker-nel matrix by assuming a random model for the missing entries and obtains the sparse represen-tation from this kernel. Experiments on synthetic and real data show the advantages and disadvan-tages of the proposed methods, which all outper-form the natural approach (low-rank matrix com-pletion followed by sparse subspace clustering) when the data matrix is high-rank or the percent-age of missing entries is large. 1.
Learning Structured Low-Rank Representation via Matrix Factorization
"... Abstract A vast body of recent works in the literature have shown that exploring structures beyond data lowrankness can boost the performance of subspace clustering methods such as Low-Rank Representation (LRR). It has also been well recognized that the matrix factorization framework might offer mo ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract A vast body of recent works in the literature have shown that exploring structures beyond data lowrankness can boost the performance of subspace clustering methods such as Low-Rank Representation (LRR). It has also been well recognized that the matrix factorization framework might offer more flexibility on pursuing underlying structures of the data. In this paper, we propose to learn structured LRR by factorizing the nuclear norm regularized matrix, which leads to our proposed non-convex formulation NLRR. Interestingly, this formulation of NLRR provides a general framework for unifying a variety of popular algorithms including LRR, dictionary learning, robust principal component analysis, sparse subspace clustering, etc. Several variants of NLRR are also proposed, for example, to promote sparsity while preserving low-rankness. We design a practical algorithm for NLRR and its variants, and establish theoretical guarantee for the stability of the solution and the convergence of the algorithm. Perhaps surprisingly, the computational and memory cost of NLRR can be reduced by roughly one order of magnitude compared to the cost of LRR. Experiments on extensive simulations and real datasets confirm the robustness of efficiency of NLRR and the variants.
Online low-rank subspace clustering by basis dictionary pursuit. arXiv preprint arXiv:1503.08356,
, 2015
"... Abstract Low-Rank Representation (LRR) has been a significant method for segmenting data that are generated from a union of subspaces. It is also known that solving LRR is challenging in terms of time complexity and memory footprint, in that the size of the nuclear norm regularized matrix is n-by-n ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract Low-Rank Representation (LRR) has been a significant method for segmenting data that are generated from a union of subspaces. It is also known that solving LRR is challenging in terms of time complexity and memory footprint, in that the size of the nuclear norm regularized matrix is n-by-n (where n is the number of samples). In this paper, we thereby develop a novel online implementation of LRR that reduces the memory cost from O(n 2 ) to O(pd), with p being the ambient dimension and d being some estimated rank (d < p ≪ n). We also establish the theoretical guarantee that the sequence of solutions produced by our algorithm converges to a stationary point of the expected loss function asymptotically. Extensive experiments on synthetic and realistic datasets further substantiate that our algorithm is fast, robust and memory efficient.
Learning Robust Subspace Clustering
, 2013
"... We propose a low-rank transformation-learning framework to robustify subspace clustering. Many high-dimensional data, such as face images and motion sequences, lie in a union of low-dimensional subspaces. The subspace clustering problem has been extensively studied in the literature to partition suc ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
We propose a low-rank transformation-learning framework to robustify subspace clustering. Many high-dimensional data, such as face images and motion sequences, lie in a union of low-dimensional subspaces. The subspace clustering problem has been extensively studied in the literature to partition such highdimensional data into clusters corresponding to their underlying low-dimensional subspaces. However, low-dimensional intrinsic structures are often violated for real-world observations, as they can be corrupted by errors or deviate from ideal models. We propose to address this by learning a linear transformation on subspaces using matrix rank, via its convex surrogate nuclear norm, as the optimization criteria. The learned linear transformation restores a low-rank structure for data from the same subspace, and, at the same time, forces a high-rank structure for data from different subspaces. In this way, we reduce variations within the subspaces, and increase separations between the subspaces for more accurate subspace clustering. This learned Robust Subspace Clustering framework significantly enhances the performance of existing subspace clustering methods. To exploit the low-rank structures of the transformed subspaces, we further introduce a subspace clustering technique, called Robust Sparse Subspace Clustering, which efficiently combines robust PCA with sparse modeling. We also discuss the online learning of the transformation, and learning of the transformation while simultaneously reducing the data dimensionality. Extensive experiments using public datasets are presented, showing that the proposed approach significantly outperforms state-ofthe-art subspace clustering methods. 1
Clustering Consistent Sparse Subspace Clustering
, 2015
"... Subspace clustering is the problem of clustering data points into a union of low-dimensional linear/affine subspaces. It is the mathematical abstraction of many important problems in computer vision, image pro-cessing and has been drawing avid attention in machine learning and statistics recently. I ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Subspace clustering is the problem of clustering data points into a union of low-dimensional linear/affine subspaces. It is the mathematical abstraction of many important problems in computer vision, image pro-cessing and has been drawing avid attention in machine learning and statistics recently. In particular, a line
Algorithms and theory for clustering . . .
, 2014
"... In this dissertation we discuss three problems characterized by hidden structure or information. The first part of this thesis focuses on extracting subspace structures from data. Subspace Clustering is the problem of finding a multi-subspace represen-tation that best fits a collection of points tak ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this dissertation we discuss three problems characterized by hidden structure or information. The first part of this thesis focuses on extracting subspace structures from data. Subspace Clustering is the problem of finding a multi-subspace represen-tation that best fits a collection of points taken from a high-dimensional space. As with most clustering problems, popular techniques for subspace clustering are often difficult to analyze theoretically as they are often non-convex in nature. Theoret-ical analysis of these algorithms becomes even more challenging in the presence of noise and missing data. We introduce a collection of subspace clustering algorithms, which are tractable and provably robust to various forms of data imperfections. We further illustrate our methods with numerical experiments on a wide variety of data segmentation problems. In the second part of the thesis, we consider the problem of recovering the seem-ingly hidden phase of an object from intensity-only measurements, a problem which naturally appears in X-ray crystallography and related disciplines. We formulate the