Results 1  10
of
22
Greedy Feature Selection for Subspace Clustering Greedy Feature Selection for Subspace Clustering
"... Unions of subspaces are a powerful nonlinear signal model for collections of highdimensional data. In order to leverage existing methods that exploit this unique signal structure, the subspaces that signals of interest occupy must be known a priori or learned directly from data. In this work, we ana ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
Unions of subspaces are a powerful nonlinear signal model for collections of highdimensional data. In order to leverage existing methods that exploit this unique signal structure, the subspaces that signals of interest occupy must be known a priori or learned directly from data. In this work, we analyze the performance of greedy feature selection strategies for learning unions of subspaces from ensembles of highdimensional data. We develop sufficient conditions that are required for orthogonal matching pursuit (OMP) to select subsets of points from the ensemble that live in the same subspace, a property which we refer to as exact feature selection (EFS). These conditions highlight the link between the sampling of each subspace in the ensemble and the geometry between pairs of subspaces in order to guarantee EFS. Following this analysis, we provide an empirical study of greedy feature selection strategies and characterize the gap between OMP and near neighborbased approaches. We find that the gap between these two methods is particularly pronounced when the tiling of subspaces in the ensemble is sparse, suggesting that OMP can be used in a number of regimes where nearest neighbor approaches fail to reveal the subspace affinity between points in the ensemble.
Low Rank Subspace Clustering (LRSC)
, 2013
"... We consider the problem of fitting one or more subspaces to a collection of data points drawn from the subspaces and corrupted by noise and/or gross errors. We pose this problem as a nonconvex optimization problem, where the goal is to decompose the corrupted data matrix as the sum of a clean and s ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
We consider the problem of fitting one or more subspaces to a collection of data points drawn from the subspaces and corrupted by noise and/or gross errors. We pose this problem as a nonconvex optimization problem, where the goal is to decompose the corrupted data matrix as the sum of a clean and selfexpressive dictionary plus a matrix of noise and/or gross errors. By selfexpressive we mean a dictionary whose atoms can be expressed as linear combinations of themselves with lowrank coefficients. In the case of noisy data, our key contribution is to show that this nonconvex matrix decomposition problem can be solved in closed form from the SVD of the noisy data matrix. The solution involves a novel polynomial thresholding operator on the singular values of the data matrix, which requires minimal shrinkage. For one subspace, a particular case of our framework leads to classical PCA, which requires no shrinkage. For multiple subspaces, the lowrank coefficients obtained by our framework can be used to construct a data affinity matrix from which the clustering of the data according to the subspaces can be obtained by spectral clustering. In the case of data corrupted by gross errors, we solve the problem using an alternating minimization approach, which combines our polynomial thresholding operator with the more traditional shrinkagethresholding operator. Experiments on motion segmentation and face clustering show that our framework performs on par with stateoftheart techniques at a reduced computational cost.
Differentially Private Subspace Clustering
"... Abstract Subspace clustering is an unsupervised learning problem that aims at grouping data points into multiple "clusters" so that data points in a single cluster lie approximately on a lowdimensional linear subspace. It is originally motivated by 3D motion segmentation in computer visi ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract Subspace clustering is an unsupervised learning problem that aims at grouping data points into multiple "clusters" so that data points in a single cluster lie approximately on a lowdimensional linear subspace. It is originally motivated by 3D motion segmentation in computer vision, but has recently been generically applied to a wide range of statistical machine learning problems, which often involves sensitive datasets about human subjects. This raises a dire concern for data privacy. In this work, we build on the framework of differential privacy and present two provably private subspace clustering algorithms. We demonstrate via both theory and experiments that one of the presented methods enjoys formal privacy and utility guarantees; the other one asymptotically preserves differential privacy while having good performance in practice. Along the course of the proof, we also obtain two new provable guarantees for the agnostic subspace clustering and the graph connectivity problem which might be of independent interests.
Sparse Subspace Clustering with Missing Entries
"... We consider the problem of clustering incomplete data drawn from a union of subspaces. Classical subspace clustering methods are not applicable to this problem because the data are incomplete, while classical lowrank matrix completion methods may not be applicable because data in multiple subsp ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We consider the problem of clustering incomplete data drawn from a union of subspaces. Classical subspace clustering methods are not applicable to this problem because the data are incomplete, while classical lowrank matrix completion methods may not be applicable because data in multiple subspaces may not be low rank. This paper proposes and evaluates two new approaches for subspace clustering and completion. The first one generalizes the sparse subspace clustering algorithm so that it can obtain a sparse representation of the data using only the observed entries. The second one estimates a suitable kernel matrix by assuming a random model for the missing entries and obtains the sparse representation from this kernel. Experiments on synthetic and real data show the advantages and disadvantages of the proposed methods, which all outperform the natural approach (lowrank matrix completion followed by sparse subspace clustering) when the data matrix is highrank or the percentage of missing entries is large. 1.
Learning Structured LowRank Representation via Matrix Factorization
"... Abstract A vast body of recent works in the literature have shown that exploring structures beyond data lowrankness can boost the performance of subspace clustering methods such as LowRank Representation (LRR). It has also been well recognized that the matrix factorization framework might offer mo ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract A vast body of recent works in the literature have shown that exploring structures beyond data lowrankness can boost the performance of subspace clustering methods such as LowRank Representation (LRR). It has also been well recognized that the matrix factorization framework might offer more flexibility on pursuing underlying structures of the data. In this paper, we propose to learn structured LRR by factorizing the nuclear norm regularized matrix, which leads to our proposed nonconvex formulation NLRR. Interestingly, this formulation of NLRR provides a general framework for unifying a variety of popular algorithms including LRR, dictionary learning, robust principal component analysis, sparse subspace clustering, etc. Several variants of NLRR are also proposed, for example, to promote sparsity while preserving lowrankness. We design a practical algorithm for NLRR and its variants, and establish theoretical guarantee for the stability of the solution and the convergence of the algorithm. Perhaps surprisingly, the computational and memory cost of NLRR can be reduced by roughly one order of magnitude compared to the cost of LRR. Experiments on extensive simulations and real datasets confirm the robustness of efficiency of NLRR and the variants.
Online lowrank subspace clustering by basis dictionary pursuit. arXiv preprint arXiv:1503.08356,
, 2015
"... Abstract LowRank Representation (LRR) has been a significant method for segmenting data that are generated from a union of subspaces. It is also known that solving LRR is challenging in terms of time complexity and memory footprint, in that the size of the nuclear norm regularized matrix is nbyn ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract LowRank Representation (LRR) has been a significant method for segmenting data that are generated from a union of subspaces. It is also known that solving LRR is challenging in terms of time complexity and memory footprint, in that the size of the nuclear norm regularized matrix is nbyn (where n is the number of samples). In this paper, we thereby develop a novel online implementation of LRR that reduces the memory cost from O(n 2 ) to O(pd), with p being the ambient dimension and d being some estimated rank (d < p ≪ n). We also establish the theoretical guarantee that the sequence of solutions produced by our algorithm converges to a stationary point of the expected loss function asymptotically. Extensive experiments on synthetic and realistic datasets further substantiate that our algorithm is fast, robust and memory efficient.
Learning Robust Subspace Clustering
, 2013
"... We propose a lowrank transformationlearning framework to robustify subspace clustering. Many highdimensional data, such as face images and motion sequences, lie in a union of lowdimensional subspaces. The subspace clustering problem has been extensively studied in the literature to partition suc ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We propose a lowrank transformationlearning framework to robustify subspace clustering. Many highdimensional data, such as face images and motion sequences, lie in a union of lowdimensional subspaces. The subspace clustering problem has been extensively studied in the literature to partition such highdimensional data into clusters corresponding to their underlying lowdimensional subspaces. However, lowdimensional intrinsic structures are often violated for realworld observations, as they can be corrupted by errors or deviate from ideal models. We propose to address this by learning a linear transformation on subspaces using matrix rank, via its convex surrogate nuclear norm, as the optimization criteria. The learned linear transformation restores a lowrank structure for data from the same subspace, and, at the same time, forces a highrank structure for data from different subspaces. In this way, we reduce variations within the subspaces, and increase separations between the subspaces for more accurate subspace clustering. This learned Robust Subspace Clustering framework significantly enhances the performance of existing subspace clustering methods. To exploit the lowrank structures of the transformed subspaces, we further introduce a subspace clustering technique, called Robust Sparse Subspace Clustering, which efficiently combines robust PCA with sparse modeling. We also discuss the online learning of the transformation, and learning of the transformation while simultaneously reducing the data dimensionality. Extensive experiments using public datasets are presented, showing that the proposed approach significantly outperforms stateoftheart subspace clustering methods. 1
Clustering Consistent Sparse Subspace Clustering
, 2015
"... Subspace clustering is the problem of clustering data points into a union of lowdimensional linear/affine subspaces. It is the mathematical abstraction of many important problems in computer vision, image processing and has been drawing avid attention in machine learning and statistics recently. I ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Subspace clustering is the problem of clustering data points into a union of lowdimensional linear/affine subspaces. It is the mathematical abstraction of many important problems in computer vision, image processing and has been drawing avid attention in machine learning and statistics recently. In particular, a line
Algorithms and theory for clustering . . .
, 2014
"... In this dissertation we discuss three problems characterized by hidden structure or information. The first part of this thesis focuses on extracting subspace structures from data. Subspace Clustering is the problem of finding a multisubspace representation that best fits a collection of points tak ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this dissertation we discuss three problems characterized by hidden structure or information. The first part of this thesis focuses on extracting subspace structures from data. Subspace Clustering is the problem of finding a multisubspace representation that best fits a collection of points taken from a highdimensional space. As with most clustering problems, popular techniques for subspace clustering are often difficult to analyze theoretically as they are often nonconvex in nature. Theoretical analysis of these algorithms becomes even more challenging in the presence of noise and missing data. We introduce a collection of subspace clustering algorithms, which are tractable and provably robust to various forms of data imperfections. We further illustrate our methods with numerical experiments on a wide variety of data segmentation problems. In the second part of the thesis, we consider the problem of recovering the seemingly hidden phase of an object from intensityonly measurements, a problem which naturally appears in Xray crystallography and related disciplines. We formulate the