Results 1  10
of
67
Robust Recovery of Subspace Structures by LowRank Representation
"... In this work we address the subspace recovery problem. Given a set of data samples (vectors) approximately drawn from a union of multiple subspaces, our goal is to segment the samples into their respective subspaces and correct the possible errors as well. To this end, we propose a novel method ter ..."
Abstract

Cited by 128 (24 self)
 Add to MetaCart
In this work we address the subspace recovery problem. Given a set of data samples (vectors) approximately drawn from a union of multiple subspaces, our goal is to segment the samples into their respective subspaces and correct the possible errors as well. To this end, we propose a novel method termed LowRank Representation (LRR), which seeks the lowestrank representation among all the candidates that can represent the data samples as linear combinations of the bases in a given dictionary. It is shown that LRR well solves the subspace recovery problem: when the data is clean, we prove that LRR exactly captures the true subspace structures; for the data contaminated by outliers, we prove that under certain conditions LRR can exactly recover the row space of the original data and detect the outlier as well; for the data corrupted by arbitrary errors, LRR can also approximately recover the row space with theoretical guarantees. Since the subspace membership is provably determined by the row space, these further imply that LRR can perform robust subspace segmentation and error correction, in an efficient way.
Factoring nonnegative matrices with linear programs
, 2012
"... This paper describes a new approach for computing nonnegative matrix factorizations (NMFs) with linear programming. The key idea is a datadriven model for the factorization, in which the most salient features in the data are used to express the remaining features. More precisely, given a data matri ..."
Abstract

Cited by 40 (0 self)
 Add to MetaCart
(Show Context)
This paper describes a new approach for computing nonnegative matrix factorizations (NMFs) with linear programming. The key idea is a datadriven model for the factorization, in which the most salient features in the data are used to express the remaining features. More precisely, given a data matrix X, the algorithm identifies a matrix C that satisfies X ≈ CX and some linear constraints. The matrix C selects features, which are then used to compute a lowrank NMF of X. A theoretical analysis demonstrates that this approach has the same type of guarantees as the recent NMF algorithm of Arora et al. (2012). In contrast with this earlier work, the proposed method (1) has better noise tolerance, (2) extends to more general noise models, and (3) leads to efficient, scalable algorithms. Experiments with synthetic and real datasets provide evidence that the new approach is also superior in practice. An optimized C++ implementation of the new algorithm can factor a multiGigabyte matrix in a matter of minutes.
Hybrid linear modeling via local bestfit flats
 in IEEE Conference on Computer Vision and Pattern Recognition
"... In this paper we present a simple and fast geometric method for modeling data by a union of affine sets. The method begins by forming a collection of local best fit affine subspaces. The correct sizes of the local neighborhoods are determined automatically by the Jones ’ β2 numbers; we prove under c ..."
Abstract

Cited by 37 (4 self)
 Add to MetaCart
(Show Context)
In this paper we present a simple and fast geometric method for modeling data by a union of affine sets. The method begins by forming a collection of local best fit affine subspaces. The correct sizes of the local neighborhoods are determined automatically by the Jones ’ β2 numbers; we prove under certain geometric conditions that good local neighborhoods exist and are found by our method. The collection is further processed by a greedy selection procedure or a spectral method to generate the final model. We discuss applications to trackingbased motion segmentation and clustering of faces under different illuminating conditions. We give extensive experimental evidence demonstrating the state of the art accuracy and speed of the suggested algorithms on these problems and also on synthetic hybrid linear data as well as the MNIST handwritten digits data; and we demonstrate how to use our algorithms for fast determination of the number of affine subspaces.
See All by Looking at A Few: Sparse Modeling for Finding Representative Objects
"... We consider the problem of finding a few representatives for a dataset, i.e., a subset of data points that efficiently describes the entire dataset. We assume that each data point can be expressed as a linear combination of the representatives and formulate the problem of finding the representatives ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
(Show Context)
We consider the problem of finding a few representatives for a dataset, i.e., a subset of data points that efficiently describes the entire dataset. We assume that each data point can be expressed as a linear combination of the representatives and formulate the problem of finding the representatives as a sparse multiple measurement vector problem. In our formulation, both the dictionary and the measurements are given by the data matrix, and the unknown sparse codes select the representatives via convex optimization. In general, we do not assume that the data are lowrank or distributed around cluster centers. When the data do come from a collection of lowrank models, we show that our method automatically selects a few representatives from each lowrank model. We also analyze the geometry of the representatives and discuss their relationship to the vertices of the convex hull of the data. We show that our framework can be extended to detect and reject outliers in datasets, and to efficiently deal with new observations and large datasets. The proposed framework and theoretical foundations are illustrated with examples in video summarization and image classification using representatives. 1.
Robust Subspace Clustering
, 2013
"... Subspace clustering refers to the task of finding a multisubspace representation that best fits a collection of points taken from a highdimensional space. This paper introduces an algorithm inspired by sparse subspace clustering (SSC) [17] to cluster noisy data, and develops some novel theory demo ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
(Show Context)
Subspace clustering refers to the task of finding a multisubspace representation that best fits a collection of points taken from a highdimensional space. This paper introduces an algorithm inspired by sparse subspace clustering (SSC) [17] to cluster noisy data, and develops some novel theory demonstrating its correctness. In particular, the theory uses ideas from geometric functional analysis to show that the algorithm can accurately recover the underlying subspaces under minimal requirements on their orientation, and on the number of samples per subspace. Synthetic as well as real data experiments complement our theoretical study, illustrating our approach and demonstrating its effectiveness.
BlockSparse Recovery via Convex Optimization
, 2012
"... Given a dictionary that consists of multiple blocks and a signal that lives in the range space of only a few blocks, we study the problem of finding a blocksparse representation of the signal, i.e., a representation that uses the minimum number of blocks. Motivated by signal/image processing and co ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
Given a dictionary that consists of multiple blocks and a signal that lives in the range space of only a few blocks, we study the problem of finding a blocksparse representation of the signal, i.e., a representation that uses the minimum number of blocks. Motivated by signal/image processing and computer vision applications, such as face recognition, we consider the blocksparse recovery problem in the case where the number of atoms in each block is arbitrary, possibly much larger than the dimension of the underlying subspace. To find a blocksparse representation of a signal, we propose two classes of nonconvex optimization programs, which aim to minimize the number of nonzero coefficient blocks and the number of nonzero reconstructed vectors from the blocks, respectively. Since both classes of problems are NPhard, we propose convex relaxations and derive conditions under which each class of the convex programs is equivalent to the original nonconvex formulation. Our conditions depend on the notions of mutual and cumulative subspace coherence of a dictionary, which are natural generalizations of existing notions of mutual and cumulative coherence. We evaluate the performance of the proposed convex programs through simulations as well as real experiments on face recognition. We show that treating the face recognition problem as a blocksparse recovery problem improves the stateoftheart results by 10 % with only 25 % of the training data.
Fixedrank representation for unsupervised visual learning
"... Subspace clustering and feature extraction are two of the most commonly used unsupervised learning techniques in computer vision and pattern recognition. Stateoftheart techniques for subspace clustering make use of recent advances in sparsity and rank minimization. However, existing techniques a ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
Subspace clustering and feature extraction are two of the most commonly used unsupervised learning techniques in computer vision and pattern recognition. Stateoftheart techniques for subspace clustering make use of recent advances in sparsity and rank minimization. However, existing techniques are computationally expensive and may result in degenerate solutions that degrade clustering performance in the case of insufficient data sampling. To partially solve these problems, and inspired by existing work on matrix factorization, this paper proposes fixedrank representation (FRR) as a unified framework for unsupervised visual learning. FRR is able to reveal the structure of multiple subspaces in closedform when the data is noiseless. Furthermore, we prove that under some suitable conditions, even with insufficient observations, FRR can still reveal the true subspace memberships. To achieve robustness to outliers and noise, a sparse regularizer is introduced into the FRR framework. Beyond subspace clustering, FRR can be used for unsupervised feature extraction. As a nontrivial byproduct, a fast numerical solver is developed for FRR. Experimental results on both synthetic data and real applications validate our theoretical analysis and demonstrate the benefits of FRR for unsupervised visual learning. 1.
Greedy Feature Selection for Subspace Clustering Greedy Feature Selection for Subspace Clustering
"... Unions of subspaces are a powerful nonlinear signal model for collections of highdimensional data. In order to leverage existing methods that exploit this unique signal structure, the subspaces that signals of interest occupy must be known a priori or learned directly from data. In this work, we ana ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
(Show Context)
Unions of subspaces are a powerful nonlinear signal model for collections of highdimensional data. In order to leverage existing methods that exploit this unique signal structure, the subspaces that signals of interest occupy must be known a priori or learned directly from data. In this work, we analyze the performance of greedy feature selection strategies for learning unions of subspaces from ensembles of highdimensional data. We develop sufficient conditions that are required for orthogonal matching pursuit (OMP) to select subsets of points from the ensemble that live in the same subspace, a property which we refer to as exact feature selection (EFS). These conditions highlight the link between the sampling of each subspace in the ensemble and the geometry between pairs of subspaces in order to guarantee EFS. Following this analysis, we provide an empirical study of greedy feature selection strategies and characterize the gap between OMP and near neighborbased approaches. We find that the gap between these two methods is particularly pronounced when the tiling of subspaces in the ensemble is sparse, suggesting that OMP can be used in a number of regimes where nearest neighbor approaches fail to reveal the subspace affinity between points in the ensemble.
Robust subspace clustering via thresholding. arXiv preprint arXiv:1307.4891
, 2013
"... The problem of clustering noisy and incompletely observed highdimensional data points into a union of lowdimensional subspaces and a set of outliers is considered. The number of subspaces, their dimensions, and their orientations are assumed unknown. We propose a simple lowcomplexity subspace clu ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
The problem of clustering noisy and incompletely observed highdimensional data points into a union of lowdimensional subspaces and a set of outliers is considered. The number of subspaces, their dimensions, and their orientations are assumed unknown. We propose a simple lowcomplexity subspace clustering algorithm, which applies spectral clustering to an adjacency matrix obtained by thresholding the correlations between data points. In other words, the adjacency matrix is constructed from the nearest neighbors of each data point in spherical distance. A statistical performance analysis shows that the algorithm succeeds even when the subspaces intersect and that it exhibits robustness to additive noise. Specifically, our results reveal an explicit tradeoff between the affinity of the subspaces and the tolerable noise level. We furthermore prove that the algorithm succeeds even when the data points are incompletely observed with the number of missing entries allowed to be (up to a logfactor) linear in the ambient dimension. We also propose a simple scheme that provably detects outliers, and we present numerical results on real and synthetic data. 1
A novel mestimator for robust PCA
"... We study the basic problem of robust subspace recovery. That is, we assume a data set that some of its points are sampled around a fixed subspace and the rest of them are spread in the whole ambient space, and we aim to recover the fixed underlying subspace. We first estimate “robust inverse sample ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
We study the basic problem of robust subspace recovery. That is, we assume a data set that some of its points are sampled around a fixed subspace and the rest of them are spread in the whole ambient space, and we aim to recover the fixed underlying subspace. We first estimate “robust inverse sample covariance ” by solving a convex minimization procedure; we then recover the subspace by the bottom eigenvectors of this matrix (their number correspond to the number of eigenvalues close to 0). We guarantee exact subspace recovery under some conditions on the underlying data. Furthermore, we propose a fast iterative algorithm, which linearly converges to the matrix minimizing the convex problem. We also quantify the effect of noise and regularization and discuss many other practical and theoretical issues for improving the subspace recovery in various settings. When replacing the sum of terms in the convex energy function (that we minimize) with the sum of squares of terms, we obtain that the new minimizer is a scaled version of the inverse sample covariance (when exists). We thus interpret our minimizer and its subspace (spanned by its bottom eigenvectors) as robust versions of the empirical inverse covariance and the PCA subspace respectively. We compare our method with many other algorithms for robust PCA on synthetic and real data sets and demonstrate stateoftheart speed and accuracy.