Results 1  10
of
31
Incoherenceoptimal matrix completion
, 2013
"... This paper considers the matrix completion problem. We show that it is not necessary to assume joint incoherence, which is a standard but unintuitive and restrictive condition that is imposed by previous studies. This leads to a sample complexity bound that is orderwise optimal with respect to the ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
This paper considers the matrix completion problem. We show that it is not necessary to assume joint incoherence, which is a standard but unintuitive and restrictive condition that is imposed by previous studies. This leads to a sample complexity bound that is orderwise optimal with respect to the incoherence parameter (as well as to the rank r and the matrix dimension n, except for a log n factor). As a consequence, we improve the sample complexity of recovering a semidefinite matrix from O(nr2 log2 n) to O(nr log2 n), and the highest allowable rank from Θ( n / log n) to Θ(n / log2 n). The key step in proof is to obtain new bounds on the `∞,2norm, defined as the maximum of the row and column norms of a matrix. To demonstrate the applicability of our techniques, we discuss extensions to SVD projection, semisupervised clustering and structured matrix completion. Finally, we turn to the lowrankplussparse matrix decomposition problem, and show that the joint incoherence condition is unavoidable here conditioned on computational complexity assumptions on the classical planted clique problem. This means that it is intractable in general to separate a rankω( n) positive semidefinite matrix and a sparse matrix. 1
More data speeds up training time in learning halfspaces over sparse vectors
 In NIPS
, 2013
"... The increased availability of data in recent years has led several authors to ask whether it is possible to use data as a computational resource. That is, if more data is available, beyond the sample complexity limit, is it possible to use the extra examples to speed up the computation time required ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
The increased availability of data in recent years has led several authors to ask whether it is possible to use data as a computational resource. That is, if more data is available, beyond the sample complexity limit, is it possible to use the extra examples to speed up the computation time required to perform the learning task? We give the first positive answer to this question for a natural supervised learning problem — we consider agnostic PAC learning of halfspaces over 3sparse vectors in {−1, 1, 0}n. This class is inefficiently learnable using O (n/2) examples. Our main contribution is a novel, noncryptographic, methodology for establishing computationalstatistical gaps, which allows us to show that, under a widely believed assumption that refuting random 3CNF formulas is hard, it is impossible to efficiently learn this class using only O n/2 examples. We further show that under stronger hardness assumptions, even O n1.499/2 examples do not suffice. On the other hand, we show a new algorithm that learns this class efficiently using Ω̃ n2/2 examples. This formally establishes the tradeoff between sample and computational complexity for a natural supervised learning problem. 1
Near optimal compressed sensing of sparse rankone matrices via sparse power factorization. arXiv preprint arXiv:1312.0525
, 2013
"... ar ..."
(Show Context)
Statistical and computational tradeoffs in estimation of sparse principal components
, 2014
"... In recent years, Sparse Principal Component Analysis has emerged as an extremely popular dimension reduction technique for highdimensional data. The theoretical challenge, in the simplest case, is to estimate the leading eigenvector of a population covariance matrix under the assumption that this e ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
In recent years, Sparse Principal Component Analysis has emerged as an extremely popular dimension reduction technique for highdimensional data. The theoretical challenge, in the simplest case, is to estimate the leading eigenvector of a population covariance matrix under the assumption that this eigenvector is sparse. An impressive range of estimators have been proposed; some of these are fast to compute, while others are known to achieve the minimax optimal rate over certain Gaussian or subgaussian classes. In this paper we show that, under a widelybelieved assumption from computational complexity theory, there is a fundamental tradeoff between statistical and computational performance in this problem. More precisely, working with new, larger classes satisfying a Restricted Covariance Concentration condition, we show that no randomised polynomial time algorithm can achieve the minimax optimal rate. On the other hand, we also study a (polynomial time) variant of the wellknown semidefinite relaxation estimator, and show that it attains essentially the optimal rate among all randomised polynomial time algorithms.
Nonconvex statistical optimization: Minimaxoptimal sparse pca in polynomial time. Available at arXiv:1408.5352
, 2014
"... Sparse principal component analysis (PCA) involves nonconvex optimization for which the global solution is hard to obtain. To address this issue, one popular approach is convex relaxation. However, such an approach may produce suboptimal estimators due to the relaxation effect. To optimally estimate ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Sparse principal component analysis (PCA) involves nonconvex optimization for which the global solution is hard to obtain. To address this issue, one popular approach is convex relaxation. However, such an approach may produce suboptimal estimators due to the relaxation effect. To optimally estimate sparse principal subspaces, we propose a twostage computational framework named “tighten after relax”: Within the “relax ” stage, we approximately solve a convex relaxation of sparse PCA with early stopping to obtain a desired initial estimator; For the “tighten ” stage, we propose a novel algorithm called sparse orthogonal iteration pursuit (SOAP), which iteratively refines the initial estimator by directly solving the underlying nonconvex problem. A key concept of this twostage framework is the basin of attraction. It represents a local region within which the “tighten ” stage has desired computational and statistical guarantees. We prove that, the initial estimator obtained from the “relax ” stage falls into such a region, and hence SOAP geometrically converges to a principal subspace estimator which is minimaxoptimal within a certain model class. Unlike most existing sparse PCA estimators, our approach applies to the nonspiked covariance models, and adapts to nonGaussianity as well as dependent data settings. Moreover, through analyzing the computational complexity of the two stages, we illustrate an interesting phenomenon: Larger sample size can reduce the total iteration complexity. Our framework motivates a general paradigm for solving many complex statistical problems which involve nonconvex optimization with provable guarantees. 1
Embedding hard learning problems into gaussian space
 In RANDOM
, 2014
"... We give the first representationindependent hardness result for agnostically learning halfspaces with respect to the Gaussian distribution. We reduce from the problem of learning sparse parities with noise with respect to the uniform distribution on the hypercube (sparse LPN), a notoriously hard pr ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We give the first representationindependent hardness result for agnostically learning halfspaces with respect to the Gaussian distribution. We reduce from the problem of learning sparse parities with noise with respect to the uniform distribution on the hypercube (sparse LPN), a notoriously hard problem in computer science and show that any algorithm for agnostically learning halfspaces requires nΩ(log (1/)) time, ruling out a polynomial time algorithm for the problem. As far as we are aware, this is the first representationindependent hardness result for supervised learning when the underlying distribution is restricted to be a Gaussian. We also show that the problem of agnostically learning sparse polynomials with respect to the Gaussian distribution in polynomial time is as hard as PAC learning DNFs on the uniform distribution in polynomial time. This complements the surprising result of [APVZ14] who show that sparse polynomials are learnable under random Gaussian noise in polynomial time. Taken together, these results show the inherent difficulty of designing supervised learning algorithms in Euclidean space even in the presence of strong distributional assumptions. Our results use a novel embedding of random labeled examples from the uniform distribution on the Boolean hypercube into random labeled examples from the Gaussian distribution that allows us to relate the hardness of learning problems on two different domains and distributions. 1
Tight convex relaxations for sparse matrix factorization
, 2014
"... Based on a new atomic norm, we propose a new convex formulation for sparse matrix factorization problems in which the number of nonzero elements of the factors is assumed fixed and known. The formulation counts sparse PCA with multiple factors, subspace clustering and lowrank sparse bilinear regre ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Based on a new atomic norm, we propose a new convex formulation for sparse matrix factorization problems in which the number of nonzero elements of the factors is assumed fixed and known. The formulation counts sparse PCA with multiple factors, subspace clustering and lowrank sparse bilinear regression as potential applications. We compute slow rates and an upper bound on the statistical dimension Amelunxen et al. (2013) of the suggested norm for rank 1 matrices, showing that its statistical dimension is an order of magnitude smaller than the usual `1norm, trace norm and their combinations. Even though our convex formulation is in theory hard and does not lead to provably polynomial time algorithmic schemes, we propose an active set algorithm leveraging the structure of the convex problem to solve it and show promising numerical results.
Improved SumofSquares Lower Bounds for Hidden Clique and Hidden Submatrix Problems
, 2015
"... Given a large data matrix A ∈ Rn×n, we consider the problem of determining whether its entries are i.i.d. with some known marginal distribution Aij ∼ P0, or instead A contains a principal submatrix AQ,Q whose entries have marginal distribution Aij ∼ P1 6 = P0. As a special case, the hidden (or plant ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Given a large data matrix A ∈ Rn×n, we consider the problem of determining whether its entries are i.i.d. with some known marginal distribution Aij ∼ P0, or instead A contains a principal submatrix AQ,Q whose entries have marginal distribution Aij ∼ P1 6 = P0. As a special case, the hidden (or planted) clique problem requires to find a planted clique in an otherwise uniformly random graph. Assuming unbounded computational resources, this hypothesis testing problem is statistically solvable provided Q  ≥ C log n for a suitable constant C. However, despite substantial effort, no polynomial time algorithm is known that succeeds with high probability when Q  = o(√n). Recently Meka and Wigderson [MW13b], proposed a method to establish lower bounds within the Sum of Squares (SOS) semidefinite hierarchy. Here we consider the degree4 SOS relaxation, and study the construction of [MW13b] to prove that SOS fails unless k ≥ C n1/3 / log n. An argument presented by Barak implies that this lower bound cannot be substantially improved unless the witness construction is changed in the proof. Our proof uses the moments method to bound the spectrum of a certain random