Results 1  10
of
42
Supplement to “Asymptotic power of sphericity tests for highdimensional data.” DOI:10.1214/13AOS1100SUPP
, 2013
"... ar ..."
Complexity theoretic lower bounds for sparse principal component detection
 In COLT 2013 – The 26th Conference on Learning Theory
, 2013
"... In the context of sparse principal component detection, we bring evidence towards the existence of a statistical price to pay for computational efficiency. We measure the performance of a test by the smallest signal strength that it can detect and we propose a computationally efficient method based ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
In the context of sparse principal component detection, we bring evidence towards the existence of a statistical price to pay for computational efficiency. We measure the performance of a test by the smallest signal strength that it can detect and we propose a computationally efficient method based on semidefinite programming. We also prove that the statistical performance of this test cannot be strictly improved by any computationally efficient method. Our results can be viewed as complexity theoretic lower bounds conditionally on the assumptions that some instances of the planted clique problem cannot be solved in randomized polynomial time.
Incoherenceoptimal matrix completion
, 2013
"... This paper considers the matrix completion problem. We show that it is not necessary to assume joint incoherence, which is a standard but unintuitive and restrictive condition that is imposed by previous studies. This leads to a sample complexity bound that is orderwise optimal with respect to the ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
This paper considers the matrix completion problem. We show that it is not necessary to assume joint incoherence, which is a standard but unintuitive and restrictive condition that is imposed by previous studies. This leads to a sample complexity bound that is orderwise optimal with respect to the incoherence parameter (as well as to the rank r and the matrix dimension n, except for a log n factor). As a consequence, we improve the sample complexity of recovering a semidefinite matrix from O(nr2 log2 n) to O(nr log2 n), and the highest allowable rank from Θ( n / log n) to Θ(n / log2 n). The key step in proof is to obtain new bounds on the `∞,2norm, defined as the maximum of the row and column norms of a matrix. To demonstrate the applicability of our techniques, we discuss extensions to SVD projection, semisupervised clustering and structured matrix completion. Finally, we turn to the lowrankplussparse matrix decomposition problem, and show that the joint incoherence condition is unavoidable here conditioned on computational complexity assumptions on the classical planted clique problem. This means that it is intractable in general to separate a rankω( n) positive semidefinite matrix and a sparse matrix. 1
Detecting positive correlations in a multivariate sample. arXiv preprint arXiv:1202.5536,
, 2012
"... Abstract We consider the problem of testing whether a correlation matrix of a multivariate normal population is the identity matrix. We focus on sparse classes of alternatives where only a few entries are nonzero and, in fact, positive. We derive a general lower bound applicable to various classes ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
Abstract We consider the problem of testing whether a correlation matrix of a multivariate normal population is the identity matrix. We focus on sparse classes of alternatives where only a few entries are nonzero and, in fact, positive. We derive a general lower bound applicable to various classes and study the performance of some nearoptimal tests. We pay special attention to computational feasibility and construct nearoptimal tests that can be computed efficiently. Finally, we apply our results to prove new lower bounds for the clique number of highdimensional random geometric graphs.
Community Detection in Sparse Random Networks
, 2013
"... We consider the problem of detecting a tight community in a sparse random network. This is formalized as testing for the existence of a dense random subgraph in a random graph. Under the null hypothesis, the graph is a realization of an ErdösRényi graph on N vertices and with connection probability ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
We consider the problem of detecting a tight community in a sparse random network. This is formalized as testing for the existence of a dense random subgraph in a random graph. Under the null hypothesis, the graph is a realization of an ErdösRényi graph on N vertices and with connection probability p0; under the alternative, there is an unknown subgraph on n vertices where the connection probability is p1> p0. In (AriasCastro and Verzelen, 2012), we focused on the asymptotically dense regime where p0 is large enough that log(1 ∨ (np0) −1) = o(log(N/n)). We consider here the asymptotically sparse regime where p0 is small enough that log(N/n) = O(log(1 ∨ (np0) −1)). As before, we derive information theoretic lower bounds, and also establish the performance of various tests. Compared to our previous work (AriasCastro and Verzelen, 2012), the arguments for the lower bounds are based on the same technology, but are substantially more technical in the details; also, the methods we study are different: besides a variant of the scan statistic, we study other statistics such as the size of the largest connected component, the number of triangles, the eigengap of the adjacency matrix, etc. Our detection bounds are sharp, except in the Poisson regime where we were not able to fully characterize the constant arising in the bound.
Statistical and computational tradeoffs in estimation of sparse principal components
, 2014
"... In recent years, Sparse Principal Component Analysis has emerged as an extremely popular dimension reduction technique for highdimensional data. The theoretical challenge, in the simplest case, is to estimate the leading eigenvector of a population covariance matrix under the assumption that this e ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
In recent years, Sparse Principal Component Analysis has emerged as an extremely popular dimension reduction technique for highdimensional data. The theoretical challenge, in the simplest case, is to estimate the leading eigenvector of a population covariance matrix under the assumption that this eigenvector is sparse. An impressive range of estimators have been proposed; some of these are fast to compute, while others are known to achieve the minimax optimal rate over certain Gaussian or subgaussian classes. In this paper we show that, under a widelybelieved assumption from computational complexity theory, there is a fundamental tradeoff between statistical and computational performance in this problem. More precisely, working with new, larger classes satisfying a Restricted Covariance Concentration condition, we show that no randomised polynomial time algorithm can achieve the minimax optimal rate. On the other hand, we also study a (polynomial time) variant of the wellknown semidefinite relaxation estimator, and show that it attains essentially the optimal rate among all randomised polynomial time algorithms.
Approximation Bounds for Sparse Principal Component Analysis
"... High dimensional data sets. n sample points in dimension p, with for some fixed γ> 0. ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
High dimensional data sets. n sample points in dimension p, with for some fixed γ> 0.
Computational Lower Bounds for Sparse PCA
"... Abstract. In the context of sparse principal component detection, we bring evidence towards the existence of a statistical price to pay for computational efficiency. We measure the performance of a test by the smallest signal strength that it can detect and we propose a computationally efficient met ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Abstract. In the context of sparse principal component detection, we bring evidence towards the existence of a statistical price to pay for computational efficiency. We measure the performance of a test by the smallest signal strength that it can detect and we propose a computationally efficient method based on semidefinite programming. We also prove that the statistical performance of this test cannot be strictly improved by any computationally efficient method. Our results can be viewed as complexity theoretic lower bounds conditionally on the assumptions that some instances of the planted clique problem cannot be solved in randomized polynomial time.