Results 1  10
of
21
Complexity theoretic lower bounds for sparse principal component detection
 In COLT 2013 – The 26th Conference on Learning Theory
, 2013
"... In the context of sparse principal component detection, we bring evidence towards the existence of a statistical price to pay for computational efficiency. We measure the performance of a test by the smallest signal strength that it can detect and we propose a computationally efficient method based ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
In the context of sparse principal component detection, we bring evidence towards the existence of a statistical price to pay for computational efficiency. We measure the performance of a test by the smallest signal strength that it can detect and we propose a computationally efficient method based on semidefinite programming. We also prove that the statistical performance of this test cannot be strictly improved by any computationally efficient method. Our results can be viewed as complexity theoretic lower bounds conditionally on the assumptions that some instances of the planted clique problem cannot be solved in randomized polynomial time.
Statistical Algorithms and a Lower Bound for Detecting Planted Cliques
"... We introduce a framework for proving lower bounds on computational problems over distributions, based on defining a restricted class of algorithms called statistical algorithms. For such algorithms, access to the input distribution is limited to obtaining an estimate of the expectation of any given ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
We introduce a framework for proving lower bounds on computational problems over distributions, based on defining a restricted class of algorithms called statistical algorithms. For such algorithms, access to the input distribution is limited to obtaining an estimate of the expectation of any given function on a sample drawn randomly from the input distribution, rather than directly accessing samples. Our definition captures most natural algorithms of interest in theory and in practice, e.g., momentsbased methods, local search, standard iterative methods for convex optimization, MCMC and simulated annealing. Our definition and techniques are inspired by and generalize the statistical query model in learning theory [35]. For wellknown problems over distributions, we give lower bounds on the complexity of any statistical algorithm. These include an exponential lower bounds for moment maximization in R n, and a nearly optimal lower bound for detecting planted bipartite clique distributions (or planted dense subgraph distributions) when the planted clique has size O(n1/2−δ) for any constant δ> 0. Variants of the latter have been assumed to be hard to prove hardness for other problems and for cryptographic applications. Our lower bounds provide concrete evidence
Energy landscape for large average submatrix detection problems in Gaussian random matrices. arXiv preprint arXiv:1211.2284
, 2012
"... Abstract. The problem of finding large average submatrices of a realvalued matrix arises in the exploratory analysis of data from a variety of disciplines, ranging from genomics to social sciences. In this paper we provide a detailed asymptotic analysis of large average submatrices of an n × n Gau ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
Abstract. The problem of finding large average submatrices of a realvalued matrix arises in the exploratory analysis of data from a variety of disciplines, ranging from genomics to social sciences. In this paper we provide a detailed asymptotic analysis of large average submatrices of an n × n Gaussian random matrix. The first part of the paper addresses global maxima. For fixed k we identify the average and the joint distribution of the k×k submatrix having largest average value. As a dual result, we establish that the size of the largest square submatrix with average bigger than a fixed positive constant is, with high probability, equal to one of two consecutive integers that depend on the threshold and the matrix dimension n. The second part of the paper addresses local maxima. Specifically we consider submatrices with dominant row and column sums that arise as the local optima of iterative search procedures for large average submatrices. For fixed k, we identify the limiting average value and joint distribution of a k × k submatrix conditioned to be a local maxima. In order to understand the density of such local optima and explain the quick convergence of such iterative procedures, we analyze the number Ln(k) of local maxima, beginning with exact asymptotic expressions for the mean and fluctuation behavior of Ln(k). For fixed k, the mean of Ln(k) is Θ(nk/(logn)(k−1)/2) while the standard deviation is Θ(n2k
Community Detection in Sparse Random Networks
, 2013
"... We consider the problem of detecting a tight community in a sparse random network. This is formalized as testing for the existence of a dense random subgraph in a random graph. Under the null hypothesis, the graph is a realization of an ErdösRényi graph on N vertices and with connection probability ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
We consider the problem of detecting a tight community in a sparse random network. This is formalized as testing for the existence of a dense random subgraph in a random graph. Under the null hypothesis, the graph is a realization of an ErdösRényi graph on N vertices and with connection probability p0; under the alternative, there is an unknown subgraph on n vertices where the connection probability is p1> p0. In (AriasCastro and Verzelen, 2012), we focused on the asymptotically dense regime where p0 is large enough that log(1 ∨ (np0) −1) = o(log(N/n)). We consider here the asymptotically sparse regime where p0 is small enough that log(N/n) = O(log(1 ∨ (np0) −1)). As before, we derive information theoretic lower bounds, and also establish the performance of various tests. Compared to our previous work (AriasCastro and Verzelen, 2012), the arguments for the lower bounds are based on the same technology, but are substantially more technical in the details; also, the methods we study are different: besides a variant of the scan statistic, we study other statistics such as the size of the largest connected component, the number of triangles, the eigengap of the adjacency matrix, etc. Our detection bounds are sharp, except in the Poisson regime where we were not able to fully characterize the constant arising in the bound.
Community Detection in Random Networks
, 2013
"... We formalize the problem of detecting a community in a network into testing whether in a given (random) graph there is a subgraph that is unusually dense. We observe an undirected and unweighted graph on N nodes. Under the null hypothesis, the graph is a realization of an ErdösRényi graph with prob ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
We formalize the problem of detecting a community in a network into testing whether in a given (random) graph there is a subgraph that is unusually dense. We observe an undirected and unweighted graph on N nodes. Under the null hypothesis, the graph is a realization of an ErdösRényi graph with probability p0. Under the (composite) alternative, there is a subgraph of n nodes where the probability of connection is p1> p0. We derive a detection lower bound for detecting such a subgraph in terms of N,n,p0,p1 and exhibit a test that achieves that lower bound. We do this both when p0 is known and unknown. We also consider the problem of testing in polynomialtime. As an aside, we consider the problem of detecting a clique, which is intimately related to the planted clique problem. Our focus in this paper is in the quasinormal regime where np0 is either bounded away from zero, or tends to zero slowly.
On the Hardness of Signaling
, 2014
"... There has been a recent surge of interest in the role of information in strategic interactions. Much of this work seeks to understand how the realized equilibrium of a game is influenced by uncertainty in the environment and the information available to players in the game. Lurking beneath this lite ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
There has been a recent surge of interest in the role of information in strategic interactions. Much of this work seeks to understand how the realized equilibrium of a game is influenced by uncertainty in the environment and the information available to players in the game. Lurking beneath this literature is a fundamental, yet largely unexplored, algorithmic question: how should a “market maker ” who is privy to additional information, and equipped with a specified objective, inform the players in the game? This is an informational analogue of the mechanism design question, and views the information structure of a game as a mathematical object to be designed, rather than an exogenous variable. We initiate a complexitytheoretic examination of the design of optimal information structures in general Bayesian games, a task often referred to as signaling. We focus on one of the simplest instantiations of the signaling question: Bayesian zerosum games, and a principal who must choose an information structure maximizing the equilibrium payoff of one of the players. In this setting, we show that optimal signaling is computationally intractable, and in some cases hard to approximate, assuming that it is hard to recover a planted clique from an ErdősRényi random graph. This is despite the fact that equilibria in these games are computable in polynomial time, and therefore suggests that the hardness of optimal signaling is a distinct phenomenon from the hardness of equilibrium computation. Necessitated by the nonlocal nature of information structures, enroute to our results we prove an “amplification lemma ” for the planted clique problemwhichmay be of independent interest. Specifically, we show that even if we plant many cliques in an ErdősRényi random graph, so much so that most nodes in the graph are in some planted clique, recovering a constant fraction of the planted cliques is no easier than the traditional planted clique problem. 1
Robust convex relaxation for the planted clique and densest ksubgraph problems: additional proofs.
, 2013
"... Abstract We consider the problem of identifying the densest knode subgraph in a given graph. We write this problem as an instance of rankconstrained cardinality minimization and then relax using the nuclear and 1 norms. Although the original combinatorial problem is NPhard, we show that the dens ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract We consider the problem of identifying the densest knode subgraph in a given graph. We write this problem as an instance of rankconstrained cardinality minimization and then relax using the nuclear and 1 norms. Although the original combinatorial problem is NPhard, we show that the densest ksubgraph can be recovered from the solution of our convex relaxation for certain program inputs. In particular, we establish exact recovery in the case that the input graph contains a single planted clique plus noise in the form of corrupted adjacency relationships. We consider two constructions for this noise. In the first, noise is introduced by an adversary deterministically deleting edges within the planted clique and placing diversionary edges. In the second, these edge corruptions are performed at random. Analogous recovery guarantees for identifying the densest subgraph of fixed size in a bipartite graph are also established, and results of numerical simulations for randomly generated graphs are included to demonstrate the efficacy of our algorithm.