Results 1  10
of
14
Near optimal compressed sensing of sparse rankone matrices via sparse power factorization. arXiv preprint arXiv:1312.0525
, 2013
"... ar ..."
(Show Context)
Statistical and computational tradeoffs in estimation of sparse principal components
, 2014
"... In recent years, Sparse Principal Component Analysis has emerged as an extremely popular dimension reduction technique for highdimensional data. The theoretical challenge, in the simplest case, is to estimate the leading eigenvector of a population covariance matrix under the assumption that this e ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
In recent years, Sparse Principal Component Analysis has emerged as an extremely popular dimension reduction technique for highdimensional data. The theoretical challenge, in the simplest case, is to estimate the leading eigenvector of a population covariance matrix under the assumption that this eigenvector is sparse. An impressive range of estimators have been proposed; some of these are fast to compute, while others are known to achieve the minimax optimal rate over certain Gaussian or subgaussian classes. In this paper we show that, under a widelybelieved assumption from computational complexity theory, there is a fundamental tradeoff between statistical and computational performance in this problem. More precisely, working with new, larger classes satisfying a Restricted Covariance Concentration condition, we show that no randomised polynomial time algorithm can achieve the minimax optimal rate. On the other hand, we also study a (polynomial time) variant of the wellknown semidefinite relaxation estimator, and show that it attains essentially the optimal rate among all randomised polynomial time algorithms.
Evaluating the statistical significance of biclusters
"... Abstract Biclustering (also known as submatrix localization) is a problem of high practical relevance in exploratory analysis of highdimensional data. We develop a framework for performing statistical inference on biclusters found by scorebased algorithms. Since the bicluster was selected in a da ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Biclustering (also known as submatrix localization) is a problem of high practical relevance in exploratory analysis of highdimensional data. We develop a framework for performing statistical inference on biclusters found by scorebased algorithms. Since the bicluster was selected in a data dependent manner by a biclustering or localization algorithm, this is a form of selective inference. Our framework gives exact (nonasymptotic) confidence intervals and pvalues for the significance of the selected biclusters.
StatisticalComputational Phase Transitions in Planted Models: The HighDimensional Setting
"... Abstract The planted models assume that a graph is generated from some unknown clusters by randomly placing edges between nodes according to their cluster memberships; the task is to recover the clusters given the graph. Special cases include planted clique, planted partition, planted densest subgr ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract The planted models assume that a graph is generated from some unknown clusters by randomly placing edges between nodes according to their cluster memberships; the task is to recover the clusters given the graph. Special cases include planted clique, planted partition, planted densest subgraph and planted coloring. Of particular interest is the highdimensional setting where the number of clusters is allowed to grow with the number of nodes. We show that the space of model parameters can be partitioned into four disjoint regions: (1) the impossible regime, where all algorithms fail; (2) the hard regime, where the computationally intractable Maximum Likelihood Estimator (MLE) succeeds, and no polynomialtime method is known; (3) the easy regime, where the polynomialtime convexified MLE succeeds; (4) the simple regime, where a simple counting/thresholding procedure succeeds. Moreover, each of these algorithms provably fails in the previous harder regimes. Our theorems establish the first minimax recovery results for the highdimensional setting, and provide the best known guarantees for polynomialtime algorithms. These results demonstrate the tradeoffs between statistical and computational considerations.
ISIT 2015 Tutorial: Information Theory and Machine Learning
"... Abstract We are in the midst of a data deluge, with an explosion in the volume and richness of data sets in fields including social networks, biology, natural language processing, and computer vision, among others. In all of these areas, machine learning has been extraordinarily successful in provi ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We are in the midst of a data deluge, with an explosion in the volume and richness of data sets in fields including social networks, biology, natural language processing, and computer vision, among others. In all of these areas, machine learning has been extraordinarily successful in providing tools and practical algorithms for extracting information from massive data sets (e.g., genetics, multispectral imaging, Google and FaceBook). Despite this tremendous practical success, relatively less attention has been paid to fundamental limits and tradeoffs, and information theory has a crucial role to play in this context. The goal of this tutorial is to demonstrate how informationtheoretic techniques and concepts can be brought to bear on machine learning problems in unorthodox and fruitful ways. We discuss how any learning problem can be formalized in a Shannontheoretic sense, albeit one that involves nontraditional notions of codewords and channels. This perspective allows informationtheoretic toolsincluding information measures, Fano's inequality, random coding arguments, and so onto be brought to bear on learning problems. We illustrate this broad perspective with discussions of several learning problems, including sparse approximation, dimensionality reduction, graph recovery, clustering, and community detection. We emphasise recent results establishing the fundamental limits of graphical model learning and community detection. We also discuss the distinction between the learningtheoretic capacity when arbitrary "decoding" algorithms are allowed, and notions of computationallyconstrained capacity. Finally, a number of open problems and conjectures at the interface of information theory and machine learning will be discussed.
Comment on “Hypothesis testing by convex optimization”
, 2015
"... With the growing size of problems at hand, convexity has become preponderant in modern statistics. Indeed, convex relaxations of NPhard problems have been successfully employed in a variety of statistical problems such as classification [2, 16], linear regression [7, 5], matrix estimation [8, 12], ..."
Abstract
 Add to MetaCart
(Show Context)
With the growing size of problems at hand, convexity has become preponderant in modern statistics. Indeed, convex relaxations of NPhard problems have been successfully employed in a variety of statistical problems such as classification [2, 16], linear regression [7, 5], matrix estimation [8, 12], graphical models [15, 9] or sparse principal component analysis (PCA) [10, 4]. The paper “Hypothesis testing by convex optimization” by Alexander Goldenshluger, Anatoli Juditsky and Arkadi Nemirovski, hereafter denoted by GJN, brings a new perspective on the role of convexity in a fundamental statistical problem: composite hypothesis testing. The role of this problem is illustrated in the light of several interesting applications in Section 4 of GJN. One of the key insights in GJN is that there exists a pair of distributions, one in each of the composite hypotheses and on which the statistician should focus her efforts. Indeed, Theorem 2.1(ii) guarantees that any test that is optimal for this simple hypothesis problem is also near optimal for the composite hypothesis problem. Moreover, this pair can be found by solving a convex optimization
UrbanaChampaign
"... In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure. In this paper, we consider a class of binary matrices, arising in many applications, which exhibit both row and column cluster struc ..."
Abstract
 Add to MetaCart
(Show Context)
In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure. In this paper, we consider a class of binary matrices, arising in many applications, which exhibit both row and column cluster structure, and our goal is to exactly recover the underlying row and column clusters by observing only a small fraction of noisy entries. We first derive a lower bound on the minimum number of observations needed for exact cluster recovery. Then, we study three algorithms with different running time and compare the number of observations needed by them for successful cluster recovery. Our analytical results show smooth timedata tradeoffs: one can gradually reduce the computational complexity when increasingly more observations are available. Categories and Subject Descriptors I.5.3 [PATTERN RECOGNITION]: Clustering—Algo
Resource Allocation for Statistical Estimation
, 2014
"... Statistical estimation in many contemporary settings involves the acquisition, analysis, and aggregation of datasets from multiple sources, which can have significant differences in character and in value. Due to these variations, the effectiveness of employing a given resource – e.g., a sensing dev ..."
Abstract
 Add to MetaCart
Statistical estimation in many contemporary settings involves the acquisition, analysis, and aggregation of datasets from multiple sources, which can have significant differences in character and in value. Due to these variations, the effectiveness of employing a given resource – e.g., a sensing device or computing power – for gathering or processing data from a particular source depends on the nature of that source. As a result, the appropriate division and assignment of a collection of resources to a set of data sources can substantially impact the overall performance of an inferential strategy. In this expository article, we adopt a general view of the notion of a resource and its effect on the quality of a data source, and we describe a framework for the allocation of a given set of resources to a collection of sources in order to optimize a specified metric of statistical efficiency. We discuss several stylized examples involving inferential tasks such as parameter estimation and hypothesis testing based on heterogeneous data sources, in which optimal allocations can be computed either in closed form or via efficient numerical procedures based on convex optimization.