Results 1  10
of
96
A unified framework for highdimensional analysis of Mestimators with decomposable regularizers
"... ..."
Nuclear norm penalization and optimal rates for noisy low rank matrix completion.
 Annals of Statistics,
, 2011
"... AbstractThis paper deals with the trace regression model where n entries or linear combinations of entries of an unknown m1 × m2 matrix A0 corrupted by noise are observed. We propose a new nuclear norm penalized estimator of A0 and establish a general sharp oracle inequality for this estimator for ..."
Abstract

Cited by 107 (7 self)
 Add to MetaCart
(Show Context)
AbstractThis paper deals with the trace regression model where n entries or linear combinations of entries of an unknown m1 × m2 matrix A0 corrupted by noise are observed. We propose a new nuclear norm penalized estimator of A0 and establish a general sharp oracle inequality for this estimator for arbitrary values of n, m1, m2 under the condition of isometry in expectation. Then this method is applied to the matrix completion problem. In this case, the estimator admits a simple explicit form and we prove that it satisfies oracle inequalities with faster rates of convergence than in the previous works. They are valid, in particular, in the highdimensional setting m1m2 n. We show that the obtained rates are optimal up to logarithmic factors in a minimax sense and also derive, for any fixed matrix A0, a nonminimax lower bound on the rate of convergence of our estimator, which coincides with the upper bound up to a constant factor. Finally, we show that our procedure provides an exact recovery of the rank of A0 with probability close to 1. We also discuss the statistical learning setting where there is no underlying model determined by A0 and the aim is to find the best trace regression model approximating the data.
Restricted strong convexity and weighted matrix completion: Optimal bounds with noise
, 2012
"... We consider the matrix completion problem under a form of row/column weighted entrywise sampling, including the case of uniform entrywise sampling as a special case. We analyze the associated random observation operator, and prove that with high probability, it satisfies a form of restricted strong ..."
Abstract

Cited by 84 (10 self)
 Add to MetaCart
We consider the matrix completion problem under a form of row/column weighted entrywise sampling, including the case of uniform entrywise sampling as a special case. We analyze the associated random observation operator, and prove that with high probability, it satisfies a form of restricted strong convexity with respect to weighted Frobenius norm. Using this property, we obtain as corollaries a number of error bounds on matrix completion in the weighted Frobenius norm under noisy sampling and for both exact and near lowrank matrices. Our results are based on measures of the “spikiness” and “lowrankness” of matrices that are less restrictive than the incoherence conditions imposed in previous work. Our technique involves an Mestimator that includes controls on both the rank and spikiness of the solution, and we establish nonasymptotic error bounds in weighted Frobenius norm for recovering matrices lying with ℓq“balls ” of bounded spikiness. Using informationtheoretic methods, we show that no algorithm can achieve better estimates (up to a logarithmic factor) over these same sets, showing that our conditions on matrices and associated rates are essentially optimal.
Highdimensional regression with noisy and missing data: Provable guarantees with nonconvexity
, 2011
"... Although the standard formulations of prediction problems involve fullyobserved and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependencies. We study these issues in the context of highdimensional sparse linear regression, and ..."
Abstract

Cited by 75 (10 self)
 Add to MetaCart
(Show Context)
Although the standard formulations of prediction problems involve fullyobserved and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependencies. We study these issues in the context of highdimensional sparse linear regression, and propose novel estimators for the cases of noisy, missing, and/or dependent data. Many standard approaches to noisy or missing data, such as those using the EM algorithm, lead to optimization problems that are inherently nonconvex, and it is difficult to establish theoretical guarantees on practical algorithms. While our approach also involves optimizing nonconvex programs, we are able to both analyze the statistical error associated with any global optimum, and prove that a simple projected gradient descent algorithm will converge in polynomial time to a small neighborhood of the set of global minimizers. On the statistical side, we provide nonasymptotic bounds that hold with high probability for the cases of noisy, missing, and/or dependent data. On the computational side, we prove that under the same types of conditions required for statistical consistency, the projected gradient descent algorithm will converge at geometric rates to a nearglobal minimizer. We illustrate these theoretical predictions with simulations, showing agreement with the predicted scalings. 1
A Dirty Model for Multitask Learning
 In NIPS
, 2010
"... We consider multitask learning in the setting of multiple linear regression, and where some relevant features could be shared across the tasks. Recent research has studied the use ofℓ1/ℓq norm blockregularizations withq> 1 for such blocksparse structured problems, establishing strong guarantees ..."
Abstract

Cited by 67 (2 self)
 Add to MetaCart
We consider multitask learning in the setting of multiple linear regression, and where some relevant features could be shared across the tasks. Recent research has studied the use ofℓ1/ℓq norm blockregularizations withq> 1 for such blocksparse structured problems, establishing strong guarantees on recovery even under highdimensional scaling where the number of features scale with the number of observations. However, these papers also caution that the performance of such blockregularized methods are very dependent on the extent to which the features are shared across tasks. Indeed they show [8] that if the extent of overlap is less than a threshold, or even if parameter values in the shared features are highly uneven, then block ℓ1/ℓq regularization could actually perform worse than simple separate elementwise ℓ1 regularization. Since these caveats depend on the unknown true parameters, we might not know when and which method to apply. Even otherwise, we are far away from a realistic multitask setting: not only do the set of relevant features have to be exactly the same across tasks, but their values
Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions
 ANNALS OF STATISTICS,40(2):1171
, 2013
"... We analyze a class of estimators based on convex relaxation for solving highdimensional matrix decomposition problems. The observations are noisy realizations of a linear transformation X of the sum of an (approximately) low rank matrix � ⋆ with a second matrix Ɣ ⋆ endowed with a complementary for ..."
Abstract

Cited by 61 (8 self)
 Add to MetaCart
We analyze a class of estimators based on convex relaxation for solving highdimensional matrix decomposition problems. The observations are noisy realizations of a linear transformation X of the sum of an (approximately) low rank matrix � ⋆ with a second matrix Ɣ ⋆ endowed with a complementary form of lowdimensional structure; this setup includes many statistical models of interest, including factor analysis, multitask regression and robust covariance estimation. We derive a general theorem that bounds the Frobenius norm error for an estimate of the pair ( � ⋆,Ɣ ⋆ ) obtained by solving a convex optimization problem that combines the nuclear norm with a general decomposable regularizer. Our results use a “spikiness ” condition that is related to, but milder than, singular vector incoherence. We specialize our general result to two cases that have been studied in past work: low rank plus an entrywise sparse matrix, and low rank plus a columnwise sparse matrix. For both models, our theory yields nonasymptotic Frobenius error bounds for both deterministic and stochastic noise matrices, and applies to matrices � ⋆ that can be exactly or approximately low rank, and matrices Ɣ ⋆ that can be exactly or approximately sparse. Moreover, for the case of stochastic noise matrices and the identity observation operator, we establish matching lower bounds on the minimax error. The sharpness of our nonasymptotic predictions is confirmed by numerical simulations.
Graphical Models Concepts in Compressed Sensing
"... This paper surveys recent work in applying ideas from graphical models and message passing algorithms to solve large scale regularized regression problems. In particular, the focus is on compressed sensing reconstruction via ℓ1 penalized leastsquares (known as LASSO or BPDN). We discuss how to deri ..."
Abstract

Cited by 37 (2 self)
 Add to MetaCart
This paper surveys recent work in applying ideas from graphical models and message passing algorithms to solve large scale regularized regression problems. In particular, the focus is on compressed sensing reconstruction via ℓ1 penalized leastsquares (known as LASSO or BPDN). We discuss how to derive fast approximate message passing algorithms to solve this problem. Surprisingly, the analysis of such algorithms allows to prove exact highdimensional limit results for the LASSO risk. This paper will appear as a chapter in a book on ‘Compressed Sensing ’ edited by Yonina Eldar and Gitta Kutynok. 1
Statistical Performance of Convex Tensor Decomposition
"... We analyze the statistical performance of a recently proposed convex tensor decomposition algorithm. Conventionally tensor decomposition has been formulated as nonconvex optimization problems, which hindered the analysis of their performance. We show under some conditions that the mean squared erro ..."
Abstract

Cited by 36 (5 self)
 Add to MetaCart
(Show Context)
We analyze the statistical performance of a recently proposed convex tensor decomposition algorithm. Conventionally tensor decomposition has been formulated as nonconvex optimization problems, which hindered the analysis of their performance. We show under some conditions that the mean squared error of the convex method scales linearly with the quantity we call the normalized rank of the true tensor. The current analysis naturally extends the analysis of convex lowrank matrix estimation to tensors. Furthermore, we show through numerical experiments that our theory can precisely predict the scaling behaviour in practice. 1