Results 1  10
of
84
A unified framework for highdimensional analysis of Mestimators with decomposable regularizers
"... ..."
Estimation of (near) lowrank matrices with noise and highdimensional scaling
"... We study an instance of highdimensional statistical inference in which the goal is to use N noisy observations to estimate a matrix Θ ∗ ∈ R k×p that is assumed to be either exactly low rank, or “near ” lowrank, meaning that it can be wellapproximated by a matrix with low rank. We consider an Me ..."
Abstract

Cited by 103 (19 self)
 Add to MetaCart
We study an instance of highdimensional statistical inference in which the goal is to use N noisy observations to estimate a matrix Θ ∗ ∈ R k×p that is assumed to be either exactly low rank, or “near ” lowrank, meaning that it can be wellapproximated by a matrix with low rank. We consider an Mestimator based on regularization by the traceornuclearnormovermatrices, andanalyze its performance under highdimensional scaling. We provide nonasymptotic bounds on the Frobenius norm error that hold for a generalclassofnoisyobservationmodels,and apply to both exactly lowrank and approximately lowrank matrices. We then illustrate their consequences for a number of specific learning models, including lowrank multivariate or multitask regression, system identification in vector autoregressive processes, and recovery of lowrank matrices from random projections. Simulations show excellent agreement with the highdimensional scaling of the error predicted by our theory. 1.
Noisy matrix decomposition via convexrelaxation: Optimal rates in high dimensions
 Annals of Statistics,40(2):1171
"... We analyze a class of estimators based on convex relaxation for solving highdimensional matrix decomposition problems. The observations are noisy realizations of a linear transformation X of the sum of an (approximately) low rank matrix � ⋆ with a second matrix Ɣ ⋆ endowed with a complementary for ..."
Abstract

Cited by 63 (11 self)
 Add to MetaCart
We analyze a class of estimators based on convex relaxation for solving highdimensional matrix decomposition problems. The observations are noisy realizations of a linear transformation X of the sum of an (approximately) low rank matrix � ⋆ with a second matrix Ɣ ⋆ endowed with a complementary form of lowdimensional structure; this setup includes many statistical models of interest, including factor analysis, multitask regression and robust covariance estimation. We derive a general theorem that bounds the Frobenius norm error for an estimate of the pair ( � ⋆,Ɣ ⋆ ) obtained by solving a convex optimization problem that combines the nuclear norm with a general decomposable regularizer. Our results use a “spikiness ” condition that is related to, but milder than, singular vector incoherence. We specialize our general result to two cases that have been studied in past work: low rank plus an entrywise sparse matrix, and low rank plus a columnwise sparse matrix. For both models, our theory yields nonasymptotic Frobenius error bounds for both deterministic and stochastic noise matrices, and applies to matrices � ⋆ that can be exactly or approximately low rank, and matrices Ɣ ⋆ that can be exactly or approximately sparse. Moreover, for the case of stochastic noise matrices and the identity observation operator, we establish matching lower bounds on the minimax error. The sharpness of our nonasymptotic predictions is confirmed by numerical simulations. 1. Introduction. The
FAST GLOBAL CONVERGENCE OF GRADIENT METHODS FOR HIGHDIMENSIONAL STATISTICAL RECOVERY
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2012
"... Many statistical Mestimators are based on convex optimization problems formed by the combination of a datadependent loss function with a normbased regularizer. We analyze the convergence rates of projected gradient and composite gradient methods for solving such problems, working within a highdi ..."
Abstract

Cited by 36 (5 self)
 Add to MetaCart
Many statistical Mestimators are based on convex optimization problems formed by the combination of a datadependent loss function with a normbased regularizer. We analyze the convergence rates of projected gradient and composite gradient methods for solving such problems, working within a highdimensional framework that allows the ambient dimension d to grow with (and possibly exceed) the sample size n. Our theory identifies conditions under which projected gradient descent enjoys globally linear convergence up to the statistical precision of the model, meaning the typical distance between the true unknown parameter θ ∗ and an optimal solution ̂ θ. By establishing these conditions with high probability for numerous statistical models, our analysis applies to a wide range of Mestimators, including sparse linear regression using Lasso; group Lasso for block sparsity; loglinear models with regularization; lowrank matrix recovery using nuclear norm regularization; and matrix decomposition
Statistical Performance of Convex Tensor Decomposition
"... We analyze the statistical performance of a recently proposed convex tensor decomposition algorithm. Conventionally tensor decomposition has been formulated as nonconvex optimization problems, which hindered the analysis of their performance. We show under some conditions that the mean squared erro ..."
Abstract

Cited by 35 (5 self)
 Add to MetaCart
(Show Context)
We analyze the statistical performance of a recently proposed convex tensor decomposition algorithm. Conventionally tensor decomposition has been formulated as nonconvex optimization problems, which hindered the analysis of their performance. We show under some conditions that the mean squared error of the convex method scales linearly with the quantity we call the normalized rank of the true tensor. The current analysis naturally extends the analysis of convex lowrank matrix estimation to tensors. Furthermore, we show through numerical experiments that our theory can precisely predict the scaling behaviour in practice. 1
Iterative ranking from pairwise comparisons
 Advances in Neural Information Processing Systems 25 (NIPS
, 2012
"... The question of aggregating pairwise comparisons to obtain a global ranking over a collection of objects has been of interest for a very long time: be it ranking of online gamers (e.g. MSR’s TrueSkill system) and chess players, aggregating social opinions, or deciding which product to sell based on ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
(Show Context)
The question of aggregating pairwise comparisons to obtain a global ranking over a collection of objects has been of interest for a very long time: be it ranking of online gamers (e.g. MSR’s TrueSkill system) and chess players, aggregating social opinions, or deciding which product to sell based on transactions. In most settings, in addition to obtaining ranking, finding ‘scores ’ for each object (e.g. player’s rating) is of interest to understanding the intensity of the preferences. In this paper, we propose a novel iterative rank aggregation algorithm for discovering scores for objects from pairwise comparisons. The algorithm has a natural random walk interpretation over the graph of objects with edges present between two objects if they are compared; the scores turn out to be the stationary probability of this random walk. The algorithm is model independent. To establish the efficacy of our method, however, we consider the popular BradleyTerryLuce (BTL) model in which each object has an associated score which determines the probabilistic outcomes of pairwise comparisons between objects. We bound the finite sample error rates between the scores assumed by the BTL model and those estimated by our algorithm. This, in essence, leads to orderoptimal dependence on the number of samples required to learn the scores well by our algorithm. Indeed, the experimental evaluation shows that our (model independent) algorithm performs as well as the Maximum Likelihood Estimator of the BTL model and outperforms a recently proposed algorithm by Ammar and Shah [1]. 1
Concentrationbased guarantees for lowrank matrix reconstruction
 24th Annual Conference on Learning Theory (COLT
, 2011
"... We consider the problem of approximately reconstructing a partiallyobserved, approximately lowrank matrix. This problem has received much attention lately, mostly using the tracenorm as a surrogate to the rank. Here we study lowrank matrix reconstruction using both the tracenorm, as well as the ..."
Abstract

Cited by 21 (6 self)
 Add to MetaCart
We consider the problem of approximately reconstructing a partiallyobserved, approximately lowrank matrix. This problem has received much attention lately, mostly using the tracenorm as a surrogate to the rank. Here we study lowrank matrix reconstruction using both the tracenorm, as well as the lessstudied maxnorm, and present reconstruction guarantees based on existing analysis on the Rademacher complexity of the unit balls of these norms. We show how these are superior in several ways to recently published guarantees based on specialized analysis.
Universal lowrank matrix recovery from Pauli measurements
"... We study the problem of reconstructing an unknown matrix M of rank r and dimension d using O(rd poly log d) Pauli measurements. This has applications in quantum state tomography, and is a noncommutative analogue of a wellknown problem in compressed sensing: recovering a sparse vector from a few of ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
(Show Context)
We study the problem of reconstructing an unknown matrix M of rank r and dimension d using O(rd poly log d) Pauli measurements. This has applications in quantum state tomography, and is a noncommutative analogue of a wellknown problem in compressed sensing: recovering a sparse vector from a few of its Fourier coefficients. We show that almost all sets of O(rd log 6 d) Pauli measurements satisfy the rankr restricted isometry property (RIP). This implies that M can be recovered from a fixed (“universal”) set of Pauli measurements, using nuclearnorm minimization (e.g., the matrix Lasso), with nearlyoptimal bounds on the error. A similar result holds for any class of measurements that use an orthonormal operator basis whose elements have small operator norm. Our proof uses Dudley’s inequality for Gaussian processes, together with bounds on covering numbers obtained via entropy duality. 1
Learning with the Weighted Tracenorm under Arbitrary Sampling Distributions
"... We provide rigorous guarantees on learning with the weighted tracenorm under arbitrary sampling distributions. We show that the standard weightedtrace norm might fail when the sampling distribution is not a product distribution (i.e. when row and column indexes are not selected independently), pre ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
(Show Context)
We provide rigorous guarantees on learning with the weighted tracenorm under arbitrary sampling distributions. We show that the standard weightedtrace norm might fail when the sampling distribution is not a product distribution (i.e. when row and column indexes are not selected independently), present a corrected variant for which we establish strong learning guarantees, and demonstrate that it works better in practice. We provide guarantees when weighting by either the true or empirical sampling distribution, and suggest that even if the true distribution is known (or is uniform), weighting by the empirical distribution may be beneficial. 1
Spectral Compressed Sensing via Structured Matrix Completion
"... The paper studies the problem of recovering a spectrally sparse object from a small number of time domain samples. Specifically, the object of interest with ambient dimension n is assumed to be a mixture of r complex multidimensional sinusoids, while the underlying frequencies can assume any value ..."
Abstract

Cited by 18 (6 self)
 Add to MetaCart
(Show Context)
The paper studies the problem of recovering a spectrally sparse object from a small number of time domain samples. Specifically, the object of interest with ambient dimension n is assumed to be a mixture of r complex multidimensional sinusoids, while the underlying frequencies can assume any value in the unit disk. Conventional compressed sensing paradigms suffer from the basis mismatch issue when imposing a discrete dictionary on the Fourier representation. To address this problem, we develop a novel nonparametric algorithm, called enhanced matrix completion (EMaC), based on structured matrix completion. The algorithm starts by arranging the data into a lowrank enhanced form with multifold Hankel structure, then attempts recovery via nuclear norm minimization. Under mild incoherence conditions, EMaC allows perfect recovery as soon as the number of samples exceeds the order of O(rlog 2 n). We also show that, in many instances, accurate completion of a lowrank multifold Hankel matrix is possible when the number of observed entries is proportional to the information theoretical limits (except for a logarithmic gap). The robustness of EMaC against bounded noise and its applicability to super resolution are further demonstrated by numerical experiments. 1.