Results 1 - 10
of
10
A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers
, 2010
"... ..."
Sparsistent learning of varying-coefficient models with structural changes
- In NIPS
, 2009
"... To estimate the changing structure of a varying-coefficient varying-structure (VCVS) model remains an important and open problem in dynamic system modelling, which includes learning trajectories of stock prices, or uncovering the topology of an evolving gene network. In this paper, we investigate sp ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
To estimate the changing structure of a varying-coefficient varying-structure (VCVS) model remains an important and open problem in dynamic system modelling, which includes learning trajectories of stock prices, or uncovering the topology of an evolving gene network. In this paper, we investigate sparsistent learning of a sub-family of this model — piecewise constant VCVS models. We analyze two main issues in this problem: inferring time points where structural changes occur and estimating model structure (i.e., model selection) on each of the constant segments. We propose a two-stage adaptive procedure, which first identifies jump points of structural changes and then identifies relevant covariates to a response on each of the segments. We provide an asymptotic analysis of the procedure, showing that with the increasing sample size, number of structural changes, and number of variables, the true model can be consistently selected. We demonstrate the performance of the method on synthetic data and apply it to the brain computer interface dataset. We also consider how this applies to structure estimation of time-varying probabilistic graphical models. 1
A Probabilistic and RIPless Theory of Compressed Sensing
, 2010
"... This paper introduces a simple and very general theory of compressive sensing. In this theory, the sensing mechanism simply selects sensing vectors independently at random from a probability distribution F; it includes all models — e.g. Gaussian, frequency measurements — discussed in the literature, ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This paper introduces a simple and very general theory of compressive sensing. In this theory, the sensing mechanism simply selects sensing vectors independently at random from a probability distribution F; it includes all models — e.g. Gaussian, frequency measurements — discussed in the literature, but also provides a framework for new measurement strategies as well. We prove that if the probability distribution F obeys a simple incoherence property and an isotropy property, one can faithfully recover approximately sparse signals from a minimal number of noisy measurements. The novelty is that our recovery results do not require the restricted isometry property (RIP) — they make use of a much weaker notion — or a random model for the signal. As an example, the paper shows that a signal with s nonzero entries can be faithfully recovered from about s log n Fourier coefficients that are contaminated with noise.
High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity
, 2011
"... Although the standard formulations of prediction problems involve fully-observed and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependencies. We study these issues in the context of high-dimensional sparse linear regression, and ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Although the standard formulations of prediction problems involve fully-observed and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependencies. We study these issues in the context of high-dimensional sparse linear regression, and propose novel estimators for the cases of noisy, missing, and/or dependent data. Many standard approaches to noisy or missing data, such as those using the EM algorithm, lead to optimization problems that are inherently non-convex, and it is difficult to establish theoretical guarantees on practical algorithms. While our approach also involves optimizing non-convex programs, we are able to both analyze the statistical error associated with any global optimum, and prove that a simple projected gradient descent algorithm will converge in polynomial time to a small neighborhood of the set of global minimizers. On the statistical side, we provide non-asymptotic bounds that hold with high probability for the cases of noisy, missing, and/or dependent data. On the computational side, we prove that under the same types of conditions required for statistical consistency, the projected gradient descent algorithm will converge at geometric rates to a near-global minimizer. We illustrate these theoretical predictions with simulations, showing agreement with the predicted scalings. 1
Atomic norm denoising with applications to line spectral estimation ∗
, 2012
"... The sub-Nyquist estimation of line spectra is a classical problem in signal processing, but currently popular subspace-based techniques have few guarantees in the presence of noise and rely on a priori knowledge about system model order. Motivated by recent work on atomic norms in inverse problems, ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The sub-Nyquist estimation of line spectra is a classical problem in signal processing, but currently popular subspace-based techniques have few guarantees in the presence of noise and rely on a priori knowledge about system model order. Motivated by recent work on atomic norms in inverse problems, we propose a new approach to line spectral estimation that provides theoretical guarantees for the mean-squared-error performance in the presence of noise and without advance knowledge of the model order. We propose an abstract theory of denoising with atomic norms and specialize this theory to provide a convex optimization problem for estimating the frequencies and phases of a mixture of complex exponentials with guaranteed bounds on the mean-squared error. We show that the associated convex optimization problem, called Atomic norm Soft Thresholding (AST), can be solved in polynomial time via semidefinite programming. For very large scale problems we provide an alternative, efficient algorithm, called Discretized Atomic norm Soft Thresholding (DAST), based on the Fast Fourier Transform that achieves nearly the same error rate as that guaranteed by the semidefinite programming approach. We compare both AST and DAST with Cadzow’s canonical alternating projection algorithm and demonstrate that AST outperforms DAST which outperforms Cadzow in terms of mean-square reconstruction error over a wide range of signal-to-noise ratios. For very large problems DAST is considerably faster than both AST and Cadzow. 1
Journal of Machine Learning Research (2012) Submitted; Published Sparsity Regret Bounds for Individual Sequences in Online Linear Regression ∗
, 2012
"... We consider the problem of online linear regression on arbitrary deterministic sequences when the ambient dimension d can be much larger than the number of time rounds T. We introduce the notion of sparsity regret bound, which is a deterministic online counterpart of recentriskboundsderivedinthestoc ..."
Abstract
- Add to MetaCart
We consider the problem of online linear regression on arbitrary deterministic sequences when the ambient dimension d can be much larger than the number of time rounds T. We introduce the notion of sparsity regret bound, which is a deterministic online counterpart of recentriskboundsderivedinthestochasticsettingunderasparsityscenario. Weprovesuch regret bounds for an online-learning algorithm called SeqSEW and based on exponential weightinganddata-driventruncation. Inasecondpartweapplyaparameter-freeversionof this algorithm to the stochastic setting (regression model with random design). This yields risk bounds of the same flavor as in Dalalyan and Tsybakov (2011) but which solve two questions left open therein. In particular our risk bounds are adaptive (up to a logarithmic factor) to the unknown variance of the noise if the latter is Gaussian. We also address the regression model with fixed design.
Submitted to the Annals of Statistics SUPPLEMENTARY MATERIAL: FAST GLOBAL CONVERGENCE OF GRADIENT METHODS FOR HIGH-DIMENSIONAL STATISTICAL RECOVERY
"... In this supplement, we provide the proofs of our main results and their corollaries in Section 7. The more technical arguments are deferred to the appendices. 7. Proofs. Recall that we use ̂ ∆ t: = θ t − ̂ θ to denote the optimization error, and ∆ ∗ = ̂ θ −θ ∗ to denote the statistical error. F ..."
Abstract
- Add to MetaCart
In this supplement, we provide the proofs of our main results and their corollaries in Section 7. The more technical arguments are deferred to the appendices. 7. Proofs. Recall that we use ̂ ∆ t: = θ t − ̂ θ to denote the optimization error, and ∆ ∗ = ̂ θ −θ ∗ to denote the statistical error. For future reference, we point out a slight weakening of restricted strong convexity (RSC), useful for obtaining parts of our results. As the proofs to follow reveal, it is only necessary to enforce an RSC condition of the form (47) TL(θ t; ̂ θ) ≥ γℓ 2 ‖θt − ̂ θ ‖ 2 −τℓ(Ln) R 2 (θ t − ̂ θ)−δ 2, which is milder than the original RSC condition (8), in that it applies only to differences of the form θ t − ̂ θ, and allows for additional slack δ. We make use of this refined notion in the proofs of various results to follow. With this relaxed RSC condition and the same RSM condition as before, our proof shows that (48) ‖θ t+1 − ̂ θ ‖ 2 ≤ κ t ‖θ 0 − ̂ θ ‖ 2 + ǫ2 ( ∆ ∗;M,M)+2δ 2 /γu
Submitted to the Annals of Statistics FAST GLOBAL CONVERGENCE OF GRADIENT METHODS FOR HIGH-DIMENSIONAL STATISTICAL RECOVERY
"... Many statistical M-estimators are based on convex optimization problems formed by the combination of a data-dependent loss function with a norm-based regularizer. We analyze the convergence rates of projected gradient and composite gradient methods for solving such problems, working within a high-di ..."
Abstract
- Add to MetaCart
Many statistical M-estimators are based on convex optimization problems formed by the combination of a data-dependent loss function with a norm-based regularizer. We analyze the convergence rates of projected gradient and composite gradient methods for solving such problems, working within a high-dimensional framework that allows the ambient dimension d to grow with (and possibly exceed) the sample size n. Our theory identifies conditions under which projected gradient descent enjoys globally linear convergence up to the statistical precision of the model, meaning the typical distance between the true unknown parameter θ ∗ and an optimal solution ̂ θ. By establishing these conditions with high probability for numerous statistical models, our analysis applies to a wide range of M-estimators, including sparse linear regression using Lasso; group Lasso for block sparsity; log-linear models with regularization; low-rank matrix recovery using nuclear norm regularization; and matrix decomposition

