Results 1  10
of
23
Sparse Inverse Covariance Matrix Estimation Using Quadratic Approximation
"... The ℓ1 regularized Gaussian maximum likelihood estimator has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix, or alternatively the underlying graph structure of a Gaussian Markov Random Field, from very limited samples. We propose a novel algorithm f ..."
Abstract

Cited by 67 (9 self)
 Add to MetaCart
The ℓ1 regularized Gaussian maximum likelihood estimator has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix, or alternatively the underlying graph structure of a Gaussian Markov Random Field, from very limited samples. We propose a novel algorithm for solving the resulting optimization problem which is a regularized logdeterminant program. In contrast to other stateoftheart methods that largely use first order gradient information, our algorithm is based on Newton’s method and employs a quadratic approximation, but with some modifications that leverage the structure of the sparse Gaussian MLE problem. We show that our method is superlinearly convergent, and also present experimental results using synthetic and real application data that demonstrate the considerable improvements in performance of our method when compared to other stateoftheart methods. 1
Proximal Newtontype methods for convex optimization
"... We seek to solve convex optimization problems in composite form: minimize x∈R n f(x): = g(x) + h(x), where g is convex and continuously differentiable and h: R n → R is a convex but not necessarily differentiable function whose proximal mapping can be evaluated efficiently. We derive a generalizatio ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
(Show Context)
We seek to solve convex optimization problems in composite form: minimize x∈R n f(x): = g(x) + h(x), where g is convex and continuously differentiable and h: R n → R is a convex but not necessarily differentiable function whose proximal mapping can be evaluated efficiently. We derive a generalization of Newtontype methods to handle such convex but nonsmooth objective functions. We prove such methods are globally convergent and achieve superlinear rates of convergence in the vicinity of an optimal solution. We also demonstrate the performance of these methods using problems of relevance in machine learning and statistics. 1
PROXIMAL NEWTONTYPE METHODS FOR MINIMIZING COMPOSITE FUNCTIONS
"... Abstract. We generalize Newtontype methods for minimizing smooth functions to handle a sum of two convex functions: a smooth function and a nonsmooth function with a simple proximal mapping. We show that the resulting proximal Newtontype methods inherit the desirable convergence behavior of Newton ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We generalize Newtontype methods for minimizing smooth functions to handle a sum of two convex functions: a smooth function and a nonsmooth function with a simple proximal mapping. We show that the resulting proximal Newtontype methods inherit the desirable convergence behavior of Newtontype methods for minimizing smooth functions, even when search directions are computed inexactly. Many popular methods tailored to problems arising in bioinformatics, signal processing, and statistical learning are special cases of proximal Newtontype methods, and our analysis yields new convergence results for some of these methods.
An Inexact Successive Quadratic Approximation Method for Convex L1 Regularized Optimization,” arXiv preprint arXiv:1309.3529
, 2013
"... Abstract We study a Newtonlike method for the minimization of an objective function φ that is the sum of a smooth convex function and an 1 regularization term. This method, which is sometimes referred to in the literature as a proximal Newton method, computes a step by minimizing a piecewise quadr ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
Abstract We study a Newtonlike method for the minimization of an objective function φ that is the sum of a smooth convex function and an 1 regularization term. This method, which is sometimes referred to in the literature as a proximal Newton method, computes a step by minimizing a piecewise quadratic model q k of the objective function φ. In order to make this approach efficient in practice, it is imperative to perform this inner minimization inexactly. In this paper, we give inexactness conditions that guarantee global convergence and that can be used to control the local rate of convergence of the iteration. Our inexactness conditions are based on a semismooth function that represents a (continuous) measure of the optimality conditions of the problem, and that embodies the softthresholding iteration. We give careful consideration to the algorithm employed for the inner minimization, and report numerical results on two test sets originating in machine learning.
Composite SelfConcordant Minimization
"... We propose a variable metric framework for minimizing the sum of a selfconcordant function and a possibly nonsmooth convex function endowed with a computable proximal operator. We theoretically establish the convergence of our framework without relying on the usual Lipschitz gradient assumption on ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
(Show Context)
We propose a variable metric framework for minimizing the sum of a selfconcordant function and a possibly nonsmooth convex function endowed with a computable proximal operator. We theoretically establish the convergence of our framework without relying on the usual Lipschitz gradient assumption on the smooth part. An important highlight of our work is a new set of analytic stepsize selection and correction procedures based on the structure of the problem. We describe concrete algorithmic instances of our framework for several interesting largescale applications and demonstrate them numerically on both synthetic and real data.
BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
"... The `1regularized Gaussian maximum likelihood estimator (MLE) has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix even under highdimensional settings. However, it requires solving a difficult nonsmooth logdeterminant program with number of param ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
The `1regularized Gaussian maximum likelihood estimator (MLE) has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix even under highdimensional settings. However, it requires solving a difficult nonsmooth logdeterminant program with number of parameters scaling quadratically with the number of Gaussian variables. Stateoftheart methods thus do not scale to problems with more than 20, 000 variables. In this paper, we develop an algorithm BIGQUIC, which can solve 1 million dimensional `1regularized Gaussian MLE problems (which would thus have 1000 billion parameters) using a single machine, with bounded memory. In order to do so, we carefully exploit the underlying structure of the problem. Our innovations include a novel blockcoordinate descent method with the blocks chosen via a clustering scheme to minimize repeated computations; and allowing for inexact computation of specific components. In spite of these modifications, we are able to theoretically analyze our procedure and show that BIGQUIC can achieve superlinear or even quadratic convergence rates. 1
Sparse Gaussian Conditional Random Fields: Algorithms, Theory, and Application to Energy Forecasting
"... This paper considers the sparse Gaussian conditional random field, a discriminative extension of sparse inverse covariance estimation, where we use convex methods to learn a highdimensional conditional distribution of outputs given inputs. The model has been proposed by multiple researchers within ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
This paper considers the sparse Gaussian conditional random field, a discriminative extension of sparse inverse covariance estimation, where we use convex methods to learn a highdimensional conditional distribution of outputs given inputs. The model has been proposed by multiple researchers within the past year, yet previous papers have been substantially limited in their analysis of the method and in the ability to solve largescale problems. In this paper, we make three contributions: 1) we develop a secondorder activeset method which is several orders of magnitude faster than previously proposed optimization approaches for this problem, 2) we analyze the model from a theoretical standpoint, improving upon past bounds with convergence rates that depend logarithmically on the data dimension, and 3) we apply the method to largescale energy forecasting problems, demonstrating stateoftheart performance on two realworld tasks. 1.
Practical Inexact Proximal QuasiNewton Method with Global Complexity Analysis
"... Recently several methods were proposed for sparse optimization which make careful use of secondorder information [11, 30, 17, 3] to improve local convergence rates. These methods construct a composite quadratic approximation using Hessian information, optimize this approximation using a firstorde ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Recently several methods were proposed for sparse optimization which make careful use of secondorder information [11, 30, 17, 3] to improve local convergence rates. These methods construct a composite quadratic approximation using Hessian information, optimize this approximation using a firstorder method, such as coordinate descent and employ a line search to ensure sufficient descent. Here we propose a general framework, which includes slightly modified versions of existing algorithms and also a new algorithm, which uses limited memory BFGS Hessian approximations, and provide a global convergence rate analysis in the spirit of proximal gradient methods, which includes analysis of method based on coordinate descent. 1
FAST PROXIMAL ALGORITHMS FOR SELFCONCORDANT FUNCTION MINIMIZATION WITH APPLICATION TO SPARSE GRAPH SELECTION
"... The convex ℓ1regularized log det divergence criterion has been shown to produce theoretically consistent graph learning. However, this objective function is challenging since the ℓ1regularization is nonsmooth, the log det objective is not globally Lipschitz gradient function, and the problem is hi ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
The convex ℓ1regularized log det divergence criterion has been shown to produce theoretically consistent graph learning. However, this objective function is challenging since the ℓ1regularization is nonsmooth, the log det objective is not globally Lipschitz gradient function, and the problem is highdimensional. Using the selfconcordant property of the objective, we propose a new adaptive step size selection and present the (F)PS ((F)ast Proximal algorithms for Selfconcordant functions) algorithmic framework which has linear convergence and exhibits superior empirical results as compared to stateoftheart first order methods. Index Terms — Sparse inverse covariance estimation, selfconcordance, step size selection
QUIC: Quadratic Approximation for Sparse Inverse Covariance Estimation
"... The `1regularized Gaussian maximum likelihood estimator (MLE) has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix, or alternatively the underlying graph structure of a Gaussian Markov Random Field, from very limited samples. We propose a novel algo ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
The `1regularized Gaussian maximum likelihood estimator (MLE) has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix, or alternatively the underlying graph structure of a Gaussian Markov Random Field, from very limited samples. We propose a novel algorithm for solving the resulting optimization problem which is a regularized logdeterminant program. In contrast to recent stateoftheart methods that largely use first order gradient information, our algorithm is based on Newton’s method and employs a quadratic approximation, but with some modifications that leverage the structure of the sparse Gaussian MLE problem. We show that our method is superlinearly convergent, and present experimental results using synthetic and realworld application data that demonstrate the considerable improvements in performance of our method when compared to previous methods.