• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

SparseNet: Coordinate descent with non-convex penalties (2009)

by R Mazumder, J Friedman, T Hastie
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 71
Next 10 →

Multinomial inverse regression for text analysis

by Matt Taddy - J. Amer. Statist. Assoc , 2013
"... ABSTRACT: Text data, including speeches, stories, and other document forms, are often connected to sentiment variables that are of interest for research in marketing, economics, and elsewhere. It is also very high dimensional and difficult to incorporate into statistical analyses. This article intro ..."
Abstract - Cited by 18 (3 self) - Add to MetaCart
ABSTRACT: Text data, including speeches, stories, and other document forms, are often connected to sentiment variables that are of interest for research in marketing, economics, and elsewhere. It is also very high dimensional and difficult to incorporate into statistical analyses. This article introduces a straightforward framework of sentiment-sufficient dimension reduction for text data. Multinomial in-verse regression is introduced as a general tool for simplifying predictor sets that can be represented as draws from a multinomial distribution, and we show that logistic regression of phrase counts onto document annotations can be used to obtain low dimension document representations that are rich in sentiment information. To facilitate this modeling, a novel estimation technique is developed for multi-nomial logistic regression with very high-dimension response. In particular, independent Laplace priors with unknown variance are assigned to each regression coefficient, and we detail an efficient routine for maximization of the joint posterior over coefficients and their prior scale. This ‘gamma-lasso ’ scheme yields stable and effective estimation for general high-dimension logistic regression, and we argue that it will be superior to current methods in many settings. Guidelines for prior specification are provided, algorithm convergence is detailed, and estimator properties are outlined from the perspective of the literature on non-concave likelihood penalization. Related work on sentiment analysis from statistics, econometrics, and machine learning is surveyed and connected. Finally, the methods are applied in two detailed examples and we provide out-of-sample prediction studies to illustrate their effectiveness. Taddy is an Associate Professor of Econometrics and Statistics and Neubauer Family Faculty Fellow at the Uni-
(Show Context)

Citation Context

...cally overwhelms penalty concavity and real examples behave like those shown in Figure 3. Moreover, although it is possible to show stationary limit points for CD on such 20 nonconvex functions (e.g. =-=Mazunder et al., 2011-=-), we advocate avoiding the issue through prior specification. In particular, hyperprior shape and rate can be raised to decrease var(λjk) while keeping E[λjk] unchanged. Although this may require mor...

Strong oracle optimality of folded concave penalized estimation

by Jianqing Fan, Hui Zou, See Profile, Available From Jianqing Fan, Jianqing Fan, Lingzhou Xue, Hui Zou - Ann. Statist , 2014
"... All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract - Cited by 13 (4 self) - Add to MetaCart
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
(Show Context)

Citation Context

...hang (2010a) devised a PLUS algorithm for solving the penalized least squares using the MCP and the SCAD. Recently, coordinate descent was applied to solve the folded concave penalized least squares (=-=Mazumder et al., 2011-=-; Fan and Lv, 2011). With these advances in computing algorithms, one can now at least efficiently compute a local solution of the folded concave penalized problem. It has been shown repeatedly that t...

A generalized least-square matrix decomposition

by Genevera I. Allen, Logan Grosenick, Jonathan Taylor - Journal of the American Statistical Association
"... Variables in many massive high-dimensional data sets are structured, arising for example from measurements on a regular grid as in imaging and time series or from spatial-temporal measurements as in climate studies. Classical multivariate techniques ignore these structural relationships often result ..."
Abstract - Cited by 13 (6 self) - Add to MetaCart
Variables in many massive high-dimensional data sets are structured, arising for example from measurements on a regular grid as in imaging and time series or from spatial-temporal measurements as in climate studies. Classical multivariate techniques ignore these structural relationships often resulting in poor performance. We propose a generalization of the singular value decomposition (SVD) and principal components analysis (PCA) that is appropriate for massive data sets with structured variables or known two-way dependencies. By finding the best low rank approximation of the data with respect to a transposable quadratic norm, our decomposition, entitled the Gen-eralized least squares Matrix Decomposition (GMD), directly accounts for structural relationships. As many variables in high-dimensional settings are often irrelevant or noisy, we also regularize our matrix decomposition by adding two-way penalties to en-courage sparsity or smoothness. We develop fast computational algorithms using our methods to perform generalized PCA (GPCA), sparse GPCA, and functional GPCA on massive data sets. Through simulations and a whole brain functional MRI example we demonstrate the utility of our methodology for dimension reduction, signal recovery, and feature selection with high-dimensional structured data.
(Show Context)

Citation Context

...lty (Fan and Li, 2001b). As mentioned above, these penalties, however, can still be used in our framework as concave penalized regression problems can be solved via iterative weighted lasso problems (=-=Mazumder et al., 2009-=-). Thus, our GPMF framework can be used with a wide range of penalties to obtain sparsity in the factors. Finally, we note that as our GMD Algorithm can be used to perform GPCA, the sparse GPMF framew...

Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. arXiv preprint, arXiv

by Zhaoran Wang, Han Liu, Tong Zhang , 2013
"... We provide theoretical analysis of the statistical and computational properties of penalized M-estimators that can be formulated as the solution to a possibly nonconvex optimization prob-lem. Many important estimators fall in this category, including least squares regression with nonconvex regulariz ..."
Abstract - Cited by 11 (5 self) - Add to MetaCart
We provide theoretical analysis of the statistical and computational properties of penalized M-estimators that can be formulated as the solution to a possibly nonconvex optimization prob-lem. Many important estimators fall in this category, including least squares regression with nonconvex regularization, generalized linear models with nonconvex regularization, and sparse elliptical random design regression. For these problems, it is intractable to calculate the global solution due to the nonconvex formulation. In this paper, we propose an approximate regulariza-tion path following method for solving a variety of learning problems with nonconvex objective functions. Under a unified analytic framework, we simultaneously provide explicit statistical and computational rates of convergence of any local solution obtained by the algorithm. Com-putationally, our algorithm attains a global geometric rate of convergence for calculating the full regularization path, which is optimal among all first-order algorithms. Unlike most existing methods that only attain geometric rates of convergence for one single regularization parameter, our algorithm calculates the full regularization path with the same iteration complexity. In par-ticular, we provide a refined iteration complexity bound to sharply characterize the performance of each stage along the regularization path. Statistically, we provide sharp sample complexity analysis for all the approximate local solutions along the regularization path. In particular, our analysis improves upon existing results by providing a more refined sample complexity bound as well as an exact support recovery result for the final estimator. These results show that the final estimator attains an oracle statistical property due to the usage of nonconvex penalty. 1
(Show Context)

Citation Context

...approximation (LQA) (Fan 2 and Li, 2001), minorize-maximize (MM) algorithm (Hunter and Li, 2005), local linear approximation (LLA) (Zou and Li, 2008), and coordinate descent (Breheny and Huang, 2011; =-=Mazumder et al., 2011-=-). The theoretical properties of the local solutions obtained by these numerical procedures are in general unestablished. Only recently Zhang and Zhang (2012) showed that the gradient descent method i...

The sparse Laplacian shrinkage estimator for high-dimensional regression

by Jian Huang, Shuangge Ma, Hongzhe Li , 2010
"... ar ..."
Abstract - Cited by 10 (3 self) - Add to MetaCart
Abstract not found

A Non-convex Relaxation Approach to Sparse Dictionary Learning

by Jianping Shi, Xiang Ren, Guang Dai, Jingdong Wang, Zhihua Zhang
"... Dictionary learning is a challenging theme in computer vision. The basic goal is to learn a sparse representation from an overcomplete basis set. Most existing approaches employ a convex relaxation scheme to tackle this challenge due to the strong ability of convexity in computation and theoretical ..."
Abstract - Cited by 9 (3 self) - Add to MetaCart
Dictionary learning is a challenging theme in computer vision. The basic goal is to learn a sparse representation from an overcomplete basis set. Most existing approaches employ a convex relaxation scheme to tackle this challenge due to the strong ability of convexity in computation and theoretical analysis. In this paper we propose a non-convex online approach for dictionary learning. To achieve the sparseness, our approach treats a so-called minimax concave (MC) penalty as a nonconvex relaxation of the ℓ0 penalty. This treatment expects to obtain a more robust and sparse representation than existing convex approaches. In addition, we employ an online algorithm to adaptively learn the dictionary, which makes the non-convex formulation computationally feasible. Experimental results on the sparseness comparison and the applications in image denoising and image inpainting demonstrate that our approach is more effective and flexible. 1.
(Show Context)

Citation Context

...ARS [18], coordinate-descent algorithms [6], etc. The lasso enjoys some attractive statistical properties. However, the ℓ1 penalty may bring out over-penalization on some good variables in some cases =-=[15, 20]-=-. A number of non-convex relaxation approaches, such as the log penalty [7], the smoothly clipped absolute deviation (SCAD) penalty [5] and the minimax concave (MC) penalty [20], have been also propos...

EP-GIG Priors and Applications in Bayesian Sparse Learning

by Zhihua Zhang, Shusen Wang, Dehua Liu, Michael I. Jordan
"... In this paper we propose a novel framework for the construction of sparsity-inducing priors. In particular, we define such priors as a mixture of exponential power distributions with a generalized inverse Gaussian density (EP-GIG). EP-GIG is a variant of generalized hyperbolic distributions, and the ..."
Abstract - Cited by 8 (2 self) - Add to MetaCart
In this paper we propose a novel framework for the construction of sparsity-inducing priors. In particular, we define such priors as a mixture of exponential power distributions with a generalized inverse Gaussian density (EP-GIG). EP-GIG is a variant of generalized hyperbolic distributions, and the special cases include Gaussian scale mixtures and Laplace scale mixtures. Furthermore, Laplace scale mixtures can subserve a Bayesian framework for sparse learning with nonconvex penalization. The densities of EP-GIG can be explicitly expressed. Moreover, the corresponding posterior distribution also follows a generalized inverse Gaussian distribution. We exploit these properties to develop EM algorithms for sparse empirical Bayesian learning. We also show that these algorithms bear an interesting resemblance to iteratively reweightedℓ2 orℓ1 methods. Finally, we present two extensions for grouped variable selection and logistic regression.
(Show Context)

Citation Context

...ni, 1996). But in addition a number of studies have emphasized the advantages of nonconvex penalties—such as the bridge penalty and the log-penalty—for achieving sparsity (Fu, 1998, Fan and Li, 2001, =-=Mazumder et al., 2011-=-). The regularized optimization problem can be cast into a maximum a posteriori (MAP) framework. This is done by taking a Bayesian decision-theoretic approach in which the loss function L(b;X ) is bas...

Moving Object Detection by Detecting Contiguous Outliers in the Low-Rank Representation

by Xiaowei Zhou, Student Member, Can Yang, Weichuan Yu
"... Abstract—Object detection is a fundamental step for automated video analysis in many vision applications. Object detection in a video is usually performed by object detectors or background subtraction techniques. Often, an object detector requires manually labeled examples to train a binary classifi ..."
Abstract - Cited by 8 (0 self) - Add to MetaCart
Abstract—Object detection is a fundamental step for automated video analysis in many vision applications. Object detection in a video is usually performed by object detectors or background subtraction techniques. Often, an object detector requires manually labeled examples to train a binary classifier, while background subtraction needs a training sequence that contains no objects to build a background model. To automate the analysis, object detection without a separate training phase becomes a critical task. People have tried to tackle this task by using motion information. But existing motion-based methods are usually limited when coping with complex scenarios such as nonrigid motion and dynamic background. In this paper, we show that the above challenges can be addressed in a unified framework named DEtecting Contiguous Outliers in the LOw-rank Representation (DECOLOR). This formulation integrates object detection and background learning into a single process of optimization, which can be solved by an alternating algorithm efficiently. We explain the relations between DECOLOR and other sparsity-based methods. Experiments on both simulated data and real sequences demonstrate that DECOLOR outperforms the state-of-the-art approaches and it can work effectively on a wide range of complex scenarios. Index Terms—Moving object detection, low-rank modeling, Markov Random Fields, motion segmentation Ç 1
(Show Context)

Citation Context

...at the irrepresentable condition [53] is often not satisfied in the outlier detection problem. In order to go beyond the ℓ1-penalty, non-convex penalties have been explored in recent literature [52], =-=[54]-=-. Compared with the ℓ1-norm, non-convex penalties give an estimation with less bias but higher variance. Thus, these non-convex penalties are superior to the ℓ1-penalty when the signal-noise-ratio (SN...

High-dimensional sparse additive hazards regression

by Wei Lin, Jinchi Lv - Journal of the American Statistical Association
"... ar ..."
Abstract - Cited by 6 (1 self) - Add to MetaCart
Abstract not found

Path following and empirical Bayes model selection for sparse regressions. manuscript in preparation

by Hua Zhou, Artin Armagan, David B. Dunson , 2011
"... ar ..."
Abstract - Cited by 6 (3 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...thm for optimization in sparse regression at a fixed tuning parameter value and has found success in ultra-high dimensional problems (Friedman et al., 2007; Wu and Lange, 2008; Friedman et al., 2010; =-=Mazumder et al., 2011-=-). Nevertheless the optimization has to be performed at a large number of grid points, making the computation rather demanding compared to path following algorithms such as LARS. For both algorithms t...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University