Results 1 - 10
of
71
Multinomial inverse regression for text analysis
- J. Amer. Statist. Assoc
, 2013
"... ABSTRACT: Text data, including speeches, stories, and other document forms, are often connected to sentiment variables that are of interest for research in marketing, economics, and elsewhere. It is also very high dimensional and difficult to incorporate into statistical analyses. This article intro ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
(Show Context)
ABSTRACT: Text data, including speeches, stories, and other document forms, are often connected to sentiment variables that are of interest for research in marketing, economics, and elsewhere. It is also very high dimensional and difficult to incorporate into statistical analyses. This article introduces a straightforward framework of sentiment-sufficient dimension reduction for text data. Multinomial in-verse regression is introduced as a general tool for simplifying predictor sets that can be represented as draws from a multinomial distribution, and we show that logistic regression of phrase counts onto document annotations can be used to obtain low dimension document representations that are rich in sentiment information. To facilitate this modeling, a novel estimation technique is developed for multi-nomial logistic regression with very high-dimension response. In particular, independent Laplace priors with unknown variance are assigned to each regression coefficient, and we detail an efficient routine for maximization of the joint posterior over coefficients and their prior scale. This ‘gamma-lasso ’ scheme yields stable and effective estimation for general high-dimension logistic regression, and we argue that it will be superior to current methods in many settings. Guidelines for prior specification are provided, algorithm convergence is detailed, and estimator properties are outlined from the perspective of the literature on non-concave likelihood penalization. Related work on sentiment analysis from statistics, econometrics, and machine learning is surveyed and connected. Finally, the methods are applied in two detailed examples and we provide out-of-sample prediction studies to illustrate their effectiveness. Taddy is an Associate Professor of Econometrics and Statistics and Neubauer Family Faculty Fellow at the Uni-
Strong oracle optimality of folded concave penalized estimation
- Ann. Statist
, 2014
"... All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
(Show Context)
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
A generalized least-square matrix decomposition
- Journal of the American Statistical Association
"... Variables in many massive high-dimensional data sets are structured, arising for example from measurements on a regular grid as in imaging and time series or from spatial-temporal measurements as in climate studies. Classical multivariate techniques ignore these structural relationships often result ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
(Show Context)
Variables in many massive high-dimensional data sets are structured, arising for example from measurements on a regular grid as in imaging and time series or from spatial-temporal measurements as in climate studies. Classical multivariate techniques ignore these structural relationships often resulting in poor performance. We propose a generalization of the singular value decomposition (SVD) and principal components analysis (PCA) that is appropriate for massive data sets with structured variables or known two-way dependencies. By finding the best low rank approximation of the data with respect to a transposable quadratic norm, our decomposition, entitled the Gen-eralized least squares Matrix Decomposition (GMD), directly accounts for structural relationships. As many variables in high-dimensional settings are often irrelevant or noisy, we also regularize our matrix decomposition by adding two-way penalties to en-courage sparsity or smoothness. We develop fast computational algorithms using our methods to perform generalized PCA (GPCA), sparse GPCA, and functional GPCA on massive data sets. Through simulations and a whole brain functional MRI example we demonstrate the utility of our methodology for dimension reduction, signal recovery, and feature selection with high-dimensional structured data.
Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. arXiv preprint, arXiv
, 2013
"... We provide theoretical analysis of the statistical and computational properties of penalized M-estimators that can be formulated as the solution to a possibly nonconvex optimization prob-lem. Many important estimators fall in this category, including least squares regression with nonconvex regulariz ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
(Show Context)
We provide theoretical analysis of the statistical and computational properties of penalized M-estimators that can be formulated as the solution to a possibly nonconvex optimization prob-lem. Many important estimators fall in this category, including least squares regression with nonconvex regularization, generalized linear models with nonconvex regularization, and sparse elliptical random design regression. For these problems, it is intractable to calculate the global solution due to the nonconvex formulation. In this paper, we propose an approximate regulariza-tion path following method for solving a variety of learning problems with nonconvex objective functions. Under a unified analytic framework, we simultaneously provide explicit statistical and computational rates of convergence of any local solution obtained by the algorithm. Com-putationally, our algorithm attains a global geometric rate of convergence for calculating the full regularization path, which is optimal among all first-order algorithms. Unlike most existing methods that only attain geometric rates of convergence for one single regularization parameter, our algorithm calculates the full regularization path with the same iteration complexity. In par-ticular, we provide a refined iteration complexity bound to sharply characterize the performance of each stage along the regularization path. Statistically, we provide sharp sample complexity analysis for all the approximate local solutions along the regularization path. In particular, our analysis improves upon existing results by providing a more refined sample complexity bound as well as an exact support recovery result for the final estimator. These results show that the final estimator attains an oracle statistical property due to the usage of nonconvex penalty. 1
A Non-convex Relaxation Approach to Sparse Dictionary Learning
"... Dictionary learning is a challenging theme in computer vision. The basic goal is to learn a sparse representation from an overcomplete basis set. Most existing approaches employ a convex relaxation scheme to tackle this challenge due to the strong ability of convexity in computation and theoretical ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
(Show Context)
Dictionary learning is a challenging theme in computer vision. The basic goal is to learn a sparse representation from an overcomplete basis set. Most existing approaches employ a convex relaxation scheme to tackle this challenge due to the strong ability of convexity in computation and theoretical analysis. In this paper we propose a non-convex online approach for dictionary learning. To achieve the sparseness, our approach treats a so-called minimax concave (MC) penalty as a nonconvex relaxation of the ℓ0 penalty. This treatment expects to obtain a more robust and sparse representation than existing convex approaches. In addition, we employ an online algorithm to adaptively learn the dictionary, which makes the non-convex formulation computationally feasible. Experimental results on the sparseness comparison and the applications in image denoising and image inpainting demonstrate that our approach is more effective and flexible. 1.
EP-GIG Priors and Applications in Bayesian Sparse Learning
"... In this paper we propose a novel framework for the construction of sparsity-inducing priors. In particular, we define such priors as a mixture of exponential power distributions with a generalized inverse Gaussian density (EP-GIG). EP-GIG is a variant of generalized hyperbolic distributions, and the ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
(Show Context)
In this paper we propose a novel framework for the construction of sparsity-inducing priors. In particular, we define such priors as a mixture of exponential power distributions with a generalized inverse Gaussian density (EP-GIG). EP-GIG is a variant of generalized hyperbolic distributions, and the special cases include Gaussian scale mixtures and Laplace scale mixtures. Furthermore, Laplace scale mixtures can subserve a Bayesian framework for sparse learning with nonconvex penalization. The densities of EP-GIG can be explicitly expressed. Moreover, the corresponding posterior distribution also follows a generalized inverse Gaussian distribution. We exploit these properties to develop EM algorithms for sparse empirical Bayesian learning. We also show that these algorithms bear an interesting resemblance to iteratively reweightedℓ2 orℓ1 methods. Finally, we present two extensions for grouped variable selection and logistic regression.
Moving Object Detection by Detecting Contiguous Outliers in the Low-Rank Representation
"... Abstract—Object detection is a fundamental step for automated video analysis in many vision applications. Object detection in a video is usually performed by object detectors or background subtraction techniques. Often, an object detector requires manually labeled examples to train a binary classifi ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Object detection is a fundamental step for automated video analysis in many vision applications. Object detection in a video is usually performed by object detectors or background subtraction techniques. Often, an object detector requires manually labeled examples to train a binary classifier, while background subtraction needs a training sequence that contains no objects to build a background model. To automate the analysis, object detection without a separate training phase becomes a critical task. People have tried to tackle this task by using motion information. But existing motion-based methods are usually limited when coping with complex scenarios such as nonrigid motion and dynamic background. In this paper, we show that the above challenges can be addressed in a unified framework named DEtecting Contiguous Outliers in the LOw-rank Representation (DECOLOR). This formulation integrates object detection and background learning into a single process of optimization, which can be solved by an alternating algorithm efficiently. We explain the relations between DECOLOR and other sparsity-based methods. Experiments on both simulated data and real sequences demonstrate that DECOLOR outperforms the state-of-the-art approaches and it can work effectively on a wide range of complex scenarios. Index Terms—Moving object detection, low-rank modeling, Markov Random Fields, motion segmentation Ç 1
High-dimensional sparse additive hazards regression
- Journal of the American Statistical Association
"... ar ..."