Results 1  10
of
71
Multinomial inverse regression for text analysis
 J. Amer. Statist. Assoc
, 2013
"... ABSTRACT: Text data, including speeches, stories, and other document forms, are often connected to sentiment variables that are of interest for research in marketing, economics, and elsewhere. It is also very high dimensional and difficult to incorporate into statistical analyses. This article intro ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
(Show Context)
ABSTRACT: Text data, including speeches, stories, and other document forms, are often connected to sentiment variables that are of interest for research in marketing, economics, and elsewhere. It is also very high dimensional and difficult to incorporate into statistical analyses. This article introduces a straightforward framework of sentimentsufficient dimension reduction for text data. Multinomial inverse regression is introduced as a general tool for simplifying predictor sets that can be represented as draws from a multinomial distribution, and we show that logistic regression of phrase counts onto document annotations can be used to obtain low dimension document representations that are rich in sentiment information. To facilitate this modeling, a novel estimation technique is developed for multinomial logistic regression with very highdimension response. In particular, independent Laplace priors with unknown variance are assigned to each regression coefficient, and we detail an efficient routine for maximization of the joint posterior over coefficients and their prior scale. This ‘gammalasso ’ scheme yields stable and effective estimation for general highdimension logistic regression, and we argue that it will be superior to current methods in many settings. Guidelines for prior specification are provided, algorithm convergence is detailed, and estimator properties are outlined from the perspective of the literature on nonconcave likelihood penalization. Related work on sentiment analysis from statistics, econometrics, and machine learning is surveyed and connected. Finally, the methods are applied in two detailed examples and we provide outofsample prediction studies to illustrate their effectiveness. Taddy is an Associate Professor of Econometrics and Statistics and Neubauer Family Faculty Fellow at the Uni
Strong oracle optimality of folded concave penalized estimation
 Ann. Statist
, 2014
"... All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
(Show Context)
All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
A generalized leastsquare matrix decomposition
 Journal of the American Statistical Association
"... Variables in many massive highdimensional data sets are structured, arising for example from measurements on a regular grid as in imaging and time series or from spatialtemporal measurements as in climate studies. Classical multivariate techniques ignore these structural relationships often result ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
(Show Context)
Variables in many massive highdimensional data sets are structured, arising for example from measurements on a regular grid as in imaging and time series or from spatialtemporal measurements as in climate studies. Classical multivariate techniques ignore these structural relationships often resulting in poor performance. We propose a generalization of the singular value decomposition (SVD) and principal components analysis (PCA) that is appropriate for massive data sets with structured variables or known twoway dependencies. By finding the best low rank approximation of the data with respect to a transposable quadratic norm, our decomposition, entitled the Generalized least squares Matrix Decomposition (GMD), directly accounts for structural relationships. As many variables in highdimensional settings are often irrelevant or noisy, we also regularize our matrix decomposition by adding twoway penalties to encourage sparsity or smoothness. We develop fast computational algorithms using our methods to perform generalized PCA (GPCA), sparse GPCA, and functional GPCA on massive data sets. Through simulations and a whole brain functional MRI example we demonstrate the utility of our methodology for dimension reduction, signal recovery, and feature selection with highdimensional structured data.
Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. arXiv preprint, arXiv
, 2013
"... We provide theoretical analysis of the statistical and computational properties of penalized Mestimators that can be formulated as the solution to a possibly nonconvex optimization problem. Many important estimators fall in this category, including least squares regression with nonconvex regulariz ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
(Show Context)
We provide theoretical analysis of the statistical and computational properties of penalized Mestimators that can be formulated as the solution to a possibly nonconvex optimization problem. Many important estimators fall in this category, including least squares regression with nonconvex regularization, generalized linear models with nonconvex regularization, and sparse elliptical random design regression. For these problems, it is intractable to calculate the global solution due to the nonconvex formulation. In this paper, we propose an approximate regularization path following method for solving a variety of learning problems with nonconvex objective functions. Under a unified analytic framework, we simultaneously provide explicit statistical and computational rates of convergence of any local solution obtained by the algorithm. Computationally, our algorithm attains a global geometric rate of convergence for calculating the full regularization path, which is optimal among all firstorder algorithms. Unlike most existing methods that only attain geometric rates of convergence for one single regularization parameter, our algorithm calculates the full regularization path with the same iteration complexity. In particular, we provide a refined iteration complexity bound to sharply characterize the performance of each stage along the regularization path. Statistically, we provide sharp sample complexity analysis for all the approximate local solutions along the regularization path. In particular, our analysis improves upon existing results by providing a more refined sample complexity bound as well as an exact support recovery result for the final estimator. These results show that the final estimator attains an oracle statistical property due to the usage of nonconvex penalty. 1
A Nonconvex Relaxation Approach to Sparse Dictionary Learning
"... Dictionary learning is a challenging theme in computer vision. The basic goal is to learn a sparse representation from an overcomplete basis set. Most existing approaches employ a convex relaxation scheme to tackle this challenge due to the strong ability of convexity in computation and theoretical ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
Dictionary learning is a challenging theme in computer vision. The basic goal is to learn a sparse representation from an overcomplete basis set. Most existing approaches employ a convex relaxation scheme to tackle this challenge due to the strong ability of convexity in computation and theoretical analysis. In this paper we propose a nonconvex online approach for dictionary learning. To achieve the sparseness, our approach treats a socalled minimax concave (MC) penalty as a nonconvex relaxation of the ℓ0 penalty. This treatment expects to obtain a more robust and sparse representation than existing convex approaches. In addition, we employ an online algorithm to adaptively learn the dictionary, which makes the nonconvex formulation computationally feasible. Experimental results on the sparseness comparison and the applications in image denoising and image inpainting demonstrate that our approach is more effective and flexible. 1.
EPGIG Priors and Applications in Bayesian Sparse Learning
"... In this paper we propose a novel framework for the construction of sparsityinducing priors. In particular, we define such priors as a mixture of exponential power distributions with a generalized inverse Gaussian density (EPGIG). EPGIG is a variant of generalized hyperbolic distributions, and the ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
In this paper we propose a novel framework for the construction of sparsityinducing priors. In particular, we define such priors as a mixture of exponential power distributions with a generalized inverse Gaussian density (EPGIG). EPGIG is a variant of generalized hyperbolic distributions, and the special cases include Gaussian scale mixtures and Laplace scale mixtures. Furthermore, Laplace scale mixtures can subserve a Bayesian framework for sparse learning with nonconvex penalization. The densities of EPGIG can be explicitly expressed. Moreover, the corresponding posterior distribution also follows a generalized inverse Gaussian distribution. We exploit these properties to develop EM algorithms for sparse empirical Bayesian learning. We also show that these algorithms bear an interesting resemblance to iteratively reweightedℓ2 orℓ1 methods. Finally, we present two extensions for grouped variable selection and logistic regression.
Moving Object Detection by Detecting Contiguous Outliers in the LowRank Representation
"... Abstract—Object detection is a fundamental step for automated video analysis in many vision applications. Object detection in a video is usually performed by object detectors or background subtraction techniques. Often, an object detector requires manually labeled examples to train a binary classifi ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Object detection is a fundamental step for automated video analysis in many vision applications. Object detection in a video is usually performed by object detectors or background subtraction techniques. Often, an object detector requires manually labeled examples to train a binary classifier, while background subtraction needs a training sequence that contains no objects to build a background model. To automate the analysis, object detection without a separate training phase becomes a critical task. People have tried to tackle this task by using motion information. But existing motionbased methods are usually limited when coping with complex scenarios such as nonrigid motion and dynamic background. In this paper, we show that the above challenges can be addressed in a unified framework named DEtecting Contiguous Outliers in the LOwrank Representation (DECOLOR). This formulation integrates object detection and background learning into a single process of optimization, which can be solved by an alternating algorithm efficiently. We explain the relations between DECOLOR and other sparsitybased methods. Experiments on both simulated data and real sequences demonstrate that DECOLOR outperforms the stateoftheart approaches and it can work effectively on a wide range of complex scenarios. Index Terms—Moving object detection, lowrank modeling, Markov Random Fields, motion segmentation Ç 1
Highdimensional sparse additive hazards regression
 Journal of the American Statistical Association
"... ar ..."