Results 1  10
of
11
Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. arXiv preprint, arXiv
, 2013
"... We provide theoretical analysis of the statistical and computational properties of penalized Mestimators that can be formulated as the solution to a possibly nonconvex optimization problem. Many important estimators fall in this category, including least squares regression with nonconvex regulariz ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
We provide theoretical analysis of the statistical and computational properties of penalized Mestimators that can be formulated as the solution to a possibly nonconvex optimization problem. Many important estimators fall in this category, including least squares regression with nonconvex regularization, generalized linear models with nonconvex regularization, and sparse elliptical random design regression. For these problems, it is intractable to calculate the global solution due to the nonconvex formulation. In this paper, we propose an approximate regularization path following method for solving a variety of learning problems with nonconvex objective functions. Under a unified analytic framework, we simultaneously provide explicit statistical and computational rates of convergence of any local solution obtained by the algorithm. Computationally, our algorithm attains a global geometric rate of convergence for calculating the full regularization path, which is optimal among all firstorder algorithms. Unlike most existing methods that only attain geometric rates of convergence for one single regularization parameter, our algorithm calculates the full regularization path with the same iteration complexity. In particular, we provide a refined iteration complexity bound to sharply characterize the performance of each stage along the regularization path. Statistically, we provide sharp sample complexity analysis for all the approximate local solutions along the regularization path. In particular, our analysis improves upon existing results by providing a more refined sample complexity bound as well as an exact support recovery result for the final estimator. These results show that the final estimator attains an oracle statistical property due to the usage of nonconvex penalty. 1
Multivariate generalized gaussian distribution: Convexity and graphical models
 IEEE Transaction on Signal Processing
, 2013
"... Abstract—We consider covariance estimation in themultivariate generalized Gaussian distribution (MGGD) and elliptically symmetric (ES) distribution. The maximum likelihood optimization associated with this problem is nonconvex, yet it has been proved that its global solution can be often computed ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
Abstract—We consider covariance estimation in themultivariate generalized Gaussian distribution (MGGD) and elliptically symmetric (ES) distribution. The maximum likelihood optimization associated with this problem is nonconvex, yet it has been proved that its global solution can be often computed via simple fixed point iterations. Our first contribution is a new analysis of this likelihood based on geodesic convexity that requires weaker assumptions. Our second contribution is a generalized framework for structured covariance estimation under sparsity constraints. We show that the optimizations can be formulated as convex minimization as long the MGGD shape parameter is larger than half and the sparsity pattern is chordal. These include, for example, maximum likelihood estimation of banded inverse covariances in multivariate Laplace distributions, which are associated with time varying autoregressive processes. Index Terms—Cholesky decomposition, geodesic convexity, graphical models, multivariate generalized Gaussian distribution. I.
Nonconvex statistical optimization: Minimaxoptimal sparse pca in polynomial time. Available at arXiv:1408.5352
, 2014
"... Sparse principal component analysis (PCA) involves nonconvex optimization for which the global solution is hard to obtain. To address this issue, one popular approach is convex relaxation. However, such an approach may produce suboptimal estimators due to the relaxation effect. To optimally estimate ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Sparse principal component analysis (PCA) involves nonconvex optimization for which the global solution is hard to obtain. To address this issue, one popular approach is convex relaxation. However, such an approach may produce suboptimal estimators due to the relaxation effect. To optimally estimate sparse principal subspaces, we propose a twostage computational framework named “tighten after relax”: Within the “relax ” stage, we approximately solve a convex relaxation of sparse PCA with early stopping to obtain a desired initial estimator; For the “tighten ” stage, we propose a novel algorithm called sparse orthogonal iteration pursuit (SOAP), which iteratively refines the initial estimator by directly solving the underlying nonconvex problem. A key concept of this twostage framework is the basin of attraction. It represents a local region within which the “tighten ” stage has desired computational and statistical guarantees. We prove that, the initial estimator obtained from the “relax ” stage falls into such a region, and hence SOAP geometrically converges to a principal subspace estimator which is minimaxoptimal within a certain model class. Unlike most existing sparse PCA estimators, our approach applies to the nonspiked covariance models, and adapts to nonGaussianity as well as dependent data settings. Moreover, through analyzing the computational complexity of the two stages, we illustrate an interesting phenomenon: Larger sample size can reduce the total iteration complexity. Our framework motivates a general paradigm for solving many complex statistical problems which involve nonconvex optimization with provable guarantees. 1
Joint Estimation of Multiple Graphical Models from High Dimensional Time Series
, 2013
"... copyright holder. Copyright © 2011 by the authors ..."
Optimal Rates of Convergence of Transelliptical Component Analysis
, 2013
"... Han and Liu (2012) proposed a method named transelliptical component analysis (TCA) for conducting scaleinvariant principal component analysis on high dimensional data with transelliptical distributions. The transelliptical family assumes that the data follow an elliptical distribution after unspec ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Han and Liu (2012) proposed a method named transelliptical component analysis (TCA) for conducting scaleinvariant principal component analysis on high dimensional data with transelliptical distributions. The transelliptical family assumes that the data follow an elliptical distribution after unspecified marginal monotone transformations. In a double asymptotic framework where the dimension d is allowed to increase with the sample size n, Han and Liu (2012) showed that one version of TCA attains a “nearly parametric ” rate of convergence in parameter estimation when the parameter of interest is assumed to be sparse. This paper improves upon their results in two aspects: (i) Under the nonsparse setting (i.e., the parameter of interest is not assumed to be sparse), we show that a version of TCA attains the optimal rate of convergence up to a logarithmic factor; (ii) Under the sparse setting, we also lay out venues to analyze the performance of the TCA estimator proposed in Han and Liu (2012). In particular, we provide a “sign subgaussian condition ” which is sufficient for TCA to attain an improved rate of convergence and verify a subfamily of the transelliptical distributions satisfying this condition.
Statistical Analysis of Big Data on Pharmacogenomics
"... This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively overview several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for netwo ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively overview several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, largescale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identify important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed.
Multivariate analysis of nonparametric estimates of large correlation matrices. arXiv preprint
, 2014
"... ar ..."
(Show Context)
A Junction Tree Framework for Undirected Graphical Model Selection
, 2013
"... An undirected graphical model is a joint probability distribution defined on an undirected graph G ∗, where the vertices in the graph index a collection of random variables and the edges encode conditional independence relationships amongst random variables. The undirected graphical model selection ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
An undirected graphical model is a joint probability distribution defined on an undirected graph G ∗, where the vertices in the graph index a collection of random variables and the edges encode conditional independence relationships amongst random variables. The undirected graphical model selection (UGMS) problem is to estimate the graph G ∗ given observations drawn from the undirected graphical model. This paper proposes a framework for decomposing the UGMS problem into multiple subproblems over clusters and subsets of the separators in a junction tree. The junction tree is constructed using a graph that contains a superset of the edges in G ∗. We highlight three main properties of using junction trees for UGMS. First, different regularization parameters or different UGMS algorithms can be used to learn different parts of the graph. This is possible since the subproblems we identify can be solved independently of each other. Second, under certain conditions, a junction tree based UGMS algorithm can produce consistent results with exponentially fewer observations than the usual requirements of existing algorithms. Third, both our theoretical and experimental results show that the junction tree framework does a significantly better job at finding the weakest edges in a graph than existing methods. This property is a consequence of both the first and second properties. Finally, we note that our framework is independent of the choice of the UGMS algorithm and can be used as a wrapper around standard UGMS algorithms for more accurate graph estimation.
Controlling the precisionrecall tradeoff in differential dependency network analysis
 In The Seventh Workshop on Machine Learning in Systems Biology
, 2013
"... Graphical models have gained a lot of attention recently as a tool for learning and representing dependencies among variables in multivariate data. Often, domain scientists are looking specifically for differences among the dependency networks of different conditions or populations (e.g. differences ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Graphical models have gained a lot of attention recently as a tool for learning and representing dependencies among variables in multivariate data. Often, domain scientists are looking specifically for differences among the dependency networks of different conditions or populations (e.g. differences between regulatory networks of different species, or differences between dependency networks of diseased versus healthy populations). The standard method for finding these differences is to learn the dependency networks for each condition independently and compare them. We show that this approach is prone to high false discovery rates (low precision) that can render the analysis useless. We then show that by imposing a bias towards learning similar dependency networks for each condition the false discovery rates can be reduced to acceptable levels, at the cost of finding a reduced number of differences. Algorithms developed in the transfer learning literature can be used to vary the strength of the imposed similarity bias and provide a natural mechanism to smoothly adjust this differential precisionrecall tradeoff to cater to the requirements of the analysis conducted. We present real case studies (oncological and neurological) where domain experts use the proposed technique to extract useful differential networks that shed light on the biological processes involved in cancer and brain function. 1
, Chairperson
, 2013
"... This dissertation is approved, and it is acceptable in quality and form for publication: Approved by the Dissertation Committee: ..."
Abstract
 Add to MetaCart
(Show Context)
This dissertation is approved, and it is acceptable in quality and form for publication: Approved by the Dissertation Committee: