Results 11  20
of
32
Joint Estimation of Multiple Graphical Models from High Dimensional Time Series
, 2013
"... copyright holder. Copyright © 2011 by the authors ..."
Optimal Rates of Convergence of Transelliptical Component Analysis
, 2013
"... Han and Liu (2012) proposed a method named transelliptical component analysis (TCA) for conducting scaleinvariant principal component analysis on high dimensional data with transelliptical distributions. The transelliptical family assumes that the data follow an elliptical distribution after unspec ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Han and Liu (2012) proposed a method named transelliptical component analysis (TCA) for conducting scaleinvariant principal component analysis on high dimensional data with transelliptical distributions. The transelliptical family assumes that the data follow an elliptical distribution after unspecified marginal monotone transformations. In a double asymptotic framework where the dimension d is allowed to increase with the sample size n, Han and Liu (2012) showed that one version of TCA attains a “nearly parametric ” rate of convergence in parameter estimation when the parameter of interest is assumed to be sparse. This paper improves upon their results in two aspects: (i) Under the nonsparse setting (i.e., the parameter of interest is not assumed to be sparse), we show that a version of TCA attains the optimal rate of convergence up to a logarithmic factor; (ii) Under the sparse setting, we also lay out venues to analyze the performance of the TCA estimator proposed in Han and Liu (2012). In particular, we provide a “sign subgaussian condition ” which is sufficient for TCA to attain an improved rate of convergence and verify a subfamily of the transelliptical distributions satisfying this condition.
The cluster graphical lasso for improved estimation of Gaussian graphical models
, 2014
"... We consider the task of estimating a Gaussian graphical model in the highdimensional setting. The graphical lasso, which involves maximizing the Gaussian log likelihood subject to an `1 penalty, is a wellstudied approach for this task. We begin by introducing a surprising connection between the g ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
We consider the task of estimating a Gaussian graphical model in the highdimensional setting. The graphical lasso, which involves maximizing the Gaussian log likelihood subject to an `1 penalty, is a wellstudied approach for this task. We begin by introducing a surprising connection between the graphical lasso and hierarchical clustering: the graphical lasso in effect performs a twostep procedure, in which (1) single linkage hierarchical clustering is performed on the variables in order to identify connected components, and then (2) an `1penalized log likelihood is maximized on the subset of variables within each connected component. In other words, the graphical lasso determines the connected components of the estimated network via single linkage clustering. Unfortunately, single linkage clustering is known to perform poorly in certain settings. Therefore, we propose the cluster graphical lasso, which involves clustering the features using an alternative to single linkage clustering, and then performing the graphical lasso on the subset of variables within each cluster. We establish model selection consistency for this technique, and demonstrate its improved performance relative to the graphical lasso in a simulation study, as well as in applications to an equities data set, a university webpage data set, and a gene expression data set. 1
Statistical Analysis of Big Data on Pharmacogenomics
"... This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively overview several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for netwo ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively overview several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, largescale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identify important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed.
Optimal estimation of sparse correlation matrices of semiparametric Gaussian copulas
"... Statistical inference of semiparametric Gaussian copulas is well studied in the classical fixed dimension and large sample size setting. Nevertheless, optimal estimation of the correlation matrix of semiparametric Gaussian copula is understudied, especially when the dimension can far exceed the sam ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Statistical inference of semiparametric Gaussian copulas is well studied in the classical fixed dimension and large sample size setting. Nevertheless, optimal estimation of the correlation matrix of semiparametric Gaussian copula is understudied, especially when the dimension can far exceed the sample size. In this paper we derive the minimax rate of convergence under the matrix 1norm and 2norm for estimating large correlation matrices of semiparametric Gaussian copulas when the correlation matrices are in a weak q ball. We further show that an explicit rankbased thresholding estimator adaptively attains minimax optimal rate of convergence simultaneously for all 0 ≤ q < 1. Numerical examples are provided to demonstrate the finite sample performance of the rankbased thresholding estimator.
Sparse Median Graphs Estimation in a High Dimensional Semiparametric Model
"... In this manuscript a unified framework for conducting inference on complex aggregated data in high dimensional settings is proposed. The data are assumed to be a collection of multiple nonGaussian realizations with underlying undirected graphical structures. Utilizing the concept of median graphs ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
In this manuscript a unified framework for conducting inference on complex aggregated data in high dimensional settings is proposed. The data are assumed to be a collection of multiple nonGaussian realizations with underlying undirected graphical structures. Utilizing the concept of median graphs in summarizing the commonality across these graphical structures, a novel semiparametric approach to modeling such complex aggregated data is provided along with robust estimation of the median graph, which is assumed to be sparse. The estimator is proved to be consistent in graph recovery and an upper bound on the rate of convergence is given. Experiments on both synthetic and real datasets are conducted to illustrate the empirical usefulness of the proposed models and methods. 1
Analysis of elliptical copula correlation factor model with kendall’s tau. Personal Communication
, 2013
"... ar ..."
(Show Context)
Multivariate analysis of nonparametric estimates of large correlation matrices. arXiv preprint
, 2014
"... ar ..."
Graph Estimation with Joint Additive Models
, 2012
"... In recent years, there has been considerable interest in estimating conditional independence graphs in high dimensions. Most previous work has assumed that the variables are multivariate Gaussian, or that the conditional means of the variables are linear; in fact, these two assumptions are nearly eq ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
In recent years, there has been considerable interest in estimating conditional independence graphs in high dimensions. Most previous work has assumed that the variables are multivariate Gaussian, or that the conditional means of the variables are linear; in fact, these two assumptions are nearly equivalent. Unfortunately, if these assumptions are violated, the resulting conditional independence estimates can be inaccurate. We propose a semiparametric method, graph estimation with joint additive models, which allows the conditional means of the features to take on an arbitrary additive form. We present an efficient algorithm for our estimator’s computation, and prove that it is consistent. We extend our method to estimation of directed graphs with known causal ordering. Using simulated data, we show that our method performs better than existing methods when there are nonlinear relationships among the features, and is comparable to methods that assume multivariate normality when the conditional means are linear. We illustrate our method on a cellsignaling data set.
Gaussian Copula Precision Estimation with Missing
 Values, International Conference on Artificial Intelligence and Statistics
, 2014
"... We consider the problem of estimating sparse precision matrix of Gaussian copula distributions using samples with missing values in high dimensions. Existing approaches, primarily designed for Gaussian distributions, suggest using plugin estimators by disregarding the missing values. In this paper ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
We consider the problem of estimating sparse precision matrix of Gaussian copula distributions using samples with missing values in high dimensions. Existing approaches, primarily designed for Gaussian distributions, suggest using plugin estimators by disregarding the missing values. In this paper, we propose double plugin Gaussian (DoPinG) copula estimators to estimate the sparse precision matrix corresponding to nonparanormal distributions. DoPinG uses two plugin procedures and consists of three steps: (1) estimate nonparametric correlations based on observed values, including Kendall’s tau and Spearman’s rho; (2) estimate the nonparanormal correlation matrix; (3) plug into existing sparse precision estimators. We prove that DoPinG copula estimators consistently estimate the nonparanormal correlation matrix at a rate of O ( 1(1−δ) log p n), where δ is the probability of missing values. We provide experimental results to illustrate the effect of sample size and percentage of missing data on the model performance. Experimental results show that DoPinG is significantly better than estimators like mGlasso, which are primarily designed for Gaussian data.