Results 1  10
of
21
The singular values and vectors of low rank perturbations of large rectangular random matrices
 J. MULTIVARIATE ANAL
, 2012
"... In this paper, we consider the singular values and singular vectors of finite, low rank perturbations of large rectangular random matrices. Specifically, we prove almost sure convergence of the extreme singular values and appropriate projections of the corresponding singular vectors of the perturbe ..."
Abstract

Cited by 28 (6 self)
 Add to MetaCart
In this paper, we consider the singular values and singular vectors of finite, low rank perturbations of large rectangular random matrices. Specifically, we prove almost sure convergence of the extreme singular values and appropriate projections of the corresponding singular vectors of the perturbed matrix. As in the prequel, where we considered the eigenvalues of Hermitian matrices, the nonrandom limiting value is shown to depend explicitly on the limiting singular value distribution of the unperturbed matrix via an integral transform that linearizes rectangular additive convolution in free probability theory. The asymptotic position of the extreme singular values of the perturbed matrix differs from that of the original matrix if and only if the singular values of the perturbing matrix are above a certain critical threshold which depends on this same aforementioned integral transform. We examine the consequence of this singular value phase transition on the associated left and right singular eigenvectors and discuss the fluctuations around these nonrandom limits.
Expectation Propagation for Bayesian Multitask Feature Selection
"... Abstract. In this paper we propose a Bayesian model for multitask feature selection. This model is based on a generalized spike and slab sparse prior distribution that enforces the selection of a common subset of features across several tasks. Since exact Bayesian inference in this model is intract ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we propose a Bayesian model for multitask feature selection. This model is based on a generalized spike and slab sparse prior distribution that enforces the selection of a common subset of features across several tasks. Since exact Bayesian inference in this model is intractable, approximate inference is performed through expectation propagation (EP). EP approximates the posterior distribution of the model using a parametric probability distribution. This posterior approximation is particularly useful to identify relevant features for prediction. We focus on problems for which the number of features d is significantly larger than the number of instances for each task. We propose an efficient parametrization of the EP algorithm that offers a computational complexity linear in d. Experiments on several multitask datasets show that the proposed model outperforms baseline approaches for singletask learning or data pooling across all tasks, as well as two stateoftheart multitask learning approaches. Additional experiments confirm the stability of the proposed feature selection with respect to various subsamplings of the training data. Keywords: Multitask learning, feature selection, expectation propagation, approximate Bayesian inference. 1
PowerExpectedPosterior Priors for Variable Selection in Gaussian Linear Models
, 2012
"... Summary: Imaginary training samples are often used in Bayesian statistics to develop prior distributions, with appealing interpretations, for use in model comparison. Expectedposterior priors are defined via imaginary training samples coming from a common underlying predictive distribution m ∗ , us ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Summary: Imaginary training samples are often used in Bayesian statistics to develop prior distributions, with appealing interpretations, for use in model comparison. Expectedposterior priors are defined via imaginary training samples coming from a common underlying predictive distribution m ∗ , using an initial baseline prior distribution. These priors can have subjective and also default Bayesian implementations, based on different choices of m ∗ and of the baseline prior. One of the main advantages of the expectedposterior priors is that impropriety of baseline priors causes no indeterminacy of Bayes factors; but at the same time they strongly depend on the selection and the size of the training sample. Here we combine ideas from the powerprior and the unitinformation prior methodologies to greatly diminish the effect of training samples on a Bayesian variableselection problem using the expectedposterior prior approach: we raise the likelihood involved in the expectedposterior prior distribution to a power that produces a prior information content equivalent to one data point. The result is that in practice our powerexpectedposterior (PEP) methodology is sufficiently insensitive to the size n ∗ of the training sample that one may take n ∗ equal to the fulldata sample size and dispense with training samples altogether; this promotes stability of the resulting Bayes factors, removes the arbitrariness arising from individual
Learning feature selection dependencies in multitask learning
 in Proceedings of the Advances in Neural Information Processing Systems 26 (NIPS ’13
, 2013
"... A probabilistic model based on the horseshoe prior is proposed for learning dependencies in the process of identifying relevant features for prediction. Exact inference is intractable in this model. However, expectation propagation offers an approximate alternative. Because the process of estimatin ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
A probabilistic model based on the horseshoe prior is proposed for learning dependencies in the process of identifying relevant features for prediction. Exact inference is intractable in this model. However, expectation propagation offers an approximate alternative. Because the process of estimating feature selection dependencies may suffer from overfitting in the model proposed, additional data from a multitask learning scenario are considered for induction. The same model can be used in this setting with few modifications. Furthermore, the assumptions made are less restrictive than in other multitask methods: The different tasks must share feature selection dependencies, but can have different relevant features and model coefficients. Experiments with real and synthetic data show that this model performs better than other multitask alternatives from the literature. The experiments also show that the model is able to induce suitable feature selection dependencies for the problems considered, only from the training data. 1
Generalized Spike and Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation
"... We describe a Bayesian method for group feature selection in linear regression problems. The method is based on a generalized version of the standard spike and slab prior distribution which is often used for individual feature selection. Exact Bayesian inference under the prior considered is infeasi ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We describe a Bayesian method for group feature selection in linear regression problems. The method is based on a generalized version of the standard spike and slab prior distribution which is often used for individual feature selection. Exact Bayesian inference under the prior considered is infeasible for typical regression problems. However, approximate inference can be carried out efficiently using Expectation Propagation (EP). A detailed analysis of the generalized spike and slab prior shows that it is well suited for regression problems that are sparse at the group level. Furthermore, this prior can be used to introduce prior knowledge about specific groups of features that are a priori believed to be more relevant. An experimental evaluation compares the performance of the proposed method with those of group LASSO, Bayesian group LASSO, automatic relevance determination and additional variants used for group feature selection. The results of these experiments show that a model based on the generalized spike and slab prior and the EP algorithm has stateoftheart prediction performance in the problems analyzed. Furthermore, this model is also very useful to carry out sequential experimental design (also known as active learning), where the data instances that are most informative are iteratively included in the training set, reducing the number of instances needed to obtain a particular level of prediction accuracy.
A variational Bayes approach to variable selection
, 2014
"... We develop methodology and theory for a mean field variational Bayes approximation to a linear model with a spike and slab prior on the regression coefficients. In particular we show how our method forces a subset of regression coefficients to be numerically indistinguishable from zero; under mild r ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We develop methodology and theory for a mean field variational Bayes approximation to a linear model with a spike and slab prior on the regression coefficients. In particular we show how our method forces a subset of regression coefficients to be numerically indistinguishable from zero; under mild regularity conditions estimators based on our method consistently estimates the model parameters with easily obtainable and appropriately sized standard error estimates; and, most strikingly, selects the true model at a exponential rate in the sample size. We also develop a practical method for simultaneously choosing reasonable initial parameter values and tuning the main tuning parameter of our algorithms which is both computationally efficient and empirically performs as well or better than some popular variable selection approaches. Our method is also faster and highly accurate when compared to MCMC.
Partial Sufficient Dimension Reduction in Regression
, 2011
"... This dissertation would not have been possible without the guidance and the help of several individuals who in one way or another contributed and extended their valuable assistance in the preparation and completion of this study. First and foremost, my utmost gratitude goes to my adviser, R. Dennis ..."
Abstract
 Add to MetaCart
This dissertation would not have been possible without the guidance and the help of several individuals who in one way or another contributed and extended their valuable assistance in the preparation and completion of this study. First and foremost, my utmost gratitude goes to my adviser, R. Dennis Cook. With his encouragement, guidance and support from the initial to the final level enabled me to develop an understanding of sufficient dimension reduction, I was able to complete this work. He was and remains my ultimate role model for a scientist, mentor, and teacher in my life. I also would like to thank my thesis committees, Prof. Meeden, Prof. Weisberg, and Prof. Reilly for their direction, dedication, and invaluable advice along this work. I am deeply indebted to my many colleagues for providing a stimulating and fun environment in which to learn and grow. I am especially grateful to Leif, Sai, and Ka Young who always make me laugh. I have never been tired or bored in the office because of them. I also thank Dimension Reduction group members, Zhihua, Xin, Shanshan, Sen, Jie, Eric, Adam, and Liliana. They helped me to understand the ideas of research area and construct the effective and efficient group web site. I could complete the work on a plate as the group leader because of their continuous support and valuable advices. Last but not the least, I would like to thank my family: my parents Manjin Kim and Miran Kim, for giving birth to me at the first place and supporting me spiritually throughout my life, and my younger brother Jihyuk, for sharing fresh thoughts and stimulating me to challenge for something new. As one of the family members, I want to give my special thanks to my roommate, Yonsil. She always helped me in every i possible way physically or emotionally to get over difficult situations and loneliness. I offer my regards and blessings to all of those who supported me in any respect during the completion of this study.
CORRELATION AND VARIANCE STABILIZATION IN THE TWO GROUP COMPARISON CASE IN HIGH DIMENSIONAL DATA UNDER DEPENDENCIES
"... Multiple testing research has undergone renewed focus in recent years as advances in high throughput technologies have produced data on unprecedented scales. Much of the focus has been on false discovery rates (FDR) and related quantities that are estimated (or controlled for) in large scale multipl ..."
Abstract
 Add to MetaCart
Multiple testing research has undergone renewed focus in recent years as advances in high throughput technologies have produced data on unprecedented scales. Much of the focus has been on false discovery rates (FDR) and related quantities that are estimated (or controlled for) in large scale multiple testing situations. Recent papers by Efron have directly addressed this issue and incorporated measures to account for highdimensional correlation structure when estimating false discovery rates and when estimating a density. Other authors also have proposed methods to control or estimate FDR under dependencies with certain assumptions. However, not much focus is given to the stability of the results obtained under dependencies in the literature. This work begins by demonstrating the effect of dependence structure on the variance of the number of discoveries and the false discovery proportion (FDP). A variance of the number of discoveries is shown and the density of a test statistic, conditioned on the status (reject or failure to reject) of a different correlated test, is derived. A closed form solution to the correlation between test statistics is also derived. This correlation is a combination of correlations and variances of the data within groups being compared. It is shown that these correlations among
Working Paper 01/12Independence Test for High Dimensional Random Vectors
, 2012
"... This paper proposes a new mutual independence test for a large number of high dimensional random vectors. The test statistic is based on the characteristic function of the empirical spectral distribution of the sample covariance matrix. The asymptotic distributions of the test statistic under the nu ..."
Abstract
 Add to MetaCart
This paper proposes a new mutual independence test for a large number of high dimensional random vectors. The test statistic is based on the characteristic function of the empirical spectral distribution of the sample covariance matrix. The asymptotic distributions of the test statistic under the null and local alternative hypotheses are established as dimensionality and the sample size of the data are comparable. We apply this test to examine multiple MA(1) and AR(1) models, panel data models with some spatial crosssectional structures. In addition, in a flexible applied fashion, the proposed test can capture some dependent but uncorrelated structures, for example, nonlinear MA(1) models, multiple ARCH(1) models and vandermonde matrices. Simulation results are provided for detecting these dependent structures. An empirical study of dependence between closed stock prices of several companies from New York Stock Exchange (NYSE) demonstrates that the feature of cross–sectional dependence is popular in stock markets.
Submitted to the Annals of Applied Probability HIGH DIMENSIONAL SPARSE COVARIANCE ESTIMATION: ACCURATE THRESHOLDS FOR THE MAXIMAL DIAGONAL ENTRY AND FOR THE LARGEST CORRELATION COEFFICIENT
"... The maxima of many independent, or weakly dependent, random variables, and their corresponding thresholds for given right tail probabilities are classical and well studied problems. In this paper we focus on two specific cases of interest related to estimation and hypothesis testing of high dimensio ..."
Abstract
 Add to MetaCart
(Show Context)
The maxima of many independent, or weakly dependent, random variables, and their corresponding thresholds for given right tail probabilities are classical and well studied problems. In this paper we focus on two specific cases of interest related to estimation and hypothesis testing of high dimensional sparse covariance matrices. These are the distribution of the maximal diagonal entry of a sample covariance matrix and the largest offdiagonal correlation coefficient, both under the assumption of an identity population covariance. In both cases, as sample size and dimension tend to infinity, upon centering and scaling, there is asymptotic convergence to a Gumbel distribution. We show, however, that this convergence is slow and that finite sample distributions may be quite far from these asymptotic ones. Applying a perturbation approach, we identify the leading error terms, and derive more accurate distributions and corresponding thresholds. For nonGaussian data, these depend explicitly on higher order moments via appropriate Edgeworth expansions. As a side result, we also derive sharp bounds for the left and right tail probabilities of a single χ 2 n random variable, which may be of independent interest. 1. Introduction. Many