Results 1  10
of
46
Bayesian and L1 approaches to sparse unsupervised learning. Arxiv preprint arXiv:1106.1157
, 2011
"... The use of L1 regularisation for sparse learning has generated immense research interest, with many successful applications in diverse areas such as signal acquisition, image coding, genomics and collaborative filtering. While existing work highlights the many advantages of L1 methods, in this paper ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
The use of L1 regularisation for sparse learning has generated immense research interest, with many successful applications in diverse areas such as signal acquisition, image coding, genomics and collaborative filtering. While existing work highlights the many advantages of L1 methods, in this paper we find that L1 regularisation often dramatically underperforms in terms of predictive performance when compared to other methods for inferring sparsity. We focus on unsupervised latent variable models, and develop L1 minimising factor models, Bayesian variants of “L1”, and Bayesian models with a stronger L0like sparsity induced through spikeandslab distributions. These spikeandslab Bayesian factor models encourage sparsity while accounting for uncertainty in a principled manner, and avoid unnecessary shrinkage of nonzero values. We demonstrate on a number of data sets that in practice spikeandslab Bayesian methods outperform L1 minimisation, even on a computational budget. We thus highlight the need to reassess the wide use of L1 methods in sparsityreliant applications, particularly when we care about generalising to previously unseen data, and provide an alternative that, over many varying conditions, provides improved generalisation performance.
On the halfCauchy prior for a global scale parameter
, 2010
"... This paper argues that the halfCauchy distribution should replace the inverseGamma distribution as a default prior for a toplevel scale parameter in Bayesian hierarchical models, at least for cases where a proper prior is necessary. Our arguments involve a blend of Bayesian and frequentist reason ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
This paper argues that the halfCauchy distribution should replace the inverseGamma distribution as a default prior for a toplevel scale parameter in Bayesian hierarchical models, at least for cases where a proper prior is necessary. Our arguments involve a blend of Bayesian and frequentist reasoning, and are intended to complement the original case made by Gelman (2006) in support of the foldedt family of priors. First, we generalize the halfCauchy prior to the wider class of hypergeometric invertedbeta priors. We derive expressions for posterior moments and marginal densities when these priors are used for a toplevel normal variance in a Bayesian hierarchical model. We go on to prove a proposition that, together with the results for moments and marginals, allows us to characterize the frequentist risk of the Bayes estimators under all globalshrinkage priors in the class. These theoretical results, in turn, allow us to study the frequentist properties of the halfCauchy prior versus a wide class of alternatives. The halfCauchy occupies a sensible “middle ground ” within this class: it performs very well near the origin, but does not lead to drastic compromises in other parts of the parameter space. This provides an alternative, classical justification for the repeated, routine use of this prior. We also consider situations where the underlying mean vector is sparse, where we argue that the usual conjugate choice of an inversegamma prior is particularly inappropriate, and can lead to highly distorted posterior inferences. Finally, we briefly summarize some open issues in the specification of default priors for scale terms in hierarchical models.
Posterior contraction in sparse Bayesian factor models for massive covariance matrices
, 2012
"... ar ..."
(Show Context)
Local shrinkage rules, Lévy processes, and regularized regression
, 2010
"... We use Lévy processes to generate joint prior distributions, and therefore penalty functions, for a location parameter β = (β1,...,βp) as p grows large. This generalizes the class of localglobal shrinkage rules based on scale mixtures of normals, illuminates new connections among disparate methods, ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
We use Lévy processes to generate joint prior distributions, and therefore penalty functions, for a location parameter β = (β1,...,βp) as p grows large. This generalizes the class of localglobal shrinkage rules based on scale mixtures of normals, illuminates new connections among disparate methods, and leads to new results for computing posterior means and modes under a wide class of priors. We extend this framework to largescale regularized regression problems where p> n, and provide comparisons with other methodologies.
Bayesian gaussian copula factor models for mixed data. Arxiv Preprint arXiv:1111.0317
, 2011
"... Gaussian factor models have proven widely useful for parsimoniously characterizing dependence in multivariate data. There is a rich literature on their extension to mixed categorical and continuous variables, using latent Gaussian variables or through generalized latent trait models acommodating mea ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
Gaussian factor models have proven widely useful for parsimoniously characterizing dependence in multivariate data. There is a rich literature on their extension to mixed categorical and continuous variables, using latent Gaussian variables or through generalized latent trait models acommodating measurements in the exponential family. However, when generalizing to nonGaussian measured variables the latent variables typically influence both the dependence structure and the form of the marginal distributions, complicating interpretation and introducing artifacts. To address this problem we propose a novel class of Bayesian Gaussian copula factor models which decouple the latent factors from the marginal distributions. A semiparametric specification for the marginals based on the extended rank likelihood yields straightforward implementation and substantial computational gains. We provide new theoretical and empirical justifications for using this likelihood in Bayesian inference. We propose new default priors for the factor loadings and develop efficient parameterexpanded Gibbs sampling for posterior computation. The methods are evaluated through simulations and applied to a dataset in political science. The models in this paper are implemented in the R package bfa.
spikeSlabGAM: Bayesian variable selection, model choice and regularization for generalized additive mixed models in R
 Journal of Statistical Software
"... The R package spikeSlabGAM implements Bayesian variable selection, model choice, and regularized estimation in (geo)additive mixed models for Gaussian, binomial, and Poisson responses. Its purpose is to (1) choose an appropriate subset of potential covariates and their interactions, (2) to determin ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
The R package spikeSlabGAM implements Bayesian variable selection, model choice, and regularized estimation in (geo)additive mixed models for Gaussian, binomial, and Poisson responses. Its purpose is to (1) choose an appropriate subset of potential covariates and their interactions, (2) to determine whether linear or more flexible functional forms are required to model the effects of the respective covariates, and (3) to estimate their shapes. Selection and regularization of the model terms is based on a novel spikeandslabtype prior on coefficient groups associated with parametric and semiparametric effects.
EPGIG Priors and Applications in Bayesian Sparse Learning
"... In this paper we propose a novel framework for the construction of sparsityinducing priors. In particular, we define such priors as a mixture of exponential power distributions with a generalized inverse Gaussian density (EPGIG). EPGIG is a variant of generalized hyperbolic distributions, and the ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
In this paper we propose a novel framework for the construction of sparsityinducing priors. In particular, we define such priors as a mixture of exponential power distributions with a generalized inverse Gaussian density (EPGIG). EPGIG is a variant of generalized hyperbolic distributions, and the special cases include Gaussian scale mixtures and Laplace scale mixtures. Furthermore, Laplace scale mixtures can subserve a Bayesian framework for sparse learning with nonconvex penalization. The densities of EPGIG can be explicitly expressed. Moreover, the corresponding posterior distribution also follows a generalized inverse Gaussian distribution. We exploit these properties to develop EM algorithms for sparse empirical Bayesian learning. We also show that these algorithms bear an interesting resemblance to iteratively reweightedℓ2 orℓ1 methods. Finally, we present two extensions for grouped variable selection and logistic regression.
DirichletLaplace priors for optimal shrinkage. arXiv preprint arXiv:1401.5398
, 2014
"... Penalized regression methods, such as L1 regularization, are routinely used in highdimensional applications, and there is a rich literature on optimality properties under sparsity assumptions. In the Bayesian paradigm, sparsity is routinely induced through twocomponent mixture priors having a pro ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Penalized regression methods, such as L1 regularization, are routinely used in highdimensional applications, and there is a rich literature on optimality properties under sparsity assumptions. In the Bayesian paradigm, sparsity is routinely induced through twocomponent mixture priors having a probability mass at zero, but such priors encounter daunting computational problems in high dimensions. This has motivated an amazing variety of continuous shrinkage priors, which can be expressed as globallocal scale mixtures of Gaussians, facilitating computation. In sharp contrast to the frequentist literature, little is known about the properties of such priors and the convergence and concentration of the corresponding posterior distribution. In this article, we propose a new class of Dirichlet– Laplace (DL) priors, which possess optimal posterior concentration and lead to efficient posterior computation exploiting results from normalized random measure theory. Finite sample performance of Dirichlet–Laplace priors relative to alternatives is assessed in simulated and real data examples.
Spikeandslab priors for function selection in structured additive regression models
 Journal of the American Statistical Association
"... Structured additive regression provides a general framework for complex Gaussian and nonGaussian regression models, with predictors comprising arbitrary combinations of nonlinear functions and surfaces, spatial effects, varying coefficients, random effects and further regression terms. The large fl ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Structured additive regression provides a general framework for complex Gaussian and nonGaussian regression models, with predictors comprising arbitrary combinations of nonlinear functions and surfaces, spatial effects, varying coefficients, random effects and further regression terms. The large flexibility of structured additive regression makes function selection a challenging and important task, aiming at (1) selecting the relevant covariates, (2) choosing an appropriate and parsimonious representation of the impact of covariates on the predictor and (3) determining the required interactions. We propose a spikeandslab prior structure for function selection that allows to include or exclude single coefficients as well as blocks of coefficients representing specific model terms. A novel multiplicative parameter expansion is required to obtain good mixing and convergence properties in a Markov chain Monte Carlo simulation approach and is shown to induce desirable shrinkage properties. In simulation studies and with (real) benchmark classification data, we investigate sensitivity to hyperparameter settings and compare performance to competitors. The flexibility and applicability of our approach are demonstrated in an additive piecewise exponential model with timevarying effects for rightcensored survival times of intensive care patients with sepsis. Geoadditive and additive mixed logit model applications are discussed in an extensive appendix. Keywords: parameter expansion, penalized splines, stochastic search variable selection, generalized additive mixed models, spatial regression 1.
RaoBlackwellization for Bayesian Variable Selection and Model Averaging in Linear and Binary Regression: A Novel Data Augmentation Approach
"... Choosing the subset of covariates to use in regression or generalized linear models is a ubiquitous problem. The Bayesian paradigm addresses the problem of model uncertainty by considering models corresponding to all possible subsets of the covariates, where the posterior distribution over models is ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Choosing the subset of covariates to use in regression or generalized linear models is a ubiquitous problem. The Bayesian paradigm addresses the problem of model uncertainty by considering models corresponding to all possible subsets of the covariates, where the posterior distribution over models is used to select models or combine them via Bayesian model averaging (BMA). Although conceptually straightforward, BMA is often difficult to implement in practice, since either the number of covariates is too large for enumeration of all subsets, calculations cannot be done analytically, or both. For orthogonal designs with the appropriate choice of prior, the posterior probability of any model can be calculated without having to enumerate the entire model space and scales linearly with the number of predictors, p. In this article we extend this idea to a much broader class of nonorthogonal design matrices. We propose a novel method which augments the observed nonorthogonal design by at most p new rows to obtain a design matrix with orthogonal columns and generate the “missing ” response variables in a data augmentation algorithm. We show that our data augmentation approach keeps the original posterior distribution of interest unaltered, and develop methods to construct RaoBlackwellized estimates of several quantities of interest, including posterior model probabilities of any model, which may not be available from an ordinary Gibbs sampler. Our method can be used for BMA in linear regression and binary regression with nonorthogonal design matrices in conjunction with independent “spike and slab ” priors with a continuous prior component that is a Cauchy or other heavy tailed distribution that may be represented as a scale mixture of normals. We provide simulated and real examples to illustrate the methodology. Supplemental materials for the manuscript are available online.