Results 1  10
of
51
Bayesian variable selection in clustering highdimensional data
 Journal of the American Statistical Association
, 2005
"... Over the last decade, technological advances have generated an explosion of data with substantially smaller sample size relative to the number of covariates ( p ≫ n). A common goal in the analysis of such data involves uncovering the group structure of the observations and identifying the discrimina ..."
Abstract

Cited by 63 (4 self)
 Add to MetaCart
Over the last decade, technological advances have generated an explosion of data with substantially smaller sample size relative to the number of covariates ( p ≫ n). A common goal in the analysis of such data involves uncovering the group structure of the observations and identifying the discriminating variables. In this article we propose a methodology for addressing these problems simultaneously. Given a set of variables, we formulate the clustering problem in terms of a multivariate normal mixture model with an unknown number of components and use the reversiblejump Markov chain Monte Carlo technique to define a sampler that moves between different dimensional spaces. We handle the problem of selecting a few predictors among the prohibitively vast number of variable subsets by introducing a binary exclusion/inclusion latent vector, which gets updated via stochastic search techniques. We specify conjugate priors and exploit the conjugacy by integrating out some of the parameters. We describe strategies for posterior inference and explore the performance of the methodology with simulated and real datasets.
Bayes model averaging with selection of regressors
 Journal of the Royal Statistical Society. Series B, Statistical Methodology
, 2002
"... Summary. When a number of distinct models contend for use in prediction, the choice of a single model can offer rather unstable predictions. In regression, stochastic search variable selection with Bayesian model averaging offers a cure for this robustness issue but at the expense of requiring very ..."
Abstract

Cited by 58 (10 self)
 Add to MetaCart
Summary. When a number of distinct models contend for use in prediction, the choice of a single model can offer rather unstable predictions. In regression, stochastic search variable selection with Bayesian model averaging offers a cure for this robustness issue but at the expense of requiring very many predictors. Here we look at Bayes model averaging incorporating variable selection for prediction. This offers similar meansquare errors of prediction but with a vastly reduced predictor space. This can greatly aid the interpretation of the model. It also reduces the cost if measured variables have costs. The development here uses decision theory in the context of the multivariate general linear model. In passing, this reduced predictor space Bayes model averaging is contrasted with singlemodel approximations. A fast algorithm for updating regressions in the Markov chain Monte Carlo searches for posterior inference is developed, allowing many more variables than observations to be contemplated. We discuss the merits of absolute rather than proportionate shrinkage in regression, especially when there are more variables than observations. The methodology is illustrated on a set of spectroscopic data used for measuring the amounts of different sugars in an aqueous solution.
Bayesian model averaging: development of an improved multiclass, gene selection and classification tool for microarray data
, 2005
"... ..."
Variable selection in clustering via Dirichlet process mixture models
, 2006
"... The increased collection of highdimensional data in various fields has raised a strong interest in clustering algorithms and variable selection procedures. In this paper, we propose a modelbased method that addresses the two problems simultaneously. We introduce a latent binary vector to identify ..."
Abstract

Cited by 41 (3 self)
 Add to MetaCart
The increased collection of highdimensional data in various fields has raised a strong interest in clustering algorithms and variable selection procedures. In this paper, we propose a modelbased method that addresses the two problems simultaneously. We introduce a latent binary vector to identify discriminating variables and use Dirichlet process mixture models to define the cluster structure. We update the variable selection index using a Metropolis algorithm and obtain inference on the cluster structure via a splitmerge Markov chain Monte Carlo technique. We explore the performance of the methodology on simulated data and illustrate an application with a dna microarray study.
GALGO: an R package for multivariate variable selection using genetic algorithms
 Bioinformatics
"... Abstract Summary: The development of statistical models linking the molecular state of a cell to its physiology is one of the most important tasks in the analysis of Functional Genomics data. Because of the large number of variables measured a comprehensive evaluation of variable subsets cannot be ..."
Abstract

Cited by 23 (5 self)
 Add to MetaCart
Abstract Summary: The development of statistical models linking the molecular state of a cell to its physiology is one of the most important tasks in the analysis of Functional Genomics data. Because of the large number of variables measured a comprehensive evaluation of variable subsets cannot be performed with available computational resources. It follows that an efficient variable selection strategy is required. However, although software packages to perform univariate variable selection are available, a comprehensive software environment to develop and evaluate multivariate statistical models using a multivariate variable selection strategy is still needed. In order to address this issue, we developed GALGO, an R package based on a genetic algorithm variable selection strategy, primarily designed to develop statistical models from largescale datasets. Availability: GALGO, along with supplementary information, can be downloaded from:
On the consistency of Bayesian variable selection for high dimensional binary regression and classification
 Neural Comput
, 2006
"... Bayesian variable selection has gained much empirical success recently in a variety of applications when the number K of explanatory variables (x1,...,xK) is possibly much larger than the sample size n. For generalized linear models, if most of the xj’s have very small effects on the response y, we ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
(Show Context)
Bayesian variable selection has gained much empirical success recently in a variety of applications when the number K of explanatory variables (x1,...,xK) is possibly much larger than the sample size n. For generalized linear models, if most of the xj’s have very small effects on the response y, we show that it is possible to use Bayesian variable selection to reduce overfitting caused by the curse of dimensionality K ≫ n. In this approach a suitable prior can be used to choose a few out of the many xj’s to model y, so that the posterior will propose probability densities p that are “often close ” to the true density p ∗ in some sense. The closeness can be described by a Hellinger distance between p and p ∗ that scales at a power very close to n −1/2, which is the “finitedimensional rate ” corresponding to a lowdimensional situation. These findings extend some recent work of Jiang [Technical Report 0502 (2005) Dept. Statistics, Northwestern Univ.] on consistency of Bayesian variable selection for binary classification.
Variable selection for nonparametric Gaussian process priors: Models and computational strategies
 Statistical Science
, 2011
"... Abstract. This paper presents a unified treatment of Gaussian process models that extends to data from the exponential dispersion family and to survival data. Our specific interest is in the analysis of data sets with predictors that have an a priori unknown form of possibly nonlinear associations t ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Abstract. This paper presents a unified treatment of Gaussian process models that extends to data from the exponential dispersion family and to survival data. Our specific interest is in the analysis of data sets with predictors that have an a priori unknown form of possibly nonlinear associations to the response. The modeling approach we describe incorporates Gaussian processes in a generalized linear model framework to obtain a class of nonparametric regression models where the covariance matrix depends on the predictors. We consider, in particular, continuous, categorical and count responses. We also look into models that account for survival outcomes. We explore alternative covariance formulations for the Gaussian process prior and demonstrate the flexibility of the construction. Next, we focus on the important problem of selecting variables from the set of possible predictors and describe a general framework that employs mixture priors. We compare alternative MCMC strategies for posterior inference and achieve a computationally efficient and practical approach. We demonstrate performances on simulated and benchmark data sets. Key words and phrases: Bayesian variable selection, generalized linear models, Gaussian processes, latent variables, MCMC, nonparametric regression, survival data.
A BAYESIAN GRAPHICAL MODELING APPROACH TO MICRORNA REGULATORY NETWORK INFERENCE
"... It has been estimated that about 30 % of the genes in the human genome are regulated by microRNAs (miRNAs). These are short RNA sequences that can downregulate the levels of mRNAs or proteins in animals and plants. Genes regulated by miRNAs are called targets. Typically, methods for target predicti ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
It has been estimated that about 30 % of the genes in the human genome are regulated by microRNAs (miRNAs). These are short RNA sequences that can downregulate the levels of mRNAs or proteins in animals and plants. Genes regulated by miRNAs are called targets. Typically, methods for target prediction are based solely on sequence data and on the structure information. In this paper we propose a Bayesian graphical modeling approach that infers the miRNA regulatory network by integrating expression levels of miRNAs with their potential mRNA targets and, via the prior probability model, with their sequence/structure information. We use a directed graphical model with a particular structure adapted to our data based on biological considerations. We then achieve network inference using stochastic search methods for variable selection that allow us to explore the huge model space via MCMC. A timedependent coefficients model is also implemented. We consider experimental data from a study on a very wellknown developmental toxicant causing neural tube defects, hyperthermia. Some of the pairs of target gene and miRNA we identify seem very plausible and warrant future investigation. Our proposed method is general and can be easily applied to other types of network inference by integrating multiple data sources. 1. Introduction. One
Simultaneous cancer classification and gene selection with Bayesian nearest . . .
 COMPUTATIONAL STATISTICS AND DATA ANALYSIS
, 2009
"... ..."