Results 1  10
of
210
Linear models and empirical bayes methods for assessing differential expression in microarray experiments.
 Stat. Appl. Genet. Mol. Biol.
, 2004
"... Abstract The problem of identifying differentially expressed genes in designed microarray experiments is considered. Lonnstedt and Speed (2002) derived an expression for the posterior odds of differential expression in a replicated twocolor experiment using a simple hierarchical parametric model. ..."
Abstract

Cited by 1321 (24 self)
 Add to MetaCart
Abstract The problem of identifying differentially expressed genes in designed microarray experiments is considered. Lonnstedt and Speed (2002) derived an expression for the posterior odds of differential expression in a replicated twocolor experiment using a simple hierarchical parametric model. The purpose of this paper is to develop the hierarchical model of Lonnstedt and Speed (2002) into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples. The model is reset in the context of general linear models with arbitrary coefficients and contrasts of interest. The approach applies equally well to both single channel and two color microarray experiments. Consistent, closed form estimators are derived for the hyperparameters in the model. The estimators proposed have robust behavior even for small numbers of arrays and allow for incomplete data arising from spot filtering or spot quality weights. The posterior odds statistic is reformulated in terms of a moderated tstatistic in which posterior residual standard deviations are used in place of ordinary standard deviations. The empirical Bayes approach is equivalent to shrinkage of the estimated sample variances towards a pooled estimate, resulting in far more stable inference when the number of arrays is small. The use of moderated tstatistics has the advantage over the posterior odds that the number of hyperparameters which need to estimated is reduced; in particular, knowledge of the nonnull prior for the fold changes are not required. The moderated tstatistic is shown to follow a tdistribution with augmented degrees of freedom. The moderated t inferential approach extends to accommodate tests of composite null hypotheses through the use of moderated Fstatistics. The performance of the methods is demonstrated in a simulation study. Results are presented for two publicly available data sets.
Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments
 STATISTICA SINICA
, 2002
"... DNA microarrays are a new and promising biotechnology whichallows the monitoring of expression levels in cells for thousands of genes simultaneously. The present paper describes statistical methods for the identification of differentially expressed genes in replicated cDNA microarray experiments. A ..."
Abstract

Cited by 438 (12 self)
 Add to MetaCart
DNA microarrays are a new and promising biotechnology whichallows the monitoring of expression levels in cells for thousands of genes simultaneously. The present paper describes statistical methods for the identification of differentially expressed genes in replicated cDNA microarray experiments. Although it is not the main focus of the paper, new methods for the important preprocessing steps of image analysis and normalization are proposed. Given suitably normalized data, the biological question of differential expression is restated as a problem in multiple hypothesis testing: the simultaneous test for each gene of the null hypothesis of no association between the expression levels and responses or covariates of interest. Di erentially expressed genes are identified based on adjusted pvalues for a multiple testing procedure which strongly controls the familywise Type I error rate and takes into account the dependence structure between the gene expression levels. No specific parametric form is assumed for the distribution of the test statistics and a permutation procedure is used to estimate adjusted pvalues. Several data displays are suggested for the visual identification of differentially expressed genes and of important features of these genes. The above methods are applied to microarray data from a study of gene expression in the livers of mice with very low HDL cholesterol levels. The genes identified using data from multiple slides are compared to those identified by recently published singleslide methods.
Normalization of cDNA microarray data
 Methods
, 2003
"... Normalization means to adjust microarray data for effects which arise from variation in the technology rather than from biological differences between the RNA samples or between the printed probes. This article describes normalization methods based on the fact that dye balance typically varies with ..."
Abstract

Cited by 242 (8 self)
 Add to MetaCart
(Show Context)
Normalization means to adjust microarray data for effects which arise from variation in the technology rather than from biological differences between the RNA samples or between the printed probes. This article describes normalization methods based on the fact that dye balance typically varies with spot intensity and with spatial position on the array. Printtip loess normalization provides a welltested general purpose normalization method which has given good results on a wide range of arrays. The method may be refined by using quality weights for individual spots. The method is best combined with diagnostic plots of the data which display the spatial and intensity trends. When diagnostic plots show that biases still remain in the data after normalization, further normalization steps such as plateorder normalization or scalenormalization between the arrays may be undertaken. Composite normalization may be used when control spots are available which are known to be not differentially expressed. Variations on loess normalization include global loess normalization and 2D normalization. Detailed commands are given to implement the normalization techniques using freely available software. 1
Use of withinarray replicate spots for assessing differential expression in microarray experiments
 Bioinformatics
, 2005
"... Motivation. Spotted arrays are often printed with probes in duplicate or triplicate, but current methods for assessing differential expression are not able to make full use of the resulting information. Usual practice is to average the duplicate or triplicate results for each probe before assessing ..."
Abstract

Cited by 239 (8 self)
 Add to MetaCart
(Show Context)
Motivation. Spotted arrays are often printed with probes in duplicate or triplicate, but current methods for assessing differential expression are not able to make full use of the resulting information. Usual practice is to average the duplicate or triplicate results for each probe before assessing differential expression. This loses valuable information about genewise variability. Results. A method is proposed for extracting more information from withinarray replicate spots in microarray experiments by estimating the strength of the correlation between them. The method involves fitting separate linear models to the expression data for each gene but with a common value for the betweenreplicate correlation. The method greatly improves the precision with which the genewise variances are estimated and thereby improves inference methods designed to identify differentially expressed genes. The method may be combined with empirical Bayes methods for moderating the genewise variances between genes. The method is validated using data from a microarray experiment involving calibration and ratio control spots in conjunction with spikedin RNA. Comparing results for calibration and ratio control spots shows that the common correlation method results in substantially better discrimination of differentially expressed genes from those which are not. The spikein experiment also confirms that the results may be further improved by empirical Bayes smoothing of the variances when the sample size is small. Availability. The methodology is implemented in the limma software package for R, available from the CRAN repository
Statistical Issues in cDNA Microarray Data Analysis
, 2003
"... This article summarizes some of the issues involved and provides a brief review of the analysis tools which are available to researchers to deal with them. Any microarray experiment involves a number of distinct stages. Firstly there is the design of the experiment. The researchers must decide which ..."
Abstract

Cited by 83 (6 self)
 Add to MetaCart
(Show Context)
This article summarizes some of the issues involved and provides a brief review of the analysis tools which are available to researchers to deal with them. Any microarray experiment involves a number of distinct stages. Firstly there is the design of the experiment. The researchers must decide which genes are to be printed on the arrays, which sources of RNA are to be hybridized to the arrays and on how many arrays the hybridizations will be replicated. Secondly, after hybridization, there follows a number of datacleaning steps or `lowlevel analysis' of the microarray data. The microarray images must be processed to acquire red and green foreground and background intensities for each spot. The acquired red/green ratios must be normalized to adjust for dyebias and for any systematic variation other than that due to the differences between the RNA samples being studied. Thirdly, the normalized ratios are analyzed by various graphical and numerical means to select differentially expressed genes or to find groups of genes whose expression profiles can reliably classify the different RNA sources into meaningful groups. The sections of this article correspond roughly to the various analysis steps. The following notation will be used throughout the article. The foreground red and green intensities will be written Pp and 9p for each spot. The background intensities will be Pf and 9f . The backgroundcorrected intensities will be P and 9 where usually P Pp Pf 0 # and 9 9p 9f 0 # . The logdifferential expression ratio will be vyq # E P 9 0 for each spot. Finally, the logintensity of the spot will be vyq 3 P9 0 , a measure of the overall brightness of the spot. (The letter E is a mnemonic for minus as vyq vyq E P 9 0 # while 3 is a mnemonic for add as #vyq vyq #...
Optimal Sample Size for Multiple Testing: the Case of Gene Expression Microarrays
 Journal of the American Statistical Association
, 2004
"... We consider the choice of an optimal sample size for multiple comparison problems. The motivating application is the choice of the number of microarray experiments to be carried out when learning about dierential gene expression. However, the approach is valid in any application that involves multip ..."
Abstract

Cited by 75 (5 self)
 Add to MetaCart
We consider the choice of an optimal sample size for multiple comparison problems. The motivating application is the choice of the number of microarray experiments to be carried out when learning about dierential gene expression. However, the approach is valid in any application that involves multiple comparison in a large number of hypothesis tests.
A multivariate empirical bayes statistic for replicated microarray time course data
 Annals of Statistics
, 2006
"... ar ..."
(Show Context)
A Bayesian mixture model for differential gene expression
 Journal of the Royal Statistical Society C
, 2005
"... We propose modelbased inference for differential gene expression, using a nonparametric Bayesian probability model for the distribution of gene intensities under different conditions. The probability model is essentially a mixture of normals. The resulting inference is similar to the empirical Bay ..."
Abstract

Cited by 58 (5 self)
 Add to MetaCart
We propose modelbased inference for differential gene expression, using a nonparametric Bayesian probability model for the distribution of gene intensities under different conditions. The probability model is essentially a mixture of normals. The resulting inference is similar to the empirical Bayes approach proposed in Efron et al. (2001). The use of fully modelbased inference mitigates some of the necessary limitations of the empirical Bayes method. However, the increased generality of our method comes at a price. Computation is not as straightforward as in the empirical Bayes scheme. But we argue that inference is no more difficult than posterior simulation in traditional nonparametric mixture of normal models. We illustrate the proposed method in two examples, including a simulation study and a microarray experiment to screen for genes with differential expression in colon cancer versus normal tissue (Alon et al., 1999).
The optimal discovery procedure I: A new approach to simultaneous significance testing
 UW Biostatistics Working Paper Series, Working Paper 259. http:// www.bepress.com/uwbiostat/paper259
, 2005
"... Significance testing is one of the main objectives of statistics. The NeymanPearson lemma provides a simple rule for optimally testing a single hypothesis when the null and alternative distributions are known. This result has played a major role in the development of significance testing strategies ..."
Abstract

Cited by 48 (7 self)
 Add to MetaCart
(Show Context)
Significance testing is one of the main objectives of statistics. The NeymanPearson lemma provides a simple rule for optimally testing a single hypothesis when the null and alternative distributions are known. This result has played a major role in the development of significance testing strategies that are used in practice. Most of the work extending single testing strategies to multiple tests has focused on formulating and estimating new types of significance measures, such as the false discovery rate. These methods tend to be based on pvalues that are calculated from each test individually, ignoring information from the other tests. As shrinkage estimation borrows strength across point estimates to improve their overall performance, I show here that borrowing strength across multiple significance tests can improve their performance as well. The ‘’optimal discovery procedure ” (ODP) is introduced, which shows how to maximize the number of expected true positives for each fixed number of expected false positives. The optimality achieved by this procedure is shown to be closely related to optimality in terms of the false discovery rate. The ODP motivates a new approach to testing multiple hypotheses, especially when the tests are related. As a simple example, a new simultaneous procedure for testing several Normal means is defined; this is surprisingly demonstrated to outperform the optimal single test procedure, showing that an optimal method for single tests may no longer be optimal in the multiple test setting. Connections to other concepts in statistics are discussed, including Stein’s paradox, shrinkage estimation, and Bayesian classification theory.
Bayesian robust inference for differential gene expression in microarrays with multiple samples
 Biometrics
"... We consider the problem of identifying differentially expressed genes under different conditions using gene expression microarrays. Because of the many steps involved in the experimental process, from hybridization to image analysis, cDNA microarray data often contain outliers. For example, an outly ..."
Abstract

Cited by 43 (11 self)
 Add to MetaCart
We consider the problem of identifying differentially expressed genes under different conditions using gene expression microarrays. Because of the many steps involved in the experimental process, from hybridization to image analysis, cDNA microarray data often contain outliers. For example, an outlying data value could occur because of scratches or dust on the surface, imperfections in the glass, or imperfections in the array production. We develop a robust Bayesian hierarchical model for testing for differential expression. Errors are modeled explicitly using a tdistribution, which accounts for outliers. The model includes an exchangeable prior for the variances which allow different variances for the genes but still shrink extreme empirical variances. Our model can be used for testing for differentially expressed genes among multiple samples, and it can distinguish between the different possible patterns of differential expression when there are three or more samples. Parameter estimation is carried out using a novel version of Markov chain Monte Carlo that is appropriate when the model puts mass on subspaces of the full parameter space. The method is illustrated using two publicly available gene expression data sets. We compare our method to six other baseline and commonly used techniques, namely the ttest, the Bonferroniadjusted ttest, Significance Analysis of Microarrays (SAM), Efron’s empirical Bayes, and EBarrays in both its LognormalNormal and GammaGamma forms. In an experiment with HIV data, our method performed better than these alternatives, on the basis of betweenreplicate agreement and disagreement.