Results 1  10
of
43
Size, power and false discovery rates
, 2007
"... Modern scientific technology has provided a new class of largescale simultaneous inference problems, with thousands of hypothesis tests to consider at the same time. Microarrays epitomize this type of technology, but similar situations arise in proteomics, spectroscopy, imaging, and social science ..."
Abstract

Cited by 53 (4 self)
 Add to MetaCart
Modern scientific technology has provided a new class of largescale simultaneous inference problems, with thousands of hypothesis tests to consider at the same time. Microarrays epitomize this type of technology, but similar situations arise in proteomics, spectroscopy, imaging, and social science surveys. This paper uses false discovery rate methods to carry out both size and power calculations on largescale problems. A simple empirical Bayes approach allows the fdr analysis to proceed with a minimum of frequentist or Bayesian modeling assumptions. Closedform accuracy formulas are derived for estimated false discovery rates, and used to compare different methodologies: local or tailarea fdr’s, theoretical, permutation, or empirical null hypothesis estimates. Two microarray data sets as well as simulations are used to evaluate the methodology the power diagnostics showing why nonnull cases might easily fail to appear on a list of “significant ” discoveries. Short Title “Size, Power, and Fdr’s”
Local False Discovery Rates
, 2005
"... Modern scientific technology is providing a new class of largescale simultaneous inference problems, with hundreds or thousands of hypothesis tests to consider at the same time. Microarrays epitomize this type of technology but similar problems arise in proteomics, time of flight spectroscopy, flow ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
Modern scientific technology is providing a new class of largescale simultaneous inference problems, with hundreds or thousands of hypothesis tests to consider at the same time. Microarrays epitomize this type of technology but similar problems arise in proteomics, time of flight spectroscopy, flow cytometry, FMRI imaging, and massive social science surveys. This paper uses local false discovery rate methods to carry out size and power calculations on largescale data sets. An empirical Bayes approach allows the fdr analysis to proceed from a minimum of frequentist or Bayesian modeling assumptions. Microarray and simulated data sets are used to illustrate a convenient estimation methodology whose accuracy can be calculated in closed form. A crucial part of the methodology is an fdr assessment of “thinned counts”, what the histogram of test statistics would look like for just the nonnull cases
Flexible empirical Bayes models for differential gene expression
, 2007
"... Motivation: Inference about differential expression is a typical objective when analyzing gene expression data. Recently, Bayesian hierarchical models have become increasingly popular for this type of problem. The two most common hierarchical models are the hierarchical Gamma–Gamma (GG) and Lognorma ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
Motivation: Inference about differential expression is a typical objective when analyzing gene expression data. Recently, Bayesian hierarchical models have become increasingly popular for this type of problem. The two most common hierarchical models are the hierarchical Gamma–Gamma (GG) and Lognormal–Normal (LNN) models. However, to facilitate inference, some unrealistic assumptions have been made. One such assumption is that of a common coefficient of variation across genes, which can adversely affect the resulting inference. Results: In this paper, we extend both the GG and LNN modeling frameworks to allow for genespecific variances and propose EM based algorithms for parameter estimation. The proposed methodology is evaluated on three experimental datasets: one cDNA microarray experiment and two Affymetrix spikein experiments. The two extended models significantly reduce the false positive rate while keeping a high sensitivity when compared to the originals. Finally, using a simulation study we show that the new frameworks are also more robust to model misspecification. Availability: The R code for implementing the proposed methodology can be downloaded at
Data exploration, quality control and testing in singlecell qPCRbased gene expression experiments
 Bioinformatics
, 2013
"... Motivation: Cell populations are never truly homogeneous; individual cells exist in biochemical states that define functional differences between them. New technology based on microfluidic arrays combined with multiplexed quantitative polymerase chain reactions (qPCR) now enables highthroughput s ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
Motivation: Cell populations are never truly homogeneous; individual cells exist in biochemical states that define functional differences between them. New technology based on microfluidic arrays combined with multiplexed quantitative polymerase chain reactions (qPCR) now enables highthroughput singlecell gene expression measurement, allowing assessment of cellular heterogeneity. However very little analytic tools have been developed specifically for the statistical and analytical challenges of singlecell qPCR data. Results: We present a statistical framework for the exploration, quality control, and analysis of singlecell gene expression data from microfluidic arrays. We assess accuracy and withinsample heterogeneity of singlecell expression and develop quality control criteria to filter unreliable
Gaga: A parsimonious and flexible model for differential expression analysis
 Ann. Appl. Stat
, 2009
"... ar ..."
Markov chain Monte Carlo with mixtures of mutually singular distributions
 J. Comput
, 2008
"... singular distributions ..."
Selection and estimation for largescale simultaneous inference. (Can be downloaded from http://wwwstat.stanford.edu/~brad/ – click on ”papers and software
"... Modern scientific technology is providing a new class of simultaneous inference problems for the applied statistician, where there are hundreds or thousands or even more hypothesis tests to consider at the same time. Microarrays epitomize this type of technology, but similar problems arise in proteo ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Modern scientific technology is providing a new class of simultaneous inference problems for the applied statistician, where there are hundreds or thousands or even more hypothesis tests to consider at the same time. Microarrays epitomize this type of technology, but similar problems arise in proteomics, time of flight spectroscopy, flow cytometry, and functional Magnetic Resonance Imaging. The paper considers two related questions: given a large number of simultaneous hypothesis testing problems, how can we Select the NonNull cases? And how can we Estimate effect sizes for the NonNulls? Selection and Estimation combine to assess power, a largescale experiment’s ability to correctly identify individual cases of interest. Microarray data is used to illustrate a simple methodology that requires a minimum of mathematical modeling. Massive data sets have become a fact of currentday life for applied statisticians. Often these sets have parallel structure comprising a large number “N ” of similar problems. For example a microarray experiment comparing control subjects with treatment subjects might
Empirical Bayes in the presence of exceptional cases, with application to microarray data
, 2013
"... Empirical Bayes is a statistical approach for estimating a series of unknown parameters from a series of associated data observations. It provides an effective means to “borrow strength ” from the ensemble of cases when making inference about each individual case. Such methods are ideally suited to ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
Empirical Bayes is a statistical approach for estimating a series of unknown parameters from a series of associated data observations. It provides an effective means to “borrow strength ” from the ensemble of cases when making inference about each individual case. Such methods are ideally suited to genomic applications where data is collected for tens of thousands of genes simultaneously. Empirical Bayes methods can however be less effective when highly exceptional cases are present. This article proposes a practical solution whereby exceptional cases are identified and the amount of “learning ” from the ensemble appropriate for each case is assessed on a casespecific basis. The approach is developed in detail for the problem of estimating genewise variances from microarray data. In this context, the proposed robust empirical Bayes procedure recognizes and protects against hypervariable genes. The new procedure improves statistical power for most genes in many microarray data sets. Simulations show that the robust estimation procedure correctly controls the type I error rate and does not increase the number of false discoveries when no hypervariable genes are present. In the presence of hypervariable genes, the robust method improves power to detect differential expression for the majority of genes that are not outliers. The proposed robust method is applied to an example microarray data set, on which it correctly identifies and downweights genes associated with a hidden covariate and detects more genes likely to be scientifically relevant to the experimental conditions. The new procedure is implemented in the limma software package which is freely available from the Bioconductor repository.
Bayesian modeling of MPSS data: gene expression analysis of bovine Salmonella infection
 J. Amer. Statist. Assoc
, 2011
"... Abstract: Among the countingbased technologies available for gene expression profiling, massively parallel signature sequencing (MPSS) has some advantages over competitors such as serial analysis of gene expression (SAGE) or direct sequencing of cDNA and is ideal for building complex relational da ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Abstract: Among the countingbased technologies available for gene expression profiling, massively parallel signature sequencing (MPSS) has some advantages over competitors such as serial analysis of gene expression (SAGE) or direct sequencing of cDNA and is ideal for building complex relational databases for gene expression. The goal of our present study was 1 a comparison between the in vivo global gene expression profiles of tissues infected with different strains of Salmonella obtained using the MPSS technology. In this article, we develop an exact ANOVA type model for this count data using a zero inflated Poisson (ZIP) distribution, different from the existing methods that assume continuous densities. We adopt two Bayesian hierarchical modelsone parametric and the other semiparametric with a Dirichlet process prior that has the ability to "borrow strength" across related signatures, where signature is a specific arrangement of the nucleotides, usually 1621 basepairs long. Modeling each signaturecount by a ZIP, we assume a normal density for the logtransformed mean parameter of the Poisson part. The mean of this normal density is assumed to have a linear model structure with parameters capturing the signature effect and the treatment effect. In the parametric model these parameters are given the usual conjugate prior distributions, whereas in the semiparametric case, the 'treatment effect' parameter is given a Dirichlet process prior with a normal baseline distribution, resulting in automatic clustering of the signatures. The deviance information criterion (DIC) is used for model choice and inference on differential expression is based on the posteriors of the 'treatment effect' parameters. To this end, symmetrized KullbackLeibler (KL) divergences with bootstrapped cutoff values are used, as well as the KruskallWallis test for the equality of medians. Among the genes associated with the differentially expressed signatures identified by our semiparametric model, there are several important Gene Ontology categories that are consistent with the existing biological knowledge about the host response to Salmonella infection. We conclude with a summary of the biological significance of our discoveries.
BAYESIAN TESTING OF MANY HYPOTHESES × MANY GENES: A STUDY OF SLEEP APNEA
"... Substantial statistical research has recently been devoted to the analysis of largescale microarray experiments which provide a measure of the simultaneous expression of thousands of genes in a particular condition. A typical goal is the comparison of gene expression between two conditions (e.g., d ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Substantial statistical research has recently been devoted to the analysis of largescale microarray experiments which provide a measure of the simultaneous expression of thousands of genes in a particular condition. A typical goal is the comparison of gene expression between two conditions (e.g., diseased vs. nondiseased) to detect genes which show differential expression. Classical hypothesis testing procedures have been applied to this problem and more recent work has employed sophisticated models that allow for the sharing of information across genes. However, many recent gene expression studies have an experimental design with several conditions that requires an even more involved hypothesis testing approach. In this paper, we use a hierarchical Bayesian model to address the situation where there are many hypotheses that must be simultaneously tested for each gene. In addition to having many hypotheses within each gene, our analysis also addresses the more typical multiple comparison issue of testing many genes simultaneously. We illustrate our approach with an application to a study of genes involved in obstructive sleep apnea in humans.