Results 1 - 10
of
775
Linear models and empirical bayes methods for assessing differential expression in microarray experiments.
- Stat. Appl. Genet. Mol. Biol.
, 2004
"... Abstract The problem of identifying differentially expressed genes in designed microarray experiments is considered. Lonnstedt and Speed (2002) derived an expression for the posterior odds of differential expression in a replicated two-color experiment using a simple hierarchical parametric model. ..."
Abstract
-
Cited by 1321 (24 self)
- Add to MetaCart
(Show Context)
Abstract The problem of identifying differentially expressed genes in designed microarray experiments is considered. Lonnstedt and Speed (2002) derived an expression for the posterior odds of differential expression in a replicated two-color experiment using a simple hierarchical parametric model. The purpose of this paper is to develop the hierarchical model of Lonnstedt and Speed (2002) into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples. The model is reset in the context of general linear models with arbitrary coefficients and contrasts of interest. The approach applies equally well to both single channel and two color microarray experiments. Consistent, closed form estimators are derived for the hyperparameters in the model. The estimators proposed have robust behavior even for small numbers of arrays and allow for incomplete data arising from spot filtering or spot quality weights. The posterior odds statistic is reformulated in terms of a moderated t-statistic in which posterior residual standard deviations are used in place of ordinary standard deviations. The empirical Bayes approach is equivalent to shrinkage of the estimated sample variances towards a pooled estimate, resulting in far more stable inference when the number of arrays is small. The use of moderated t-statistics has the advantage over the posterior odds that the number of hyperparameters which need to estimated is reduced; in particular, knowledge of the non-null prior for the fold changes are not required. The moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom. The moderated t inferential approach extends to accommodate tests of composite null hypotheses through the use of moderated F-statistics. The performance of the methods is demonstrated in a simulation study. Results are presented for two publicly available data sets.
Exploration, normalization, and summaries of high density oligonucleotide array probe level data.
- Biostatistics,
, 2003
"... SUMMARY In this paper we report exploratory analyses of high-density oligonucleotide array data from the Affymetrix GeneChip R system with the objective of improving upon currently used measures of gene expression. Our analyses make use of three data sets: a small experimental study consisting of f ..."
Abstract
-
Cited by 854 (33 self)
- Add to MetaCart
SUMMARY In this paper we report exploratory analyses of high-density oligonucleotide array data from the Affymetrix GeneChip R system with the objective of improving upon currently used measures of gene expression. Our analyses make use of three data sets: a small experimental study consisting of five MGU74A mouse GeneChip R arrays, part of the data from an extensive spike-in study conducted by Gene Logic and Wyeth's Genetics Institute involving 95 HG-U95A human GeneChip R arrays; and part of a dilution study conducted by Gene Logic involving 75 HG-U95A GeneChip R arrays. We display some familiar features of the perfect match and mismatch probe (P M and M M) values of these data, and examine the variance-mean relationship with probe-level data from probes believed to be defective, and so delivering noise only. We explain why we need to normalize the arrays to one another using probe level intensities. We then examine the behavior of the P M and M M using spike-in data and assess three commonly used summary measures: Affymetrix's (i) average difference (AvDiff) and (ii) MAS 5.0 signal, and (iii) the Li and Wong multiplicative model-based expression index (MBEI). The exploratory data analyses of the probe level data motivate a new summary measure that is a robust multiarray average (RMA) of background-adjusted, normalized, and log-transformed P M values. We evaluate the four expression summary measures using the dilution study data, assessing their behavior in terms of bias, variance and (for MBEI and RMA) model fit. Finally, we evaluate the algorithms in terms of their ability to detect known levels of differential expression using the spike-in data. We conclude that there is no obvious downside to using RMA and attaching a standard error (SE) to this quantity using a linear model which removes probe-specific affinities.
Empirical Bayes Analysis of a Microarray Experiment
- Journal of the American Statistical Association
, 2001
"... Microarrays are a novel technology that facilitates the simultaneous measurement of thousands of gene expression levels. A typical microarray experiment can produce millions of data points, raising serious problems of data reduction, and simultaneous inference. We consider one such experiment in whi ..."
Abstract
-
Cited by 492 (20 self)
- Add to MetaCart
(Show Context)
Microarrays are a novel technology that facilitates the simultaneous measurement of thousands of gene expression levels. A typical microarray experiment can produce millions of data points, raising serious problems of data reduction, and simultaneous inference. We consider one such experiment in which oligonucleotide arrays were employed to assess the genetic effects of ionizing radiation on seven thousand human genes. A simple nonparametric empirical Bayes model is introduced, which is used to guide the ef � cient reduction of the data to a single summary statistic per gene, and also to make simultaneous inferences concerning which genes were affected by the radiation. Although our focus is on one speci � c experiment, the proposed methods can be applied quite generally. The empirical Bayes inferences are closely related to the frequentist false discovery rate (FDR) criterion. 1.
Summaries of Affymetrix GeneChip probe level data
- Nucleic Acids Res
, 2003
"... High density oligonucleotide array technology is widely used in many areas of biomedical research for quantitative and highly parallel measurements of gene expression. Affymetrix GeneChip arrays are the most popular. In this technology each gene is typically represented by a set of 11±20 pairs of pr ..."
Abstract
-
Cited by 471 (21 self)
- Add to MetaCart
(Show Context)
High density oligonucleotide array technology is widely used in many areas of biomedical research for quantitative and highly parallel measurements of gene expression. Affymetrix GeneChip arrays are the most popular. In this technology each gene is typically represented by a set of 11±20 pairs of probes. In order to obtain expression measures it is necessary to summarize the probe level data. Using two extensive spike-in studies and a dilution study, we developed a set of tools for assessing the effectiveness of expression measures. We found that the performance of the current version of the default expression measure provided by Affymetrix Microarray Suite can be signi®cantly improved by the use of probe level summaries derived from empirically motivated statistical models. In particular, improvements in the ability to detect differentially expressed genes are demonstrated.
Evolving gene/ transcript definitions significantly alter the interpretation of GeneChip data
- ATHEY B, JONES EG, BUNNEY WE, MYERS RM, SPEED TP, AKIL H, WATSON SJ, MENG
, 2005
"... ..."
A Model Based Background Adjustment for Oligonucleotide Expression Arrays.
- Journal of the American Statistical Association
, 2004
"... ..."
A Benchmark for Affymetrix GeneChip Expression Measures
- Bioinformatics
, 2003
"... Motivation: The defining feature of oligonucleotide expression arrays is the use of several probes to assay each targeted transcript. This is a bonanza for the statistical geneticist, who can create probeset summaries with specific characteristics.There are now several methods available for summariz ..."
Abstract
-
Cited by 141 (10 self)
- Add to MetaCart
(Show Context)
Motivation: The defining feature of oligonucleotide expression arrays is the use of several probes to assay each targeted transcript. This is a bonanza for the statistical geneticist, who can create probeset summaries with specific characteristics.There are now several methods available for summarizing probe level data from the popular Affymetrix GeneChips, but it is difficult to identify the best method for a given inquiry. Results: We have developed a graphical tool to evaluate summaries of Affymetrix probe level data. Plots and summary statistics offer a picture of how an expression measure performs in several important areas. This picture facilitates the comparison of competing expression measures and the selection of methods suitable for a specific investigation. The key is a benchmark data set consisting of a dilution study and a spike-in study. Because the truth is known for these data, we can identify statistical features of the data for which the expected outcome is known in advance. Those features highlighted in our suite of graphs are justified by questions of biological interest and motivated by the presence of appropriate data. Availability: In conjunction with the release of a graphics toolbox as part of the Bioconductor project
Analysis of high density expression microarrays with signed-rank call algorithms
- Bioinformatics
, 2002
"... Motivation: We consider the detection of expressed genes and the comparison of them in different experi-ments with the high-density oligonucleotide microarrays. The results are summarized as the detection calls and comparison calls, and they should be robust against data outliers over a wide target ..."
Abstract
-
Cited by 114 (1 self)
- Add to MetaCart
(Show Context)
Motivation: We consider the detection of expressed genes and the comparison of them in different experi-ments with the high-density oligonucleotide microarrays. The results are summarized as the detection calls and comparison calls, and they should be robust against data outliers over a wide target concentration range. It is also helpful to provide parameters that can be adjusted by the user to balance specificity and sensitivity under various experimental conditions. Results: We present rank-based algorithms for making detection and comparison calls on expression microarrays. The detection call algorithm utilizes the discrimination scores. The comparison call algorithm utilizes intensity differences. Both algorithms are based on Wilcoxon’s signed-rank test. Several parameters in the algorithms can be adjusted by the user to alter levels of specificity and sensitivity. The algorithms were developed and analyzed using spiked-in genes arrayed in a Latin square format. In the call process, p-values are calculated to give a confidence level for the pertinent hypotheses. For comparison calls made between two arrays, two primary normalization factors are defined. To overcome the difficulty that constant normalization factors do not fit all probe sets, we perturb these primary normalization factors and make increasing or decreasing calls only if all resulting p-values fall within a defined critical region. Our algorithms also automatically handle scanner saturation. Availability: These algorithms are available commercially as part of the MAS 5.0 software package. Contact: wei-min
Stochastic Models Inspired by Hybridization Theory for Short Oligonucleotide Arrays (Extended Abstract)
- J. Comput. Biol
, 2004
"... Zhijin Wu Johns Hopkins Bloomberg School of Public Health 615 North Wolfe Street zwu@jhsph.edu Rafael A. Irizarry Johns Hopkins Bloomberg School of Public Health 615 North Wolfe Street rafa@jhu.edu ABSTRACT High density oligonucleotide expression arrays are a widely used tool for the measureme ..."
Abstract
-
Cited by 80 (4 self)
- Add to MetaCart
(Show Context)
Zhijin Wu Johns Hopkins Bloomberg School of Public Health 615 North Wolfe Street zwu@jhsph.edu Rafael A. Irizarry Johns Hopkins Bloomberg School of Public Health 615 North Wolfe Street rafa@jhu.edu ABSTRACT High density oligonucleotide expression arrays are a widely used tool for the measurement of gene expression on a large scale. A#ymetrix GeneChip arrays appear to dominate this market. These arrays use short oligonucleotides to probe for genes in an RNA sample. Due to optical noise, nonspecific hybridization, probe-specific e#ects, and measurement error, ad-hoc measures of expression, that summarize probe intensities, can lead to imprecise and inaccurate results. Various researchers have demonstrated that expression measures based on simple statistical models can provide great improvements over the ad-hoc procedure o#ered by A#ymetrix. Recently, physical models based on molecular hybridization theory, have been proposed as useful tools for prediction of, for example, non-specific hybridization. These physical models show great potential in terms of improving existing expression measures. In this paper we suggest that the system producing the measured intensities is too complex to be fully described with these relatively simple physical models and we propose empirically motivated stochastic models that compliment the above mentioned molecular hybridization theory to provide a comprehensive description of the data. We discuss how the proposed model can be used to obtain improved measures of expression useful for the data analysts.
Evaluation of gene expression measurements from commercial microarray platforms
- Nucleic Acids Res
, 2003
"... Multiple commercial microarrays for measuring genome-wide gene expression levels are currently available, including oligonucleotide and cDNA, single- and two-channel formats. This study reports on the results of gene expression measurements generated from identical RNA preparations that were obtaine ..."
Abstract
-
Cited by 72 (0 self)
- Add to MetaCart
(Show Context)
Multiple commercial microarrays for measuring genome-wide gene expression levels are currently available, including oligonucleotide and cDNA, single- and two-channel formats. This study reports on the results of gene expression measurements generated from identical RNA preparations that were obtained using three commercially available microarray platforms. RNA was collected from PANC-1 cells grown in serum-rich medium and at 24 h following the removal of serum. Three biological replicates were prepared for each condition, and three experimental replicates were produced for the ®rst biological replicate. RNA was labeled and hybridized to microarrays from three major suppliers according to manufacturers ' protocols, and gene expression measurements were obtained using each platform's standard software. For each platform, gene targets from a subset of 2009 common genes were compared. Correlations in gene expression levels and comparisons for signi®cant gene expression changes in this subset were calculated, and showed considerable divergence across the different platforms, suggesting the need for establishing industrial manufacturing standards, and further independent and thorough validation of the technology.