• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection (2001)

by C Li, W H Wong
Venue:Proc. Natl Acad. Sci. USA
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 775
Next 10 →

Linear models and empirical bayes methods for assessing differential expression in microarray experiments.

by Gordon K Smyth , Gordon K Smyth - Stat. Appl. Genet. Mol. Biol. , 2004
"... Abstract The problem of identifying differentially expressed genes in designed microarray experiments is considered. Lonnstedt and Speed (2002) derived an expression for the posterior odds of differential expression in a replicated two-color experiment using a simple hierarchical parametric model. ..."
Abstract - Cited by 1321 (24 self) - Add to MetaCart
Abstract The problem of identifying differentially expressed genes in designed microarray experiments is considered. Lonnstedt and Speed (2002) derived an expression for the posterior odds of differential expression in a replicated two-color experiment using a simple hierarchical parametric model. The purpose of this paper is to develop the hierarchical model of Lonnstedt and Speed (2002) into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples. The model is reset in the context of general linear models with arbitrary coefficients and contrasts of interest. The approach applies equally well to both single channel and two color microarray experiments. Consistent, closed form estimators are derived for the hyperparameters in the model. The estimators proposed have robust behavior even for small numbers of arrays and allow for incomplete data arising from spot filtering or spot quality weights. The posterior odds statistic is reformulated in terms of a moderated t-statistic in which posterior residual standard deviations are used in place of ordinary standard deviations. The empirical Bayes approach is equivalent to shrinkage of the estimated sample variances towards a pooled estimate, resulting in far more stable inference when the number of arrays is small. The use of moderated t-statistics has the advantage over the posterior odds that the number of hyperparameters which need to estimated is reduced; in particular, knowledge of the non-null prior for the fold changes are not required. The moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom. The moderated t inferential approach extends to accommodate tests of composite null hypotheses through the use of moderated F-statistics. The performance of the methods is demonstrated in a simulation study. Results are presented for two publicly available data sets.
(Show Context)

Citation Context

...a set of n microarrays yielding a response vector yTg = (yg1, . . . , ygn) for the gth gene. The responses will usually be log-ratios for twocolor data or log-intensities for single channel data, although other transformations are possible. The responses are assumed to be suitably normalized to remove dye-bias and other technological artifacts; see for example Huber et al (2002) or Smyth and Speed (2003). In the case of high density oligonucleotide array, the probes are assumed to have been normalized to produce an expression summary, represented here as ygi, for each gene on each array as in Li and Wong (2001) or Irizarry et al (2003). We assume that E(yg) = Xαg where X is a design matrix of full column rank and αg is a coefficient vector. We assume var(yg) = Wgσ 2 g where Wg is a known non-negative definite weight matrix. The vector yg may contain missing values and the matrix Wg may contain diagonal weights which are zero. Certain contrasts of the coefficients are assumed to be of biological interest and these are defined by βg = C T αg. We assume that it is of interest to test whether individual contrast values βgj are equal to zero. For example, with design (d) above the experimenter might want...

Exploration, normalization, and summaries of high density oligonucleotide array probe level data.

by Rafael A Irizarry , Bridget Hobbs , Francois Collin , Yasmin D Beazer-Barclay , Kristen J Antonellis , Uwe Scherf , Terence P Speed - Biostatistics, , 2003
"... SUMMARY In this paper we report exploratory analyses of high-density oligonucleotide array data from the Affymetrix GeneChip R system with the objective of improving upon currently used measures of gene expression. Our analyses make use of three data sets: a small experimental study consisting of f ..."
Abstract - Cited by 854 (33 self) - Add to MetaCart
SUMMARY In this paper we report exploratory analyses of high-density oligonucleotide array data from the Affymetrix GeneChip R system with the objective of improving upon currently used measures of gene expression. Our analyses make use of three data sets: a small experimental study consisting of five MGU74A mouse GeneChip R arrays, part of the data from an extensive spike-in study conducted by Gene Logic and Wyeth's Genetics Institute involving 95 HG-U95A human GeneChip R arrays; and part of a dilution study conducted by Gene Logic involving 75 HG-U95A GeneChip R arrays. We display some familiar features of the perfect match and mismatch probe (P M and M M) values of these data, and examine the variance-mean relationship with probe-level data from probes believed to be defective, and so delivering noise only. We explain why we need to normalize the arrays to one another using probe level intensities. We then examine the behavior of the P M and M M using spike-in data and assess three commonly used summary measures: Affymetrix's (i) average difference (AvDiff) and (ii) MAS 5.0 signal, and (iii) the Li and Wong multiplicative model-based expression index (MBEI). The exploratory data analyses of the probe level data motivate a new summary measure that is a robust multiarray average (RMA) of background-adjusted, normalized, and log-transformed P M values. We evaluate the four expression summary measures using the dilution study data, assessing their behavior in terms of bias, variance and (for MBEI and RMA) model fit. Finally, we evaluate the algorithms in terms of their ability to detect known levels of differential expression using the spike-in data. We conclude that there is no obvious downside to using RMA and attaching a standard error (SE) to this quantity using a linear model which removes probe-specific affinities.

Empirical Bayes Analysis of a Microarray Experiment

by Bradley Efron, Robert Tibshirani, John D. Storey, Virginia Tusher - Journal of the American Statistical Association , 2001
"... Microarrays are a novel technology that facilitates the simultaneous measurement of thousands of gene expression levels. A typical microarray experiment can produce millions of data points, raising serious problems of data reduction, and simultaneous inference. We consider one such experiment in whi ..."
Abstract - Cited by 492 (20 self) - Add to MetaCart
Microarrays are a novel technology that facilitates the simultaneous measurement of thousands of gene expression levels. A typical microarray experiment can produce millions of data points, raising serious problems of data reduction, and simultaneous inference. We consider one such experiment in which oligonucleotide arrays were employed to assess the genetic effects of ionizing radiation on seven thousand human genes. A simple nonparametric empirical Bayes model is introduced, which is used to guide the ef � cient reduction of the data to a single summary statistic per gene, and also to make simultaneous inferences concerning which genes were affected by the radiation. Although our focus is on one speci � c experiment, the proposed methods can be applied quite generally. The empirical Bayes inferences are closely related to the frequentist false discovery rate (FDR) criterion. 1.
(Show Context)

Citation Context

...erature for microarrays, still in its infancy and with much of it unpublished, has tended to focus on frequentist data-analytic devices such as cluster analysis, bootstrapping, and linear models (see =-=Li and Wong 2000-=-; Kerr and Churchill 2000; Black and Doerge 2000; Van del Laan and Bryan 2000; Eisen, Spellman, Brown, and Botslein 1998). Parametric Bayesian modeling was featured in Newton, Kendziorski, Richmond, B...

Summaries of Affymetrix GeneChip probe level data

by Rafael A. Irizarry, Benjamin M. Bolstad, Francois Collin, Leslie M. Cope, Bridget Hobbs, Terence P. Speed - Nucleic Acids Res , 2003
"... High density oligonucleotide array technology is widely used in many areas of biomedical research for quantitative and highly parallel measurements of gene expression. Affymetrix GeneChip arrays are the most popular. In this technology each gene is typically represented by a set of 11±20 pairs of pr ..."
Abstract - Cited by 471 (21 self) - Add to MetaCart
High density oligonucleotide array technology is widely used in many areas of biomedical research for quantitative and highly parallel measurements of gene expression. Affymetrix GeneChip arrays are the most popular. In this technology each gene is typically represented by a set of 11±20 pairs of probes. In order to obtain expression measures it is necessary to summarize the probe level data. Using two extensive spike-in studies and a dilution study, we developed a set of tools for assessing the effectiveness of expression measures. We found that the performance of the current version of the default expression measure provided by Affymetrix Microarray Suite can be signi®cantly improved by the use of probe level summaries derived from empirically motivated statistical models. In particular, improvements in the ability to detect differentially expressed genes are demonstrated.
(Show Context)

Citation Context

...< PM, but adjusted to be less than PM when MM > PM, which in general occurs for about one-third of all probes (4,6). A model for MAS 5.0 is log(PMij ± CTij) = log(qi) +eij, j = 1,, J. A recent paper =-=(7)-=- reported that variation of a speci®c probe across multiple arrays could be considerably smaller than the *To whom correspondence should be addressed. Tel: +1 410 614 5157; Fax: +1 410 955 0958; Email...

Evolving gene/ transcript definitions significantly alter the interpretation of GeneChip data

by Manhong Dai, Pinglang Wang, Andrew D. Boyd, Georgi Kostov, Brian Athey, Edward G. Jones, William E. Bunney, Richard M. Myers, Terry P. Speed, Huda Akil, Stanley J. Watson, Fan Meng - ATHEY B, JONES EG, BUNNEY WE, MYERS RM, SPEED TP, AKIL H, WATSON SJ, MENG , 2005
"... ..."
Abstract - Cited by 330 (3 self) - Add to MetaCart
Abstract not found

A Model Based Background Adjustment for Oligonucleotide Expression Arrays.

by Z Wu, Irizarry RA, R Gentleman, F Martinez-Murillo, F Spencer - Journal of the American Statistical Association , 2004
"... ..."
Abstract - Cited by 207 (4 self) - Add to MetaCart
Abstract not found

A Benchmark for Affymetrix GeneChip Expression Measures

by Leslie M. Cope, Rafael A. Irizarry, Harris A. Jaffee, Zhijin Wu, Terence P. Speed - Bioinformatics , 2003
"... Motivation: The defining feature of oligonucleotide expression arrays is the use of several probes to assay each targeted transcript. This is a bonanza for the statistical geneticist, who can create probeset summaries with specific characteristics.There are now several methods available for summariz ..."
Abstract - Cited by 141 (10 self) - Add to MetaCart
Motivation: The defining feature of oligonucleotide expression arrays is the use of several probes to assay each targeted transcript. This is a bonanza for the statistical geneticist, who can create probeset summaries with specific characteristics.There are now several methods available for summarizing probe level data from the popular Affymetrix GeneChips, but it is difficult to identify the best method for a given inquiry. Results: We have developed a graphical tool to evaluate summaries of Affymetrix probe level data. Plots and summary statistics offer a picture of how an expression measure performs in several important areas. This picture facilitates the comparison of competing expression measures and the selection of methods suitable for a specific investigation. The key is a benchmark data set consisting of a dilution study and a spike-in study. Because the truth is known for these data, we can identify statistical features of the data for which the expected outcome is known in advance. Those features highlighted in our suite of graphs are justified by questions of biological interest and motivated by the presence of appropriate data. Availability: In conjunction with the release of a graphics toolbox as part of the Bioconductor project
(Show Context)

Citation Context

...even identical descriptive methods have been used by a variety of researchers to evaluate measures of expression (Holder et al., 2001; Workman et al., 2002; Irizarry et al., 2003a; Naef et al., 2001; =-=Li and Wong, 2001-=-). Nor is the report comprehensive; we do not claim to have exhausted the possibilities contained even within this data and other benchmark data sets are available as well. Lemon et al. (2002), for ex...

Analysis of high density expression microarrays with signed-rank call algorithms

by W. -m. Liu, R. Mei, X. Di, T. B. Ryder, E. Hubbell, S. Dee, T. A. Webster, C. A. Harrington, M. -h. Ho, J. Baid, S. P. Smeekens - Bioinformatics , 2002
"... Motivation: We consider the detection of expressed genes and the comparison of them in different experi-ments with the high-density oligonucleotide microarrays. The results are summarized as the detection calls and comparison calls, and they should be robust against data outliers over a wide target ..."
Abstract - Cited by 114 (1 self) - Add to MetaCart
Motivation: We consider the detection of expressed genes and the comparison of them in different experi-ments with the high-density oligonucleotide microarrays. The results are summarized as the detection calls and comparison calls, and they should be robust against data outliers over a wide target concentration range. It is also helpful to provide parameters that can be adjusted by the user to balance specificity and sensitivity under various experimental conditions. Results: We present rank-based algorithms for making detection and comparison calls on expression microarrays. The detection call algorithm utilizes the discrimination scores. The comparison call algorithm utilizes intensity differences. Both algorithms are based on Wilcoxon’s signed-rank test. Several parameters in the algorithms can be adjusted by the user to alter levels of specificity and sensitivity. The algorithms were developed and analyzed using spiked-in genes arrayed in a Latin square format. In the call process, p-values are calculated to give a confidence level for the pertinent hypotheses. For comparison calls made between two arrays, two primary normalization factors are defined. To overcome the difficulty that constant normalization factors do not fit all probe sets, we perturb these primary normalization factors and make increasing or decreasing calls only if all resulting p-values fall within a defined critical region. Our algorithms also automatically handle scanner saturation. Availability: These algorithms are available commercially as part of the MAS 5.0 software package. Contact: wei-min
(Show Context)

Citation Context

...m correspondence should be addressed. † Current address: Mail Code CR145, Oregon Health Sciences University, Portland, OR 79201 mutations (Fodor et al., 1993; Lockhart et al., 1996; Mei et al., 2000; =-=Li and Wong, 2001-=-). Analysis of the large amounts of data produced by these microarrays, however, requires robust computer algorithms. While heuristic approaches have been used very effectively in the past, it is usua...

Stochastic Models Inspired by Hybridization Theory for Short Oligonucleotide Arrays (Extended Abstract)

by Zhijin Wu, Rafael A. Irizarry - J. Comput. Biol , 2004
"... Zhijin Wu Johns Hopkins Bloomberg School of Public Health 615 North Wolfe Street zwu@jhsph.edu Rafael A. Irizarry Johns Hopkins Bloomberg School of Public Health 615 North Wolfe Street rafa@jhu.edu ABSTRACT High density oligonucleotide expression arrays are a widely used tool for the measureme ..."
Abstract - Cited by 80 (4 self) - Add to MetaCart
Zhijin Wu Johns Hopkins Bloomberg School of Public Health 615 North Wolfe Street zwu@jhsph.edu Rafael A. Irizarry Johns Hopkins Bloomberg School of Public Health 615 North Wolfe Street rafa@jhu.edu ABSTRACT High density oligonucleotide expression arrays are a widely used tool for the measurement of gene expression on a large scale. A#ymetrix GeneChip arrays appear to dominate this market. These arrays use short oligonucleotides to probe for genes in an RNA sample. Due to optical noise, nonspecific hybridization, probe-specific e#ects, and measurement error, ad-hoc measures of expression, that summarize probe intensities, can lead to imprecise and inaccurate results. Various researchers have demonstrated that expression measures based on simple statistical models can provide great improvements over the ad-hoc procedure o#ered by A#ymetrix. Recently, physical models based on molecular hybridization theory, have been proposed as useful tools for prediction of, for example, non-specific hybridization. These physical models show great potential in terms of improving existing expression measures. In this paper we suggest that the system producing the measured intensities is too complex to be fully described with these relatively simple physical models and we propose empirically motivated stochastic models that compliment the above mentioned molecular hybridization theory to provide a comprehensive description of the data. We discuss how the proposed model can be used to obtain improved measures of expression useful for the data analysts.
(Show Context)

Citation Context

...ogs of non-positives). Various researchers have developed alternative algorithms, motivated by statistical models, that outperform the default algorithm in many applications. For example, Li and Wong =-=[16] not-=-ice a strong probe effect in both P M and P M − MM and describe it via a simple multiplicative model. By analyzing various arrayss−2 0 2 4 6 8 10 6 8 10 12 14 Nominal Log (Base 2) Concentration Lo...

Evaluation of gene expression measurements from commercial microarray platforms

by Paul K. Tan, Thomas J. Downey, Edward L. Spitznagel, Pin Xu, Dadin Fu, Dimiter S. Dimitrov, Richard A. Lempicki, Bruce M. Raaka, Margaret C. Cam - Nucleic Acids Res , 2003
"... Multiple commercial microarrays for measuring genome-wide gene expression levels are currently available, including oligonucleotide and cDNA, single- and two-channel formats. This study reports on the results of gene expression measurements generated from identical RNA preparations that were obtaine ..."
Abstract - Cited by 72 (0 self) - Add to MetaCart
Multiple commercial microarrays for measuring genome-wide gene expression levels are currently available, including oligonucleotide and cDNA, single- and two-channel formats. This study reports on the results of gene expression measurements generated from identical RNA preparations that were obtained using three commercially available microarray platforms. RNA was collected from PANC-1 cells grown in serum-rich medium and at 24 h following the removal of serum. Three biological replicates were prepared for each condition, and three experimental replicates were produced for the ®rst biological replicate. RNA was labeled and hybridized to microarrays from three major suppliers according to manufacturers &apos; protocols, and gene expression measurements were obtained using each platform&apos;s standard software. For each platform, gene targets from a subset of 2009 common genes were compared. Correlations in gene expression levels and comparisons for signi®cant gene expression changes in this subset were calculated, and showed considerable divergence across the different platforms, suggesting the need for establishing industrial manufacturing standards, and further independent and thorough validation of the technology.
(Show Context)

Citation Context

...he mean signal of these replicates were compared (Fig. 3). Previous studies have shown that gene expression measurements of low-abundance transcripts are more variable than high-abundance transcripts =-=(10,11)-=-. As indicated in Figure 3, this trend is exhibited by the Affymetrix GeneChip data and to a lesser degree by data produced by the Agilent technology. Variability of the Amersham Codelink expression d...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University