• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Linear models and empirical bayes methods for assessing differential expression in microarray experiments (0)

by G K Smyth
Venue:Stat Appl Genet Mol Biol 3 (2004), Article3. 120
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 1,321
Next 10 →

Limma: linear models for microarray data

by Gordon K. Smyth, Matthew Ritchie, Natalie Thorne, James Wettenhall, Wei Shi - Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005
"... This free open-source software implements academic research by the authors and co-workers. If you use it, please support the project by citing the appropriate journal articles listed in Section 2.1.Contents ..."
Abstract - Cited by 774 (13 self) - Add to MetaCart
This free open-source software implements academic research by the authors and co-workers. If you use it, please support the project by citing the appropriate journal articles listed in Section 2.1.Contents
(Show Context)

Citation Context

...l technologies such as Affymetrix—. Empirical Bayes and other shrinkage methods are used to borrow information across genes making the analyses stable even for experiments with small number of arrays =-=[1, 2]-=-. 1 Smyth, G. K. (2005). Limma: linear models for microarray data. In: Bioinformatics and Computational Biology Solutions using R and Bioconductor, R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, W. H...

NCBI GEO: archive for functional genomics data sets–10 years on

by Tanya Barrett, Dennis B. Troup, Stephen E. Wilhite, Pierre Ledoux, Carlos Evangelista, Irene F. Kim, Maxim Tomashevsky, Kimberly A. Marshall, Patti M. Sherman, Rolf N. Muertter, Michelle Holko, Oluwabukunmi Ayanbule, Andrey Yefanov, Ra Soboleva - Nucleic Acids Res , 2011
"... years on ..."
Abstract - Cited by 370 (2 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...put for two R scripts, ‘boxplot’, which draws a boxplot of the distribution of expression values of selected samples helping users to determine whether the data are suitable for analysis, and ‘limma’ =-=(15)-=-, which performs the topTable computation to extract a table of the top-ranked genes. The ‘limma’ results are processed according to the type of output requested, formatted in JSON and then used to cr...

A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics

by Juliane Schäfer, Korbinian Strimmer , 2005
"... ..."
Abstract - Cited by 274 (15 self) - Add to MetaCart
Abstract not found

Use of within-array replicate spots for assessing differential expression in microarray experiments

by Gordon K. Smyth, Joëlle Michaud, Hamish S. Scott - Bioinformatics , 2005
"... Motivation. Spotted arrays are often printed with probes in duplicate or triplicate, but current methods for assessing differential expression are not able to make full use of the resulting information. Usual practice is to average the duplicate or triplicate results for each probe before assessing ..."
Abstract - Cited by 239 (8 self) - Add to MetaCart
Motivation. Spotted arrays are often printed with probes in duplicate or triplicate, but current methods for assessing differential expression are not able to make full use of the resulting information. Usual practice is to average the duplicate or triplicate results for each probe before assessing differential expression. This loses valuable information about gene-wise variability. Results. A method is proposed for extracting more information from within-array replicate spots in microarray experiments by estimating the strength of the correlation between them. The method involves fitting separate linear models to the expression data for each gene but with a common value for the between-replicate correlation. The method greatly improves the precision with which the genewise variances are estimated and thereby improves inference methods designed to identify differentially expressed genes. The method may be combined with empirical Bayes methods for moderating the genewise variances between genes. The method is validated using data from a microarray experiment involving calibration and ratio control spots in conjunction with spiked-in RNA. Comparing results for calibration and ratio control spots shows that the common correlation method results in substantially better discrimination of differentially expressed genes from those which are not. The spike-in experiment also confirms that the results may be further improved by empirical Bayes smoothing of the variances when the sample size is small. Availability. The methodology is implemented in the limma software package for R, available from the CRAN repository
(Show Context)

Citation Context

...d from the whole ensemble of genes (Newton et al., 2001; Tusher et al., 2001; Efron et al., 2001; Efron and Tibshirani, 2002; Lönnstedt and Speed, 2002; Kendziorski et al., 2003; Newton et al., 2004; =-=Smyth, 2004-=-). Such methods have not as yet been applied to experimental designs in which there are technical or biological replicates leading to multiple strata of random variation for each gene. This article de...

On testing the significance of sets of genes

by Bradley Efron, Robert Tibshirani - Annals of Applied Statistics
"... This paper discusses the problem of identifying differentially expressed groups of genes from a microarray experiment. The groups of genes are externally defined, for example, sets of gene pathways derived from biological databases. Our starting point is the interesting Gene Set Enrichment Analysis ..."
Abstract - Cited by 166 (3 self) - Add to MetaCart
This paper discusses the problem of identifying differentially expressed groups of genes from a microarray experiment. The groups of genes are externally defined, for example, sets of gene pathways derived from biological databases. Our starting point is the interesting Gene Set Enrichment Analysis (GSEA) procedure of Subramanian et al. (2005). We study the problem in some generality and propose two potential improvements to GSEA: the maxmean statistic for summarizing gene-sets, and restandardization for more accurate inferences. We discuss a variety of examples and extensions, including the use of gene-set scores for class predictions. We also describe a new R language package GSA that implements our ideas. 1
(Show Context)

Citation Context

...ate the false discovery rate of the list of significant gene-sets. The Bioconductor package limma offers an analysis option similar to GSEA, but uses instead the simple average of the scores zk: see (=-=Smyth 2004-=-). Other related ideas may be found in (Pavlidiss1 INTRODUCTION 3 et al. 2002) and (Rahnenfhrer et al. 2004). Zahn et al. (2006) propose an alternative to GSEA, that uses a Van der Waerden statistic i...

Moderated Statistical Tests for Assessing Differences in Tag Abundance

by Mark D. Robinson, Gordon K. Smyth, Prof David Rocke
"... Motivation: Digital gene expression (DGE) technologies measure gene expression by counting sequence tags. They are sensitive technologiesfor measuring gene expression on agenomic scale, without the need for prior knowledge of the genome sequence. As the cost of sequencing DNA decreases, the number o ..."
Abstract - Cited by 143 (7 self) - Add to MetaCart
Motivation: Digital gene expression (DGE) technologies measure gene expression by counting sequence tags. They are sensitive technologiesfor measuring gene expression on agenomic scale, without the need for prior knowledge of the genome sequence. As the cost of sequencing DNA decreases, the number of DGE datasets is expected to grow dramatically. Various tests of differential expression have been proposed for replicated DGE data using binomial, Poisson, negative binomial or pseudo-likelihood (PL) models for the counts, but none of the these are usable when the number of replicates is very small. Results: We develop tests using the negative binomial distribution to model overdispersion relative to the Poisson, and use conditional weighted likelihood to moderate the level of overdispersion across genes. Not only is our strategy applicable even with the smallest number of libraries, but it also proves to be more powerful than previous strategies when more libraries are available. The methodology is equally applicable to other counting technologies, such as proteomic spectral counts. Availability: An R package and supplementary materials can be accessed from
(Show Context)

Citation Context

... for covariates is very straightforward. Many genome-wide statistical inference methods that share information over all genes, be it in an ad hoc way (Tusher et al., 2001) or via hierarchical models (=-=Smyth, 2004-=-), have proven much more © 2007 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses...

A Hilbert space embedding for distributions

by Alex Smola, Arthur Gretton, Le Song, Bernhard Schölkopf - In Algorithmic Learning Theory: 18th International Conference , 2007
"... Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in two-sample tests, which are used for ..."
Abstract - Cited by 113 (44 self) - Add to MetaCart
Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in two-sample tests, which are used for determining whether two sets of observations arise from the same distribution, covariate shift correction, local learning, measures of independence, and density estimation. Kernel methods are widely used in supervised learning [1, 2, 3, 4], however they are much less established in the areas of testing, estimation, and analysis of probability distributions, where information theoretic approaches [5, 6] have long been dominant. Recent examples include [7] in the context of construction of graphical models, [8] in the context of feature extraction, and [9] in the context of independent component analysis. These methods have by and large a common issue: to compute quantities such as the mutual information, entropy, or Kullback-Leibler divergence, we require sophisticated space partitioning and/or
(Show Context)

Citation Context

...dea is to normalize each feature by ¯sj = sj++sj− instead. Subsequently the (xj+ − xj−)/¯sj are used to score features. Moderated t-score is similar to t-statistic and is used for microarray analysis =-=[57]-=-. Its normalization for the jth feature is derived via a Bayes approach as ˜sj = m¯s2 j + m0¯s 2 0 m + m0 (17) where ¯sj is from (16), and ¯s0 and m0 are hyperparameters for the prior distribution on ...

Model-based Variance-stabilizing Transformation for Illumina Microarray Data’, Nucleic Acids Res

by Simon M. Lin, Pan Du, Wolfgang Huber, Warren A. Kibbe , 2008
"... doi:10.1093/nar/gkm1075 ..."
Abstract - Cited by 101 (9 self) - Add to MetaCart
doi:10.1093/nar/gkm1075
(Show Context)

Citation Context

....00019, 0.00022 and 0.00023 (VST-quantile, log2-quantile and VSN, respectively). Figure 7. The percentage of concordant probes among the top probes by ranking the t-test p-values estimated by ‘limma’ =-=(16)-=-. The concordant probes are defined as those with correlation coefficients between the gene expression profile (measured by probe intensities) and the titration profile larger than 0.8. Note that the ...

Microarrays, empirical Bayes and the two-groups model

by Bradley Efron - STATIST. SCI , 2006
"... The classic frequentist theory of hypothesis testing developed by Neyman, Pearson, and Fisher has a claim to being the Twentieth Century’s most influential piece of applied mathematics. Something new is happening in the Twenty-First Century: high throughput devices, such as microarrays, routinely re ..."
Abstract - Cited by 75 (10 self) - Add to MetaCart
The classic frequentist theory of hypothesis testing developed by Neyman, Pearson, and Fisher has a claim to being the Twentieth Century’s most influential piece of applied mathematics. Something new is happening in the Twenty-First Century: high throughput devices, such as microarrays, routinely require simultaneous hypothesis tests for thousands of individual cases, not at all what the classical theory had in mind. In these situations empirical Bayes information begins to force itself upon frequentists and Bayesians alike. The two-groups model is a simple Bayesian construction that facilitates empirical Bayes analysis. This article concerns the interplay of Bayesian and frequentist ideas in the two-groups setting, with particular attention focussed on Benjamini and Hochberg’s False Discovery Rate method. Topics include the choice and meaning of the null hypothesis in large-scale testing situations, power considerations, the limitations of permutation methods, significance testing for groups of cases (such as pathways in microarray studies), correlation effects, multiple confidence intervals, and Bayesian competitors to the two-groups model.
(Show Context)

Citation Context

... Subramanian et al. (2005). For a given gene-set “S” with m members, let ¯zS denote the mean of the m z-values within S; ¯zS is the enrichment statistic suggested in the Bioconductor R package limma (=-=Smyth, 2004-=-), (8.1) ¯zS = 0.842 for the CTL pathway. How significant is this result? I will consider assigning an individual p-value to (8.1), not taking into account multiple inference for a catalogue of possib...

Regional and cellular gene expression changes in human Huntington's disease brain

by Angela Hodges , Andrew D Strand , Aaron K Aragaki , Alexandre Kuhn , Thierry Sengstag , Gareth Hughes , Lyn A Elliston , Cathy Hartog , Darlene R Goldstein , Doris Thu , Zane R Hollingsworth , Francois Collin , Beth Synek , Peter A Holmans , Anne B Young , Nancy S Wexler , Mauro Delorenzi , Charles Kooperberg , Sarah J Augood , Richard L M Faull , James M Olson , Lesley Jones , Ruth Luthi-Carter - Hum Mol Genet
"... Huntington's disease (HD) pathology is well understood at a histological level but a comprehensive molecular analysis of the effect of the disease in the human brain has not previously been available. To elucidate the molecular phenotype of HD on a genome-wide scale, we compared mRNA profiles ..."
Abstract - Cited by 71 (7 self) - Add to MetaCart
Huntington's disease (HD) pathology is well understood at a histological level but a comprehensive molecular analysis of the effect of the disease in the human brain has not previously been available. To elucidate the molecular phenotype of HD on a genome-wide scale, we compared mRNA profiles from 44 human HD brains with those from 36 unaffected controls using microarray analysis. Four brain regions were analyzed: caudate nucleus, cerebellum, prefrontal association cortex [Brodmann's area 9 (BA9)] and motor cortex [Brodmann's area 4 (BA4)]. The greatest number and magnitude of differentially expressed mRNAs were detected in the caudate nucleus, followed by motor cortex, then cerebellum. Thus, the molecular phenotype of HD generally parallels established neuropathology. Surprisingly, no mRNA changes were detected in prefrontal association cortex, thereby revealing subtleties of pathology not previously disclosed by histological methods. To establish that the observed changes were not simply the result of cell loss, we examined mRNA levels in laser-capture microdissected neurons from Grade 1 HD caudate compared to control. These analyses confirmed changes in expression seen in tissue homogenates; we thus conclude that mRNA changes are not attributable to cell loss alone. These data from bona fide HD brains comprise an important reference for hypotheses related to HD and other neurodegenerative diseases.
(Show Context)

Citation Context

...the affy package (46). To identify genes differentially expressed between HD (Grades 0–2) and controls for each brain region, we computed empirical Bayes moderated t-statistics with the limma package =-=(47)-=-, correcting gene expression for collection site (Boston or New Zealand), gender and age [,45, (45–60), (60–70) and 70þ years]. Unless otherwise stated, reported P-values are nominal, unadjusted. R co...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University