• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

A Bayesian Framework for the Analysis of Microarray Expression Data: Regularized t-Test and Statistical Inferences of Gene Changes (2001)

by Pierre Baldi, Anthony D. Long
Venue:Bioinformatics
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 491
Next 10 →

Linear models and empirical bayes methods for assessing differential expression in microarray experiments.

by Gordon K Smyth , Gordon K Smyth - Stat. Appl. Genet. Mol. Biol. , 2004
"... Abstract The problem of identifying differentially expressed genes in designed microarray experiments is considered. Lonnstedt and Speed (2002) derived an expression for the posterior odds of differential expression in a replicated two-color experiment using a simple hierarchical parametric model. ..."
Abstract - Cited by 1321 (24 self) - Add to MetaCart
Abstract The problem of identifying differentially expressed genes in designed microarray experiments is considered. Lonnstedt and Speed (2002) derived an expression for the posterior odds of differential expression in a replicated two-color experiment using a simple hierarchical parametric model. The purpose of this paper is to develop the hierarchical model of Lonnstedt and Speed (2002) into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples. The model is reset in the context of general linear models with arbitrary coefficients and contrasts of interest. The approach applies equally well to both single channel and two color microarray experiments. Consistent, closed form estimators are derived for the hyperparameters in the model. The estimators proposed have robust behavior even for small numbers of arrays and allow for incomplete data arising from spot filtering or spot quality weights. The posterior odds statistic is reformulated in terms of a moderated t-statistic in which posterior residual standard deviations are used in place of ordinary standard deviations. The empirical Bayes approach is equivalent to shrinkage of the estimated sample variances towards a pooled estimate, resulting in far more stable inference when the number of arrays is small. The use of moderated t-statistics has the advantage over the posterior odds that the number of hyperparameters which need to estimated is reduced; in particular, knowledge of the non-null prior for the fold changes are not required. The moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom. The moderated t inferential approach extends to accommodate tests of composite null hypotheses through the use of moderated F-statistics. The performance of the methods is demonstrated in a simulation study. Results are presented for two publicly available data sets.
(Show Context)

Citation Context

...ferential Expression Published by The Berkeley Electronic Press, 2004 the posterior odds of reducing the number of hyperparameters which need to estimated under the hierarchical model; in particular, knowledge of the non-null prior for the fold changes are not required. The moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom. The moderated t inferential approach extends to accommodate tests involving two or more contrasts through the use of moderated F -statistics. The idea of using a t-statistic with a Bayesian adjusted denominator was also proposed by Baldi and Long (2001) who developed the useful cyberT program. Their work was limited though to two-sample control versus treatment designs and their model did not distinguish between differentially and non-differentially expressed genes. They also did not develop consistent estimators for the hyperparameters. The degrees of freedom associated with the prior distribution of the variances was set to a default value while the prior variance was simply equated to locally pooled sample variances. Tusher et al (2001), Efron et al (2001) and Broberg (2003) have used t statistics with offset standard deviations. This is ...

Survey of clustering algorithms

by Rui Xu, Donald Wunsch II - IEEE TRANSACTIONS ON NEURAL NETWORKS , 2005
"... Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the ..."
Abstract - Cited by 499 (4 self) - Add to MetaCart
Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.
(Show Context)

Citation Context

...age. Gene expression data analysis consists of a three-level framework based on the complexity, ranging from the investigation of single gene activities to the inference of the entire genetic network =-=[20]-=-. The intermediate level explores the relations and interactions between genes under different conditions, and attracts more attention currently. Generally, cluster analysis of gene expression data is...

A review of feature selection techniques in bioinformatics

by Yvan Saeys, Iñaki Inza, Pedro Larrañaga - BIOINFORMATICS , 2007
"... Feature selection techniques have become an apparent need in many bioinformatics applications. In addition to the large pool of techniques that have already been developed in the machine learning and data mining fields, specific applications in bioinformatics have led to a wealth of newly proposed t ..."
Abstract - Cited by 358 (10 self) - Add to MetaCart
Feature selection techniques have become an apparent need in many bioinformatics applications. In addition to the large pool of techniques that have already been developed in the machine learning and data mining fields, specific applications in bioinformatics have led to a wealth of newly proposed techniques. In this paper, we make the interested reader aware of the possibilities of feature selection, providing a basic taxonomy of feature selection techniques, and discussing their use, variety and potential in a number of both common as well as upcoming bioinformatics applications.

Use of within-array replicate spots for assessing differential expression in microarray experiments

by Gordon K. Smyth, Joëlle Michaud, Hamish S. Scott - Bioinformatics , 2005
"... Motivation. Spotted arrays are often printed with probes in duplicate or triplicate, but current methods for assessing differential expression are not able to make full use of the resulting information. Usual practice is to average the duplicate or triplicate results for each probe before assessing ..."
Abstract - Cited by 239 (8 self) - Add to MetaCart
Motivation. Spotted arrays are often printed with probes in duplicate or triplicate, but current methods for assessing differential expression are not able to make full use of the resulting information. Usual practice is to average the duplicate or triplicate results for each probe before assessing differential expression. This loses valuable information about gene-wise variability. Results. A method is proposed for extracting more information from within-array replicate spots in microarray experiments by estimating the strength of the correlation between them. The method involves fitting separate linear models to the expression data for each gene but with a common value for the between-replicate correlation. The method greatly improves the precision with which the genewise variances are estimated and thereby improves inference methods designed to identify differentially expressed genes. The method may be combined with empirical Bayes methods for moderating the genewise variances between genes. The method is validated using data from a microarray experiment involving calibration and ratio control spots in conjunction with spiked-in RNA. Comparing results for calibration and ratio control spots shows that the common correlation method results in substantially better discrimination of differentially expressed genes from those which are not. The spike-in experiment also confirms that the results may be further improved by empirical Bayes smoothing of the variances when the sample size is small. Availability. The methodology is implemented in the limma software package for R, available from the CRAN repository
(Show Context)

Citation Context

...prove on the use of t-statistics for assessing differential expression in microarray experiments by using appropriate shrinkage methods to moderate the genewise sample variances (Tusher et al., 2001; =-=Baldi and Long, 2001-=-; Efron et al., 2001; Lönnstedt and Speed, 2002; Broberg, 2003; Smyth, 2004). We show here �sthat the empirical Bayes method of Smyth (2004) combines in a natural way with the methods of this paper. I...

Cluster Analysis for Gene Expression Data: A Survey

by Daxin Jiang, Chun Tang, Aidong Zhang - IEEE Transactions on Knowledge and Data Engineering , 2004
"... Abstract—DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity f ..."
Abstract - Cited by 149 (5 self) - Add to MetaCart
Abstract—DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increases the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. Cluster analysis seeks to partition a given data set into groups based on specified features so that the data points within a group are more similar to each other than the points in different groups. A very rich literature on cluster analysis has developed over the past three decades. Many conventional clustering algorithms have been adapted or directly applied to gene expression data, and also new algorithms have recently been proposed specifically aiming at gene expression data. These clustering algorithms have been proven useful for identifying biologically relevant groups of genes and samples. In this paper, we first briefly introduce the concepts of microarray technology and discuss the basic elements of clustering on gene expression data. In particular, we divide cluster analysis for gene expression data into three categories. Then, we present specific challenges pertinent to each clustering category and introduce several representative approaches. We also discuss the problem of cluster validation in three aspects and review various methods to assess the quality and reliability of clustering results. Finally, we conclude this paper and suggest the promising trends in this field. Index Terms—Microarray technology, gene expression data, clustering.

Model-based Variance-stabilizing Transformation for Illumina Microarray Data’, Nucleic Acids Res

by Simon M. Lin, Pan Du, Wolfgang Huber, Warren A. Kibbe , 2008
"... doi:10.1093/nar/gkm1075 ..."
Abstract - Cited by 101 (9 self) - Add to MetaCart
doi:10.1093/nar/gkm1075
(Show Context)

Citation Context

... with the heteroskedasticity problem, an alternative to data transformation is to abandon the canonical linear models that require making distribution assumptions. For example, the cyberT-test method =-=(18)-=- uses a moving window to reduce the dependency of variance on expression level. However, these ad hoc modifications cannot be easily generalized to complex experimental designs such as a mixed linear ...

Fundamentals of cDNA Microarray Data Analysis

by Yuk Fai Leung, Duccio Cavalieri - Trends Genet , 2003
"... genomics research. The multi-step, data-intensive nature of this technology has created an unprecedented informatics and analytical challenge. It is important to understand the crucial steps that can affect the outcome of the analysis. In this review, we provide an overview of the contemporary trend ..."
Abstract - Cited by 81 (0 self) - Add to MetaCart
genomics research. The multi-step, data-intensive nature of this technology has created an unprecedented informatics and analytical challenge. It is important to understand the crucial steps that can affect the outcome of the analysis. In this review, we provide an overview of the contemporary trend on various main analysis steps in the microarray data analysis process, which includes experimental design, data standardization, image acquisition and analysis, normalization, statistical significance inference, exploratory data analysis, class prediction and pathway analysis, as well as various considerations relevant to their implementation. The development of microarray technology has been phenomenal in the past few years. It has become a standard tool in many genomics research laboratories. The reason for this popularity is that microarrays have revolutionized the approach to biological research. Instead of working on a gene-by-gene basis, scientists can now study tens of thousands of genes at once. Unfortunately, they are often daunted and confused by the complexity of data analyses. Although it is advisable to collaborate with statisticians and mathematicians on performing a proper data analysis, it is crucial to understand the fundamentals of data analysis. In this review, we explain these fundamentals step-by-step (Figure 1; Table 1). Instead of discussing any particular analysis software, we focus primarily on the rationale behind the analysis processes and the key factors that affect the quality of the result. For a compilation of current microarray analysis software see a recent article [1] and author’s website

Optimal Sample Size for Multiple Testing: the Case of Gene Expression Microarrays

by Peter Müller, Giovanni Parmigiani, Christian Robert, Judith Rousseau - Journal of the American Statistical Association , 2004
"... We consider the choice of an optimal sample size for multiple comparison problems. The motivating application is the choice of the number of microarray experiments to be carried out when learning about dierential gene expression. However, the approach is valid in any application that involves multip ..."
Abstract - Cited by 75 (5 self) - Add to MetaCart
We consider the choice of an optimal sample size for multiple comparison problems. The motivating application is the choice of the number of microarray experiments to be carried out when learning about dierential gene expression. However, the approach is valid in any application that involves multiple comparison in a large number of hypothesis tests.

A multivariate empirical bayes statistic for replicated microarray time course data

by Chuan Tai, Terence P. Speed - Annals of Statistics , 2006
"... ar ..."
Abstract - Cited by 71 (4 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...dentifying temporally changing genes in replicated microarray experiments is to carry out multiple pairwise comparisons across times, using statistics developed for comparing two independent samples, =-=[2, 6, 13, 19, 20, 23, 24, 33]-=-. These methods are not entirely appropriate as they do not incorporate the fact that longitudinal microarray time course samples are correlated. A simple and intuitive approach to our problem is to u...

Statistical strategies for avoiding false discoveries in metabolomics and related experiments

by David I. Broadhurst, Douglas B. Kell A , 2006
"... Many metabolomics, and other high-content or high-throughput, experiments are set up such that the primary aim is the discovery of biomarker metabolites that can discriminate, with a certain level of certainty, between nominally matched ‘case ’ and ‘control ’ samples. However, it is unfortunately ve ..."
Abstract - Cited by 61 (11 self) - Add to MetaCart
Many metabolomics, and other high-content or high-throughput, experiments are set up such that the primary aim is the discovery of biomarker metabolites that can discriminate, with a certain level of certainty, between nominally matched ‘case ’ and ‘control ’ samples. However, it is unfortunately very easy to find markers that are apparently persuasive but that are in fact entirely spurious, and there are well-known examples in the proteomics literature. The main types of danger are not entirely independent of each other, but include bias, inadequate sample size (especially relative to the number of metabolite variables and to the required statistical power to prove that a biomarker is discriminant), excessive false discovery rate due to multiple hypothesis testing, inappropriate choice of particular numerical methods, and overfitting (generally caused by the failure to perform adequate validation and cross-validation). Many studies fail to take these into account, and thereby fail to discover anything of true significance (despite their claims). We summarise these problems, and provide pointers to a substantial existing literature that should assist in the improved design and evaluation of metabolomics experiments, thereby allowing robust scientific conclusions to be drawn from the available data. We provide a list of some of the simpler checks that might improve one’s confidence that a candidate biomarker is not simply a statistical artefact, and suggest a series of preferred tests and visualisation tools that can assist readers and authors in assessing papers. These tools can be applied to individual metabolites by using multiple univariate tests performed in parallel across all metabolite peaks. They may also be applied to the validation of multivariate models. We stress in
(Show Context)

Citation Context

...into account the prevalences of the two classes in a given population, and in consequence we do not need to be, and largely are not, either explicitly Bayesian (Berry, 1996; Bernardo and Smith, 2000; =-=Baldi and Long, 2001-=-) or explicitly frequentist for these present purposes. However, in terms of the interpretation of data, it is extremely important that prevalences are taken into account (e.g. (Brenner and Gefeller, ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University