• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Exploring expression data: identification and analysis of coexpressed genes. Genome Res. (1999)

by Heyer LJ, S Kruglyak, S Yooseph
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 316
Next 10 →

A Bayesian Framework for the Analysis of Microarray Expression Data: Regularized t-Test and Statistical Inferences of Gene Changes

by Pierre Baldi, Anthony D. Long - Bioinformatics , 2001
"... Motivation: DNA microarrays are now capable of providing genome-wide patterns of gene expression across many different conditions. The first level of analysis of these patterns requires determining whether observed differences in expression are significant or not. Current methods are unsatisfactory ..."
Abstract - Cited by 491 (6 self) - Add to MetaCart
Motivation: DNA microarrays are now capable of providing genome-wide patterns of gene expression across many different conditions. The first level of analysis of these patterns requires determining whether observed differences in expression are significant or not. Current methods are unsatisfactory due to the lack of a systematic framework that can accommodate noise, variability, and low replication often typical of microarray data. Results: We develop a Bayesian probabilistic framework for microarray data analysis. At the simplest level, we model log-expression values by independent normal distributions, parameterized by corresponding means and variances with hierarchical prior distributions. We derive point estimates for both parameters and hyperparameters, and regularized expressions for the variance of each gene by combining the empirical variance with a local background variance associated with neighboring genes. An additional hyperparameter, inversely related to the number of empirical observations, determines the strength of the background variance. Simulations show that these point estimates, combined with a t-test, provide a systematic inference approach that compares favorably with simple t-test or fold methods, and partly compensate for the lack of replication. Availability: The approach is implemented in a software called Cyber-T accessible through a Web interface at www.genomics.uci.edu/software.html. The code is available as Open Source and is written in the freely available statistical language R. and Department of Biological Chemistry, College of Medicine, University of California, Irvine. To whom all correspondence should be addressed. Contact: pfbaldi@ics.uci.edu, tdlong@uci.edu. 1

Missing value estimation methods for DNA microarrays

by Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein, Russ B. Altman , 2001
"... Motivation: Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K-means clu ..."
Abstract - Cited by 477 (24 self) - Add to MetaCart
Motivation: Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K-means clustering are not robust to missing data, and may lose effectiveness even with a few missing values. Methods for imputing missing data are needed, therefore, to minimize the effect of incomplete data sets on analyses, and to increase the range of data sets to which these algorithms can be applied. In this report, we investigate automated methods for estimating missing data.

Genetic Network Inference: From Co-Expression Clustering To Reverse Engineering

by Patrik D'Haeseleer, Shoudan Liang, Roland Somogyi , 2000
"... motivation: Advances in molecular biological, analytical and computational technologies are enabling us to systematically investigate the complex molecular processes underlying biological systems. In particular, using highthroughput gene expression assays, we are able to measure the output of the ge ..."
Abstract - Cited by 336 (0 self) - Add to MetaCart
motivation: Advances in molecular biological, analytical and computational technologies are enabling us to systematically investigate the complex molecular processes underlying biological systems. In particular, using highthroughput gene expression assays, we are able to measure the output of the gene regulatory network. We aim here to review datamining and modeling approaches for conceptualizing and unraveling the functional relationships implicit in these datasets. Clustering of co-expression profiles allows us to infer shared regulatory inputs and functional pathways. We discuss various aspects of clustering, ranging from distance measures to clustering algorithms and multiple-cluster memberships. More advanced analysis aims to infer causal connections between genes directly, i.e. who is regulating whom and how. We discuss several approaches to the problem of reverse engineering of genetic networks, from discrete Boolean networks, to continuous linear and non-linear models. We conclude that the combination of predictive modeling with systematic experimental verification will be required to gain a deeper insight into living organisms, therapeutic targeting and bioengineering.

Relating Whole-Genome Expression Data with Protein-Protein Interactions

by Ronald Jansen, Dov Greenbaum, Mark Gerstein , 2002
"... this paper is the interactions occurring within specific complexes. These were obtained from the MIPS complexes catalog (Fellenberg et al. 2000), which represents a carefully annotated, comprehensive data set of protein complexes culled from the scientific literature. In addition, we looked at other ..."
Abstract - Cited by 229 (14 self) - Add to MetaCart
this paper is the interactions occurring within specific complexes. These were obtained from the MIPS complexes catalog (Fellenberg et al. 2000), which represents a carefully annotated, comprehensive data set of protein complexes culled from the scientific literature. In addition, we looked at other types of protein-protein interactions from large "aggregated" data sets collecting many heterogeneous pair-wise interactions. We collected these from the MIPS catalogs of physical and genetic interactions (Fellenberg et al. 2000), databases of interacting proteins (DIP and BIND) (Bader and Hogue 2000; Xenarios 2000), and a comprehensive collection of yeast two-hybrid experiments (Cagney et al. 2000; lto et al. 2000; Schwikowski et al. 2000; Uetz et al. 2000; Uetz and Hughes 2000; lto et al. 2001). These interactions are subdivided into groups based on their method of discovery. They include physical interactions (e.g., collected through coimmunoprecipitation and copurification), genetic interactions (e.g., determined through genetic means such as synthetic lethality or suppression experiments), and yeast twohybrid pairs
(Show Context)

Citation Context

...we look at the distribution of Pearson correlation coefficients for pairs of genes as the measure of similarity. (Other measures of similarity are possible as well (D'haeseleer 1997; Wen et al. 1998; =-=Heyer et al. 1999-=-; Qian et al. 2001).) As the input for our procedure we use the expression vectors or profiles of all the subunits of a complex and then compute their pair-wise correlations. Like for the normalized d...

A Hierarchical Unsupervised Growing Neural Network for Clustering Gene Expression Patterns

by Javier Herrero, Alfonso Valencia, Joaqun Dopazo , 2001
"... Motivation: We describe a new approach to the analysis of gene expression data coming from DNA array experiments, using an unsupervised neural network. DNA array technologies allow monitoring thousands of genes rapidly and efficiently. One of the interests of these studies is the search for correlat ..."
Abstract - Cited by 200 (20 self) - Add to MetaCart
Motivation: We describe a new approach to the analysis of gene expression data coming from DNA array experiments, using an unsupervised neural network. DNA array technologies allow monitoring thousands of genes rapidly and efficiently. One of the interests of these studies is the search for correlated gene expression patterns, and this is usually achieved by clustering them. The Self-Organising Tree Algorithm, (SOTA) (Dopazo,J. and Carazo,J.M. (1997) J. Mol. Evol., 44, 226--233), is a neural network that grows adopting the topology of a binary tree. The result of the algorithm is a hierarchical cluster obtained with the accuracy and robustness of a neural network. Results: SOTA clustering confers several advantages over classical hierarchical clustering methods. SOTA is a divisive method: the clustering process is performed from top to bottom, i.e. the highest hierarchical levels are resolved before going to the details of the lowest levels. The growing can be stopped at the desired hierarchical level. Moreover, a criterion to stop the growing of the tree, based on the approximate distribution of probability obtained by randomisation of the original data set, is provided. By means of this criterion, a statistical support for the definition of clusters is proposed. In addition, obtaining average gene expression patterns is a built-in feature of the algorithm. Different neurons defining the different hierarchical levels represent the averages of the gene expression patterns contained in the clusters. Since SOTA runtimes are approximately linear with the number of items to be classified, it is especially suitable for dealing with huge amounts of data. The method proposed is very general and applies to any data providing that they can be coded as a series of numbers and t...

Cluster Analysis for Gene Expression Data: A Survey

by Daxin Jiang, Chun Tang, Aidong Zhang - IEEE Transactions on Knowledge and Data Engineering , 2004
"... Abstract—DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity f ..."
Abstract - Cited by 149 (5 self) - Add to MetaCart
Abstract—DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increases the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. Cluster analysis seeks to partition a given data set into groups based on specified features so that the data points within a group are more similar to each other than the points in different groups. A very rich literature on cluster analysis has developed over the past three decades. Many conventional clustering algorithms have been adapted or directly applied to gene expression data, and also new algorithms have recently been proposed specifically aiming at gene expression data. These clustering algorithms have been proven useful for identifying biologically relevant groups of genes and samples. In this paper, we first briefly introduce the concepts of microarray technology and discuss the basic elements of clustering on gene expression data. In particular, we divide cluster analysis for gene expression data into three categories. Then, we present specific challenges pertinent to each clustering category and introduce several representative approaches. We also discuss the problem of cluster validation in three aspects and review various methods to assess the quality and reliability of clustering results. Finally, we conclude this paper and suggest the promising trends in this field. Index Terms—Microarray technology, gene expression data, clustering.
(Show Context)

Citation Context

...oefficient is widely used and has proven effective as a similarity measure for gene expression data [36, 64, 65, 74]. However, empirical study has shown that it is not robust with respect to outliers =-=[30]-=-, thus potentially yielding false positives which assign a high similarity score to a pair of dissimilar patterns. If two patterns have a common peak or valley at a single feature, the correlation wil...

From patterns to pathways: gene expression data analysis comes of age.

by Donna K Slonim - Nature Genetics , 2002
"... ..."
Abstract - Cited by 134 (1 self) - Add to MetaCart
Abstract not found

Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions

by Jiang Qian, Marisa Dolled-filhart, Jimmy Lin, Haiyuan Yu, Mark Gerstein - J MOL BIOL , 2001
"... The complexity of biological systems provides for a great diversity of relationships between genes. The current analysis of whole-genome expression data focuses on relationships based on global correlation over a whole time-course, identifying clusters of genes whose expression levels simultaneously ..."
Abstract - Cited by 84 (9 self) - Add to MetaCart
The complexity of biological systems provides for a great diversity of relationships between genes. The current analysis of whole-genome expression data focuses on relationships based on global correlation over a whole time-course, identifying clusters of genes whose expression levels simultaneously rise and fall. There are, of course, other potential relationships between genes, which are missed by such global clustering. These include activation, where one expects a time-delay between related expression pro®les, and inhibition, where one expects an inverted relationship. Here, we propose a new method, which we call local clustering, for identifying these time-delayed and inverted relationships. It is related to conventional gene-expression clustering in a fashion analogous to the way local sequence alignment (the Smith-Waterman algorithm) is derived from global alignment (Needleman-Wunsch). An integral part of our method is the use of random score distributions to assess the statistical signi®cance of each cluster. We applied our method to the yeast cellcycle
(Show Context)

Citation Context

...shown in Figure 1A. Other measures include squared Pearson correlation coefficient (D'haeseleer et al., 1997), Spearman rank correlation (D'haeseleer et al., 1997), jackknife correlation coefficient (=-=Heyer et al., 1999-=-), and Euclidean distance (Wen et al., 1998). The major drawback in these measures is that they ignore many additional relationships implicit in expression time courses. For instance, a gene may contr...

Ensemble learning for independent component analysis

by James W. Miskin - IN ADVANCES IN INDEPENDENT COMPONENT ANALYSIS , 2000
"... This thesis is concerned with the problem of Blind Source Separation. Specifically we considerthe Independent Component Analysis (ICA) model in which a set of observations are modelled by xt = Ast: (1) where A is an unknown mixing matrix and st is a vector of hidden source components attime t. The ..."
Abstract - Cited by 59 (3 self) - Add to MetaCart
This thesis is concerned with the problem of Blind Source Separation. Specifically we considerthe Independent Component Analysis (ICA) model in which a set of observations are modelled by xt = Ast: (1) where A is an unknown mixing matrix and st is a vector of hidden source components attime t. The ICA problem is to find the sources given only a set of observations. In chapter 1, the blind source separation problem is introduced. In chapter 2 the methodof Ensemble Learning is explained. Chapter 3 applies Ensemble Learning to the ICA model and chapter 4 assesses the use of Ensemble Learning for model selection.Chapters 5-7 apply the Ensemble Learning ICA algorithm to data sets from physics (a medical imaging data set consisting of images of a tooth), biology (data sets from cDNAmicro-arrays) and astrophysics (Planck image separation and galaxy spectra separation).
(Show Context)

Citation Context

...ny tissues we should be able to understand the gene expression underlying the biology. Many different techniques have been used to analyse the data, such as a Self Organising Mapping [58], Clustering =-=[3, 26, 29, 50, 62, 74, 75, 118]-=- and modelling the genetic regulatory networks [60]. Each of the techniques implicitly assumes a specific model for the data. For instance, clustering assumes that the different genes can be grouped i...

A duplication growth model of gene expression networks

by Ashish Bhan, David J. Galas, T. Gregory Dewey - Bioinformatics , 2002
"... ..."
Abstract - Cited by 58 (3 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...works from Dynamic Models of Gene Expression. There have been a number of recent attempts to analyze time series data for whole genome expression profiles (Dewey and Galas, 2001; Holter et al., 2000; =-=Heyer et al., 1999-=-; DeRisi et al., 1997; Spellman et al., 1998). Interestingly, this does not require the complexity of detailed non-linear models of gene expression, but needs only simple, linear models (Dewey and Gal...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University