Results 1 - 10
of
132
Computational Discovery of Gene Modules, Regulatory Networks and Expression Programs
, 2007
"... High-throughput molecular data are revolutionizing biology by providing massive amounts of information about gene expression and regulation. Such information is applicable both to furthering our understanding of fundamental biology and to developing new diagnostic and treatment approaches for diseas ..."
Abstract
-
Cited by 236 (17 self)
- Add to MetaCart
High-throughput molecular data are revolutionizing biology by providing massive amounts of information about gene expression and regulation. Such information is applicable both to furthering our understanding of fundamental biology and to developing new diagnostic and treatment approaches for diseases. However, novel mathematical methods are needed for extracting biological knowledge from highdimensional, complex and noisy data sources. In this thesis, I develop and apply three novel computational approaches for this task. The common theme of these approaches is that they seek to discover meaningful groups of genes, which confer robustness to noise and compress complex information into interpretable models. I first present the GRAM algorithm, which fuses information from genome-wide expression and in vivo transcription factor-DNA binding data to discover regulatory networks of
MJ: TRICLUSTER: an effective algorithm for mining coherent clusters in 3Dmicroarray data
- In SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD international
"... In this paper we introduce a novel algorithm called tri-Cluster, for mining coherent clusters in three-dimensional (3D) gene expression datasets. triCluster can mine arbi-trarily positioned and overlapping clusters, and depending on di®erent parameter values, it can mine di®erent types of clusters, ..."
Abstract
-
Cited by 38 (2 self)
- Add to MetaCart
(Show Context)
In this paper we introduce a novel algorithm called tri-Cluster, for mining coherent clusters in three-dimensional (3D) gene expression datasets. triCluster can mine arbi-trarily positioned and overlapping clusters, and depending on di®erent parameter values, it can mine di®erent types of clusters, including those with constant or similar values along each dimension, as well as scaling and shifting ex-pression patterns. triCluster relies on graph-based ap-proach to mine all valid clusters. For each time slice, i.e., a gene£sample matrix, it constructs the range multigraph, a compact representation of all similar value ranges between any two sample columns. It then searches for constrained maximal cliques in this multigraph to yield the set of bi-clusters for this time slice. Then triCluster constructs another graph using the biclusters (as vertices) from each time slice; mining cliques from this graph yields the ¯nal set of triclusters. Optionally, triCluster merges/deletes some clusters having large overlaps. We present a useful set of metrics to evaluate the clustering quality, and we show that triCluster can ¯nd signi¯cant triclusters in the real microarray datasets. 1.
Analyzing Gene Expression Time-Courses
- IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 2, NO. 3, JULY-SEPTEMBER 2005
, 2005
"... Measuring gene expression over time can provide important insights into basic cellular processes. Identifying groups of genes with similar expression time-courses is a crucial first step in the analysis. As biologically relevant groups frequently overlap, due to genes having several distinct roles ..."
Abstract
-
Cited by 29 (6 self)
- Add to MetaCart
Measuring gene expression over time can provide important insights into basic cellular processes. Identifying groups of genes with similar expression time-courses is a crucial first step in the analysis. As biologically relevant groups frequently overlap, due to genes having several distinct roles in those cellular processes, this is a difficult problem for classical clustering methods. We use a mixture model to circumvent this principal problem, with hidden Markov models (HMMs) as effective and flexible components. We show that the ensuing estimation problem can be addressed with additional labeled data—partially supervised learning of mixtures—through a modification of the Expectation-Maximization (EM) algorithm. Good starting points for the mixture estimation are obtained through a modification to Bayesian model merging, which allows us to learn a collection of initial HMMs. We infer groups from mixtures with a simple information-theoretic decoding heuristic, which quantifies the level of ambiguity in group assignment. The effectiveness is shown with high-quality annotation data. As the HMMs we propose capture asynchronous behavior by design, the groups we find are also asynchronous. Synchronous subgroups are obtained from a novel algorithm based on Viterbi paths. We show the suitability of our HMM mixture approach on biological and simulated data and through the favorable comparison with previous approaches. A software implementing the method is freely available under the GPL from
Automated discovery of functional generality of human gene expression programs. PLoS Computational Biology, to appear
, 2007
"... An important research problem in computational biology is the identification of expression programs, sets of coexpressed genes orchestrating normal or pathological processes, and the characterization of the functional breadth of these programs. The use of human expression data compendia for discover ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
(Show Context)
An important research problem in computational biology is the identification of expression programs, sets of coexpressed genes orchestrating normal or pathological processes, and the characterization of the functional breadth of these programs. The use of human expression data compendia for discovery of such programs presents several challenges including cellular inhomogeneity within samples, genetic and environmental variation across samples, uncertainty in the numbers of programs and sample populations, and temporal behavior. We developed GeneProgram, a new unsupervised computational framework based on Hierarchical Dirichlet Processes that addresses each of the above challenges. GeneProgram uses expression data to simultaneously organize tissues into groups and genes into overlapping programs with consistent temporal behavior, to produce maps of expression programs, which are sorted by generality scores that exploit the automatically learned groupings. Using synthetic and real gene expression data, we showed that GeneProgram outperformed several popular expression analysis methods. We applied GeneProgram to a compendium of 62 short time-series gene expression datasets exploring the responses of human cells to infectious agents and immune-modulating molecules. GeneProgram produced a map of 104 expression programs, a substantial number of which were significantly enriched for genes involved in key signaling pathways and/or bound by NF-jB transcription factors in genome-wide experiments. Further, GeneProgram discovered expression programs that appear to implicate surprising signaling pathways or receptor types in the response to infection, including Wnt signaling and
Identification of regulatory modules in time-series gene expression data using a linear time biclustering algorithm
- IEEE/ACM Transactions on Computational Biology and Bioinformatics
"... Several non-supervised machine learning methods have been used in the analysis of gene expression data obtained from microarray experiments. Recently, biclustering, a non-supervised approach that performs simultaneous clustering on the row and column dimensions of the data matrix, has been shown to ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
(Show Context)
Several non-supervised machine learning methods have been used in the analysis of gene expression data obtained from microarray experiments. Recently, biclustering, a non-supervised approach that performs simultaneous clustering on the row and column dimensions of the data matrix, has been shown to be remarkably effective in a variety of applications. The goal of biclustering is to find subgroups of genes and subgroups of experimental conditions, where the genes exhibit highly correlated behaviors. These correlated behaviors correspond to coherent expression patterns and can be used to identify potential regulatory modules possibly involved in regulatory mechanisms. Many specific versions of the biclustering problem have been shown to be NP-complete. However, when we are interested in identifying biclusters in time series expression data, we can restrict the problem by finding only maximal biclusters with contiguous columns. This restriction leads to a tractable problem. Its motivation is the fact that biological processes start and finish in an identifiable contiguous period of time, leading to increased (or decreased) activity of sets of genes forming biclusters with contiguous
Narayanan A: Discovering gene networks with a neural-genetic hybrid
- Computational Biology and Bioinformatics, IEEE/ACM Transactions on
"... Abstract—Recent advances in biology (namely, DNA arrays) allow an unprecedented view of the biochemical mechanisms contained within a cell. However, this technology raises new challenges for computer scientists and biologists alike, as the data created by these arrays is often highly complex. One of ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Recent advances in biology (namely, DNA arrays) allow an unprecedented view of the biochemical mechanisms contained within a cell. However, this technology raises new challenges for computer scientists and biologists alike, as the data created by these arrays is often highly complex. One of the challenges is the elucidation of the regulatory connections and interactions between genes, proteins and other gene products. In this paper, a novel method is described for determining gene interactions in temporal gene expression data using genetic algorithms combined with a neural network component. Experiments conducted on real-world temporal gene expression data sets confirm that the approach is capable of finding gene networks that fit the data. A further repeated approach shows that those genes significantly involved in interaction with other genes can be highlighted and hypothetical gene networks and circuits proposed for further laboratory testing. Index Terms—Gene expression analysis, neural networks, genetic algorithms, reverse-engineering, gene interactions. æ 1
BATS: a Bayesian user-friendly software for Analyzing Time Series microarray experiments.
"... BATS is a user-friendly software for Bayesian Analysis of Time Series microarray experiments based on the novel, truly functional and fully Bayesian approach proposed in Angelini et at. (2006). The software is specifically designed for time series data. It allows an user to automatically identify an ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
BATS is a user-friendly software for Bayesian Analysis of Time Series microarray experiments based on the novel, truly functional and fully Bayesian approach proposed in Angelini et at. (2006). The software is specifically designed for time series data. It allows an user to automatically identify and rank differentially expressed genes and to estimate their expression profiles. BATS successfully manages various technical difficulties which arise in microarray time-course experiments, such as a small number of observations, non-uniform sampling intervals, and presence of missing or multiple data. BATS can carry out analysis with both simulated and real experimental data. It also handles data from different platforms. 1 Availability: BATS is written in Matlab and executable in Windows (Macintosh and Linux version are currently under development). It is freely available upon request from the authors. 1
A (2009) Model-based redesign of global transcription regulation. Nucleic acids research 37: e38–e38
- GRNs Inference Based on DC PLOS ONE | www.plosone.org 7 February 2014 | Volume 9 | Issue 2 | e87446
"... doi:10.1093/nar/gkp022 ..."
(Show Context)
Bar-Joseph Z: Identifying cycling genes by combining sequence homology and expression data
- Bioinformatics
"... Vol. 22 no. 14 2006, pages e314–e322 doi:10.1093/bioinformatics/btl229BIOINFORMATICS Identifying cycling genes by combining sequence homology and expression data ..."
Abstract
-
Cited by 13 (7 self)
- Add to MetaCart
(Show Context)
Vol. 22 no. 14 2006, pages e314–e322 doi:10.1093/bioinformatics/btl229BIOINFORMATICS Identifying cycling genes by combining sequence homology and expression data
Class prediction from time series gene expression profiles using dynamical systems kernels
- PROCEEDINGS OF THE PACIFIC SYMPOSIUM OF BIOCOMPUTING 2006, PAGES 547–558, MAUI HAWAII
, 2006
"... We present a kernel-based approach to the classification of a time series of gene expression profiles. Our method takes into account the dynamic evolution over time as well as the temporal characteristics of the data. More specifically, we model the evolution of the gene expression profiles as a Lin ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
(Show Context)
We present a kernel-based approach to the classification of a time series of gene expression profiles. Our method takes into account the dynamic evolution over time as well as the temporal characteristics of the data. More specifically, we model the evolution of the gene expression profiles as a Linear Time Invariant (LTI) dynamical system and estimate its model parameters. A kernel on dynamical systems is then used to classify these time series. We successfully test our approach on a published dataset to predict response to drug therapy in Multiple Sclerosis patients. For pharmacogenomics, our method offers a huge potential for advanced computational tools in disease diagnosis, and disease and drug therapy outcome prognosis.