Results 1 - 10
of
1,087
Modeling and simulation of genetic regulatory systems: A literature review
- JOURNAL OF COMPUTATIONAL BIOLOGY
, 2002
"... In order to understand the functioning of organisms on the molecular level, we need to know which genes are expressed, when and where in the organism, and to which extent. The regulation of gene expression is achieved through genetic regulatory systems structured by networks of interactions between ..."
Abstract
-
Cited by 738 (14 self)
- Add to MetaCart
In order to understand the functioning of organisms on the molecular level, we need to know which genes are expressed, when and where in the organism, and to which extent. The regulation of gene expression is achieved through genetic regulatory systems structured by networks of interactions between DNA, RNA, proteins, and small molecules. As most genetic regulatory networks of interest involve many components connected through interlocking positive and negative feedback loops, an intuitive understanding of their dynamics is hard to obtain. As a consequence, formal methods and computer tools for the modeling and simulation of genetic regulatory networks will be indispensable. This paper reviews formalisms that have been employed in mathematical biology and bioinformatics to describe genetic regulatory systems, in particular directed graphs, Bayesian networks, Boolean networks and their generalizations, ordinary and partial differential equations, qualitative differential equations, stochastic equations, and rule-based formalisms. In addition, the paper discusses how these formalisms have been used in the simulation of the behavior of actual regulatory systems.
Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks
, 2002
"... Motivation: Our goal is to construct a model for genetic regulatory networks such that the model class: (i ) incorporates rule-based dependencies between genes; (ii ) allows the systematic study of global network dynamics; (iii ) is able to cope with uncertainty, both in the data and the model selec ..."
Abstract
-
Cited by 391 (59 self)
- Add to MetaCart
Motivation: Our goal is to construct a model for genetic regulatory networks such that the model class: (i ) incorporates rule-based dependencies between genes; (ii ) allows the systematic study of global network dynamics; (iii ) is able to cope with uncertainty, both in the data and the model selection; and (iv ) permits the quantification of the relative influence and sensitivity of genes in their interactions with other genes.
Being Bayesian about network structure
- Machine Learning
, 2000
"... Abstract. In many multivariate domains, we are interested in analyzing the dependency structure of the underlying distribution, e.g., whether two variables are in direct interaction. We can represent dependency structures using Bayesian network models. To analyze a given data set, Bayesian model sel ..."
Abstract
-
Cited by 299 (3 self)
- Add to MetaCart
(Show Context)
Abstract. In many multivariate domains, we are interested in analyzing the dependency structure of the underlying distribution, e.g., whether two variables are in direct interaction. We can represent dependency structures using Bayesian network models. To analyze a given data set, Bayesian model selection attempts to find the most likely (MAP) model, and uses its structure to answer these questions. However, when the amount of available data is modest, there might be many models that have non-negligible posterior. Thus, we want compute the Bayesian posterior of a feature, i.e., the total posterior probability of all models that contain it. In this paper, we propose a new approach for this task. We first show how to efficiently compute a sum over the exponential number of networks that are consistent with a fixed order over network variables. This allows us to compute, for a given order, both the marginal probability of the data and the posterior of a feature. We then use this result as the basis for an algorithm that approximates the Bayesian posterior of a feature. Our approach uses a Markov Chain Monte Carlo (MCMC) method, but over orders rather than over network structures. The space of orders is smaller and more regular than the space of structures, and has much a smoother posterior “landscape”. We present empirical results on synthetic and real-life datasets that compare our approach to full model averaging (when possible), to MCMC over network structures, and to a non-Bayesian bootstrap approach.
Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles
- PLoS Biol
, 2007
"... Machine learning approaches offer the potential to systematically identify transcriptional regulatory interactions from a compendium of microarray expression profiles. However, experimental validation of the performance of these methods at the genome scale has remained elusive. Here we assess the gl ..."
Abstract
-
Cited by 255 (6 self)
- Add to MetaCart
(Show Context)
Machine learning approaches offer the potential to systematically identify transcriptional regulatory interactions from a compendium of microarray expression profiles. However, experimental validation of the performance of these methods at the genome scale has remained elusive. Here we assess the global performance of four existing classes of inference algorithms using 445 Escherichia coli Affymetrix arrays and 3,216 known E. coli regulatory interactions from RegulonDB. We also developed and applied the context likelihood of relatedness (CLR) algorithm, a novel extension of the relevance networks class of algorithms. CLR demonstrates an average precision gain of 36 % relative to the nextbest performing algorithm. At a 60 % true positive rate, CLR identifies 1,079 regulatory interactions, of which 338 were in the previously known network and 741 were novel predictions. We tested the predicted interactions for three transcription factors with chromatin immunoprecipitation, confirming 21 novel interactions and verifying our RegulonDB-based performance estimates. CLR also identified a regulatory link providing central metabolic control of iron transport, which we confirmed with real-time quantitative PCR. The compendium of expression data compiled in this study, coupled with RegulonDB, provides a valuable model system for further improvement of network inference
Computational Discovery of Gene Modules, Regulatory Networks and Expression Programs
, 2007
"... High-throughput molecular data are revolutionizing biology by providing massive amounts of information about gene expression and regulation. Such information is applicable both to furthering our understanding of fundamental biology and to developing new diagnostic and treatment approaches for diseas ..."
Abstract
-
Cited by 236 (17 self)
- Add to MetaCart
High-throughput molecular data are revolutionizing biology by providing massive amounts of information about gene expression and regulation. Such information is applicable both to furthering our understanding of fundamental biology and to developing new diagnostic and treatment approaches for diseases. However, novel mathematical methods are needed for extracting biological knowledge from highdimensional, complex and noisy data sources. In this thesis, I develop and apply three novel computational approaches for this task. The common theme of these approaches is that they seek to discover meaningful groups of genes, which confer robustness to noise and compress complex information into interpretable models. I first present the GRAM algorithm, which fuses information from genome-wide expression and in vivo transcription factor-DNA binding data to discover regulatory networks of
Inferring Subnetworks from Perturbed Expression Profiles
, 2001
"... Genome-wide expression profiles of genetic mutants provide a wide variety of measurements of cellular responses to perturbations. Typical analysis of such data identifies genes affected by perturbation and uses clustering to group genes of similar function. In this paper we discover a finer structur ..."
Abstract
-
Cited by 204 (14 self)
- Add to MetaCart
Genome-wide expression profiles of genetic mutants provide a wide variety of measurements of cellular responses to perturbations. Typical analysis of such data identifies genes affected by perturbation and uses clustering to group genes of similar function. In this paper we discover a finer structure of interactions between genes, such as causality, mediation, activation, and inhibition by using a Bayesian network framework. We extend this framework to correctly handle perturbations, and to identify significant subnetworks of interacting genes. We apply this method to expression data of S. cerevisiae mutants and uncover a variety of structured metabolic, signaling and regulatory pathways. Contact: danab@cs.huji.ac.il
Reverse engineering of regulatory networks in human B cells.
- Nat. Genet.
, 2005
"... Cellular phenotypes are determined by the differential activity of networks linking coregulated genes. Available methods for the reverse engineering of such networks from genome-wide expression profiles have been successful only in the analysis of lower eukaryotes with simple genomes. Using a new m ..."
Abstract
-
Cited by 178 (2 self)
- Add to MetaCart
(Show Context)
Cellular phenotypes are determined by the differential activity of networks linking coregulated genes. Available methods for the reverse engineering of such networks from genome-wide expression profiles have been successful only in the analysis of lower eukaryotes with simple genomes. Using a new method called ARACNe (algorithm for the reconstruction of accurate cellular networks), we report the reconstruction of regulatory networks from expression profiles of human B cells. The results are suggestive a hierarchical, scale-free network, where a few highly interconnected genes (hubs) account for most of the interactions. Validation of the network against available data led to the identification of MYC as a major hub, which controls a network comprising known target genes as well as new ones, which were biochemically validated. The newly identified MYC targets include some major hubs. This approach can be generally useful for the analysis of normal and pathologic networks in mammalian cells. Cell phenotypes are determined by the concerted activity of thousands of genes and their products. This activity is coordinated by a complex network that regulates the expression of genes controlling common functions, such as the formation of a transcriptional complex or the availability of a signaling pathway. Understanding this organization is crucial to elucidate normal cell physiology as well as to dissect complex pathologic phenotypes. Studies in lower organisms indicate that the structure of both protein-protein interaction and metabolic networks is of a hierarchical scale-free nature 1,2 , characterized by an inverse relationship between the number of nodes and their connectivity (scale-free) and by a preferential interaction among highly connected genes, called hubs (hierarchical). Although scale-free networks may represent a common blueprint for all cellular constituents, evidence of scale-free topology in higher-order eukaryotic cells is currently limited to coexpression networks 3,4 , which tend to identify entire subpathways rather than individual interactions. Identifying the organizational network of eukaryotic cells is still a key goal in understanding cell physiology and disease. Genome-wide clustering of gene-expression profiles has provided an initial step towards the elucidation of cellular networks. But the organization of gene-expression profile data into functionally meaningful genetic information has proven difficult and so far has fallen short of uncovering the intricate structure of cellular interactions. This challenge, called network reverse engineering or deconvolution, has led to an entirely new class of methods aimed at producing high-fidelity representations of cellular networks as graphs, where nodes represent genes and edges between them represent interactions, either between the encoded proteins or between the encoded proteins and the genes (we use 'genetic interaction' to refer to both types of mechanisms). Available methods fall into four broad categories: optimization methods 5-7 , which maximize a scoring function over alternative network models; regression techniques Here we present the successful reverse engineering of geneexpression profile data from human B cells. Our study is based on ARACNe (algorithm for the reconstruction of accurate cellular networks), a new approach for the reverse engineering of cellular networks from microarray expression profiles. ARACNe first identifies statistically significant gene-gene coregulation by mutual information, an information-theoretic measure of relatedness. It then eliminates indirect relationships, in which two genes are coregulated through one or more intermediaries, by applying a well-known staple of data
Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks
- Bioinformatics
, 2003
"... Motivation: Bayesian networks have been applied to infer genetic regulatory interactions from microarray gene expression data. This inference problem is particularly hard in that interactions between hundreds of genes have to be learned from very small data sets, typically containing only a few doze ..."
Abstract
-
Cited by 174 (5 self)
- Add to MetaCart
Motivation: Bayesian networks have been applied to infer genetic regulatory interactions from microarray gene expression data. This inference problem is particularly hard in that interactions between hundreds of genes have to be learned from very small data sets, typically containing only a few dozen time points during a cell cycle. Most previous studies have assessed the inference results on real gene expression data by comparing predicted genetic regulatory interactions with those known from the biological literature. This approach is controversial due to the absence of known gold standards, which renders the estimation of the sensitivity and specificity, that is, the true and (complementary) false detection rate, unreliable and difficult. The objective of the present study is to test the viability of the Bayesian network paradigm in a realistic simulation study. First, gene expression data are simulated from a realistic biological network involving DNAs, mRNAs, inactive protein monomers and active protein dimers. Then, interaction networks are inferred from these data in a reverse engineering approach, using Bayesian networks and Bayesian learning with Markov chain Monte Carlo.
Results: The simulation results are presented as receiver operator characteristics curves. This allows estimating the proportion of spurious gene interactions incurred for a specified target proportion of recovered true interactions. The findings demonstrate how the network inference performance varies with the training set size, the degree of inadequacy of prior assumptions, the experimental sampling strategy and the inclusion of further, sequence-based information.
The max-min hill-climbing Bayesian network structure learning algorithm
, 2006
"... We present a new algorithm for Bayesian network structure learning, called Max-Min Hill-Climbing (MMHC). The algorithm combines ideas from local learning, constraint-based, and search-and-score techniques in a principled and effective way. It first reconstructs the skeleton of a Bayesian network a ..."
Abstract
-
Cited by 156 (8 self)
- Add to MetaCart
We present a new algorithm for Bayesian network structure learning, called Max-Min Hill-Climbing (MMHC). The algorithm combines ideas from local learning, constraint-based, and search-and-score techniques in a principled and effective way. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. In our extensive empirical evaluation MMHC outperforms on average and in terms of various metrics several prototypical and state-of-the-art
A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression
- Bioinformatics
, 2004
"... This paper studies the problem of building multiclass classifiers for tissue classification based on gene expression. The recent development of microarray technologies has enabled biologists to quantify gene expression of tens of thousands of genes in a single experiment. Biologists have begun colle ..."
Abstract
-
Cited by 143 (5 self)
- Add to MetaCart
This paper studies the problem of building multiclass classifiers for tissue classification based on gene expression. The recent development of microarray technologies has enabled biologists to quantify gene expression of tens of thousands of genes in a single experiment. Biologists have begun collecting gene expression for a large number of samples. One of the urgent issues in the use of microarray data is to develop methods for characterizing samples based on their gene expression. The most basic step in the research direction is binary sample classification, which has been studied extensively over the past few years. This paper investigates the next step—multiclass classification of samples based on gene expression. The characteristics of expression data (e.g., large number of genes with small sample size)