Results 1 - 10
of
55
From Promoter Sequence to Expression: A Probabilistic Framework
, 2002
"... We present a probabilistic framework that models the process by which transcriptional binding explains the mRNA expression of different genes. Our joint probabilistic model unifies the two key components of this process: the prediction of gene regulation events from sequence motifs in the gene's pro ..."
Abstract
-
Cited by 44 (5 self)
- Add to MetaCart
We present a probabilistic framework that models the process by which transcriptional binding explains the mRNA expression of different genes. Our joint probabilistic model unifies the two key components of this process: the prediction of gene regulation events from sequence motifs in the gene's promoter region, and the prediction of mRNA expression from combinations of gene regulation events in different settings. Our approach has several advantages. By learning promoter sequence motifs that are directly predictive of expression data, it can improve the identification of binding site patterns. It is also able to identify combinatorial regulation via interactions of different transcription factors. Finally, the general framework allows us to integrate additional data sources, including data from the recent binding localization assays. We demonstrate our approach on the cell cycle data of Spellman et al., combined with the binding localization information of Simon et al. We show that the learned model predicts expression from sequence, and that it identifies coherent co-regulated groups with significant transcription factor motifs. It also provides valuable biological insight into the domain via these co-regulated "modules" and the combinatorial regulation effects that govern their behavior.
Context-Specific Bayesian Clustering for Gene Expression Data
, 2002
"... The recent growth in genomic data and measurements of genome-wide expression patterns allows us to apply computational tools to examine gene regulation by transcription factors. ..."
Abstract
-
Cited by 41 (5 self)
- Add to MetaCart
The recent growth in genomic data and measurements of genome-wide expression patterns allows us to apply computational tools to examine gene regulation by transcription factors.
ab initio prediction of transcription factor targets using structural knowledge
- PLoS Comput Biol
, 2005
"... Current approaches for identification and detection of transcription factor binding sites rely on an extensive set of known target genes. Here we describe a novel structure-based approach applicable to transcription factors with no prior binding data. Our approach combines sequence data and structur ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
Current approaches for identification and detection of transcription factor binding sites rely on an extensive set of known target genes. Here we describe a novel structure-based approach applicable to transcription factors with no prior binding data. Our approach combines sequence data and structural information to infer context-specific amino acid–nucleotide recognition preferences. These are used to predict binding sites for novel transcription factors from the same structural family. We demonstrate our approach on the Cys 2His 2 Zinc Finger protein family, and show that the learned DNA-recognition preferences are compatible with experimental results. We use these preferences to perform a genome-wide scan for direct targets of Drosophila melanogaster Cys 2His 2 transcription factors. By analyzing the predicted targets along with gene annotation and expression data we infer the function and activity of these proteins. Citation: Kaplan T, Friedman N, Margalit H (2005) Ab initio prediction of transcription factor targets using structural knowledge. PLoS Comp Biol 1(1): e1.
A discriminative model for identifying spatial cis-regulatory modules
- In Proc. RECOMB’04
, 2004
"... Transcriptional regulation is mediated by the coordinated binding of transcription factors to the upstream regions of genes. In higher eukaryotes, the binding sites of cooperating transcription factors are organized into short sequence units, called cis-regulatory modules. In this paper, we propose ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
Transcriptional regulation is mediated by the coordinated binding of transcription factors to the upstream regions of genes. In higher eukaryotes, the binding sites of cooperating transcription factors are organized into short sequence units, called cis-regulatory modules. In this paper, we propose a method for identifying modules of transcription factor binding sites in a set of co-regulated genes, using only the raw sequence data as input. Our method is based on a novel probabilistic model that describes the mechanism of cis-regulation, including the binding sites of cooperating transcription factors, the organization of these binding sites into short sequence modules, and the regulation of a gene by its modules. We show that our method is successful in discovering planted modules in simulated data and known modules in yeast. More importantly, we applied our method to a large collection of human gene sets and found 83 significant cis-regulatory modules, which included 36 known motifs and many novel ones. Thus, our results provide one of the first comprehensive compendiums of putative cis-regulatory modules in human. Key words: cis-regulatory module, probabilistic model, transcriptional regulation. 1.
Petri net based model validation in systems biology
- In 25th International Conference on Application and Theory of Petri Nets
, 2004
"... Abstract. This paper describes the thriving application of Petri net theory for model validation of different types of molecular biological systems. After a short introduction into systems biology we demonstrate how to develop and validate qualitative models of biological pathways in a systematic ma ..."
Abstract
-
Cited by 16 (8 self)
- Add to MetaCart
Abstract. This paper describes the thriving application of Petri net theory for model validation of different types of molecular biological systems. After a short introduction into systems biology we demonstrate how to develop and validate qualitative models of biological pathways in a systematic manner using the well-established Petri net analysis technique of place and transition invariants. We discuss special properties, which are characteristic ones for biological pathways, and give three representative case studies, which we model and analyse in more detail. The examples used in this paper cover signal transduction pathways as well as metabolic pathways. 1
Probabilistic discovery of overlapping cellular processes and their regulation
- J Comput Biol
, 2004
"... Many of the functions carried out by a living cell are regulated at the transcriptional level, to ensure that genes are expressed when they are needed. Thus, to understand biological processes, it is thus necessary to understand the cell’s transcriptional network. In this paper, we propose a novel p ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Many of the functions carried out by a living cell are regulated at the transcriptional level, to ensure that genes are expressed when they are needed. Thus, to understand biological processes, it is thus necessary to understand the cell’s transcriptional network. In this paper, we propose a novel probabilistic model of gene regulation for the task of identifying overlapping biological processes and the regulatory mechanism controlling their activation. A key feature of our approach is that we allow genes to participate in multiple processes, thus providing a more biologically plausible model for the process of gene regulation. We present an algorithm to learn this model automatically from data, using only genome-wide measurements of gene expression as input. We compare our results to those obtained by other approaches, and show significant benefits can be gained by modeling both the organization of genes into overlapping cellular processes and the regulatory programs of these processes. Moreover, our method successfully grouped genes known to function together, recovered many regulatory relationships that are known in the literature, and suggested novel hypotheses regarding the regulatory role of previously uncharacterized proteins.
Finding Motifs in Promoter Regions
- J. Comp. Biol
, 2005
"... A central issue in molecular biology is understanding the regulatory mechanisms that control gene expression. The availability of whole genome sequences opens the way for computational methods to search for the key elements in transcription regulation. These include methods for discovering the bindi ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
A central issue in molecular biology is understanding the regulatory mechanisms that control gene expression. The availability of whole genome sequences opens the way for computational methods to search for the key elements in transcription regulation. These include methods for discovering the binding sites of DNA-binding proteins, such as transcription factors. A common representation of transcription factor binding sites is a position specific score matrix (PSSM). We developed a probabilistic approach for searching for putative binding sites. Given a promoter sequence and a PSSM, we scan the promoter and find the position with the maximal score. Then we calculate the probability to get such a maximal score or higher on a random promoter. This is the p-value of the putative binding site. In this way, we searched for putative binding sites in the upstream sequences of Saccharomyces cerevisiae, where some binding sites are known (according to the Saccharomyces cerevisiae Promoters Database, SCPD). Our method produces either exact p-values, or a better estimate for them than other methods, and this improves the results of the search. For each gene we found its statistically significant putative binding sites. We measured the rates of true positives, by a comparison to the known binding sites, and also compared our results to these of MatInspector, a commercially available software that looks for putative binding sites in DNA sequences according to PSSMs. Our results were significantly better. In contrast with us, MatInspector doesn’t calculate the exact statistical significance of its results. Key words: promoters, transcription factors, binding sites, PSSM. 1.
Distribution Patterns of Over-Represented k-mers in Non-Coding Yeast DNA
"... Motivation: Over-represented k-mers in genomic DNA regions are often of particular biological interest. For example, over-represented k-mers in co-regulated families of genes are associated with the DNA binding sites of transcription factors. To measure over-representation, we introduce a statistic ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
Motivation: Over-represented k-mers in genomic DNA regions are often of particular biological interest. For example, over-represented k-mers in co-regulated families of genes are associated with the DNA binding sites of transcription factors. To measure over-representation, we introduce a statistical background model based on single-mismatches, and apply it to the pooled 500bp ORF upstream regions of yeast. More importantly, we investigate the context and spatial distribution of overrepresented k-mers in yeast upstream regions.
PlantProm: a database of plant promoter sequences
- Nucleic Acids Res
, 2003
"... PlantProm DB, a plant promoter database, is an annotated, non-redundant collection of proximal promoter sequences for RNA polymerase II with experimentally determined transcription start site(s), TSS, from various plant species. The first release (2002.01) of PlantProm DBcontains 305 entries includi ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
PlantProm DB, a plant promoter database, is an annotated, non-redundant collection of proximal promoter sequences for RNA polymerase II with experimentally determined transcription start site(s), TSS, from various plant species. The first release (2002.01) of PlantProm DBcontains 305 entries including 71, 220 and 14 promoters from monocot, dicot and other plants, respectively. It provides DNA sequence of the promoter regions ( 200: þ51) with TSS on the fixed position þ201, taxonomic/promoter type classification of promoters and Nucleotide Frequency Matrices (NFM) for promoter elements: TATA-box, CCAAT-box and TSS-motif (Inr). Analysis of TSS-motifs revealed that their composition is different in dicots and monocots, as well as for TATA and TATA-less promoters. The database serves as learning set in developing plant promoter prediction programs. One such program (TSSP) based on discriminant analysis has been created by Softberry Inc. and the application of a support vector machine approach for promoter identification is under development.

