Results 1 - 10
of
146
Comining phylogenetic data with co-regulated genes to identify regulatory motif
- BIOINFORMATICS
, 2003
"... Motivation: Discovery of regulatory motifs in unaligned DNA sequences remains a fundamental problem in computational biology. Two categories of algorithms have been developed to identify common motifs from a set of DNA sequences. The first can be called a ‘multiple genes, single species’approach. It ..."
Abstract
-
Cited by 136 (11 self)
- Add to MetaCart
(Show Context)
Motivation: Discovery of regulatory motifs in unaligned DNA sequences remains a fundamental problem in computational biology. Two categories of algorithms have been developed to identify common motifs from a set of DNA sequences. The first can be called a ‘multiple genes, single species’approach. It proposes that a degenerate motif is embedded in some or all of the otherwise unrelated input sequences and tries to describe a consensus motif and identify its occurrences. It is often used for co-regulated genes identified through experimental approaches. The second approach can be called ‘single gene, multiple species’. It requires orthologous input sequences and tries to identify unusually well conserved regions by phylogen-etic footprinting. Both approaches perform well, but each has some limitations. It is tempting to combine the knowledge of co-regulation among different genes and conservation among orthologous genes to improve our ability to identify motifs. Results: Based on the Consensus algorithm previously established by our group, we introduce a new algorithm called PhyloCon (Phylogenetic Consensus) that takes into account both conservation among orthologous genes and co-regulation of genes within a species. This algorithm first aligns conserved regions of orthologous sequences into multiple sequence alignments, or profiles, then compares profiles representing non-orthologous sequences. Motifs emerge as common regions in these profiles. Here we present a novel statistic to compare profiles of DNA sequences and a greedy approach to search for common subprofiles. We demonstrate that PhyloCon performs well on both synthetic and biological data. Availability: Software available upon request from the authors.
Gibbs recursive sampler: finding transcription factor binding sites
- Nucleic Acids Res
, 2003
"... The Gibbs Motif Sampler is a software package for locating common elements in collections of biopolymer sequences. In this paper we describe a new variation of the Gibbs Motif Sampler, the Gibbs Recursive Sampler, which has been developed specifically for locating multiple transcription factor bindi ..."
Abstract
-
Cited by 92 (7 self)
- Add to MetaCart
(Show Context)
The Gibbs Motif Sampler is a software package for locating common elements in collections of biopolymer sequences. In this paper we describe a new variation of the Gibbs Motif Sampler, the Gibbs Recursive Sampler, which has been developed specifically for locating multiple transcription factor binding sites for multiple transcription factors simultaneously in unaligned DNA sequences that may be heterogeneous in DNA composition. Here we describe the basic operation of the web-based version of this sampler. The sampler may be accessed at
STAMP: a web tool for exploring DNA-binding motif similarities.” Nucleic Acids Res, 35(Web Server issue
, 2007
"... doi:10.1093/nar/gkm272 ..."
(Show Context)
Sigma70 promoters in Escherichia coli: specific transcription in dense regions of overlapping promoterlike signals
- J Mol Biol
, 2003
"... We present here a computational analysis showing that sigma70 house-keeping promoters are located within zones with high densities of promoter-like signals in Escherichia coli, and we introduce strategies that allow for the correct computer prediction of sigma70 promoters. Based on 599 experimentall ..."
Abstract
-
Cited by 68 (11 self)
- Add to MetaCart
We present here a computational analysis showing that sigma70 house-keeping promoters are located within zones with high densities of promoter-like signals in Escherichia coli, and we introduce strategies that allow for the correct computer prediction of sigma70 promoters. Based on 599 experimentally verified promoters of E. coli K-12, we generated and evaluated more than 200 weight matrices optimizing different criteria to obtain the best recognition matrices. The alignments generating the best statistical models did not fully correspond with the canonical sigma70 model. However, matrices that correspond to such a canonical model per-formed better as tools for prediction. We tested the predictive capacity of these matrices on 250 bp long regions upstream of gene starts, where 90 % of the known promoters occur. The computational matrix models generated an average of 38 promoter-like signals within each 250 bp region. In more than 50 % of the cases, the true promoter does not have the best score within the region. We observed, in fact, that real promoters
ab initio prediction of transcription factor targets using structural knowledge
- PLoS Comput Biol
, 2005
"... Current approaches for identification and detection of transcription factor binding sites rely on an extensive set of known target genes. Here we describe a novel structure-based approach applicable to transcription factors with no prior binding data. Our approach combines sequence data and structur ..."
Abstract
-
Cited by 68 (1 self)
- Add to MetaCart
(Show Context)
Current approaches for identification and detection of transcription factor binding sites rely on an extensive set of known target genes. Here we describe a novel structure-based approach applicable to transcription factors with no prior binding data. Our approach combines sequence data and structural information to infer context-specific amino acid–nucleotide recognition preferences. These are used to predict binding sites for novel transcription factors from the same structural family. We demonstrate our approach on the Cys 2His 2 Zinc Finger protein family, and show that the learned DNA-recognition preferences are compatible with experimental results. We use these preferences to perform a genome-wide scan for direct targets of Drosophila melanogaster Cys 2His 2 transcription factors. By analyzing the predicted targets along with gene annotation and expression data we infer the function and activity of these proteins. Citation: Kaplan T, Friedman N, Margalit H (2005) Ab initio prediction of transcription factor targets using structural knowledge. PLoS Comp Biol 1(1): e1.
Virtual Footprint and PRODORIC: an integrative framework for regulon prediction in prokaryotes
, 2005
"... A new online framework for the accurate and integrative prediction of transcription factor binding sites (TFBSs) in prokaryotes was developed. The system consists of three interconnected modules: 1. The PRODORIC database as a comprehensive data source and extensive collection of TFBSs with correspon ..."
Abstract
-
Cited by 63 (11 self)
- Add to MetaCart
A new online framework for the accurate and integrative prediction of transcription factor binding sites (TFBSs) in prokaryotes was developed. The system consists of three interconnected modules: 1. The PRODORIC database as a comprehensive data source and extensive collection of TFBSs with corresponding position weight matrices (PWMs). 2. The pattern matching tool Virtual Footprint for the prediction of genome based regulons and for the analysis of individual promoter regions. 3. The interactive genome browser GBPro for the visualization of TFBS search results in their genomic context and links to gene and regulator-specific information in PRODORIC. The aim of this service is to provide researchers a free and easy to use collection of interconnected tools in the field of molecular microbiology, infection and systems biology.
What is bioinformatics? A proposed definition and overview of the field.
- Methods of Information in Medicine,
, 2001
"... ..."
Computational identification of transcriptional regulatory elements in DNA sequence
, 2006
"... Identification and annotation of all the functional elements in the genome, including genes and the regulatory sequences, is a fundamental challenge in genomics and computational biology. Since regulatory elements are frequently short and variable, their identification and discovery using computatio ..."
Abstract
-
Cited by 55 (0 self)
- Add to MetaCart
Identification and annotation of all the functional elements in the genome, including genes and the regulatory sequences, is a fundamental challenge in genomics and computational biology. Since regulatory elements are frequently short and variable, their identification and discovery using computational algorithms is difficult. However, significant advances have been made in the computational methods for modeling and detection of DNA regulatory elements. The availability of complete genome sequence from multiple organisms, as well as mRNA profiling and high-throughput experimental methods for mapping protein-binding sites in DNA, have contributed to the development of methods that utilize these auxiliary data to inform the detection of transcriptional regulatory elements. Progress is also being made in the identification of cis-regulatory modules and higher order structures of the regulatory sequences, which is essential to the understanding of transcription regulation in the metazoan genomes. This article reviews the computational approaches for modeling and identification of genomic regulatory elements, with an emphasis on the recent developments, and current challenges.
Bayesian sparse hidden components analysis for transcription regulation networks
, 2005
"... Motivation: In systems like E. Coli, the abundance of sequence information, gene expression array studies, and small scale experiments allows one to reconstruct the regulatory network and to quantify the effects of transcription factors on gene expression. However, this goal can only be achieved if ..."
Abstract
-
Cited by 40 (1 self)
- Add to MetaCart
Motivation: In systems like E. Coli, the abundance of sequence information, gene expression array studies, and small scale experiments allows one to reconstruct the regulatory network and to quantify the effects of transcription factors on gene expression. However, this goal can only be achieved if all information sources are used in concert. Results: Our method integrates literature information, DNA sequences, and expression arrays. A set of relevant transcription factors is defined on the basis of literature. Sequence data is used to identify potential target genes and the results are used to define a prior distribution on the topology of the regulatory network. A Bayesian hidden component model for the expression array data allows us to identify which of the potential binding sites are actually used by the regulatory proteins in the studied cell conditions, the strength of their control, and their activation profile in a series of experiments. We apply our methodology to 35 expression studies in E. Coli with convincing results. Availability: www.genetics.ucla.edu/labs/sabatti/software.html