• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Modeling within-motif dependence for transcription factor binding site predictions. (2004)

by Q Zhou, J S Liu
Venue:Bioinformatics,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 70
Next 10 →

Computational identification of transcriptional regulatory elements in DNA sequence

by Debraj GuhaThakurta , 2006
"... Identification and annotation of all the functional elements in the genome, including genes and the regulatory sequences, is a fundamental challenge in genomics and computational biology. Since regulatory elements are frequently short and variable, their identification and discovery using computatio ..."
Abstract - Cited by 55 (0 self) - Add to MetaCart
Identification and annotation of all the functional elements in the genome, including genes and the regulatory sequences, is a fundamental challenge in genomics and computational biology. Since regulatory elements are frequently short and variable, their identification and discovery using computational algorithms is difficult. However, significant advances have been made in the computational methods for modeling and detection of DNA regulatory elements. The availability of complete genome sequence from multiple organisms, as well as mRNA profiling and high-throughput experimental methods for mapping protein-binding sites in DNA, have contributed to the development of methods that utilize these auxiliary data to inform the detection of transcriptional regulatory elements. Progress is also being made in the identification of cis-regulatory modules and higher order structures of the regulatory sequences, which is essential to the understanding of transcription regulation in the metazoan genomes. This article reviews the computational approaches for modeling and identification of genomic regulatory elements, with an emphasis on the recent developments, and current challenges.

On the detection and refinement of transcription factor binding sites using ChIP-Seq data,”

by Ming Hu , Jindan Yu , Jeremy M G Taylor , Arul M Chinnaiyan , Zhaohui S Qin - Nucleic Acids Research, , 2010
"... ABSTRACT Coupling chromatin immunoprecipitation (ChIP) with recently developed massively parallel sequencing technologies has enabled genome-wide detection of protein-DNA interactions with unprecedented sensitivity and specificity. This new technology, ChIP-Seq, presents opportunities for in-depth ..."
Abstract - Cited by 29 (2 self) - Add to MetaCart
ABSTRACT Coupling chromatin immunoprecipitation (ChIP) with recently developed massively parallel sequencing technologies has enabled genome-wide detection of protein-DNA interactions with unprecedented sensitivity and specificity. This new technology, ChIP-Seq, presents opportunities for in-depth analysis of transcription regulation. In this study, we explore the value of using ChIP-Seq data to better detect and refine transcription factor binding sites (TFBS). We introduce a novel computational algorithm named Hybrid Motif Sampler (HMS), specifically designed for TFBS motif discovery in ChIP-Seq data. We propose a Bayesian model that incorporates sequencing depth information to aid motif identification. Our model also allows intra-motif dependency to describe more accurately the underlying motif pattern. Our algorithm combines stochastic sampling and deterministic 'greedy' search steps into a novel hybrid iterative scheme. This combination accelerates the computation process. Simulation studies demonstrate favorable performance of HMS compared to other existing methods. When applying HMS to real ChIP-Seq datasets, we find that (i) the accuracy of existing TFBS motif patterns can be significantly improved; and (ii) there is significant intra-motif dependency inside all the TFBS motifs we tested; modeling these dependencies further improves the accuracy of these TFBS motif patterns. These findings may offer new biological insights into the mechanisms of transcription factor regulation.
(Show Context)

Citation Context

...epresentation of motifs that allows arbitrary dependencies among positions. Barash et al. (43) suggested multiple Bayesian network models to represent dependencies among motif positions. Zhou and Liu =-=(44)-=- proposed a generalized weight matrix model in which a 16-component multinomial model is used to model two dependent positions jointly. 2156 Nucleic Acids Research, 2010, Vol. 38, No. 7sat Pennsylvani...

GAME: detecting cis-regulatory elements using a genetic algorithm

by Zhi Wei , Shane T Jensen - Bioinformatics , 2006
"... ABSTRACT Motivation: Identification of a transcription factor binding sites is an important aspect of the analysis of genetic regulation. Many programs have been developed for the de novo discovery of a binding motif (collection of binding sites). Recently, a scoring function formulation was derive ..."
Abstract - Cited by 20 (0 self) - Add to MetaCart
ABSTRACT Motivation: Identification of a transcription factor binding sites is an important aspect of the analysis of genetic regulation. Many programs have been developed for the de novo discovery of a binding motif (collection of binding sites). Recently, a scoring function formulation was derived that allows for the comparison of discovered motifs from different programs . A simple program, BioOptimizer, was proposed in ) that improved discovered motifs by optimizing a scoring function. However, BioOptimizer is a very simple algorithm that can only make local improvements upon an already discovered motif and so BioOptimizer can only be used in conjunction with other motif-finding software. Results: We introduce software, GAME, which utilizes a genetic algorithm to find optimal motifs in DNA sequences. GAME evolves motifs with high fitness from a population of randomly-generated starting motifs, which eliminates the reliance on additional motif-finding programs. In addition to using standard genetic operations, GAME also incorporates two additional operators that are specific to the motif discovery problem. We demonstrate the superior performance of GAME compared to MEME, BioProspector and BioOptimizer in simulation studies as well as several real data applications where we use an extended version of the GAME algorithm that allows the motif width to be unknown.

Extracting sequence features to predict protein-DNA interactions: A comparative study

by Qing Zhou, Jun S. Liu - Nucleic Acids Research , 2008
"... Predicting how and where proteins, especially transcription factors (TFs), interact with DNA is an important problem in biology. We present here a systematic study of predictive modeling approaches to the TF-DNA binding problem, which have been frequently shown to be more efficient than those method ..."
Abstract - Cited by 15 (3 self) - Add to MetaCart
Predicting how and where proteins, especially transcription factors (TFs), interact with DNA is an important problem in biology. We present here a systematic study of predictive modeling approaches to the TF-DNA binding problem, which have been frequently shown to be more efficient than those methods only based on position-specific weight matrices (PWMs). In these approaches, a statistical relationship between genomic sequences and gene expression or ChIPbinding intensities is inferred through a regression framework; and influential sequence features are identified by variable selection. We examine a few state-of-the-art learning methods including stepwise linear regression, multivariate adaptive regression splines (MARS), neural networks, support vector machines, boosting, and Bayesian additive regression trees (BART). These methods are applied to both simulated datasets and two whole-genome ChIP-chip datasets on the TFs Oct4 and Sox2, respectively, in human embryonic stem cells. We find that, with proper learning methods, predictive modeling approaches can significantly improve the predictive power and identify more biologically interesting features, such as TF-TF interactions, than the PWM approach. In particular, BART and boosting show the best and the most robust overall performance among all the methods.

CIS: Compound importance sampling method for protein-DNA binding site p-value estimation. Bioinformatics

by Y. Barash, G. Elidan, T. Kaplan, N. Friedman , 2004
"... Motivation: A key aspect of transcriptional regulation is the binding of transcription factors to sequence-specific binding sites that allow them to modulate the expression of nearby genes. Given models of such binding sites, one can scan regulatory regions for putative binding sites and construct a ..."
Abstract - Cited by 9 (1 self) - Add to MetaCart
Motivation: A key aspect of transcriptional regulation is the binding of transcription factors to sequence-specific binding sites that allow them to modulate the expression of nearby genes. Given models of such binding sites, one can scan regulatory regions for putative binding sites and construct a genome-wide regulatory network. In such genome-wide scans, it is crucial to control the amount of false positive predictions. Recently, several works demonstrated the benefits of modeling dependencies between positions within the binding site. Yet, computing the statistical significance of putative binding sites in this scenario remains a challenge. Results: We present a general, accurate and efficient method for computing p-values of putative binding sites that is applicable to a large class of probabilistic binding site and background models. We demonstrate the accuracy of the method on synthetic and real-life data. Availability: The procedure for scanning DNA sequences and computing the statistical significance of putative binding site scores is available upon request at
(Show Context)

Citation Context

...mportance of modeling transcription factor binding sites using probabilistic models that allow for inner-dependencies within the positions of a binding site (Barash et al., 2003; King and Roth, 2003; =-=Zhou and Liu, 2004-=-). Such models provide richer representations of the binding preferences of a transcription factor, and consequently can better discriminate sites that match these preferences. This can lead to more a...

Hannenhalli S: Generalizations of Markov model to characterize biological sequences

by J Wang - BMC Bioinformatics
"... ..."
Abstract - Cited by 7 (2 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

... cited. algorithm significantly improved protein sequence alignment [3]. Similarly one can better predict the transcription factor binding sites by considering the interdependence between nucleotides =-=[4,5]-=-. Markov model (MM) is a statistical technique to model sequences such that the probability of a sequence element is based on a limited context preceding the element [6,7]. In other words, MM is a way...

Finding Regulatory Motifs with Maximum Density Subgraphs

by Eugene Fratkin, Brian Naughton, Douglas L. Brutlag, Serafim Batzoglou
"... The identification of over-represented but imperfectly conserved motifs in genomic DNA is a problem with important biological applications, such as the discovery of regulatory elements that determine the timing, location, and level of gene transcription. Experimental techniques such as ChIP-chip and ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
The identification of over-represented but imperfectly conserved motifs in genomic DNA is a problem with important biological applications, such as the discovery of regulatory elements that determine the timing, location, and level of gene transcription. Experimental techniques such as ChIP-chip and geneexpression
(Show Context)

Citation Context

... nucleotide at each position in the motif is independent of the nucleotides at other positions. Recent work has shown evidence for dependencies between positions in transcription factor binding sites =-=[19,23,42]-=-. Zhou and Liu found evidence for statistically significant dependencies in 25% of TRANSFAC motifs [19]. Our analysis of yeast and multicellular eukaryotic motifs confirms this (results not shown). Fi...

Hybrid Gibbs-Sampling Algorithm for Challenging Motif Discovery: GibbsDST

by Kazuhito Shida
"... The di culties of computational discovery of transcription factor binding sites (TFBS) are well represented by (l, d) planted motif challenge problems. Large d problems are di cult, particularly for pro le-based motif discovery algorithms. Their local search in the pro le space is apparently incompa ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
The di culties of computational discovery of transcription factor binding sites (TFBS) are well represented by (l, d) planted motif challenge problems. Large d problems are di cult, particularly for pro le-based motif discovery algorithms. Their local search in the pro le space is apparently incompatible with subtle motifs and large mutational distances between the motif occurrences. Herein, an improved pro le-based method called GibbsDST is described and tested on (15,4), (12,3), and (18,6) challenging problems. For the rst time for a pro le-based method, its performance in motif challenge problems is comparable to that of Random Projection. It is noteworthy that GibbsDST outperforms a pattern-based algorithm, WINNOWER, in some cases. E ectiveness of GibbsDST using a biological dataset as an example and its possible extension to more realistic evolution models are also introduced.
(Show Context)

Citation Context

...e original score function. Many features of alignments can be discouraged or encouraged in the sampling phase and its score function. One good example of such a feature is the within-motif dependence =-=[16]-=-: the sites in a motif showing correlated patterns of mutations. Of course, detailed and precise description of within-motif dependence demands a more elaborate model such as a Bayesian network. Howev...

doi:10.1093/nar/gkl816 KBERG: KnowledgeBase for Estrogen Responsive Genes

by Suisheng Tang, Zhuo Zhang, Sin Lam Tan, Man-hung Eric Tang, Arun Prashanth Kumar, Suresh Kumar Ramadoss, Vladimir B. Bajic , 2006
"... Estrogen has a profound impact on human physiology affecting transcription of numerous genes. To decipher functional characteristics of estrogen responsive genes, we developed KnowledgeBase for Estrogen Responsive Genes (KBERG). Genes in KBERG were derived from Estrogen Responsive Gene Database (ERG ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Estrogen has a profound impact on human physiology affecting transcription of numerous genes. To decipher functional characteristics of estrogen responsive genes, we developed KnowledgeBase for Estrogen Responsive Genes (KBERG). Genes in KBERG were derived from Estrogen Responsive Gene Database (ERGDB) and were analyzed from multiple aspects. We explored the possible transcription regulation mechanism by capturing highly conserved promoter motifs across orthologous genes, using promoter regions that cover the range of [ 1200, +500] relative to the transcription start sites. The motif detection is based on ab initio discovery of common cis-elements from the orthologous gene cluster from human, mouse and rat, thus reflecting a degree of promoter sequence preservation during evolution. The identified motifs are linked to transcription factor binding sites based on the TRANSFAC database. In addition, KBERG uses two established ontology systems, GO and eVOC, to associate genes with their function. Users may assess gene functionality through the description terms in GO. Alternatively, they can gain gene co-expression information through evidence from human EST libraries via eVOC. KBERG is a userfriendly system that provides links to other relevant resources such as ERGDB, UniGene, Entrez Gene, HomoloGene, GO, eVOC and GenBank, and thus offers a platform for functional exploration and potential annotation of genes responsive to estrogen. KBERG database can be accessed at
(Show Context)

Citation Context

...ort motif length, high degeneration and flexible distance location. Position weight matrix (PWM) approach has been widely used in TFBS detection although high false positive rate is the main drawback =-=(9)-=-. Comparative genomic approach that searches for highly preserved motifs across different species has been applied to improve the accuracy of prediction (10). The method has been corroborated in a gen...

Bayesian Modeling for High Throughput Genomic Data

by Ming Hu, Doctoral Committee , 2010
"... by ..."
Abstract - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...epresentation of motifs that allows arbitrary dependencies among positions.sBarash et al. (92) suggested multiple Bayesian network models to represent dependenciessamong motif positions. Zhou and Liu =-=(93)-=- proposed a generalized weight matrix model inswhich a 16-component multinomial model is used to model two dependent positionssjointly.sHere we extend the generalized weight matrix model of Zhou and L...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University