Results 1 - 10
of
54
Computational Identification of Cis-regulatory Elements Associated with Groups of Functionally Related Genes in . . .
- J. MOL. BIOL
, 2000
"... ... runs on randomly selected sets of genes and on sets of genes whose upstream regions contain known transcription factor binding sites serve as controls. ..."
Abstract
-
Cited by 153 (7 self)
- Add to MetaCart
... runs on randomly selected sets of genes and on sets of genes whose upstream regions contain known transcription factor binding sites serve as controls.
Regulatory Sequence Analysis Tools
- Nucleic Acids Res
, 2003
"... The web resource Regulatory Sequence Analysis Tools (RSAT) ..."
Abstract
-
Cited by 74 (6 self)
- Add to MetaCart
The web resource Regulatory Sequence Analysis Tools (RSAT)
Finding composite regulatory patterns in DNA sequences
- Bioinformatics
, 2002
"... Pattern discovery in unaligned DNA sequences is a fundamental problem in computational biology with important applications in finding regulatory signals. Current approaches to pattern discovery focus on monad patterns that correspond to relatively short contiguous strings. However, many of the actua ..."
Abstract
-
Cited by 59 (3 self)
- Add to MetaCart
Pattern discovery in unaligned DNA sequences is a fundamental problem in computational biology with important applications in finding regulatory signals. Current approaches to pattern discovery focus on monad patterns that correspond to relatively short contiguous strings. However, many of the actual regulatory signals are composite patterns that are groups of monad patterns that occur near each other. A difficulty in discovering composite patterns is that one or both of the component monad patterns in the group may be “too weak”. Since the traditional monad-based motif finding algorithms usually output one (or a few) high scoring patterns, they often fail to find composite regulatory signals consisting of weak monad parts. In this paper, we present a MITRA (MIsmatch TRee Algorithm) approach for discovering composite signals. We demonstrate that MITRA performs well for both monad and composite patterns by presenting experiments over biological and synthetic data. Availability: MITRA is available at
A Gibbs Sampling Method to Detect Over-Represented Motifs in the Upstream Regions of Co-Expressed Genes
, 2002
"... Microarray experiments can reveal important information about transcriptional regulation. ..."
Abstract
-
Cited by 53 (7 self)
- Add to MetaCart
Microarray experiments can reveal important information about transcriptional regulation.
Gibbs recursive sampler: finding transcription factor binding sites
- Nucleic Acids Res
, 2003
"... The Gibbs Motif Sampler is a software package for locating common elements in collections of biopolymer sequences. In this paper we describe a new variation of the Gibbs Motif Sampler, the Gibbs Recursive Sampler, which has been developed specifically for locating multiple transcription factor bindi ..."
Abstract
-
Cited by 37 (4 self)
- Add to MetaCart
The Gibbs Motif Sampler is a software package for locating common elements in collections of biopolymer sequences. In this paper we describe a new variation of the Gibbs Motif Sampler, the Gibbs Recursive Sampler, which has been developed specifically for locating multiple transcription factor binding sites for multiple transcription factors simultaneously in unaligned DNA sequences that may be heterogeneous in DNA composition. Here we describe the basic operation of the web-based version of this sampler. The sampler may be accessed at
Mining for Putative Regulatory Elements in the Yeast Genome Using Gene Expression Data
, 2000
"... We have developed a set of methods and tools for automatic discovery of putative regulatory signals in genome sequences. The analysis pipeline consists of gene expression data clustering, sequence pattern discovery from upstream sequences of genes, a control experiment for pattern significance thres ..."
Abstract
-
Cited by 34 (2 self)
- Add to MetaCart
We have developed a set of methods and tools for automatic discovery of putative regulatory signals in genome sequences. The analysis pipeline consists of gene expression data clustering, sequence pattern discovery from upstream sequences of genes, a control experiment for pattern significance threshold limit detection, selection of interesting patterns, grouping of these patterns, representing the pattern groups in a concise form and evaluating the discovered putative signals against existing databases of regulatory signals. The pattern discovery is computationally the most expensive and crucial step. Our tool performs a rapid exhaustive search for a priori unknown statistically significant sequence patterns of unrestricted length. The statistical significance is determined for a set of sequences in each cluster with respect to a set of background sequences allowing the detection of subtle regulatory signals specific for each cluster. The potentially large number of significant patterns is reduced to a small number of groups by clustering them by mutual similarity. Automatically derived consensus patterns of these groups represent the results in a comprehensive way for a human investigator.\\ We have performed a systematic analysis for the yeast {\it Saccharomyces cerevisiae}. We created a large number of independent clusterings of expression data simultaneously assessing the ``goodness'' of each cluster. For each of the over 52000 clusters acquired in this way we discovered significant patterns in the upstream sequences of respective genes. We selected nearly 1500 significant patterns by formal criteria and matched them against the experimentally mapped transcription factor binding sites in the SCPD database. We clustered the 1500 patterns to 62 groups for which we derived automatically alignments and consensus patterns. Of these 62 groups 48 had patterns that have matching sites in SCPD database.
Markovian Structures in Biological Sequence Alignments
- Journal of the American Statistical Association
, 1999
"... this article, we provide a coherent view of the two recent models used for multiple sequence alignment --- the hidden Markov model (HMM) and the block-based motif model --- in order to develop a set of new algorithms that enjoy both the sensitivity of the block-based model and the flexibility of the ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
this article, we provide a coherent view of the two recent models used for multiple sequence alignment --- the hidden Markov model (HMM) and the block-based motif model --- in order to develop a set of new algorithms that enjoy both the sensitivity of the block-based model and the flexibility of the HMM. In particular, we decompose the standard HMM into two components: the insertion component, which is captured by the socalled "propagation model," and the deletion component, which is described by a deletion vector. Such a decomposition serves as a basis for rational compromise between biological specificity and model flexibility. Furthermore, we introduce a Bayesian model selection criterion that --- in combination with the propagation model, genetic algorithm, and other computational aspects --- forms the core of PROBE, a multiple alignment and database search methodology (software available via anonymous ftp at ftp://ncbi.nlm.nih.gov/pub/neuwald/probe1.0). The application of our method to a GTPase family of protein sequences yields an alignment that is confirmed by comparison with known tertiary structures.
Methods and Statistics for Combining Motif Match Scores
- Journal of Computational Biology
, 1998
"... Position-specific scoring matrices are useful for representing and searching for protein sequence motifs. A sequence family can often be described by a group of one or more motifs, and an effective search must combine the scores for matching a sequence to each of the motifs in the group. We describe ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Position-specific scoring matrices are useful for representing and searching for protein sequence motifs. A sequence family can often be described by a group of one or more motifs, and an effective search must combine the scores for matching a sequence to each of the motifs in the group. We describe three methods for combining match scores and estimating the statistical significance of the combined scores and evaluate the search quality (classification accuracy) and the accuracy of the estimate of statistical significance of each. The three methods are: 1) sum of scores, 2) sum of reduced variates, 3) product of score pvalues. We show that method 3) is superior to the other two methods in both regards, and that combining motif scores indeed gives better search accuracy. The mast sequence homology search algorithm utilizing the product of p-values scoring method is available for interactive use and downloading at URL http://www.sdsc.edu/MEME. Keywords: protein sequence motifs, profiles,...
Whole-genome discovery of transcription factor binding sites by network-level conservation
- GENOME RES
, 2004
"... ..."

