Results 1 - 10
of
328
Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm
- BIOINFORMATICS
, 1998
"... Motivation: The discovery of motifs in biological sequences is an important problem. Results: This paper presents a new algorithm for the discovery of rigid patterns (motifs) in biological sequences. Our method is combinatorial in nature and able to produce all patterns that appear in at least a (us ..."
Abstract
-
Cited by 231 (14 self)
- Add to MetaCart
Motivation: The discovery of motifs in biological sequences is an important problem. Results: This paper presents a new algorithm for the discovery of rigid patterns (motifs) in biological sequences. Our method is combinatorial in nature and able to produce all patterns that appear in at least a (user-defined) minimum number of sequences, yet it manages to be very efficient by avoiding the enumeration of the entire pattern space. Furthermore, the reported patterns are maximal: any reported pattern cannot be made more specific and still keep on appearing at the exact same positions within the input sequences. The effectiveness of the proposed approach is showcased on a number of test cases which aim to: (i) validate the approach through the discovery of previously reported patterns; (ii) demonstrate the capability to identify automatically highly selective patterns particular to the sequences under consideration. Finally, experimental analysis indicates that the algorithm is output sensitive, i.e. its running time is quasi-linear to the size of the generated output.
Approaches to the Automatic Discovery of Patterns in Biosequences
, 1995
"... This paper is a survey of approaches and algorithms used for the automatic discovery of patterns in biosequences. Patterns with the expressive power in the class of regular languages are considered, and a classification of pattern languages in this class is developed, covering those patterns which a ..."
Abstract
-
Cited by 174 (21 self)
- Add to MetaCart
This paper is a survey of approaches and algorithms used for the automatic discovery of patterns in biosequences. Patterns with the expressive power in the class of regular languages are considered, and a classification of pattern languages in this class is developed, covering those patterns which are the most frequently used in molecular bioinformatics. A formulation is given of the problem of the automatic discovery of such patterns from a set of sequences, and an analysis presented of the ways in which an assessment can be made of the significance and usefulness of the discovered patterns. It is shown that this problem is related to problems studied in the field of machine learning. The largest part of this paper comprises a review of a number of existing methods developed to solve this problem and how these relate to each other, focusing on the algorithms underlying the approaches. A comparison is given of the algorithms, and examples are given of patterns that have been discovered...
3DCoffee: Combining protein sequences and structures within multiple sequence alignments
- J Mol Biol
, 2004
"... It has long been assumed that using structural information can increase the accuracy of multiple protein sequence alignments (MSA). 1 Recent results 2,3 suggest that accurate MSAs obtained this ..."
Abstract
-
Cited by 86 (12 self)
- Add to MetaCart
It has long been assumed that using structural information can increase the accuracy of multiple protein sequence alignments (MSA). 1 Recent results 2,3 suggest that accurate MSAs obtained this
Improving the Practical Space and Time Efficiency of the Shortest-Paths Approach to Sum-of-Pairs Multiple Sequence Alignment
, 1996
"... The MSA program, written and distributed in 1989, is one of the few existing programs that attempts to find optimal alignments of multiple protein or DNA sequences. The MSA program implements a branch-and-bound technique together with a variant of Dijkstra's shortest paths algorithm to prune th ..."
Abstract
-
Cited by 75 (4 self)
- Add to MetaCart
The MSA program, written and distributed in 1989, is one of the few existing programs that attempts to find optimal alignments of multiple protein or DNA sequences. The MSA program implements a branch-and-bound technique together with a variant of Dijkstra's shortest paths algorithm to prune the basic dynamic programming graph. We have made substantial improvements in the time and space usage of MSA. The improvements make feasible a variety of problem instances that were not feasible previously. On some runs we achieve an order of magnitude reduction in space usage and a significant multiplicative factor speedup in running time. To explain that these improvements work, we give a much more detailed description of MSA than has been previously available. In practice, MSA rarely produces a provably optimal alignment and we explain why.
Pure multiple RNA secondary structure alignments: a progressive profile approach
- IEEE/ACM Trans. Comput. Biol. Bioinform
"... Abstract—In functional, noncoding RNA, structure is often essential to function. While the full 3D structure is very difficult to determine, the 2D structure of an RNA molecule gives good clues to its 3D structure, and for molecules of moderate length, it can be predicted with good reliability. Stru ..."
Abstract
-
Cited by 73 (3 self)
- Add to MetaCart
(Show Context)
Abstract—In functional, noncoding RNA, structure is often essential to function. While the full 3D structure is very difficult to determine, the 2D structure of an RNA molecule gives good clues to its 3D structure, and for molecules of moderate length, it can be predicted with good reliability. Structure comparison is, in analogy to sequence comparison, the essential technique to infer related function. We provide a method for computing multiple alignments of RNA secondary structures under the tree alignment model, which is suitable to cluster RNA molecules purely on the structural level, i.e., sequence similarity is not required. We give a systematic generalization of the profile alignment method from strings to trees and forests. We introduce a tree profile representation of RNA secondary structure alignments which allows reasonable scoring in structure comparison. Besides the technical aspects, an RNA profile is a useful data structure to represent multiple structures of RNA sequences. Moreover, we propose a visualization of RNA consensus structures that is enriched by the full sequence information. Index Terms—Alignment of trees, RNA secondary structures, noncoding RNAs.
Algorithms for Phylogenetic Footprinting
- JOURNAL OF COMPUTATIONAL BIOLOGY
, 2001
"... Phylogenetic footprinting is a technique that identifies regulatory elements by finding unusually well conserved regions in a set of orthologous non-coding DNA sequences from multiple species. In an earlier paper, we presented an exact algorithm that identifies the most conserved region of a set of ..."
Abstract
-
Cited by 67 (3 self)
- Add to MetaCart
(Show Context)
Phylogenetic footprinting is a technique that identifies regulatory elements by finding unusually well conserved regions in a set of orthologous non-coding DNA sequences from multiple species. In an earlier paper, we presented an exact algorithm that identifies the most conserved region of a set of sequences. Here, we present a number of algorithmic improvements that produce a 1000 fold speedup over the original algorithm. We also show how prior knowledge can be used to identify weaker motifs, and how to handle data sets in which only an unknown subset of the sequences contain the regulatory element. Each technique is implemented and successfully identifies a large number of known binding sites, as well as several highly conserved but uncharacterized regions.
Web Data Mining
, 2006
"... Vol. 53: Soft Computing Approach to Pattern Recognition and Image Processing ..."
Abstract
-
Cited by 50 (3 self)
- Add to MetaCart
Vol. 53: Soft Computing Approach to Pattern Recognition and Image Processing
A novel method for multiple alignment of sequences with repeated and shuffled elements
, 2004
"... ..."
Matt: local flexibility aids protein multiple structure alignment
- PLoS Comput. Biol
, 2008
"... Even when there is agreement on what measure a protein multiple structure alignment should be optimizing, finding the optimal alignment is computationally prohibitive. One approach used by many previous methods is aligned fragment pair chaining, where short structural fragments from all the proteins ..."
Abstract
-
Cited by 45 (4 self)
- Add to MetaCart
(Show Context)
Even when there is agreement on what measure a protein multiple structure alignment should be optimizing, finding the optimal alignment is computationally prohibitive. One approach used by many previous methods is aligned fragment pair chaining, where short structural fragments from all the proteins are aligned against each other optimally, and the final alignment chains these together in geometrically consistent ways. Ye and Godzik have recently suggested that adding geometric flexibility may help better model protein structures in a variety of contexts. We introduce the program Matt (Multiple Alignment with Translations and Twists), an aligned fragment pair chaining algorithm that, in intermediate steps, allows local flexibility between fragments: small translations and rotations are temporarily allowed to bring sets of aligned fragments closer, even if they are physically impossible under rigid body transformations. After a dynamic programming assembly guided by these ‘‘bent’ ’ alignments, geometric consistency is restored in the final step before the alignment is output. Matt is tested against other recent multiple protein structure alignment programs on the popular Homstrad and SABmark benchmark datasets. Matt’s global performance is competitive with the other programs on Homstrad, but outperforms the other programs on SABmark, a benchmark of multiple structure alignments of proteins with more distant homology. On both datasets, Matt demonstrates an ability to better align the ends of a-helices and b-strands, an important characteristic of any structure alignment program intended to help construct a structural template library for threading approaches to the inverse protein-folding