Results 1 - 10
of
88
mreps: efficient and flexible detection of tandem repeats in dna
- Nucleic Acids Res
, 2003
"... The presence of repeated sequences is a fundamental feature of genomes. Tandemly repeated DNA appears in both eukaryotic and prokaryotic genomes, it is associated with various regulatory mechanisms and plays an important role in genomic fingerprinting. In this paper, we describe mreps, a powerful so ..."
Abstract
-
Cited by 95 (3 self)
- Add to MetaCart
(Show Context)
The presence of repeated sequences is a fundamental feature of genomes. Tandemly repeated DNA appears in both eukaryotic and prokaryotic genomes, it is associated with various regulatory mechanisms and plays an important role in genomic fingerprinting. In this paper, we describe mreps, a powerful software tool for a fast identification of tandemly repeated structures in DNA sequences. mreps is able to identify all types of tandem repeats within a single run on a whole genomic sequence. It has a resolution parameter that allows the program to identify ‘fuzzy ’ repeats. We introduce main algorithmic solutions behind mreps, describe its usage, give some execution time benchmarks and present several case studies to illustrate its capabilities. The mreps web interface is accessible through
Incremental String Comparison
- SIAM JOURNAL ON COMPUTING
, 1995
"... The problem of comparing two sequences A and B to determine their LCS or the edit distance between them has been much studied. In this paper we consider the following incremental version of these problems: given an appropriate encoding of a comparison between A and B, can one incrementally compute t ..."
Abstract
-
Cited by 49 (2 self)
- Add to MetaCart
The problem of comparing two sequences A and B to determine their LCS or the edit distance between them has been much studied. In this paper we consider the following incremental version of these problems: given an appropriate encoding of a comparison between A and B, can one incrementally compute the answer for A and bB, and the answer for A and Bb with equal efficiency, where b is an additional symbol? Our main result is a theorem exposing a surprising relationship between the dynamic programming solutions for two such "adjacent" problems. Given a threshold k on the number of differences to be permitted in an alignment, the theorem leads directly to an O(k) algorithm for incrementally computing a new solution from an old one, as contrasts the O(k²) time required to compute a solution from scratch. We further show with a series of applications that this algorithm is indeed more powerful than its non-incremental counterpart by solving the applications with greater asymptotic ef...
Linear Time Algorithms for Finding and Representing all the Tandem Repeats in a String
- TREES, AND SEQUENCES: COMPUTER SCIENCE AND COMPUTATIONAL BIOLOGY
, 1998
"... A tandem repeat (or square) is a string ffff, where ff is a nonempty string. We present an O(jSj)-time algorithm that operates on the suffix tree T (S) for a string S, finding and marking the endpoint in T (S) of every tandem repeat that occurs in S. This decorated suffix tree implicitly represents ..."
Abstract
-
Cited by 42 (2 self)
- Add to MetaCart
(Show Context)
A tandem repeat (or square) is a string ffff, where ff is a nonempty string. We present an O(jSj)-time algorithm that operates on the suffix tree T (S) for a string S, finding and marking the endpoint in T (S) of every tandem repeat that occurs in S. This decorated suffix tree implicitly represents all occurrences of tandem repeats in S, and can be used to efficiently solve many questions concerning tandem repeats and tandem arrays in S. This improves and generalizes several prior efforts to efficiently capture large subsets of tandem repeats.
Reconstructing a History of Recombinations From a Set of Sequences
- Discrete Appl. Math
, 1998
"... One of the classic problems in computational biology is the reconstruction of evolutionary history. A recent trend in the area is to increase the explanatory power of the models that are considered by incorporating higher-order evolutionary events that more accurately reflect the mechanisms of mutat ..."
Abstract
-
Cited by 38 (6 self)
- Add to MetaCart
(Show Context)
One of the classic problems in computational biology is the reconstruction of evolutionary history. A recent trend in the area is to increase the explanatory power of the models that are considered by incorporating higher-order evolutionary events that more accurately reflect the mechanisms of mutation at the level of the chromosome. We take a step in this direction by considering the problem of reconstructing an evolutionary history for a set of genetic sequences that have evolved by recombination. Recombination is a non-tree-like event that produces a child sequence by crossing two parent sequences. We present polynomial-time algorithms for reconstructing a parsimonious history of such events for several models of recombination when all sequences, including those of ancestors, are present in the input. We also show that these models appear to be near the limit of what can be solved in polynomial time, in that several natural generalizations are NP-complete. Keywords Computational bio...
The emergence of pattern discovery techniques in computational biology
- Metabolic Engineering
, 2000
"... In the past few years, pattern discovery has been emerging as a generic tool of choice for tackling problems from the computational biology domain. In this presentation, and after defining the problem in its generality, we review some of the algorithms that have appeared in the literature and descri ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
(Show Context)
In the past few years, pattern discovery has been emerging as a generic tool of choice for tackling problems from the computational biology domain. In this presentation, and after defining the problem in its generality, we review some of the algorithms that have appeared in the literature and describe several applications of pattern discovery to problems from computational biology. 2000 Academic Press 1.
Theoretical and practical improvements on the RMQ-problem, with applications to LCA and LCE
- PROC. CPM. VOLUME 4009 OF LNCS
, 2006
"... The Range-Minimum-Query-Problem is to preprocess an array such that the position of the minimum element between two specified indices can be obtained efficiently. We present a direct algorithm for the general RMQ-problem with linear preprocessing time and constant query time, without making use of ..."
Abstract
-
Cited by 34 (9 self)
- Add to MetaCart
(Show Context)
The Range-Minimum-Query-Problem is to preprocess an array such that the position of the minimum element between two specified indices can be obtained efficiently. We present a direct algorithm for the general RMQ-problem with linear preprocessing time and constant query time, without making use of any dynamic data structure. It consumes less than half of the space that is needed by the method by Berkman and Vishkin. We use our new algorithm for RMQ to improve on LCA-computation for binary trees, and further give a constant-time LCE-algorithm solely based on arrays. Both LCA and LCE have important applications, e.g., in computational biology. Experimental studies show that our new method is almost twice as fast in practice as previous approaches, and asymptotically slower variants of the constant-time algorithms perform even better for today’s common problem sizes.
An Algorithm For Locating Non-Overlapping Regions Of Maximum Alignment Score
- SIAM J. Comput
, 1993
"... . In this paper we present an O(N 2 log 2 N) algorithm for finding the two non-overlapping substrings of a given string of length N which have the highest-scoring alignment between them. This significantly improves the previously best known bound of O(N 3 ) for the worst-case complexity of thi ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
. In this paper we present an O(N 2 log 2 N) algorithm for finding the two non-overlapping substrings of a given string of length N which have the highest-scoring alignment between them. This significantly improves the previously best known bound of O(N 3 ) for the worst-case complexity of this problem. One of the central ideas in the design of this algorithm is that of partitioning a matrix into pieces in such a way that all submatrices of interest for this problem can be put together as the union of very few of these pieces. Other ideas include the use of candidate-lists, an application of the ideas of Apostolico et al.[1] to our problem domain, and divide and conquer techniques. 1. Introduction. Let A = a 1 a 2 :::a N be a sequence of length N , and let A[p::q] denote the substring a p a p+1 :::a q of A. The problem we consider is that of finding the score of the best alignment between two substrings A[p::q] and A[r::s] under the the generalized Levenshtein model of alignmen...
Finding maximal pairs with bounded gap
- Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 1645 of Lecture Notes in Computer Science
, 1999
"... A pair in a string is the occurrence of the same substring twice. A pair is maximal if the two occurrences of the substring cannot be extended to the left and right without making them different. The gap of a pair is the number of characters between the two occurrences of the substring. In this pape ..."
Abstract
-
Cited by 26 (4 self)
- Add to MetaCart
A pair in a string is the occurrence of the same substring twice. A pair is maximal if the two occurrences of the substring cannot be extended to the left and right without making them different. The gap of a pair is the number of characters between the two occurrences of the substring. In this paper we present methods for finding all maximal pairs under various constraints on the gap. In a string of length n we can find all maximal pairs with gap in an upper and lower bounded interval in time O(n log n + z) where z is the number of reported pairs. If the upper bound is removed the time reduces to O(n+z). Since a tandem repeat is a pair where the gap is zero, our methods can be seen as a generalization of finding tandem repeats. The running time of our methods equals the running time of well known methods for finding tandem repeats.
Sequence Alignment with Tandem Duplication
- J. Comp. Biol
, 1997
"... Algorithm development for comparing and aligning biological sequences has, until recently, been based on the SI model of mutational events which assumes that modi#cation of sequences proceeds through any of the operations of substitution, insertion or deletion #the latter two collectively termed i ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
(Show Context)
Algorithm development for comparing and aligning biological sequences has, until recently, been based on the SI model of mutational events which assumes that modi#cation of sequences proceeds through any of the operations of substitution, insertion or deletion #the latter two collectively termed indels#.