Results 1 - 10
of
33
Improvement of the A* Algorithm for Multiple Sequence Alignment
"... The alignment problem of DNA or protein sequences is very applicable and importantinvarious fields of molecular biology. This problem can be reduced to the shortest path problem and Ikeda and Imai [4] showed that the A 3 algorithm works efficiently with the estimator utilizing all 2-dimensional su ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
The alignment problem of DNA or protein sequences is very applicable and importantinvarious fields of molecular biology. This problem can be reduced to the shortest path problem and Ikeda and Imai [4] showed that the A 3 algorithm works efficiently with the estimator utilizing all 2-dimensional sub-alignments. In this paper we present new powerful estimators utilizing k 3 dimensional sub-alignments, and propose a new bounding technique using V1 , a set of vertices in the paths whose lengths are at most 1 longer than the shortest path. We also extend our algorithm to a recursive-estimate version. These algorithms become more efficient when the number of sequences increase, or the similarity among sequences is lower. 1
Approximation Algorithms for Multiple Sequence Alignment Under a Fixed Evolutionary Tree
, 1995
"... . We consider the problem of aligning sequences related by a given evolutionary tree: given a fixed tree with its leaves labeled with sequences, find ancestral sequences to label the internal nodes so as to minimize the total cost of all the edges in the tree. The cost of an edge is the edit distanc ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
. We consider the problem of aligning sequences related by a given evolutionary tree: given a fixed tree with its leaves labeled with sequences, find ancestral sequences to label the internal nodes so as to minimize the total cost of all the edges in the tree. The cost of an edge is the edit distance between the sequences labeling its endpoints. In this paper, we consider the case when the given tree is a regular d-ary tree for some fixed d and provide a d+1 d01 -approximation algorithm for this problem that runs in time O(d(2kn) d +n 2 k 2d ) where k is the number of leaves in the tree and n is the maximum length of any of the sequences labeling the leaves. We also consider a new bottleneck objective in labeling the internal nodes. In this version, we wish to find the labeling of the internal nodes that minimizes the maximum cost of any edge in the tree. For this problem we provide a simple 2ffi + 1-approximation algorithm where ffi is the depth of the given undirected tree def...
A polyhedral approach to sequence alignment problems
- Discrete Appl. Math
, 2000
"... We study two new problems in sequence alignment both from a practical and a theoretical view, using tools from combinatorial optimization to develop branchand-cut algorithms. The Generalized Maximum Trace formulation captures several forms of multiple sequence alignment problems in a common framewor ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
We study two new problems in sequence alignment both from a practical and a theoretical view, using tools from combinatorial optimization to develop branchand-cut algorithms. The Generalized Maximum Trace formulation captures several forms of multiple sequence alignment problems in a common framework, among them the original formulation of Maximum Trace. The RNA Sequence Alignment Problem captures the comparison of RNA molecules on the basis of their primary sequence and their secondary structure. Both problems have a characterization in terms of graphs which we reformulate in terms of integer linear programming. We then study the polytopes (or convex hulls of all feasible solutions) associated with the integer linear program for both problems. For each polytope we derive several classes of facet-defining inequalities and show that for some of these classes the corresponding separation problem can be solved in polynomial time. This leads to a polynomial time algorithm for pairwise sequence alignment that is not based on dynamic programming. Moreover, for multiple sequences the branch-and-cut algorithms for both sequence alignment problems are able to solve to optimality instances that are beyond the range of present dynamic programming approaches.
The Practical Use of the A* Algorithm for Exact Multiple Sequence Alignment
- Journal of Computational Biology
, 1997
"... Multiple alignment is an important problem in computational biology. It is well known that it can be solved exactly by a dynamic programming algorithm which in turn can be interpreted as a shortest path computation in a directed acyclic graph. The A algorithm (or goal directed unidirectional search ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
Multiple alignment is an important problem in computational biology. It is well known that it can be solved exactly by a dynamic programming algorithm which in turn can be interpreted as a shortest path computation in a directed acyclic graph. The A algorithm (or goal directed unidirectional search) is a technique that speeds up the computation of a shortest path by transforming the edge lengths without losing the optimality of the shortest path. We implemented the A algorithm in a computer program similar to MSA [GKS95] and FMA [SI97b]. We incorporated in this program new bounding strategies for both, lower and upper bounds and show that the A algorithm, together with our improvements, can speed up computations considerably. Additionally we show that the A algorithm together with a standard bounding technique is superior to the well known Carillo-Lipman bounding since it excludes more nodes from consideration. 1 Introduction One of the most prominent problems in computational mo...
Evaluation Measures of Multiple Sequence Alignments
- J Comput Biol
, 1996
"... Multiple sequence alignments (MSAs) are frequently used in the study of families of protein sequences or DNA/RNA sequences. They are a fundamental tool for the understanding of structure, functionality and ultimately evolution of proteins. A new algorithm, the Circular Sum (CS) method, is present ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Multiple sequence alignments (MSAs) are frequently used in the study of families of protein sequences or DNA/RNA sequences. They are a fundamental tool for the understanding of structure, functionality and ultimately evolution of proteins. A new algorithm, the Circular Sum (CS) method, is presented for formally evaluating the quality of an MSA. It is based on the use of a solution to the Traveling Salesman Problem, which identifies a circular tour through an evolutionary tree connecting the sequences in a protein family. With this approach the calculation of an evolutionary tree and the errors that it would introduce can be avoided altogether. The algorithm gives an upper bound, the best score that can possibly be achieved by any MSA for a given set of protein sequences. Alternatively, if presented with a specific MSA, the algorithm provides a formal score for the MSA, which serves as an absolute measure of the quality of the MSA. The CS measure yields a direct connection be...
Aligning Alignments
- In 9-th Annual Symp. Combinatorial Pattern Matching, Volume 1448 of LNCS
, 1998
"... While the area of sequence comparison has a rich collection of results on the alignment of two sequences, and even the alignment of multiple sequences, there is little known about the alignment of two alignments. The problem becomes interesting when the alignment objective function counts gaps, as i ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
While the area of sequence comparison has a rich collection of results on the alignment of two sequences, and even the alignment of multiple sequences, there is little known about the alignment of two alignments. The problem becomes interesting when the alignment objective function counts gaps, as is common when aligning biological sequences, and has the form of the sum-of-pairs objective. We begin a thorough investigation of aligning two alignments under the sum-of-pairs objective with general linear gap costs when either of the two alignments are given in the form of a sequence (a degenerate alignment containing a single sequence) , a multiple alignment (containing two or more sequences), or a profile (a representation of a multiple alignment often used in computational biology). This leads to five problem variations, some of which arise in widely-used heuristics for multiple sequence alignment, and in assessing the relatedness of a sequence to a sequence family. For variations in w...
Near-Optimal Sequence Alignment
- Curr. Opin. Struct. Biol
, 1996
"... Introduction Aligning two short, similiar protein sequences by eye is an easy task. Teaching a machine the same skill turns out to be astonishingly difficult. Starting with the famous paper by Needleman and Wunsch [1] people have used dynamic programming algorithms to maximize the score associated ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Introduction Aligning two short, similiar protein sequences by eye is an easy task. Teaching a machine the same skill turns out to be astonishingly difficult. Starting with the famous paper by Needleman and Wunsch [1] people have used dynamic programming algorithms to maximize the score associated with an alignment. While this algorithm is sometimes perceived as complicated the definition of an optimal sequence alignment is not. It is, in fact, simple. For proteins, pairs of residues are attributed a similarity score. With the aid of gaps (modeling insertions and deletions) the sum of these scores has to be maximized while the number and length of the gaps has to remain within biologically reasonable limits. This is achieved by subtracting penalties for gaps from the score for the matches. In a sense, it is astonishing how much biological information can be obtained using such a simple approach. A high alignment score between two proteins usually implies that sequences are homo
Multiple sequence alignment with arbitrary gap costs: Computing an optimal solution using polyhedral combinatorics
, 2002
"... ..."
Probabilistic ancestral sequences and Multiple alignments
- In Fifth Scandinavian Workshop on Algorithm Theory, Reykjevik
, 1996
"... An evolutionary configuration (EC) is a set of aligned sequences of characters (possibly representing amino acids, DNA, RNA or natural language) . We define the probability of an EC, based on a given phylogenetic tree and give an algorithm to compute this probability efficiently. From these probabil ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
An evolutionary configuration (EC) is a set of aligned sequences of characters (possibly representing amino acids, DNA, RNA or natural language) . We define the probability of an EC, based on a given phylogenetic tree and give an algorithm to compute this probability efficiently. From these probabilities, we can compute the most likely sequence at any place in the phylogenetic tree, or its probability profile. The probability profile at the root of the tree is called the probabilistic ancestral sequence. By computing the probability of an EC, we can find by dynamic programming alignments over two subtrees. This gives an algorithm for computing multiple alignments. These multiple alignments are maximum likelihood, and are a compatible generalization of two sequence alignments. 1 Introduction In our view the main contribution of this paper is to introduce the computation of probabilities of a given evolutionary configuration (EC), that is a phylogenetic tree with sequences at its leaves...
Improving the Divide-and-Conquer Approach to Sum-of-Pairs Multiple Sequence Alignment
- Appl. Math. Lett
, 1997
"... We consider the problem of multiple sequence alignment: given k sequences of length at most n and a certain scoring function, find an alignment that minimizes the corresponding "sum of pairs" distance score. We generalize the divide-and-conquer technique described in [1,2], and present new ideas ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
We consider the problem of multiple sequence alignment: given k sequences of length at most n and a certain scoring function, find an alignment that minimizes the corresponding "sum of pairs" distance score. We generalize the divide-and-conquer technique described in [1,2], and present new ideas on how to use efficient search strategies for saving computer memory and accelerating the procedure for three or more sequences. Resulting running times and memory usage are shown for several test cases. Keywords---Multiple sequence alignment, Dynamic programming, Divide-and-conquer. 1. INTRODUCTION Multiple sequence alignment is an important problem in computational molecular biology, and many algorithms have been presented in this area of research (for a recent comparison see [3]). Since the problem of computing optimal alignments with respect to the "sum-of-pairs" criterion and most of its variants are NP-hard [4], many approximative algorithms have been proposed (e.g., [5--8]), Unfo...

