Results 1  10
of
75
Divideandconquer frontier search applied to optimal sequence alignment
 In National Conference on Artificial Intelligence (AAAI
, 2000
"... We present a new algorithm that reduces the space complexity of heuristic search. It is most e ective for problem spaces that grow polynomially with problem size, but contain large numbers of short cycles. For example, the problem of nding an optimal global alignment ofseveral DNA or aminoacid sequ ..."
Abstract

Cited by 53 (6 self)
 Add to MetaCart
We present a new algorithm that reduces the space complexity of heuristic search. It is most e ective for problem spaces that grow polynomially with problem size, but contain large numbers of short cycles. For example, the problem of nding an optimal global alignment ofseveral DNA or aminoacid sequences can be solved by nding a lowestcost cornertocorner path in a ddimensional grid. A previous algorithm, called divideandconquer bidirectional search (Korf 1999), saves memory by storing only the Open lists and not the Closed lists. We show that this idea can be applied in a unidirectional search aswell. This extends the technique to problems where bidirectional search is not applicable, and is more e cient in both time and space than the bidirectional version. If n is the length of the strings, and d is the number of strings, this algorithm can reduce the memory requirement from O(n d) to O(n d;1). While our current implementation of DCFS is somewhat slower than existing dynamic programming approaches for optimal alignment of multiple gene sequences, DCFS is a more general algorithm 1
Improvement of the A* Algorithm for Multiple Sequence Alignment
"... The alignment problem of DNA or protein sequences is very applicable and importantinvarious fields of molecular biology. This problem can be reduced to the shortest path problem and Ikeda and Imai [4] showed that the A 3 algorithm works efficiently with the estimator utilizing all 2dimensional su ..."
Abstract

Cited by 31 (1 self)
 Add to MetaCart
The alignment problem of DNA or protein sequences is very applicable and importantinvarious fields of molecular biology. This problem can be reduced to the shortest path problem and Ikeda and Imai [4] showed that the A 3 algorithm works efficiently with the estimator utilizing all 2dimensional subalignments. In this paper we present new powerful estimators utilizing k 3 dimensional subalignments, and propose a new bounding technique using V1 , a set of vertices in the paths whose lengths are at most 1 longer than the shortest path. We also extend our algorithm to a recursiveestimate version. These algorithms become more efficient when the number of sequences increase, or the similarity among sequences is lower. 1
RASCAL: rapid scanning and correction of multiple sequence alignments
 Bioinformatics
, 2003
"... Motivation: Most multiple sequence alignment programs use heuristics that sometimes introduce errors into the alignment. The most commonly used methods to correct these errors use iterative techniques to maximize an objective function. We present here an alternative, knowledgebased approach that c ..."
Abstract

Cited by 30 (5 self)
 Add to MetaCart
(Show Context)
Motivation: Most multiple sequence alignment programs use heuristics that sometimes introduce errors into the alignment. The most commonly used methods to correct these errors use iterative techniques to maximize an objective function. We present here an alternative, knowledgebased approach that combines a number of recently developed methods into a twostep refinement process. The alignment is divided horizontally and vertically to form a ‘lattice ’ in which well aligned regions can be differentiated. Alignment correction is then restricted to the less reliable regions, leading to a more reliable and efficient refinement strategy. Results: The accuracy and reliability of RASCAL is demonstrated using: (i) alignments from the BAliBASE benchmark database, where significant improvements were often observed, with no deterioration of the existing highquality regions, (ii) a large scale study involving 946 alignments from the ProDom protein domain database, where alignment quality was increased in 68 % of the cases; and (iii) an automatic pipeline to obtain a highquality alignment of 695 fulllength nuclear receptor proteins, which took 11 min on a DEC Alpha 6100 computer. Availability: RASCAL is available at ftp://ftpigbmc. ustrasbg.fr/pub/RASCAL.
The Practical Use of the A* Algorithm for Exact Multiple Sequence Alignment
 Journal of Computational Biology
, 1997
"... Multiple alignment is an important problem in computational biology. It is well known that it can be solved exactly by a dynamic programming algorithm which in turn can be interpreted as a shortest path computation in a directed acyclic graph. The A algorithm (or goal directed unidirectional search ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
(Show Context)
Multiple alignment is an important problem in computational biology. It is well known that it can be solved exactly by a dynamic programming algorithm which in turn can be interpreted as a shortest path computation in a directed acyclic graph. The A algorithm (or goal directed unidirectional search) is a technique that speeds up the computation of a shortest path by transforming the edge lengths without losing the optimality of the shortest path. We implemented the A algorithm in a computer program similar to MSA [GKS95] and FMA [SI97b]. We incorporated in this program new bounding strategies for both, lower and upper bounds and show that the A algorithm, together with our improvements, can speed up computations considerably. Additionally we show that the A algorithm together with a standard bounding technique is superior to the well known CarilloLipman bounding since it excludes more nodes from consideration. 1 Introduction One of the most prominent problems in computational mo...
Approximation Algorithms for Multiple Sequence Alignment Under a Fixed Evolutionary Tree
, 1995
"... . We consider the problem of aligning sequences related by a given evolutionary tree: given a fixed tree with its leaves labeled with sequences, find ancestral sequences to label the internal nodes so as to minimize the total cost of all the edges in the tree. The cost of an edge is the edit distanc ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
. We consider the problem of aligning sequences related by a given evolutionary tree: given a fixed tree with its leaves labeled with sequences, find ancestral sequences to label the internal nodes so as to minimize the total cost of all the edges in the tree. The cost of an edge is the edit distance between the sequences labeling its endpoints. In this paper, we consider the case when the given tree is a regular dary tree for some fixed d and provide a d+1 d01 approximation algorithm for this problem that runs in time O(d(2kn) d +n 2 k 2d ) where k is the number of leaves in the tree and n is the maximum length of any of the sequences labeling the leaves. We also consider a new bottleneck objective in labeling the internal nodes. In this version, we wish to find the labeling of the internal nodes that minimizes the maximum cost of any edge in the tree. For this problem we provide a simple 2ffi + 1approximation algorithm where ffi is the depth of the given undirected tree def...
A polyhedral approach to sequence alignment problems
 DISCRETE APPL. MATH
, 2000
"... We study two new problems in sequence alignment both from a practical and a theoretical view, using tools from combinatorial optimization to develop branchandcut algorithms. The Generalized Maximum Trace formulation captures several forms of multiple sequence alignment problems in a common framewo ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
(Show Context)
We study two new problems in sequence alignment both from a practical and a theoretical view, using tools from combinatorial optimization to develop branchandcut algorithms. The Generalized Maximum Trace formulation captures several forms of multiple sequence alignment problems in a common framework, among them the original formulation of Maximum Trace. The RNA Sequence Alignment Problem captures the comparison of RNA molecules on the basis of their primary sequence and their secondary structure. Both problems have a characterization in terms of graphs which we reformulate in terms of integer linear programming. We then study the polytopes (or convex hulls of all feasible solutions) associated with the integer linear program for both problems. For each polytope we derive several classes of facetdefining inequalities and show that for some of these classes the corresponding separation problem can be solved in polynomial time. This leads to a polynomial time algorithm for pairwise sequence alignment that is not based on dynamic programming. Moreover, for multiple sequences the branchandcut algorithms for both sequence alignment problems are able to solve to optimality instances that are beyond the range of present dynamic programming approaches.
A General Method for Fast Multiple Sequence Alignment
, 1996
"... We have developed a fast heuristic algorithm for multiple sequence alignment which provides neartooptimal results for sufficiently homologous sequences. The algorithm makes use of the standard dynamic programming procedure by applying it to all pairs of sequences. The resulting score matrices ..."
Abstract

Cited by 20 (10 self)
 Add to MetaCart
We have developed a fast heuristic algorithm for multiple sequence alignment which provides neartooptimal results for sufficiently homologous sequences. The algorithm makes use of the standard dynamic programming procedure by applying it to all pairs of sequences. The resulting score matrices for pairwise alignment give rise to secondary matrices containing the additional charges imposed by forcing the alignment path to run through a particular vertex. Such a constraint corresponds to slicing the sequences at the positions defining that vertex, and aligning the remaining pairs of prefix and suffix sequences seperately. From these secondary matrices, one can compute  for any given family of sequences  suitable positions for cutting all of these sequences simultaneously, thus reducing the problem of aligning a family of n sequences of average length l in a Divide and Conquer fashion to aligning two families of n sequences of approximately half that length. In this pape...
Aligning Alignments
 In 9th Annual Symp. Combinatorial Pattern Matching, Volume 1448 of LNCS
, 1998
"... While the area of sequence comparison has a rich collection of results on the alignment of two sequences, and even the alignment of multiple sequences, there is little known about the alignment of two alignments. The problem becomes interesting when the alignment objective function counts gaps, as i ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
While the area of sequence comparison has a rich collection of results on the alignment of two sequences, and even the alignment of multiple sequences, there is little known about the alignment of two alignments. The problem becomes interesting when the alignment objective function counts gaps, as is common when aligning biological sequences, and has the form of the sumofpairs objective. We begin a thorough investigation of aligning two alignments under the sumofpairs objective with general linear gap costs when either of the two alignments are given in the form of a sequence (a degenerate alignment containing a single sequence) , a multiple alignment (containing two or more sequences), or a profile (a representation of a multiple alignment often used in computational biology). This leads to five problem variations, some of which arise in widelyused heuristics for multiple sequence alignment, and in assessing the relatedness of a sequence to a sequence family. For variations in w...
Evaluation Measures of Multiple Sequence Alignments
 J Comput Biol
, 1996
"... Multiple sequence alignments (MSAs) are frequently used in the study of families of protein sequences or DNA/RNA sequences. They are a fundamental tool for the understanding of structure, functionality and ultimately evolution of proteins. A new algorithm, the Circular Sum (CS) method, is present ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
(Show Context)
Multiple sequence alignments (MSAs) are frequently used in the study of families of protein sequences or DNA/RNA sequences. They are a fundamental tool for the understanding of structure, functionality and ultimately evolution of proteins. A new algorithm, the Circular Sum (CS) method, is presented for formally evaluating the quality of an MSA. It is based on the use of a solution to the Traveling Salesman Problem, which identifies a circular tour through an evolutionary tree connecting the sequences in a protein family. With this approach the calculation of an evolutionary tree and the errors that it would introduce can be avoided altogether. The algorithm gives an upper bound, the best score that can possibly be achieved by any MSA for a given set of protein sequences. Alternatively, if presented with a specific MSA, the algorithm provides a formal score for the MSA, which serves as an absolute measure of the quality of the MSA. The CS measure yields a direct connection be...