Results 1 
8 of
8
Linear programming for phylogenetic reconstruction based on gene rearrangements
 In Proc. 16th Ann. Symp. Combin. Pattern Matching (CPM’05), Lecture
"... Abstract. Phylogenetic reconstruction from gene rearrangements has attracted increasing attention from biologists and computer scientists over the last few years. Methods used in reconstruction include distancebased methods, parsimony methods using sequencebased encodings, and direct optimization. ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Phylogenetic reconstruction from gene rearrangements has attracted increasing attention from biologists and computer scientists over the last few years. Methods used in reconstruction include distancebased methods, parsimony methods using sequencebased encodings, and direct optimization. The latter, pioneered by Sankoff and extended by us with the software suite GRAPPA, is the most accurate approach, but has been limited to small genomes because the running time of its scoring algorithm grows exponentially with the number of genes in the genome. We report here on a new method to compute a tight lower bound on the score of a given tree, using a set of linear constraints generated through selective applications of the triangle inequality (in the spirit of GESTALT). Our method generates an integer linear program with a carefully limited number of constraints, rapidly solves its relaxed version, and uses the result to provide a tight lower bound. Since this bound is very close to the optimal tree score, it can be used directly as a selection criterion, thereby enabling us to bypass entirely the expensive scoring procedure. We have implemented this method within our GRAPPA software and run several series of experiments on both biological and simulated datasets to assess its accuracy. Our results show that using the bound as a selection criterion yields excellent trees, with error rates below 5 % up to very large evolutionary distances, consistently beating the baseline NeighborJoining. Our new method enables us to extend the range of applicability of the direct optimization method to chromosomes of size comparable to those of bacteria, as well as to datasets with complex combinations of evolutionary events. 1
Lower Bounds for Maximum Parsimony with Gene Order Data
 RECOMB Comparative Genomics
"... Abstract. In this paper, we study lower bound techniques for branchandbound algorithms for maximum parsimony, with a focus on gene order data. We give a simple O(n3) time dynamic programming algorithm for computing the maximum circular ordering lower bound. The wellknown gene order phylogeny pro ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper, we study lower bound techniques for branchandbound algorithms for maximum parsimony, with a focus on gene order data. We give a simple O(n3) time dynamic programming algorithm for computing the maximum circular ordering lower bound. The wellknown gene order phylogeny program, GRAPPA, currently implements the bruteforce exponential time algorithm and the Swapasyougo heuristic. Our experiments show a significant improvement over both these methods in practice. Next, we show that the linear programmingbased lower bound of Tang and Moret can be greatly simplified, allowing us to solve the LP very efficiently. Finally, we formalize the problem of computing the circular ordering lower bound, when the tree topologies are generated bottomup, as a PathConstrained Travelling Salesman Problem, and give a 3approximation algorithm for it. This is a special case of the more general PrecedenceConstrained Travelling Salesman Problem and has not previously been studied, to the best of our knowledge. 1
A DivideandConquer Implementation of Three Sequence Alignment and Ancestor Inference
"... In this paper, we present an algorithm to simultaneously align three biological sequences with affine gap model and infer their common ancestral sequence. Our algorithm can be further extended to perform tree alignment for more sequences, and eventually unify the two procedures of phylogenetic recon ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we present an algorithm to simultaneously align three biological sequences with affine gap model and infer their common ancestral sequence. Our algorithm can be further extended to perform tree alignment for more sequences, and eventually unify the two procedures of phylogenetic reconstruction and sequence alignment. The novelty of our algorithm is: it applies the divideandconquer strategy so that the memory usage is reduced from O(n 3) to O(n 2), while at the same time, it is based on dynamic programming and optimal alignment is guaranteed. Traditionally, three sequence alignment is limited by the huge demand of memory space and can only handle sequences less than two hundred characters long. With the new improved algorithm, we can produce the optimal alignment of sequences of several thousand characters long. We implemented our algorithm as a C program package MSAM. It has been extensively tested with BAliBASE, a real manually refined multiple sequence alignment database, as well as simulated datasets generated by Rose (Random Model of Sequence Evolution). We compared our results with those of other popular multiple sequence alignment tools, including the widely used programs such as ClustalW and TCoffee. The experiment shows that MSAM produces not only better alignment, but also better ancestral sequence. The software can be downloaded for free at
Evolutionary inference via the Poisson indel process
 Proc. Natl. Acad. Sci., 10.1073/pnas.1220450110
, 2012
"... We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of stringvalued evolutionary processes along the branches of a phylogenetic tree. The classical evoluti ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of stringvalued evolutionary processes along the branches of a phylogenetic tree. The classical evolutionary process, the TKF91 model [62], is a continuoustime Markov chain model comprised of insertion, deletion and substitution events. Unfortunately this model gives rise to an intractable computational problem—the computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa [39]. In this work, we present a new stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The new model is closely related to the TKF91 model, differing only in its treatment of insertions, but the new model has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared to separate inference of phylogenies and alignments.
Barking Up the Wrong Treelength: The Impact of Gap Penalty on Alignment and Tree Accuracy
, 2009
"... The current technique for estimating phylogenies from sequence data uses two phases: first, the sequences are aligned, and then the tree is estimated using the obtained alignment. More recently, however, several computational methods have been developed for simultaneous estimation of the alignment ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
The current technique for estimating phylogenies from sequence data uses two phases: first, the sequences are aligned, and then the tree is estimated using the obtained alignment. More recently, however, several computational methods have been developed for simultaneous estimation of the alignment and the tree, of which POY (a heuristic for the NPhard “minimum treelength” problem, which extends maximum parsimony (MP) so that gaps contribute to the cost) is the most popular. In a 2007 paper published in Systematic Biology, Ogden and Rosenberg reported on a simulation study in which they compared POY to the very simple twophase method of estimating the alignment using ClustalW and then analyzing the resultant alignment using MP. They found that in the overwhelming majority of the cases, ClustalW þ MP outperformed POY with respect to alignment and phylogenetic tree accuracy, and they concluded that simultaneous estimation techniques (collectively referred to as “Direct Optimization”) are not competitive with twophase techniques. Our paper presents a simulation study in which we take a closer look at the points raised by Ogden and Rosenberg. Instead of focusing specifically on POY, we focus on the NPhard optimization problem that POY addresses: minimizing treelength. Since this optimization depends upon the specific edit distance criterion used to score a tree, our study considers the impact of the gap penalty (in particular, affine versus simple) on the accuracy of the resultant alignment and tree that optimizes the treelength for that gap penalty function. Our study suggests that the poor performance observed for POY by Ogden and Rosenberg is due to the simple gap penalties they used to score alignment/tree pairs, but also suggests the intriguing possibility that optimizing under an affine gap penalty might produce alignments that are not only better than ClustalW alignments, but competitive with (or
An Efficient Heuristic for the Tree Alignment Problem
"... Abstract. Phylogeny and alignment estimation are two important and closely related biological problems. One approach to the alignment question is known in combinatorial optimization as the Tree Alignment Problem (TAP). A number of heuristic solutions exist, with the most competitive algorithms using ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Phylogeny and alignment estimation are two important and closely related biological problems. One approach to the alignment question is known in combinatorial optimization as the Tree Alignment Problem (TAP). A number of heuristic solutions exist, with the most competitive algorithms using iterative methods to find solutions. However, these methods are either nonscalable, the bound is too loose, or the time and memory complexity are difficult to predict. For the simplest distance functions, a widely used quick heuristic to find a solution to an instance of the problem is Direct Optimization (DO). Unfortunately, this algorithm has not been formally described, and it is currently unable to find correct solutions under an edition distance function called affine indel cost, of great importance for biologists analyzing DNA sequences. In this paper, we describe DO formally, and present a new algorithm, AffineDO, which heuristically bounds an instance of the TAP under the affine indel cost. AffineDO shows superior results with real biological sequences, a point that we illustrate using a published data set example. Draft 8, December 18, 2008 1
On Improving Heuristics for Maximum Likelihood and Multiple . . .
, 2006
"... In the construction of phylogenetic trees, exact methods for maximum likelihood and multiple sequence alignment are very computationally intensive algorithms, even on a fixed tree. Since anything above cubic time is too computationally complex to be practical in phylogenetics, heuristics methods ar ..."
Abstract
 Add to MetaCart
In the construction of phylogenetic trees, exact methods for maximum likelihood and multiple sequence alignment are very computationally intensive algorithms, even on a fixed tree. Since anything above cubic time is too computationally complex to be practical in phylogenetics, heuristics methods are used to approximate a solution. We attempt to optimize maximum likelihood heuristic methods to improve performance and accuracy by eliminating “fast evolving sites.” In addition, we compare various multiple sequence alignment heuristic methods to meet an overall goal of improving generalized tree alignment heuristic methods.