Results 1  10
of
16
Fixed parameter tractability of binary nearperfect phylogenetic tree reconstruction
 International Colloquium on Automata, Languages and Programming
, 2006
"... Abstract. We consider the problem of finding a Steiner minimum tree in a hypercube. Specifically, given n terminal vertices in an m dimensional cube and a parameter q, we compute the Steiner minimum tree in time O(72 q + 8 q nm 2), under the assumption that the length of the minimum Steiner tree is ..."
Abstract

Cited by 18 (7 self)
 Add to MetaCart
(Show Context)
Abstract. We consider the problem of finding a Steiner minimum tree in a hypercube. Specifically, given n terminal vertices in an m dimensional cube and a parameter q, we compute the Steiner minimum tree in time O(72 q + 8 q nm 2), under the assumption that the length of the minimum Steiner tree is at most m + q. This problem has extensive applications in taxonomy and biology. The Steiner tree problem in hypercubes is equivalent to the phylogeny (evolutionary tree) reconstruction problem under the maximum parsimony criterion, when each taxon is defined over binary states. The taxa, character set and mutation of a phylogeny correspond to terminal vertices, dimensions and traversal of a dimension in a Steiner tree. Phylogenetic trees that mutate each character exactly once are called perfect phylogenies and their size is bounded by the number of characters. When a perfect phylogeny consistent with the data set exists it can be constructed in linear time. However, real data sets often do not admit perfect phylogenies. In this paper, we consider the problem of reconstructing nearperfect phylogenetic trees (referred to as BNPP). A nearperfect phylogeny relaxes the perfect phylogeny assumption by allowing at most q additional mutations. We show for the first time that the BNPP problem is fixed parameter tractable (FPT) and significantly improve the previous asymptotic bounds. 1
Topological rearrangements and local search method for tandem duplication trees
 IEEE TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
, 2005
"... The problem of reconstructing the duplication history of a set of tandemly repeated sequences was first introduced by Fitch [4]. Many recent studies deal with this problem, showing the validity of the unequal recombination model proposed by Fitch, describing numerous inference algorithms, and explo ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
The problem of reconstructing the duplication history of a set of tandemly repeated sequences was first introduced by Fitch [4]. Many recent studies deal with this problem, showing the validity of the unequal recombination model proposed by Fitch, describing numerous inference algorithms, and exploring the combinatorial properties of these new mathematical objects, which are duplication trees. In this paper, we deal with the topological rearrangement of these trees. Classical rearrangements used in phylogeny (NNI, SPR, TBR,...) cannot be applied directly on duplication trees. We show that restricting the neighborhood defined by the SPR (Subtree Pruning and Regrafting) rearrangement to valid duplication trees, allows exploring the whole duplication tree space. We use these restricted rearrangements in a local search method which improves an initial tree via successive rearrangements. This method is applied to the optimization of parsimony and minimum evolution criteria. We show through simulations that this method improves all existing programs for both reconstructing the topology of the true tree and recovering its duplication events. We apply this approach to tandemly repeated human Zinc finger genes and observe that a much better duplication tree is obtained by our method than using any other program.
The GeneDuplication Problem: NearLinear Time Algorithms for NNI Based Local Searches
"... Abstract. The geneduplication problem is to infer a species supertree from a collection of gene trees that are confounded by complex histories of gene duplication events. This problem is NPcomplete and thus requires efficient and effective heuristics. Existing heuristics perform a stepwise search ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
Abstract. The geneduplication problem is to infer a species supertree from a collection of gene trees that are confounded by complex histories of gene duplication events. This problem is NPcomplete and thus requires efficient and effective heuristics. Existing heuristics perform a stepwise search of the tree space, where each step is guided by an exact solution to an instance of a local search problem. A classical local search problem is the NNI search problem, which is based on the nearest neighbor interchange operation. In this work we (i) provide a novel nearlinear time algorithm for the NNI search problem, (ii) introduce extensions that significantly enlarge the search space of the NNI search problem, and (iii) present algorithms for these extended versions that are asymptotically just as efficient as our algorithm for the NNI search problem. The substantially extended NNI search problem, along with the exceptional speedup achieved, make the geneduplication problem more tractable for largescale phylogenetic analyses. 1
OPTIMAL IMPERFECT PHYLOGENY RECONSTRUCTION AND HAPLOTYPING (IPPH)
"... The production of large quantities of diploid genotype data has created a need for computational methods for largescale inference of haplotypes from genotypes. One promising approach to the problem has been to infer possible phylogenies explaining the observed genotypes in terms of putative descenda ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
The production of large quantities of diploid genotype data has created a need for computational methods for largescale inference of haplotypes from genotypes. One promising approach to the problem has been to infer possible phylogenies explaining the observed genotypes in terms of putative descendants of some common ancestral haplotype. The first attempts at this problem proceeded on the restrictive assumption that observed sequences could be explained by a perfect phylogeny, in which each variant locus is presumed to have mutated exactly once over the sampled population’s history. Recently, the perfect phylogeny model was relaxed and the problem of reconstructing an imperfect phylogeny (IPPH) from genotype data was considered. A polynomial time algorithm was developed for the case when a single site is allowed to mutate twice, but the general problem remained open. In this work, we solve the general IPPH problem and show for the first time that it is possible to infer optimal qnearperfect phylogenies from diploid genotype data in polynomial time for any constant q, where q is the number of “extra ” mutations required in the phylogeny beyond what would be present in a perfect phylogeny. This work has application to the haplotype phasing problem as well as to various related problems in phylogenetic inference, analysis of sequence variability in populations, and association study design. Empirical studies on human data of known phase show this method to be competitive with the leading phasing methods and provide strong support for the value of continued research into algorithms for general phylogeny construction from diploid data.
Algorithms for Efficient NearPerfect Phylogenetic Tree Reconstruction in Theory and Practice
"... Abstract—We consider the problem of reconstructing nearperfect phylogenetic trees using binary character states (referred to as BNPP). A perfect phylogeny assumes that every character mutates at most once in the evolutionary tree, yielding an algorithm for binary character states that is computatio ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Abstract—We consider the problem of reconstructing nearperfect phylogenetic trees using binary character states (referred to as BNPP). A perfect phylogeny assumes that every character mutates at most once in the evolutionary tree, yielding an algorithm for binary character states that is computationally efficient but not robust to imperfections in real data. A nearperfect phylogeny relaxes the perfect phylogeny assumption by allowing at most a constant number of additional mutations. We develop two algorithms for constructing optimal nearperfect phylogenies and provide empirical evidence of their performance. The first simple algorithm is fixedparameter tractable when the number of additional mutations and the number of characters that share four gametes with some other character are constants. The second, more involved, algorithm for the problem is fixedparameter tractable when only the number of additional mutations is fixed. We have implemented both algorithms and have shown them to be extremely efficient in practice on biologically significant data sets. This work proves that the BNPP problem is fixedparameter tractable and provides the first practical phylogenetic tree reconstruction algorithms that find guaranteed optimal solutions while being easily implemented and computationally feasible for data sets of biologically meaningful size and complexity. Index Terms—Computations on discrete structures, trees, biology and genetics. Ç 1
FPT Algorithms for Binary NearPerfect Phylogenetic Trees
, 2005
"... We consider the problem of reconstructing nearperfect phylogenetic trees using binary character states (referred to as BNPP). A perfect phylogeny assumes that every character mutates at most once in the evolutionary tree, yielding an algorithm for binary character states that is computationally eff ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We consider the problem of reconstructing nearperfect phylogenetic trees using binary character states (referred to as BNPP). A perfect phylogeny assumes that every character mutates at most once in the evolutionary tree, yielding an algorithm for binary character states that is computationally efficient but not robust to imperfections in real data. A nearperfect phylogeny relaxes the perfect phylogeny assumption by allowing at most a constant number of additional mutations. In this paper, we present a simple lower bound for the size of an optimal phylogeny, develop two algorithms for constructing optimal phylogenies and show experimental results for one of the variants. The first algorithm is intuitive and reconstructs an optimal nearperfect phylogenetic tree in time (q + Îº) O(q) nm + O(nm 2) where Îº is the number of characters that share four gametes with some other character. A second, more involved algorithm shows the problem to be fixed parameter tractable in q by solving it in time q O(q) nm + O(nm 2) where n is the number of taxa and m is the number of characters. This is a significant improvement over the previous best result of nm O(q) 2 O(q2 s 2) , where s is the number of states per character (2 for binary). We implement the first algorithm and show that it finds the optimal solution quickly for a selection of population datasets
Algorithms for Analyzing Intraspecific Sequence Variation
, 2007
"... the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity
The Imperfect Ancestral Recombination Graph Reconstruction Problem: Upper Bounds For Recombination and Homoplasy
, 2010
"... One of the central problems in computational biology is the reconstruction of evolutionary histories. While models incorporating recombination and homoplasy have been studied separately, a missing component in the theory is a robust and flexible unifying model which incorporates both of these major ..."
Abstract
 Add to MetaCart
One of the central problems in computational biology is the reconstruction of evolutionary histories. While models incorporating recombination and homoplasy have been studied separately, a missing component in the theory is a robust and flexible unifying model which incorporates both of these major biological events shaping genetic diversity. In this article, we introduce the first such unifying model and develop algorithms to find the optimal ancestral recombination graph incorporating recombinations and homoplasy events. The power of our framework is the connection between our formulation and the Directed Steiner Arborescence Problem in combinatorial optimization. We implement linear programming techniques as well as heuristics for the Directed Steiner Arborescence Problem, and use our methods to construct evolutionary histories for both simulated and real data sets.
A New Algorithm for the Reconstruction of NearPerfect Binary Phylogenetic Trees
, 2005
"... In this paper, we consider the problem of reconstructing nearperfect phylogenetic trees using binary characters. A perfect phylogeny assumes that every character mutates at most once in the evolutionary tree. The algorithm for reconstructing a perfect phylogeny for binary characters is computationa ..."
Abstract
 Add to MetaCart
In this paper, we consider the problem of reconstructing nearperfect phylogenetic trees using binary characters. A perfect phylogeny assumes that every character mutates at most once in the evolutionary tree. The algorithm for reconstructing a perfect phylogeny for binary characters is computationally efficient but impractical in most real settings. A nearperfect phylogeny relaxes this assumption by allowing characters to mutate a constant number of times. We show that if the number of additional mutations required by the nearperfect phylogeny is bounded by q, then we can reconstruct the optimal nearperfect phylogenetic tree in time 2 where n is the number of taxa and m is the number of characters. This is a significant improvement over the previous best result of nm where r is the number of states per character (2 for binary). This improvement could lead to the first practical phylogenetic tree reconstruction algorithm that is both computationally feasible and biologically meaningful. We finally outline a method to improve the bound to q .
Abstract Representation in stochastic search for phylogenetic tree reconstruction
, 2005
"... Phylogenetic tree reconstruction is a process in which the ancestral relationships among a group of organisms are inferred from their DNA sequences. For all but trivial sized data sets, finding the optimal tree is computationally intractable. Many heuristic algorithms exist, but the branchswapping ..."
Abstract
 Add to MetaCart
(Show Context)
Phylogenetic tree reconstruction is a process in which the ancestral relationships among a group of organisms are inferred from their DNA sequences. For all but trivial sized data sets, finding the optimal tree is computationally intractable. Many heuristic algorithms exist, but the branchswapping algorithm used in the software package PAUP * is the most popular. This method performs a stochastic search over the space of trees, using a branchswapping operation to construct neighboring trees in the search space. This study introduces a new stochastic search algorithm that operates over an alternative representation of trees, namely as permutations of taxa giving the order in which they are processed during stepwise addition. Experiments on several data sets suggest that this algorithm for generating an initial tree, when followed by branchswapping, can produce better trees for a given total amount of time.