Results 1  10
of
40
A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood
, 2003
"... The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The ..."
Abstract

Cited by 2182 (27 self)
 Add to MetaCart
The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The core of this method is a simple hillclimbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distancebased method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximumlikelihood programs and much higher than the performance of distancebased and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximumlikelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distancebased and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page:
General TimeReversible Distances with Unequal Rates across Sites: Mixing Γ and Inverse Gaussian Distributions with Invariant Sites
, 1997
"... This paper aims to explain to biologists the assumptions of these distances and to clarify some earlier misconceptions. Importantly, nearly all of the currently used distance estimates (including those of Tamura, 1992; Tamura and Nei, 1994) are special cases (restrictions) of the general timerevers ..."
Abstract

Cited by 49 (3 self)
 Add to MetaCart
This paper aims to explain to biologists the assumptions of these distances and to clarify some earlier misconceptions. Importantly, nearly all of the currently used distance estimates (including those of Tamura, 1992; Tamura and Nei, 1994) are special cases (restrictions) of the general timereversible distance (see Zharkikh, 1994; Swofford et al., 1996)
Efficient Algorithms for Inverting Evolution
 Proceedings of the ACM Symposium on the Foundations of Computer Science
, 1999
"... Evolution can be mathematically modelled by a stochastic process that operates on the DNA of species. Such models are based on the established theory that the DNA sequences, or genomes, of all extant species have been derived from the genome of the common ancestor of all species by a process of rand ..."
Abstract

Cited by 44 (4 self)
 Add to MetaCart
(Show Context)
Evolution can be mathematically modelled by a stochastic process that operates on the DNA of species. Such models are based on the established theory that the DNA sequences, or genomes, of all extant species have been derived from the genome of the common ancestor of all species by a process of random mutation and natural selection. A stochastic model...
Data exploration in phylogenetic inference: scientific, heuristic, or neither
 CLADISTICS
, 2003
"... ..."
Outgroup misplacement and phylogenetic inaccuracy under a molecular clock – a simulation study
, 2003
"... Abstract.—We conducted a simulation study of the phylogenetic methods UPGMA, neighbor joining, maximum parsimony, and maximum likelihood for a fivetaxon tree under a molecular clock. The parameter space included a small region where maximum parsimony is inconsistent, so we tested inconsistency corr ..."
Abstract

Cited by 32 (6 self)
 Add to MetaCart
Abstract.—We conducted a simulation study of the phylogenetic methods UPGMA, neighbor joining, maximum parsimony, and maximum likelihood for a fivetaxon tree under a molecular clock. The parameter space included a small region where maximum parsimony is inconsistent, so we tested inconsistency correction for parsimony and distance correction for neighbor joining. As expected, corrected parsimony was consistent. For these data, maximum likelihood with the clock assumption outperformed each of the other methods tested. The distancebased methods performed marginally better than did maximum parsimony and maximum likelihood without the clock assumption. Data correction was generally detrimental to accuracy, especially for short sequence lengths. We identified another region of the parameter space where, although consistent for a given method, some incorrect trees were each selected with up to twice the frequency of the correct (generating) tree for sequences of bounded length. These incorrect trees are those where the outgroup has been incorrectly placed. In addition to this problem, the placement of the outgroup sequence can have a confounding effect on the ingroup tree, whereby the ingroup is correct when using the ingroup sequences alone, but with the inclusion of the outgroup the ingroup tree becomes incorrect. [Equal rates; inconsistency; maximum likelihood; maximum parsimony; misleading zones; neighbor joining; outgroup rooting; UPGMA.] Any treeestimation method can be inconsistent if incorrect assumptions are made about the mechanism of
Bootstrapping phylogenetic trees: theory and methods
 Statist. Sci
, 2003
"... Abstract. This is a survey of the use of the bootstrap in the area of systematic and evolutionary biology. I present the current usage by biologists of the bootstrap as a tool both for making inferences and for evaluating robustness, and propose a framework for thinking about these problems in terms ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
Abstract. This is a survey of the use of the bootstrap in the area of systematic and evolutionary biology. I present the current usage by biologists of the bootstrap as a tool both for making inferences and for evaluating robustness, and propose a framework for thinking about these problems in terms of mathematical statistics. Key words and phrases: Bootstrap, phylogenetic trees, confidence regions, nonpositive curvature. 1. AN INTRODUCTION TO SYSTEMATICS The objects of study in systematics are binary rooted semilabeled trees that link species or families by their coancestral relationships. For example, Figure 1 shows a tree with seven strains of HIV.
Statistics for phylogenetic trees
 THEORETICAL POPULATION BIOLOGY 63 (2003) 17–32
, 2003
"... This paper poses the problem of estimating and validating phylogenetic trees in statistical terms. The problem is hard enough to warrant several tacks: we reason by analogy to rounding real numbers, and dealing with ranking data. These are both cases where, as in phylogeny the parameters of interest ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
This paper poses the problem of estimating and validating phylogenetic trees in statistical terms. The problem is hard enough to warrant several tacks: we reason by analogy to rounding real numbers, and dealing with ranking data. These are both cases where, as in phylogeny the parameters of interest are not real numbers. Then we pose the problem in geometrical terms, using distances and measures on a natural space of trees. We do not solve the problems of inference on tree space, but suggest some coherent ways of tackling them.
Maximum likelihood JukesCantor triplets: analytic solutions
, 2004
"... Complex systems of polynomial equations have to be set up and solved algebraically in order to obtain analytic solutions for maximum likelihood on phylogenetic trees. This has restricted the types of systems previously resolved to the simplest models three and four taxa under a molecular clock, wit ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
(Show Context)
Complex systems of polynomial equations have to be set up and solved algebraically in order to obtain analytic solutions for maximum likelihood on phylogenetic trees. This has restricted the types of systems previously resolved to the simplest models three and four taxa under a molecular clock, with just two state characters. In this work we give, for the first time, analytic solutions for a family of trees with four state characters, like normal DNA or RNA. The model of substitution we use is the JukesCantor model, and the trees are on three taxa under molecular clock, namely rooted triplets. We employ a number of approaches and tools to solve this system: Spectral methods (Hadamard conjugation), a new representation of variables (the pathset spectrum), and algebraic geometry tools (the resultant of two polynomials). All these, combined with heavy application of computer algebra packages (Maple), let us derive the desired solution.
Analytic solutions for threetaxon MLMC trees with variable rates across sites
 In WABI 2001
, 2001
"... We consider the problem of finding the maximum likelihood rooted tree under a molecular clock (MLMC), with three species and 2state characters under a symmetric model of substitution. For identically distributed rates per site this is probably the simplest phylogenetic estimation problem, and it is ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
We consider the problem of finding the maximum likelihood rooted tree under a molecular clock (MLMC), with three species and 2state characters under a symmetric model of substitution. For identically distributed rates per site this is probably the simplest phylogenetic estimation problem, and it is readily solved numerically. Analytic solutions, on the other hand, were obtained only recently (Yang, 2000). In this work we provide analytic solutions for any distribution of rates across sites, provided the moment generating function of the distribution is strictly increasing over the negative real numbers. This class of distributions includes, among others, identical rates across sites, as well as the Gamma, the uniform, and the inverse Gaussian distributions. Therefore, our work generalizes Yang’s solution. In addition, our derivation of the analytic solution is substantially simpler. We use the Hadamard conjugation (Hendy and Penny, 1993) to prove a general statement about the edge lengths of any neighboring pair of leaves in any phylogenetic tree (on three or more taxa). We then employ this relation, in conjunction with the convexity of an entropy–like function, to derive the analytic solution.