A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood
, 2003
The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements.
Cited by 2182
The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The core of this method is a simple hillclimbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distancebased method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximumlikelihood programs and much higher than the performance of distancebased and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximumlikelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distancebased and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page:
A likelihood approach to estimating phylogeny from discrete morphological character data
 Systematic Biology
, 2001
"... Abstract.—Evolutionary biologists have adopted simple likelihoodmodels for purposes of estimating ancestral states and evaluating character independence on specied phylogenies; however, for purposes of estimating phylogenies by using discrete morphological data, maximum parsimony remains the only o ..."
Cited by 156
Abstract.—Evolutionary biologists have adopted simple likelihoodmodels for purposes of estimating ancestral states and evaluating character independence on specied phylogenies; however, for purposes of estimating phylogenies by using discrete morphological data, maximum parsimony remains the only option. This paper explores the possibility of using standard, wellbehaved Markov models for estimating morphological phylogenies (including branch lengths) under the likelihood criterion. An importantmodication of standardMarkovmodels involvesmaking the likelihood conditional on characters being variable, because constant characters are absent in morphological data sets. Without this modication, branch lengths are often overestimated, resulting in potentially serious biases in tree topology selection. Several new avenues of research are opened by an explicitly modelbased approach to phylogenetic analysis of discrete morphological data, including combineddata likelihood analyses (morphologyC sequence data), likelihood ratio tests, and Bayesian analyses. [Discrete morphological character; Markov model; maximum likelihood; phylogeny.] The increased availability of nucleotide and protein sequences from a diversity of both organisms and genes has stimu
Phylogenetic Tree Construction Using Markov Chain Monte Carlo
, 1999
"... We describe a Bayesian method based on Markov chain simulation to study the phylogenetic relationship in a group of DNA sequences. Under simple models of mutational events, our method produces a Markov chain whose stationary distribution is the conditional distribution of the phylogeny given the obs ..."
Cited by 86
We describe a Bayesian method based on Markov chain simulation to study the phylogenetic relationship in a group of DNA sequences. Under simple models of mutational events, our method produces a Markov chain whose stationary distribution is the conditional distribution of the phylogeny given the observed sequences. Our algorithm strikes a reasonable balance between the desire to move globally through the space of phylogenies and the need to make computationally feasible moves in areas of high probability. Since phylogenetic information is described by a tree, we have created new diagnostics to handle this type of data structure. An important byproduct of the Markov chain Monte Carlo phylogeny building technique is that it provides estimates and corresponding measures of variability for any aspect of the phylogeny under study.
PHYML Onlinea web server for fast maximum likelihoodbased phylogenetic inference
 Nucleic Acids Res
, 2005
"... PHYML Online is a web interface to PHYML, a software that implements a fast and accurate heuristic for estimating maximum likelihood phylogenies from DNA and protein sequences. This tool provides the user with a number of options, e.g. nonparametric bootstrap and estimation of various evolutionary p ..."
Cited by 84
PHYML Online is a web interface to PHYML, a software that implements a fast and accurate heuristic for estimating maximum likelihood phylogenies from DNA and protein sequences. This tool provides the user with a number of options, e.g. nonparametric bootstrap and estimation of various evolutionary parameters, in order to perform comprehensive phylogenetic analyses on large datasets in reasonable computing time. The server and its documentation are available at
Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics
, 2004
"... Motivation: Bayesian estimation of phylogeny is based on the posterior probability distribution of trees. Currently, the only numerical method that can effectively approximate posterior probabilities of trees is Markov chain Monte Carlo (MCMC). Standard implementations of MCMC can be prone to entrap ..."
Cited by 73
Motivation: Bayesian estimation of phylogeny is based on the posterior probability distribution of trees. Currently, the only numerical method that can effectively approximate posterior probabilities of trees is Markov chain Monte Carlo (MCMC). Standard implementations of MCMC can be prone to entrapment in local optima. Metropolis coupled MCMC [(MC) 3], a variant of MCMC, allows multiple peaks in the landscape of trees to be more readily explored, but at the cost of increased execution time. Results: This paper presents a parallel algorithm for (MC) 3. The proposed parallel algorithm retains the ability to explore multiple peaks in the posterior distribution of trees while maintaining a fast execution time. The algorithm has been implemented using two popular parallel programming models: message passing and shared memory. Performance results indicate nearly linear speed improvement in both programming models for small and large data sets. Availability: MrBayes v3.0 is available at
Cited by 39
Improving the efficiency of SPR moves in phylogenetic tree search methods based on maximum likelihood
 BIOINFORMATICS
, 2005
"... Motivation: Maximum likelihood methods have become very popular for constructing phylogenetic trees from sequence data. However, despite noticeable recent progress, with large and difficult data sets (e.g. multiple genes with conflicting signals) current ML programs still require huge computing time ..."
Cited by 31
Motivation: Maximum likelihood methods have become very popular for constructing phylogenetic trees from sequence data. However, despite noticeable recent progress, with large and difficult data sets (e.g. multiple genes with conflicting signals) current ML programs still require huge computing times and can become trapped in bad local optima of the likelihood function. When this occurs, the resulting trees may still show some of the defects (e.g. long branch attraction) of starting trees obtained using fast distance or parsimony programs. Methods: Subtree Pruning and Regrafting (SPR) topological rearrangements are usually sufficient to intensively search the tree space. Here, we propose two new methods to make SPR moves more efficient. The first method uses a fast distancebased approach to detect the least promising candidate SPR moves, which are then simply discarded. The second method locally estimates the change in likelihood for any remaining potential SPRs, as opposed to globally evaluating the entire tree for each possible move. These two methods are implemented in a new algorithm with a sophisticated filtering strategy, which efficiently selects potential SPRs and concentrates most of the likelihood computation on the promising moves. Results: Experiments with real data sets comprising 35 to 250 taxa show that, while indeed greatly reducing the amount of computation, our approach provides likelihood values at least as good as those of the best known ML methods so far, and is very robust to poor starting trees. Furthermore, combining our new SPR algorithm with local moves such as PHYML’s nearest neighbor interchanges, the time needed to find good solutions can sometimes be reduced even more. Availability: Executables of our SPR program and the used data sets are available for download at
Parallel MetropolisCoupled Markov Chain Monte Carlo for Bayesian Phylogenetic Inference
, 2003
"... Motivation: Bayesian estimation of phylogeny is based on the posterior probability distribution of trees. Currently, the only numerical method that can effectively approximate posterior probabilities of trees is Markov Chain Monte Carlo (MCMC). Standard implementations of MCMC can be prone to entrap ..."
Cited by 29
Motivation: Bayesian estimation of phylogeny is based on the posterior probability distribution of trees. Currently, the only numerical method that can effectively approximate posterior probabilities of trees is Markov Chain Monte Carlo (MCMC). Standard implementations of MCMC can be prone to entrapment in local optima. A variant of MCMC, known as MetropolisCoupled MCMC allows multiple peaks in the landscape of trees to be more readily explored, but at the cost of increased execution time. Results: This paper presents a parallel algorithm for MetropolisCoupled MCMC. The proposed parallel algorithm retains the ability to explore multiple peaks in the posterior distribution of trees while maintaining a fast execution time. The algorithm has been implemented using two popular parallel programming models: message passing and shared memory. Performance results indicate nearly linear speed improvement in both programming models for small and large data sets. Availability: MrBayes v3.0 is available at http://morphbank.ebc.uu.se/mrbayes3/.
An efficient program for phylogenetic inference using simulated annealing
 In Proceedings of 19th IEEE/ACM International Parallel and Distributed Processing Symposium
, 2005
"... Inference of phylogenetic trees comprising thousands of organisms based on the maximum likelihood method is computationally expensive. A new program RAxMLSA (Randomized Axelerated Maximum Likelihood with Simulated Annealing) is presented that combines simulated annealing and hillclimbing technique ..."
Cited by 18
Inference of phylogenetic trees comprising thousands of organisms based on the maximum likelihood method is computationally expensive. A new program RAxMLSA (Randomized Axelerated Maximum Likelihood with Simulated Annealing) is presented that combines simulated annealing and hillclimbing techniques to improve the quality of final trees. In addition, to the ability to perform backward steps and potentially escape local maxima provided by simulated annealing, a large number of “good ” alternative topologies is generated which can be used to build a consensus tree on the fly. Though, slower than some of the fastest hillclimbing programs such as RAxMLIII and PHYML, RAxMLSA finds better trees for large real data alignments containing more than 250 sequences. Furthermore, the performance on 40 simulated 500taxon alignments is reasonable in comparison to PHYML. Finally, a straightforward and efficient OpenMP parallelization of RAxML is presented. 1.
2005 Estimation of phylogeny using a general Markov model
 Evol. Bioinform. Online
"... Abstract: The nonhomogeneous model of nucleotide substitution proposed by Barry and Hartigan (Stat Sci, 2: 191210) is the most general model of DNA evolution assuming an independent and identical process at each site. We present a computational solution for this model, and use it to analyse two da ..."
Cited by 18
Abstract: The nonhomogeneous model of nucleotide substitution proposed by Barry and Hartigan (Stat Sci, 2: 191210) is the most general model of DNA evolution assuming an independent and identical process at each site. We present a computational solution for this model, and use it to analyse two data sets, each violating one or more of the assumptions of stationarity, homogeneity, and reversibility. The log likelihood values returned by programs based on the F84 model (J Mol Evol, 29: 170179), the general time reversible model (J Mol Evol, 20: 8693), and Barry and Hartigan’s model are compared to determine the validity of the assumptions made by the fi rst two models. In addition, we present a method for assessing whether sequences have evolved under reversible conditions and discover that this is not so for the two data sets. Finally, we determine the most likely tree under the three models of DNA evolution and compare these with the one favoured by the tests for symmetry.