Results 1  10
of
223
A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood
, 2003
"... The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The ..."
Abstract

Cited by 2176 (27 self)
 Add to MetaCart
The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The core of this method is a simple hillclimbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distancebased method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximumlikelihood programs and much higher than the performance of distancebased and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximumlikelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distancebased and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page:
Model Selection and Model Averaging in Phylogenetics: Advantages of Akaike Information Criterion and Bayesian Approaches Over Likelihood Ratio Tests
, 2004
"... Model selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental concepts and techniques of model selection in the context of phylogenetics. We start by reviewing different aspects of the sel ..."
Abstract

Cited by 404 (8 self)
 Add to MetaCart
Model selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental concepts and techniques of model selection in the context of phylogenetics. We start by reviewing different aspects of the selection of substitution models in phylogenetics from a theoretical, philosophical and practical point of view, and summarize this comparison in table format. We argue that the most commonly implemented model selection approach, the hierarchical likelihood ratio test, is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages. In particular, the latter two methods are able to simultaneously compare multiple nested or nonnested models, assess model selection uncertainty, and allow for the estimation of phylogenies and model parameters using all available models (modelaveraged inference or multimodel inference). We also describe how the relative importance of the different parameters included in substitution models can be depicted. To illustrate some of these points, we have applied AICbased model averaging to 37 mitochondrial DNA sequences from the subgenus Ohomopterus (genus Carabus) ground beetles described by Sota and Vogler (2001).
Bayesian phylogenetic analysis of combined data
 Syst. Biol
, 2004
"... Abstract. — The recent development of Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) techniques has facilitated the exploration of parameterrich evolutionary models. At the same time, stochastic models have become more realistic (and complex) and have been extended to new typ ..."
Abstract

Cited by 198 (11 self)
 Add to MetaCart
(Show Context)
Abstract. — The recent development of Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) techniques has facilitated the exploration of parameterrich evolutionary models. At the same time, stochastic models have become more realistic (and complex) and have been extended to new types of data, such as morphology. Based on this foundation, we developed a Bayesian MCMC approach to the analysis of combined data sets and explored its utility in inferring relationships among gall wasps based on data from morphology and four genes (nuclear and mitochondrial, ribosomal and protein coding). Examined models range in complexity from those recognizing only a morphological and a molecular partition to those having complex substitution models with independent parameters for each gene. Bayesian MCMC analysis deals efficiently with complex models: convergence occurs faster and more predictably for complex models, mixing is adequate for all parameters even under very complex models, and the parameter update cycle is virtually unaffected by model partitioning across sites. Morphology contributed only 5 % of the characters in the data set but nevertheless influenced the combineddata tree, supporting the utility of morphological data in multigene analyses. We used Bayesian criteria (Bayes factors) to show that process heterogeneity across data partitions is a significant model component, although not as important as amongsite rate variation. More complex evolutionary models are associated with more topological uncertainty and less conflict between morphology and molecules. Bayes factors sometimes favor simpler models over considerably more
Bayesian phylogenetic inference via Markov chain Monte Carlo methods
 Biometrics
, 1999
"... SUMMARY. We derive a Markov chain to sample from the posterior distribution for a phylogenetic tree given sequence information from the corresponding set of organisms, a stochastic model for these data, and a prior distribution on the space of trees. A transformation of the tree into a canonical cop ..."
Abstract

Cited by 159 (6 self)
 Add to MetaCart
(Show Context)
SUMMARY. We derive a Markov chain to sample from the posterior distribution for a phylogenetic tree given sequence information from the corresponding set of organisms, a stochastic model for these data, and a prior distribution on the space of trees. A transformation of the tree into a canonical cophenetic matrix form suggests a simple and effective proposal distribution for selecting candidate trees close to the current tree in the chain. We illustrate the algorithm with restriction site data on 9 plant species, then extend to DNA sequences from 32 species of fish. The algorithm mixes well in both examples from random starting trees, generating reproducible estimates and credible sets for the path of evolution.
Selecting the bestfit model of nucleotide substitution
 Syst
, 2001
"... Abstract.—Despite the relevant role of models of nucleotide substitution in phylogenetics, choosing among different models remains a problem. Several statistical methods for selecting the model that best ts the data at hand have been proposed, but their absolute and relative performance has not yet ..."
Abstract

Cited by 134 (2 self)
 Add to MetaCart
Abstract.—Despite the relevant role of models of nucleotide substitution in phylogenetics, choosing among different models remains a problem. Several statistical methods for selecting the model that best ts the data at hand have been proposed, but their absolute and relative performance has not yet been characterized. In this study, we compare under various conditions the performance of different hierarchical and dynamic likelihood ratio tests, and of Akaike and Bayesian information methods, for selecting bestt models of nucleotide substitution. We specically examine the role of the topology used to estimate the likelihood of the different models and the importance of the order in which hypotheses are tested. We do this by simulating DNA sequences under a known model of nucleotide substitution andrecording howoften this truemodel is recovered by thedifferentmethods.Our results suggest thatmodel selection is reasonablyaccurateandindicate that some likelihood ratio testmethods perform overall better than the Akaike or Bayesian information criteria. The tree used to estimate the likelihood scores does not inuence model selection unless it is a randomly chosen tree. The order in which hypotheses are tested, and the complexity of the initial model in the sequence of tests, inuence model selection in some cases. Model tting in phylogenetics has been suggested for many years, yet many authors still arbitrarily choose their models, often using the default models implemented
Molecular systematics of the eastern fence lizard (Sceloporus undulatus): A comparison of parsimony, likelihood, and Bayesian approaches
 Syst. Biol
, 2002
"... Abstract.—Phylogenetic analysis of large datasets using complex nucleotide substitution models under a maximum likelihood framework can be computationally infeasible, especially when attempting to infer con�dence values by way of nonparametric bootstrapping. Recent developments in phylogenetics sugg ..."
Abstract

Cited by 90 (8 self)
 Add to MetaCart
Abstract.—Phylogenetic analysis of large datasets using complex nucleotide substitution models under a maximum likelihood framework can be computationally infeasible, especially when attempting to infer con�dence values by way of nonparametric bootstrapping. Recent developments in phylogenetics suggest the computational burden can be reduced by using Bayesian methods of phylogenetic inference. However, few empirical phylogenetic studies exist that explore the ef�ciency of Bayesian analysis of large datasets. To this end, we conducted an extensive phylogenetic analysis of the wideranging and geographically variable Eastern Fence Lizard (Sceloporus undulatus). Maximum parsimony, maximum likelihood, and Bayesian phylogenetic analyses were performed on a combined mitochondrial DNA dataset (12S and 16S rRNA, ND1 proteincoding gene, and associated tRNA; 3,688 bp total) for 56 populations of S. undulatus (78 total terminals including other S. undulatus group species and outgroups). Maximum parsimony analysis resulted in numerous equally parsimonious trees (82,646 from equally weighted parsimony and 335 from weighted parsimony). The majority rule consensus tree derived from the Bayesian analysis was topologically identical to the single best phylogeny inferred from the maximum likelihood analysis, but required �80 % less computational time. The mtDNA data provide strong support for the monophyly of the S. undulatus group and
A.C., Toward a comprehensive phylogeny for mammalian and avian
, 2000
"... These include: This article cites 20 articles, 10 of which can be accessed free at: ..."
Abstract

Cited by 70 (4 self)
 Add to MetaCart
(Show Context)
These include: This article cites 20 articles, 10 of which can be accessed free at:
Phylogenetic models of rate heterogeneity: A high performance computing perspective
 In Proceedings of the 20th Internationational Parallel and Distributed Processing Symposium (IPDPS
, 2006
"... Inference of phylogenetic trees using the maximum likelihood (ML) method is NPhard. Furthermore, the computation of the likelihood function for huge trees of more than 1,000 organisms is computationally intensive due to a large amount of floating point operations and high memory consumption. Within ..."
Abstract

Cited by 42 (9 self)
 Add to MetaCart
(Show Context)
Inference of phylogenetic trees using the maximum likelihood (ML) method is NPhard. Furthermore, the computation of the likelihood function for huge trees of more than 1,000 organisms is computationally intensive due to a large amount of floating point operations and high memory consumption. Within this context, the present paper compares two competing mathematical models that account for evolutionary rate heterogeneity: the Γ and CAT models. The intention of this paper is to show that—from a purely empirical point of view—CAT can be used instead of Γ. The main advantage of CAT over Γ consists in significantly lower memory consumption and faster inference times. An experimental study using RAxML has been performed on 19 realworld datasets comprising 73 up to 1,663 DNA sequences. Results show that CAT is on average 5.5 times faster than Γ and—surprisingly enough—also yields trees with slightly superior Γ likelihood values. The usage of the CAT model decreases the amount of average L2 and L3 cache misses by factor 8.55. 1.