Results 1  10
of
133
A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood
, 2003
"... The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The ..."
Abstract

Cited by 2182 (27 self)
 Add to MetaCart
The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The core of this method is a simple hillclimbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distancebased method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximumlikelihood programs and much higher than the performance of distancebased and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximumlikelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distancebased and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page:
Model Selection and Model Averaging in Phylogenetics: Advantages of Akaike Information Criterion and Bayesian Approaches Over Likelihood Ratio Tests
, 2004
"... Model selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental concepts and techniques of model selection in the context of phylogenetics. We start by reviewing different aspects of the sel ..."
Abstract

Cited by 407 (8 self)
 Add to MetaCart
Model selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental concepts and techniques of model selection in the context of phylogenetics. We start by reviewing different aspects of the selection of substitution models in phylogenetics from a theoretical, philosophical and practical point of view, and summarize this comparison in table format. We argue that the most commonly implemented model selection approach, the hierarchical likelihood ratio test, is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages. In particular, the latter two methods are able to simultaneously compare multiple nested or nonnested models, assess model selection uncertainty, and allow for the estimation of phylogenies and model parameters using all available models (modelaveraged inference or multimodel inference). We also describe how the relative importance of the different parameters included in substitution models can be depicted. To illustrate some of these points, we have applied AICbased model averaging to 37 mitochondrial DNA sequences from the subgenus Ohomopterus (genus Carabus) ground beetles described by Sota and Vogler (2001).
Selecting the bestfit model of nucleotide substitution
 Syst
, 2001
"... Abstract.—Despite the relevant role of models of nucleotide substitution in phylogenetics, choosing among different models remains a problem. Several statistical methods for selecting the model that best ts the data at hand have been proposed, but their absolute and relative performance has not yet ..."
Abstract

Cited by 135 (2 self)
 Add to MetaCart
Abstract.—Despite the relevant role of models of nucleotide substitution in phylogenetics, choosing among different models remains a problem. Several statistical methods for selecting the model that best ts the data at hand have been proposed, but their absolute and relative performance has not yet been characterized. In this study, we compare under various conditions the performance of different hierarchical and dynamic likelihood ratio tests, and of Akaike and Bayesian information methods, for selecting bestt models of nucleotide substitution. We specically examine the role of the topology used to estimate the likelihood of the different models and the importance of the order in which hypotheses are tested. We do this by simulating DNA sequences under a known model of nucleotide substitution andrecording howoften this truemodel is recovered by thedifferentmethods.Our results suggest thatmodel selection is reasonablyaccurateandindicate that some likelihood ratio testmethods perform overall better than the Akaike or Bayesian information criteria. The tree used to estimate the likelihood scores does not inuence model selection unless it is a randomly chosen tree. The order in which hypotheses are tested, and the complexity of the initial model in the sequence of tests, inuence model selection in some cases. Model tting in phylogenetics has been suggested for many years, yet many authors still arbitrarily choose their models, often using the default models implemented
Increased taxon sampling greatly reduces phylogenetic error. Syst. Biol
, 2002
"... Abstract.—Several authors have argued recently that extensive taxon sampling has a positive and important effect on the accuracy of phylogenetic estimates. However, other authors have argued that there is little bene�t of extensive taxon sampling, and so phylogenetic problems can or should be reduce ..."
Abstract

Cited by 106 (2 self)
 Add to MetaCart
Abstract.—Several authors have argued recently that extensive taxon sampling has a positive and important effect on the accuracy of phylogenetic estimates. However, other authors have argued that there is little bene�t of extensive taxon sampling, and so phylogenetic problems can or should be reduced to a few exemplar taxa as a means of reducing the computational complexity of the phylogenetic analysis. In this paper we examined �ve aspects of study design that may have led to these different perspectives. First, we considered the measurement of phylogenetic error across a wide range of taxon sample sizes, and conclude that the expected error based on randomly selecting trees (which varies by taxon sample size) must be considered in evaluating error in studies of the effects of taxon sampling. Second, we addressed the scope of the phylogenetic problems de�ned by different samples of taxa, and argue that phylogenetic scope needs to be considered in evaluating the importance of taxonsampling strategies. Third, we examined the claim that fast and simple tree searches are as effective as more thorough searches at �nding nearoptimal trees that minimize error. We show that a more complete search of tree space reduces phylogenetic error, especially as the taxon sample size increases. Fourth, we examined the effects of simple versus complex simulation models on taxonomic sampling studies. Although bene�ts of taxon sampling are apparent for all models,
Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models
 SYST. BIOL
, 2004
"... What does die posterior probability of a phylogenetic tree mean? This simulation study shows that Bayesian posterior probabilities have the meaning that is typically ascribed to them; the pt>sterkir probability ot'a tree is the probability that the tree is corwct, assuming th>.it the mo ..."
Abstract

Cited by 101 (7 self)
 Add to MetaCart
What does die posterior probability of a phylogenetic tree mean? This simulation study shows that Bayesian posterior probabilities have the meaning that is typically ascribed to them; the pt>sterkir probability ot'a tree is the probability that the tree is corwct, assuming th>.it the model is correct. At the same time, the BayLsian method can be sensitive to model misspecification, and the sensitivity of the Bayesian method appears to be greater than the sensitivity ot " the nonparametric bootstrap method (using maximum likelihood to estimate trees). Although the estimatLs of phylogeny obtained by use of the method of maximum likelihood or the Bayesian method are Ukely to be similar, the assessment of the uncertainty of inferred trees via either bootstriipping (t"or maximum likelihood estimates) or petsterior probabilities (for Bayesian estimates) is not likely to be the same. We suggest that the Bayesian method be implemented with the most complex models of those currently avaiiable, as tliis should reduce the chance that the metliod will concentrate too much probability on tuo few trees. [Bayesian estimation; Markov ch^iin Monte Carlo; posterior probability; prior probability.] Quantify ing the uncertainty of a phylogcneticesti mil te is at least as important a goal as obtaining the phylogenetic estimate itself. Measures of phylogenetic reliability not only point out what parts of a tree can be trusted when interpreting the evolution of a group, but can guide
Diskcovering, a fastconverging method for phylogenetic tree reconstruction
 JOURNAL OF COMPUTATIONAL BIOLOGY
, 1999
"... The evolutionary history of a set of species is represented by a phylogenetic tree, which is a rooted, leaflabeled tree, where internal nodes represent ancestral species and the leaves represent modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large and diverg ..."
Abstract

Cited by 92 (10 self)
 Add to MetaCart
(Show Context)
The evolutionary history of a set of species is represented by a phylogenetic tree, which is a rooted, leaflabeled tree, where internal nodes represent ancestral species and the leaves represent modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large and divergent trees from realistic length sequences have long been considered one of the major challenges in systematic biology. In this paper, we present a simple method, the DiskCovering Method (DCM), which boosts the performance of base phylogenetic methods under various Markov models of evolution. We analyze the performance of DCMboosted distance methods under the Jukes–Cantor Markov model of biomolecular sequence evolution, and prove that for almost all trees, polylogarithmic length sequences suffice for complete accuracy with high probability, while polynomial length sequences always suffice. We also provide an experimental study based upon simulating sequence evolution on model trees. This study confirms substantial reductions in error rates at realistic sequence lengths.
Inferring Evolutionary Trees with Strong Combinatorial Evidence
 THEORETICAL COMPUTER SCIENCE
, 1997
"... We consider the problem of inferring the evolutionary tree of a set of n species. We propose a quartet reconstruction method which specifically produces trees whose edges have strong combinatorial evidence. Let Q be a set of resolved quartets defined on the studied species, the method computes th ..."
Abstract

Cited by 73 (13 self)
 Add to MetaCart
We consider the problem of inferring the evolutionary tree of a set of n species. We propose a quartet reconstruction method which specifically produces trees whose edges have strong combinatorial evidence. Let Q be a set of resolved quartets defined on the studied species, the method computes the unique maximum subset Q of Q which is equivalent to a tree and outputs the corresponding tree as an estimate of the species' phylogeny. We use a characterization of the subset Q due to [6] to provide an O(n 4 ) incremental algorithm for this variant of the NPhard quartet consistency problem. Moreover, when chosing the resolution of the quartets by the FourPoint Method (FPM) and considering the CavenderFarris model of evolution, we show that the convergence rate of the Q method is at worst polynomial when the maximum evolutive distance between two species is bounded. We complete these theoretical results by an experimental study on real and simulated data sets. The results ...
The importance of data partitioning and the utility of Bayes factors in Bayesian phylogenetics. Syst. Biol
, 2007
"... Abstract.—As larger, more complex data sets are being used to infer phylogenies, accuracy of these phylogenies increasingly requires models of evolution that accommodate heterogeneity in the processes of molecular evolution. We investigated the effect of improper data partitioning on phylogenetic ac ..."
Abstract

Cited by 70 (6 self)
 Add to MetaCart
(Show Context)
Abstract.—As larger, more complex data sets are being used to infer phylogenies, accuracy of these phylogenies increasingly requires models of evolution that accommodate heterogeneity in the processes of molecular evolution. We investigated the effect of improper data partitioning on phylogenetic accuracy, as well as the type I error rate and sensitivity of Bayes factors, a commonly used method for choosing among different partitioning strategies in Bayesian analyses. We also used Bayes factors to test empirical data for the need to divide data in a manner that has no expected biological meaning. Posterior probability estimates are misleading when an incorrect partitioning strategy is assumed. The error was greatest when the assumed model was underpartitioned. These results suggest that model partitioning is important for large data sets. Bayes factors performed well, giving a 5 % type I error rate, which is remarkably consistent with standard frequentist hypothesis tests. The sensitivity of Bayes factors was found to be quite high when the acrossclass model heterogeneity reflected that of empirical data. These results suggest that Bayes factors represent a robust method of choosing among partitioning strategies. Lastly, results of tests for the inclusion of unexpected divisions in empirical data mirrored the simulation results, although the outcome of such tests is highly dependent on accounting for rate variation among classes. We conclude by discussing other approaches for partitioning data, as well as other applications of Bayes factors. [Bayes factors; Bayesian phylogenetic inference; data partitioning; model choice; posterior probabilities.] Maximum likelihood (ML) and Bayesian methods of
Comparing Bootstrap and Posterior Probability Values in the FourTaxon Case
, 2003
"... Assessment of the reliability of a given phylogenetic hypothesis is an important step in phylogenetic analysis. Historically, the nonparametric bootstrap procedure has been the most frequently used method for assessing the support for specific phylogenetic relationships. The recent employment of Bay ..."
Abstract

Cited by 61 (4 self)
 Add to MetaCart
Assessment of the reliability of a given phylogenetic hypothesis is an important step in phylogenetic analysis. Historically, the nonparametric bootstrap procedure has been the most frequently used method for assessing the support for specific phylogenetic relationships. The recent employment of Bayesian methods for phylogenetic inference problems has resulted in clade support being expressed in terms of posterior probabilities. We used simulated data and the fourtaxon case to explore the relationship between nonparametric bootstrap values (as inferred by maximum likelihood) and posterior probabilities (as inferred by Bayesian analysis). The results suggest a complex association between the two measures. Three general regions of tree space can be identified: (1) the neutral zone, where differences between mean bootstrap and mean posterior probability values are not significant, (2) near the twobranch corner, and (3) deep in the twobranch corner. In the last two regions, significant differences occur between mean bootstrap and mean posterior probability values. Whether bootstrap or posterior probability values are higher depends on the data in support of alternative topologies. Examination of star topologies revealed that both bootstrap and posterior probability values differ significantly from theoretical expectations;