Results 1  10
of
50
Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models
 SYST. BIOL
, 2004
"... What does die posterior probability of a phylogenetic tree mean? This simulation study shows that Bayesian posterior probabilities have the meaning that is typically ascribed to them; the pt>sterkir probability ot'a tree is the probability that the tree is corwct, assuming th>.it the mo ..."
Abstract

Cited by 101 (7 self)
 Add to MetaCart
What does die posterior probability of a phylogenetic tree mean? This simulation study shows that Bayesian posterior probabilities have the meaning that is typically ascribed to them; the pt>sterkir probability ot'a tree is the probability that the tree is corwct, assuming th>.it the model is correct. At the same time, the BayLsian method can be sensitive to model misspecification, and the sensitivity of the Bayesian method appears to be greater than the sensitivity ot " the nonparametric bootstrap method (using maximum likelihood to estimate trees). Although the estimatLs of phylogeny obtained by use of the method of maximum likelihood or the Bayesian method are Ukely to be similar, the assessment of the uncertainty of inferred trees via either bootstriipping (t"or maximum likelihood estimates) or petsterior probabilities (for Bayesian estimates) is not likely to be the same. We suggest that the Bayesian method be implemented with the most complex models of those currently avaiiable, as tliis should reduce the chance that the metliod will concentrate too much probability on tuo few trees. [Bayesian estimation; Markov ch^iin Monte Carlo; posterior probability; prior probability.] Quantify ing the uncertainty of a phylogcneticesti mil te is at least as important a goal as obtaining the phylogenetic estimate itself. Measures of phylogenetic reliability not only point out what parts of a tree can be trusted when interpreting the evolution of a group, but can guide
The importance of data partitioning and the utility of Bayes factors in Bayesian phylogenetics. Syst. Biol
, 2007
"... Abstract.—As larger, more complex data sets are being used to infer phylogenies, accuracy of these phylogenies increasingly requires models of evolution that accommodate heterogeneity in the processes of molecular evolution. We investigated the effect of improper data partitioning on phylogenetic ac ..."
Abstract

Cited by 70 (6 self)
 Add to MetaCart
(Show Context)
Abstract.—As larger, more complex data sets are being used to infer phylogenies, accuracy of these phylogenies increasingly requires models of evolution that accommodate heterogeneity in the processes of molecular evolution. We investigated the effect of improper data partitioning on phylogenetic accuracy, as well as the type I error rate and sensitivity of Bayes factors, a commonly used method for choosing among different partitioning strategies in Bayesian analyses. We also used Bayes factors to test empirical data for the need to divide data in a manner that has no expected biological meaning. Posterior probability estimates are misleading when an incorrect partitioning strategy is assumed. The error was greatest when the assumed model was underpartitioned. These results suggest that model partitioning is important for large data sets. Bayes factors performed well, giving a 5 % type I error rate, which is remarkably consistent with standard frequentist hypothesis tests. The sensitivity of Bayes factors was found to be quite high when the acrossclass model heterogeneity reflected that of empirical data. These results suggest that Bayes factors represent a robust method of choosing among partitioning strategies. Lastly, results of tests for the inclusion of unexpected divisions in empirical data mirrored the simulation results, although the outcome of such tests is highly dependent on accounting for rate variation among classes. We conclude by discussing other approaches for partitioning data, as well as other applications of Bayes factors. [Bayes factors; Bayesian phylogenetic inference; data partitioning; model choice; posterior probabilities.] Maximum likelihood (ML) and Bayesian methods of
Phylogenomics and the reconstruction of the tree of life
 Nat Rev Genet
, 2005
"... As more complete genomes are sequenced, phylogenetic analysis is entering a new era — that of phylogenomics. One branch of this expanding field aims to reconstruct the evolutionary history of organisms based on the analysis of their genomes. Recent studies have demonstrated the power of this approac ..."
Abstract

Cited by 54 (2 self)
 Add to MetaCart
As more complete genomes are sequenced, phylogenetic analysis is entering a new era — that of phylogenomics. One branch of this expanding field aims to reconstruct the evolutionary history of organisms based on the analysis of their genomes. Recent studies have demonstrated the power of this approach, which has the potential to provide answers to a number of fundamental evolutionary questions. However, challenges for the future have also been revealed. The very nature of the evolutionary history of organisms and the limitations of current phylogenetic reconstruction methods mean that part of the tree of life halsde00193293, version 1 3 Dec 2007 may prove difficult, if not impossible, to resolve with confidence. Introductory paragraph Understanding phylogenetic relationships between organisms is a prerequisite of almost any evolutionary study, as contemporary species all share a common history through their ancestry. The notion of phylogeny follows directly from the theory of evolution presented by Charles Darwin in “The Origin of Species ” 1: the only illustration in his famous book is the first representation of evolutionary relationships among species, in the form of a
Accurate branch length estimation in partitioned Bayesian analyses requires accommodation of amongpartition rate variation and attention to branch length priors. Syst Biol
, 2006
"... Molecular phylogenetic studies are making increasing use of partitioned Bayesian analyses via software tools like MrBayes, version 3 (Ronquist and Huelsenbeck, 2003). Data partitioning is important because, as long as the same topology/history underlies all of the partitions, it addresses some of t ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
(Show Context)
Molecular phylogenetic studies are making increasing use of partitioned Bayesian analyses via software tools like MrBayes, version 3 (Ronquist and Huelsenbeck, 2003). Data partitioning is important because, as long as the same topology/history underlies all of the partitions, it addresses some of the problems associated with the combination of data sets with heterogeneous rates (Bull et al., 1993) and eliminates the need to argue the validity of tests that have been used to judge data combinability (e.g., Huelsenbeck et al., 1994; Huelsenbeck
Does choice in model selection affect maximum likelihood analysis
 Systematic Biology
"... Abstract.—In order to have confidence in modelbased phylogenetic analysis, the model of nucleotide substitution adopted must be selected in a statistically rigorous manner. Several modelselection methods are applicable to maximum likelihood (ML) analysis, including the hierarchical likelihoodrati ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
(Show Context)
Abstract.—In order to have confidence in modelbased phylogenetic analysis, the model of nucleotide substitution adopted must be selected in a statistically rigorous manner. Several modelselection methods are applicable to maximum likelihood (ML) analysis, including the hierarchical likelihoodratio test (hLRT), Akaike information criterion (AIC), Bayesian information criterion (BIC), and decision theory (DT), but their performance relative to empirical data has not been investigated thoroughly. In this study, we use 250 phylogenetic data sets obtained from TreeBASE to examine the effects that choice in model selection has on ML estimation of phylogeny, with an emphasis on optimal topology, bootstrap support, and hypothesis testing. We show that the use of different methods leads to the selection of two or more models for ∼80 % of the data sets and that the AIC typically selects more complex models than alternative approaches. Although ML estimation with different bestfit models results in incongruent tree topologies ∼50 % of the time, these differences are primarily attributable to alternative resolutions of poorly supported nodes. Furthermore, topologies and bootstrap values estimated with ML using alternative statistically supported models are more similar to each other than to topologies and bootstrap values estimated with ML under the Kimura twoparameter (K2P) model or maximum parsimony (MP). In addition, SwoffordOlsenWaddellHillis (SOWH) tests indicate that ML trees estimated with alternative bestfit models are usually not significantly different from each other when evaluated with the same model. However, ML trees estimated with statistically supported models are often significantly suboptimal to ML trees made with the K2P model when both are evaluated with
The Effect of Ambiguous Data on Phylogenetic Estimates Obtained by Maximum Likelihood and Bayesian Inference
, 2009
"... Abstract.—Although an increasing number of phylogenetic data sets are incomplete, the effect of ambiguous data on phylogenetic accuracy is not well understood. We use 4taxon simulations to study the effects of ambiguous data (i.e., missing characters or gaps) in maximum likelihood (ML) and Bayesian ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
(Show Context)
Abstract.—Although an increasing number of phylogenetic data sets are incomplete, the effect of ambiguous data on phylogenetic accuracy is not well understood. We use 4taxon simulations to study the effects of ambiguous data (i.e., missing characters or gaps) in maximum likelihood (ML) and Bayesian frameworks. By introducing ambiguous data in a way that removes confounding factors, we provide the first clear understanding of 1 mechanism by which ambiguous data can mislead phylogenetic analyses. We find that in both ML and Bayesian frameworks, amongsite rate variation can interact with ambiguous data to produce misleading estimates of topology and branch lengths. Furthermore, within a Bayesian framework, priors on branch lengths and rate heterogeneity parameters can exacerbate the effects of ambiguous data, resulting in strongly misleading bipartition posterior probabilities. The magnitude and direction of the ambiguous data bias are a function of the number and taxonomic distribution of ambiguous characters, the strength of topological support, and whether or not the model is correctly specified. The results of this study have major implications for all analyses that rely on accurate estimates of topology or branch lengths, including divergence time estimation, ancestral state reconstruction, treedependent comparative methods, rate variation analysis, phylogenetic hypothesis testing, and phylogeographic
Identifying and removing fastevolving sites using compatibility analysis: an example from the Arthropoda. Syst. Biol
"... Felsenstein (1978) first recognized that long branch attraction (LBA) can seriously affect the accuracy of phylogenetic reconstruction. Although Felsenstein specifically addressed this problem in the cases of parsimony and clique analyses, it is now known that LBA can affect any tree reconstructio ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
(Show Context)
Felsenstein (1978) first recognized that long branch attraction (LBA) can seriously affect the accuracy of phylogenetic reconstruction. Although Felsenstein specifically addressed this problem in the cases of parsimony and clique analyses, it is now known that LBA can affect any tree reconstruction method, including maximum likelihood (ML) and Bayesian approaches. However, LBA is only a problem for distance, ML, and Bayesian methods when the assumed substitution model is underparameterized, i.e., when it is unrealistically simple (Swofford et al., 2001; Lemmon and Moriatry, 2004). LBA should therefore be avoidable by analyzing the data using ML, Bayesian, or distance methods under the bestfitting substitution model (providing this is a good approximation of the true substitution model). However,
Bayesian mixed models and the phylogeny of pitvipers (Viperidae:
 Serpentes). Mol. Phylogenet. Evol.
, 2006
"... Abstract The subfamily Crotalinae (pitvipers) contains over 190 species of venomous snakes distributed in both the Old and New World. We incorporated an extensive sampling of taxa (including 28 of 29 genera), and sequences of four mitochondrial gene fragments (2.3 kb) per individual, to estimate th ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
(Show Context)
Abstract The subfamily Crotalinae (pitvipers) contains over 190 species of venomous snakes distributed in both the Old and New World. We incorporated an extensive sampling of taxa (including 28 of 29 genera), and sequences of four mitochondrial gene fragments (2.3 kb) per individual, to estimate the phylogeny of pitvipers based on maximum parsimony and Bayesian phylogenetic methods. Our Bayesian analyses incorporated complex mixed models of nucleotide evolution that allocated independent models to various partitions of the dataset within combined analyses. We compared results of unpartitioned versus partitioned Bayesian analyses to investigate how much unpartitioned (versus partitioned) models were forced to compromise estimates of model parameters, and whether complex models substantially alter phylogenetic conclusions to the extent that they appear to extract more phylogenetic signal than simple models. Our results indicate that complex models do extract more phylogenetic signal from the data. We also address how diVerences in phylogenetic results (e.g., bipartition posterior probabilities) obtained from simple versus complex models may be interpreted in terms of relative credibility. Our estimates of pitviper phylogeny suggest that nearly all recently proposed generic reallocations appear valid, although certain Old and New World genera (Ovophis, Trimeresurus, and Bothrops) remain polyor paraphyletic and require further taxonomic revision. While a majority of nodes were resolved, we could not conWdently estimate the basal relationships among New World genera and which lineage of Old World species is most closely related to this New World group.
Inferring complex DNA substitution processes on phylogenies using uniformization and data augmentation
 Syst. Biol
, 2006
"... Abstract.—A new method is developed for calculating sequence substitution probabilities using Markov chain Monte Carlo (MCMC) methods. The basic strategy is to use uniformization to transform the original continuous time Markov process into a Poisson substitution process and a discrete Markov chain ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
Abstract.—A new method is developed for calculating sequence substitution probabilities using Markov chain Monte Carlo (MCMC) methods. The basic strategy is to use uniformization to transform the original continuous time Markov process into a Poisson substitution process and a discrete Markov chain of state transitions. An efficient MCMC algorithm for evaluating substitution probabilities by this approach using a continuous gamma distribution to model sitespecific rates is outlined. The method is applied to the problem of inferring branch lengths and sitespecific rates from nucleotide sequences under a general timereversible (GTR) model and a computer program BYPASSR is developed. Simulations are used to examine the performance of the new program relative to an existing program BASEML that uses a discrete approximation for the gamma distributed prior on sitespecific rates. It is found that BASEML and BYPASSR are in close agreement when inferring branch lengths, regardless of the number of rate categories used, but that BASEML tends to underestimate high sitespecific substitution rates, and to overestimate intermediate rates, when fewer than 50 rate categories are used. Rate estimates obtained using BASEML agree more closely with those of BYPASSR as the number of rate categories increases. Analyses of the posterior distributions of sitespecific rates from BYPASSR suggest that a large number of taxa are needed to obtain precise estimates of sitespecific rates, especially when rates are very high or very low. The method is applied to analyze 45 sequences of the alpha 2B adrenergic receptor gene (A2AB) from a sample of eutherian taxa. In general, the pattern expected for regions under negative selection is observed with third codon positions having the highest inferred rates, followed by
Are unequal clade priors problematic for Bayesian phylogenetics? Syst. Biol
, 2006
"... Although Bayesian phylogenetic methodologies were first developed in the 1960s (Felsenstein, 1968,2004), the approach remained relatively obscure until the initial release of the software application MrBayes (Huelsenbeck and Ronquist, 2001). Since that time, the popularity of Bayesian phylogenetics ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Although Bayesian phylogenetic methodologies were first developed in the 1960s (Felsenstein, 1968,2004), the approach remained relatively obscure until the initial release of the software application MrBayes (Huelsenbeck and Ronquist, 2001). Since that time, the popularity of Bayesian phylogenetics has increased tremendously, and it now must be considered a primary method of analysis on par with maximum likelihood, parsimony, and distance methods. The popularity of Bayesian analysis can be attributed to computational efficiencies that allow for explicit modelbased analyses of large data sets in real time with simultaneous estimation of nodal support in the form of posterior probability values. Despite the initial enthusiasm generated by the availability of a fast likelihoodbased approach, Bayesian phylogenetic