Results 11  20
of
557
A phylogenetic mixture model for detecting patternheterogeneity in gene sequence or characterstate data. Syst. Biol
, 2004
"... Abstract.—We describe a general likelihoodbased ‘mixture model ’ for inferring phylogenetic trees from genesequence or other characterstate data. The model accommodates cases in which different sites in the alignment evolve in qualitatively distinct ways, but does not require prior knowledge of t ..."
Abstract

Cited by 136 (3 self)
 Add to MetaCart
(Show Context)
Abstract.—We describe a general likelihoodbased ‘mixture model ’ for inferring phylogenetic trees from genesequence or other characterstate data. The model accommodates cases in which different sites in the alignment evolve in qualitatively distinct ways, but does not require prior knowledge of these patterns or partitioning of the data. We call this qualitative variability in the pattern of evolution across sites “patternheterogeneity ” to distinguish it from both a homogenous process of evolution and from one characterized principally by differences in rates of evolution. We present studies to show that the model correctly retrieves the signals of patternheterogeneity from simulated genesequence data, and we apply the method to proteincoding genes and to a ribosomal 12S data set. The mixture model outperforms conventional partitioning in both these data sets. We implement the mixture model such that it can simultaneously detect rate and patternheterogeneity. The model simplifies to a homogeneous model or a ratevariability model as special cases, and therefore always performs at least as well as these two approaches, and often considerably improves upon them. We make the model available within a Bayesian Markovchain Monte Carlo framework for phylogenetic inference, as an easytouse computer program. [Bayesian inference; MCMC; mixture model; phylogeny; rateheterogeneity; secondary structure; sequence evolution] The conventional likelihoodbased approach to inferring phylogenetic trees from aligned genesequence or other data is to apply a single substitutional model to
Combining phylogenetic and hidden Markov models in biosequence analysis
 J. Comput. Biol
, 2004
"... A few models have appeared in recent years that consider not only the way substitutions occur through evolutionary history at each site of a genome, but also the way the process changes from one site to the next. These models combine phylogenetic models of molecular evolution, which apply to individ ..."
Abstract

Cited by 135 (13 self)
 Add to MetaCart
(Show Context)
A few models have appeared in recent years that consider not only the way substitutions occur through evolutionary history at each site of a genome, but also the way the process changes from one site to the next. These models combine phylogenetic models of molecular evolution, which apply to individual sites, and hidden Markov models, which allow for changes from site to site. Besides improving the realism of ordinary phylogenetic models, they are potentially very powerful tools for inference and prediction—for gene finding, for example, or prediction of secondary structure. In this paper, we review progress on combined phylogenetic and hidden Markov models and present some extensions to previous work. Our main result is a simple and efficient method for accommodating higherorder states in the HMM, which allows for contextsensitive models of substitution— that is, models that consider the effects of neighboring bases on the pattern of substitution. We present experimental results indicating that higherorder states, autocorrelated rates, and multiple functional categories all lead to significant improvements in the fit of a combined phylogenetic and hidden Markov model, with the effect of higherorder states being particularly pronounced.
Selecting the bestfit model of nucleotide substitution
 Syst
, 2001
"... Abstract.—Despite the relevant role of models of nucleotide substitution in phylogenetics, choosing among different models remains a problem. Several statistical methods for selecting the model that best ts the data at hand have been proposed, but their absolute and relative performance has not yet ..."
Abstract

Cited by 135 (2 self)
 Add to MetaCart
Abstract.—Despite the relevant role of models of nucleotide substitution in phylogenetics, choosing among different models remains a problem. Several statistical methods for selecting the model that best ts the data at hand have been proposed, but their absolute and relative performance has not yet been characterized. In this study, we compare under various conditions the performance of different hierarchical and dynamic likelihood ratio tests, and of Akaike and Bayesian information methods, for selecting bestt models of nucleotide substitution. We specically examine the role of the topology used to estimate the likelihood of the different models and the importance of the order in which hypotheses are tested. We do this by simulating DNA sequences under a known model of nucleotide substitution andrecording howoften this truemodel is recovered by thedifferentmethods.Our results suggest thatmodel selection is reasonablyaccurateandindicate that some likelihood ratio testmethods perform overall better than the Akaike or Bayesian information criteria. The tree used to estimate the likelihood scores does not inuence model selection unless it is a randomly chosen tree. The order in which hypotheses are tested, and the complexity of the initial model in the sequence of tests, inuence model selection in some cases. Model tting in phylogenetics has been suggested for many years, yet many authors still arbitrarily choose their models, often using the default models implemented
Relaxed Phylogenetics and Dating with Confidence
"... In phylogenetics, the unrooted model of phylogeny and the strict molecular clock model are two extremes of a continuum. Despite their dominance in phylogenetic inference, it is evident that both are biologically unrealistic and that the real evolutionary process lies between these two extremes. Fort ..."
Abstract

Cited by 125 (4 self)
 Add to MetaCart
(Show Context)
In phylogenetics, the unrooted model of phylogeny and the strict molecular clock model are two extremes of a continuum. Despite their dominance in phylogenetic inference, it is evident that both are biologically unrealistic and that the real evolutionary process lies between these two extremes. Fortunately, intermediate models employing relaxed molecular clocks have been described. These models open the gate to a new field of ‘‘relaxed phylogenetics.’’ Here we introduce a new approach to performing relaxed phylogenetic analysis. We describe how it can be used to estimate phylogenies and divergence times in the face of uncertainty in evolutionary rates and calibration times. Our approach also provides a means for measuring the clocklikeness of datasets and comparing this measure between different genes and phylogenies. We find no significant rate autocorrelation among branches in three large datasets, suggesting that autocorrelated models are not necessarily suitable for these data. In addition, we place these datasets on the continuum of clocklikeness between a strict molecular clock and the alternative unrooted extreme. Finally, we
H: Computing Bayes factors using thermodynamic integration
 Syst Biol
"... Abstract.—In the Bayesian paradigm, a common method for comparing two models is to compute the Bayes factor, defined as the ratio of their respective marginal likelihoods. In recent phylogenetic works, the numerical evaluation of marginal likelihoods has often been performed using the harmonic mean ..."
Abstract

Cited by 112 (7 self)
 Add to MetaCart
Abstract.—In the Bayesian paradigm, a common method for comparing two models is to compute the Bayes factor, defined as the ratio of their respective marginal likelihoods. In recent phylogenetic works, the numerical evaluation of marginal likelihoods has often been performed using the harmonic mean estimation procedure. In the present article, we propose to employ another method, based on an analogy with statistical physics, called thermodynamic integration. We describe the method, propose an implementation, and show on two analytical examples that this numerical method yields reliable estimates. In contrast, the harmonic mean estimator leads to a strong overestimation of the marginal likelihood, which is all the more pronounced as the model is higher dimensional. As a result, the harmonic mean estimator systematically favors more parameterrich models, an artefact that might explain some recent puzzling observations, based on harmonic mean estimates, suggesting that Bayes factors tend to overscore complex models. Finally, we apply our method to the comparison of several alternative models of aminoacid replacement. We confirm our previous observations, indicating that modeling pattern heterogeneity across sites tends to yield better models than standard empirical matrices. [Bayes factor; harmonic mean; mixture model; path sampling; phylogeny; thermodynamic integration.] Bayesian methods have become popular in molecular phylogenetics over the recent years. The simple and intuitive interpretation of the concept of probabilities
Partitioned Bayesian analyses, partition choice, and the phylogenetic relationships of scincid lizards
 Syst
, 2005
"... Abstract.Partitioned Bayesian analyses of ∼2.2 kb of nucleotide sequence data (mtDNA) were used to elucidate phylogenetic relationships among 30 scincid lizard genera. Few partitioned Bayesian analyses exist in the literature, resulting in a lack of methods to determine the appropriate number of a ..."
Abstract

Cited by 112 (7 self)
 Add to MetaCart
(Show Context)
Abstract.Partitioned Bayesian analyses of ∼2.2 kb of nucleotide sequence data (mtDNA) were used to elucidate phylogenetic relationships among 30 scincid lizard genera. Few partitioned Bayesian analyses exist in the literature, resulting in a lack of methods to determine the appropriate number of and identity of partitions. Thus, a criterion, based on the Bayes factor, for selecting among competing partitioning strategies is proposed and tested. Improvements in both mean −lnL and estimated posterior probabilities were observed when specific models and parameter estimates were assumed for partitions of the total data set. This result is expected given that the 95% credible intervals of model parameter estimates for numerous partitions do not overlap and it reveals that different data partitions may evolve quite differently. We further demonstrate that how one partitions the data (by gene, codon position, etc.) is shown to be a greater concern than simply the overall number of partitions. Using the criterion of the 2ln Bayes factor >10, the phylogenetic analysis employing the largest number of partitions was decisively better than all other strategies. Strategies that partitioned the ND1 gene by codon position performed better than other partition strategies, regardless of the overall number of partitions. Scincidae, Acontinae, Lygosominae, east Asian and North American "Eumeces" + Neoseps; North African Eumeces, Scincus, and Scincopus, and a large group primarily from subSaharan Africa, Madagascar, and neighboring islands are monophyletic. Feylinia, a limbless group of previously uncertain relationships, is nested within a "scincine" clade from subSaharan Africa. We reject the hypothesis that the nearly limbless dibamids are derived from within the Scincidae, but cannot reject the hypothesis that they represent the sister taxon to skinks. Amphiglossus, Chalcides, the acontines Acontias and Typhlosaurus, and Scincinae are paraphyletic. The globally widespread "Eumeces" is polyphyletic and we make necessary taxonomic changes.
FastTree 2  Approximately MaximumLikelihood Trees for Large Alignments
, 2010
"... Background: We recently described FastTree, a tool for inferring phylogenies for alignments with up to hundreds of thousands of sequences. Here, we describe improvements to FastTree that improve its accuracy without sacrificing scalability. Methodology/Principal Findings: Where FastTree 1 used neare ..."
Abstract

Cited by 103 (0 self)
 Add to MetaCart
(Show Context)
Background: We recently described FastTree, a tool for inferring phylogenies for alignments with up to hundreds of thousands of sequences. Here, we describe improvements to FastTree that improve its accuracy without sacrificing scalability. Methodology/Principal Findings: Where FastTree 1 used nearestneighbor interchanges (NNIs) and the minimumevolution criterion to improve the tree, FastTree 2 adds minimumevolution subtreepruningregrafting (SPRs) and maximumlikelihood NNIs. FastTree 2 uses heuristics to restrict the search for better trees and estimates a rate of evolution for each site (the ‘‘CAT’ ’ approximation). Nevertheless, for both simulated and genuine alignments, FastTree 2 is slightly more accurate than a standard implementation of maximumlikelihood NNIs (PhyML 3 with default settings). Although FastTree 2 is not quite as accurate as methods that use maximumlikelihood SPRs, most of the splits that disagree are poorly supported, and for large alignments, FastTree 2 is 100–1,000 times faster. FastTree 2 inferred a topology and likelihoodbased local support values for 237,882 distinct 16S ribosomal RNAs on a desktop computer in 22 hours and 5.8 gigabytes of memory. Conclusions/Significance: FastTree 2 allows the inference of maximumlikelihood phylogenies for huge alignments.
Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models
 SYST. BIOL
, 2004
"... What does die posterior probability of a phylogenetic tree mean? This simulation study shows that Bayesian posterior probabilities have the meaning that is typically ascribed to them; the pt>sterkir probability ot'a tree is the probability that the tree is corwct, assuming th>.it the mo ..."
Abstract

Cited by 101 (7 self)
 Add to MetaCart
What does die posterior probability of a phylogenetic tree mean? This simulation study shows that Bayesian posterior probabilities have the meaning that is typically ascribed to them; the pt>sterkir probability ot'a tree is the probability that the tree is corwct, assuming th>.it the model is correct. At the same time, the BayLsian method can be sensitive to model misspecification, and the sensitivity of the Bayesian method appears to be greater than the sensitivity ot " the nonparametric bootstrap method (using maximum likelihood to estimate trees). Although the estimatLs of phylogeny obtained by use of the method of maximum likelihood or the Bayesian method are Ukely to be similar, the assessment of the uncertainty of inferred trees via either bootstriipping (t"or maximum likelihood estimates) or petsterior probabilities (for Bayesian estimates) is not likely to be the same. We suggest that the Bayesian method be implemented with the most complex models of those currently avaiiable, as tliis should reduce the chance that the metliod will concentrate too much probability on tuo few trees. [Bayesian estimation; Markov ch^iin Monte Carlo; posterior probability; prior probability.] Quantify ing the uncertainty of a phylogcneticesti mil te is at least as important a goal as obtaining the phylogenetic estimate itself. Measures of phylogenetic reliability not only point out what parts of a tree can be trusted when interpreting the evolution of a group, but can guide