Results 1  10
of
70
H: Computing Bayes factors using thermodynamic integration
 Syst Biol
"... Abstract.—In the Bayesian paradigm, a common method for comparing two models is to compute the Bayes factor, defined as the ratio of their respective marginal likelihoods. In recent phylogenetic works, the numerical evaluation of marginal likelihoods has often been performed using the harmonic mean ..."
Abstract

Cited by 111 (7 self)
 Add to MetaCart
(Show Context)
Abstract.—In the Bayesian paradigm, a common method for comparing two models is to compute the Bayes factor, defined as the ratio of their respective marginal likelihoods. In recent phylogenetic works, the numerical evaluation of marginal likelihoods has often been performed using the harmonic mean estimation procedure. In the present article, we propose to employ another method, based on an analogy with statistical physics, called thermodynamic integration. We describe the method, propose an implementation, and show on two analytical examples that this numerical method yields reliable estimates. In contrast, the harmonic mean estimator leads to a strong overestimation of the marginal likelihood, which is all the more pronounced as the model is higher dimensional. As a result, the harmonic mean estimator systematically favors more parameterrich models, an artefact that might explain some recent puzzling observations, based on harmonic mean estimates, suggesting that Bayes factors tend to overscore complex models. Finally, we apply our method to the comparison of several alternative models of aminoacid replacement. We confirm our previous observations, indicating that modeling pattern heterogeneity across sites tends to yield better models than standard empirical matrices. [Bayes factor; harmonic mean; mixture model; path sampling; phylogeny; thermodynamic integration.] Bayesian methods have become popular in molecular phylogenetics over the recent years. The simple and intuitive interpretation of the concept of probabilities
Phylogenetic inference via sequential Monte Carlo. Systematic Biology
, 2012
"... Abstract.—Bayesian inference provides an appealing general framework for phylogenetic analysis, able to incorporate a wide variety of modeling assumptions and to provide a coherent treatment of uncertainty. Existing computational approaches to Bayesian inference based on Markov Chain Monte Carlo (MC ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
Abstract.—Bayesian inference provides an appealing general framework for phylogenetic analysis, able to incorporate a wide variety of modeling assumptions and to provide a coherent treatment of uncertainty. Existing computational approaches to Bayesian inference based on Markov Chain Monte Carlo (MCMC) have not, however, kept pace with the scale of the data analysis problems in phylogenetics, and this has hindered the adoption of Bayesian methods. In this paper we present an alternative to MCMC based on Sequential Monte Carlo (SMC). We develop an extension of classical SMC based on partially ordered sets, and show how to apply this framework—which we refer to as PosetSMC—to phylogenetic analysis. We provide a theoretical treatment of PosetSMC and also present experimental evaluation of PosetSMC on both synthetic and real data. The empirical results demonstrate that PosetSMC is a very promising alternative to MCMC, providing up to two orders of magnitude faster convergence. We discuss other factors favorable to the adoption of PosetSMC in phylogenetics, including its ability to estimate marginal likelihoods, its ready implementability on parallel and distributed computing platforms, and the possibility of combining with MCMC in hybrid MCMCSMC schemes. Software for PosetSMC is available at
Searching for convergence in phylogenetic Markov chain Monte Carlo, Syst
 Biol
"... Abstract. — Markov chain Monte Carlo (MCMC) is a methodology that is gaining widespread use in the phylogenetics community and is central to phylogenetic software packages such as MrBayes. An important issue for users of MCMC methods is how to select appropriate values for adjustable parameters such ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
Abstract. — Markov chain Monte Carlo (MCMC) is a methodology that is gaining widespread use in the phylogenetics community and is central to phylogenetic software packages such as MrBayes. An important issue for users of MCMC methods is how to select appropriate values for adjustable parameters such as the length of the Markov chain or chains, the sampling density, the proposal mechanism, and, if Metropoliscoupled MCMC is being used, the number of heated chains and their temperatures. Although some parameter settings have been examined in detail in the literature, others are frequently chosen with more regard to computational time or personal experience with other data sets. Such choices may lead to inadequate sampling of tree space or an inefficient use of computational resources. We performed a detailed study of convergence and mixing for 70 randomly selected, putatively orthologous protein sets with different sizes and taxonomic compositions. Replicated runs from multiple random starting points permit a more rigorous assessment of convergence, and we developed two novel statistics, 8 and e, for this purpose. Although likelihood values invariably stabilized quickly, adequate sampling of the posterior distribution of tree topologies took considerably longer. Our results suggest that multimodality is common for data sets with 30 or more taxa and that this results in slow convergence and mixing. However, we also found that the pragmatic approach of combining data from several short, replicated runs into a "metachain " to estimate bipartition posterior probabilities provided good approximations, and that such estimates were no worse in approximating a reference posterior distribution than those obtained using a single long run of the same length as
Unifying vertical and nonvertical evolution: a stochasticARGbased framework
 Syst
, 2010
"... Abstract.—Evolutionary biologists have introduced numerous statistical approaches to explore nonvertical evolution, such as horizontal gene transfer, recombination, and genomic reassortment, through collections of Markovdependent gene trees. These tree collections allow for inference of nonvertical ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
Abstract.—Evolutionary biologists have introduced numerous statistical approaches to explore nonvertical evolution, such as horizontal gene transfer, recombination, and genomic reassortment, through collections of Markovdependent gene trees. These tree collections allow for inference of nonvertical evolution, but only indirectly, making findings difficult to interpret and models difficult to generalize. An alternative approach to explore nonvertical evolution relies on phylogenetic networks. These networks provide a framework to model nonvertical evolution but leave unanswered questions such as the statistical significance of specific nonvertical events. In this paper, we begin to correct the shortcomings of both approaches by introducing the “stochastic model for reassortment and transfer events ” (SMARTIE) drawing upon ancestral recombination graphs (ARGs). ARGs are directed graphs that allow for formal probabilistic inference on vertical speciation events and nonvertical evolutionary events. We apply SMARTIE to phylogenetic data. Because of this, we can typically infer a single most probable ARG, avoiding coarse population dynamic summary statistics. In addition, a focus on phylogenetic data suggests novel probability distributions on ARGs. To make inference with our model, we develop a reversible jump Markov chain Monte Carlo sampler to approximate the posterior distribution of SMARTIE. Using the BEAST phylogenetic
Large Maximum Likelihood Trees
 In Proc. of the NIC Symposium 2006
, 2006
"... Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise requires pri ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise requires prior specific permission by the publisher mentioned above.
Importance tempering
, 2007
"... Simulated tempering (ST) is an established Markov Chain Monte Carlo (MCMC) methodology for sampling from a multimodal density π(θ). The technique involves introducing an auxiliary variable k taking values in a finite subset of [0,1] and indexing a set of tempered distributions, say πk(θ) ∝ π(θ) k. ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Simulated tempering (ST) is an established Markov Chain Monte Carlo (MCMC) methodology for sampling from a multimodal density π(θ). The technique involves introducing an auxiliary variable k taking values in a finite subset of [0,1] and indexing a set of tempered distributions, say πk(θ) ∝ π(θ) k. Small values of k encourage better mixing, but samples from π are only obtained when the joint chain for (θ,k) reaches k = 1. However, the entire chain can be used to estimate expectations under π of functions of interest, provided that importance sampling (IS) weights are calculated. Unfortunately this method, which we call importance tempering (IT), has tended not work well in practice. This is partly because the most immediately obvious implementation is naïve and can lead to high variance estimators. We derive a new optimal method for combining multiple IS estimators and prove that this optimal combination has a highly desirable property related to the notion of effective sample size. The methodology is applied in two modelling scenarios requiring reversiblejump MCMC, where the naïve approach to IT fails spectacularly: model averaging in treed models, and model selection for mark– recapture data. 1 Key words: simulated tempering, importance sampling, Markov chain Monte
Building the Tree of Life on Terascale Systems
 In Proc. of the 21st International Parallel and Distributed Processing Symposium
, 2007
"... Bayesian phylogenetic inference is an important alternative to maximum likelihoodbased phylogenetic method. However, inferring large trees using the Bayesian approach is computationally demanding—requiring huge amounts of memory and months of computational time. With a combination of novel parallel ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Bayesian phylogenetic inference is an important alternative to maximum likelihoodbased phylogenetic method. However, inferring large trees using the Bayesian approach is computationally demanding—requiring huge amounts of memory and months of computational time. With a combination of novel parallel algorithms and latest system technology, terascale phylogenetic tools will provide biologists the computational power necessary to conduct experiments on very large dataset, and thus aid construction of the tree of life. In this work we evaluate the performance of PBPI, a parallel application that reconstructs phylogenetic trees using MCMCbased Bayesian methods, on two terascale systems, Blue Gene/L at IBM Rochester and System X at Virginia Tech. Our results confirm that for a benchmark dataset with 218 taxa and 10000 characters, PBPI can achieve linear speedup on 1024 or more processors for both systems. 1.
Bayesian Phylogeny Analysis via Stochastic Approximation Monte
, 2008
"... Monte Carlo methods have received much attention in the recent literature of phylogeny analysis. However, the conventional Markov chain Monte Carlo algorithms, such as the MetropolisHastings algorithm, tend to get trapped in a local energy minimum in simulating from the posterior distribution of ph ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Monte Carlo methods have received much attention in the recent literature of phylogeny analysis. However, the conventional Markov chain Monte Carlo algorithms, such as the MetropolisHastings algorithm, tend to get trapped in a local energy minimum in simulating from the posterior distribution of phylogenetic trees, rendering the inference ineffective. In this paper, we apply an advanced Monte Carlo algorithm, the stochastic approximation Monte Carlo algorithm, to Bayesian phylogeny analysis. Our method is compared with two popular Bayesian phylogeny software, BAMBE and MrBayes, on simulated and real datasets. The numerical results favor to our method, which tends to produce better consensus trees and more accurate estimates for the parameters of the sequence evolutionary model, but uses less CPU time, than do the methods under comparison.
Molecular Phylogenetics among Three Families of Bats (Chiroptera: Rhinolophidae, Hipposideridae, and Vespertilionidae) Based on Partial Sequences of the Mitochondrial 12S and 16S rRNA Genes
, 2007
"... species, making up more than 20 % of extant ..."
Gogarten J: Evolution of acetoclastic methanogenesis in Methanosarcina via horizontal gene transfer from cellulolytic Clostridia
 Journal of Bacteriology
"... This article cites 18 articles, 12 of which can be accessed free ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
This article cites 18 articles, 12 of which can be accessed free