Results 1  10
of
110
Species trees from gene trees: Reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions
 SYSTEMATIC BIOLOGY
, 2007
"... The estimation of species trees has become popular as a considerable amount of multilocus molecular data is available for inferring the evolutionary history of species. However, the current phylogenetic paradigm, that reconstructs gene trees to represent the species tree suggests that commonly used ..."
Abstract

Cited by 107 (11 self)
 Add to MetaCart
(Show Context)
The estimation of species trees has become popular as a considerable amount of multilocus molecular data is available for inferring the evolutionary history of species. However, the current phylogenetic paradigm, that reconstructs gene trees to represent the species tree suggests that commonly used methods such as the concatenation method, the consensus tree method, or the gene tree parsimony method may be either inconsistent or highly biased. In this paper, we propose a Bayesian hierarchical model to estimate the phylogeny of a group of species using multiple estimated gene tree distributions such as those that arise in a Bayesian analysis of DNA sequence data. Our model employs substitution models used in traditional phylogenetics, but also uses coalescent theory to explain genealogical signals from species trees to gene trees and from gene trees to sequence data, thereby forming a stochastic model to estimate gene trees, species trees, ancestral population sizes and species divergence times simultaneously. Our model is founded on the assumption that gene trees, even of unlinked loci, are correlated due to being derived from a single species tree and therefore should be estimated jointly. We apply the method to two multilocus DNA sequences datasets. The estimates of the
The importance of data partitioning and the utility of Bayes factors in Bayesian phylogenetics. Syst. Biol
, 2007
"... Abstract.—As larger, more complex data sets are being used to infer phylogenies, accuracy of these phylogenies increasingly requires models of evolution that accommodate heterogeneity in the processes of molecular evolution. We investigated the effect of improper data partitioning on phylogenetic ac ..."
Abstract

Cited by 67 (6 self)
 Add to MetaCart
Abstract.—As larger, more complex data sets are being used to infer phylogenies, accuracy of these phylogenies increasingly requires models of evolution that accommodate heterogeneity in the processes of molecular evolution. We investigated the effect of improper data partitioning on phylogenetic accuracy, as well as the type I error rate and sensitivity of Bayes factors, a commonly used method for choosing among different partitioning strategies in Bayesian analyses. We also used Bayes factors to test empirical data for the need to divide data in a manner that has no expected biological meaning. Posterior probability estimates are misleading when an incorrect partitioning strategy is assumed. The error was greatest when the assumed model was underpartitioned. These results suggest that model partitioning is important for large data sets. Bayes factors performed well, giving a 5 % type I error rate, which is remarkably consistent with standard frequentist hypothesis tests. The sensitivity of Bayes factors was found to be quite high when the acrossclass model heterogeneity reflected that of empirical data. These results suggest that Bayes factors represent a robust method of choosing among partitioning strategies. Lastly, results of tests for the inclusion of unexpected divisions in empirical data mirrored the simulation results, although the outcome of such tests is highly dependent on accounting for rate variation among classes. We conclude by discussing other approaches for partitioning data, as well as other applications of Bayes factors. [Bayes factors; Bayesian phylogenetic inference; data partitioning; model choice; posterior probabilities.] Maximum likelihood (ML) and Bayesian methods of
Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Syst Biol
, 2011
"... Abstract.—The marginal likelihood is commonly used for comparing different evolutionary models in Bayesian phylogenetics and is the central quantity used in computing Bayes Factors for comparing model fit. A popular method for estimating marginal likelihoods, the harmonic mean (HM) method, can be e ..."
Abstract

Cited by 38 (1 self)
 Add to MetaCart
(Show Context)
Abstract.—The marginal likelihood is commonly used for comparing different evolutionary models in Bayesian phylogenetics and is the central quantity used in computing Bayes Factors for comparing model fit. A popular method for estimating marginal likelihoods, the harmonic mean (HM) method, can be easily computed from the output of a Markov chain Monte Carlo analysis but often greatly overestimates the marginal likelihood. The thermodynamic integration (TI) method is much more accurate than the HM method but requires more computation. In this paper, we introduce a new method, steppingstone sampling (SS), which uses importance sampling to estimate each ratio in a series (the “stepping stones”) bridging the posterior and prior distributions. We compare the performance of the SS approach to the TI and HM methods in simulation and using real data. We conclude that the greatly increased accuracy of the SS and TI methods argues for their use instead of the HM method, despite the extra computation needed. [Bayes factor; harmonic mean; phylogenetics, marginal likelihood;
Lewis P: Choosing among partition models in Bayesian phylogenetics
 Mol Biol Evol
"... Bayesian phylogenetic analyses often depend on Bayes factors (BFs) to determine the optimal way to partition the data. The marginal likelihoods used to compute BFs, in turn, are most commonly estimated using the harmonic mean (HM) method, which has been shown to be inaccurate. We describe a new more ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
(Show Context)
Bayesian phylogenetic analyses often depend on Bayes factors (BFs) to determine the optimal way to partition the data. The marginal likelihoods used to compute BFs, in turn, are most commonly estimated using the harmonic mean (HM) method, which has been shown to be inaccurate. We describe a new more accurate method for estimating the marginal likelihood of a model and compare it with the HM method on both simulated and empirical data. The new method generalizes our previously described steppingstone (SS) approach by making use of a reference distribution parameterized using samples from the posterior distribution. This avoids one challenging aspect of the original SS method, namely the need to sample from distributions that are close (in the Kullback–Leibler sense) to the prior. We specifically address the choice of partition models and find that using the HM method can lead to a strong preference for an overpartitioned model. In contrast to the HM method and the original SS method, we show using simulated data that the generalized SS method is strikingly more precise (repeatable BF values of the same data and partition model) and yields BF values that are much more reasonable than those produced by the HM method. Comparisons of HM and generalized SS methods on an empirical data set demonstrate that the generalized SS method tends to choose simpler partition schemes that are more in line with expectation based on inferred patterns of molecular evolution. The generalized SS method shares with thermodynamic integration the need to sample from a series of distributions in addition to the posterior. Such dedicated pathbased Markov chain Monte Carlo analyses appear to be a cost of estimating marginal likelihoods accurately.
Inferring speciation and extinction rates under different sampling schemes
, 2011
"... The birth–death process is widely used in phylogenetics to model speciation and extinction.Recent studies have shown that the inferred rates are sensitive to assumptions about the sampling probability of lineages. Here, we examine the effect of the method used to sample lineages. Whereas previous st ..."
Abstract

Cited by 18 (8 self)
 Add to MetaCart
(Show Context)
The birth–death process is widely used in phylogenetics to model speciation and extinction.Recent studies have shown that the inferred rates are sensitive to assumptions about the sampling probability of lineages. Here, we examine the effect of the method used to sample lineages. Whereas previous studies have assumed random sampling (RS), we consider two extreme cases of biased sampling: “diversified sampling ” (DS), where tips are selected tomaximize diversity and “cluster sampling (CS),” where sample diversity is minimized. DS appears to be standard practice, for example, in analyses of higher taxa, whereas CS may occur under special circumstances, for example, in studies of geographically defined floras or faunas. Using both simulations and analyses of empirical data, we show that inferred ratesmay be heavily biased if the sampling strategy is notmodeled correctly. In particular, when a diversified sample is treated as if it were a random or complete sample, the extinction rate is severely underestimated, often close to 0. Such dramatic errors may lead to serious consequences, for example, if estimated rates are used in assessing the vulnerability of threatened species to extinction. Using Bayesian model testing across 18 empirical data sets, we show that DS is commonly a better fit to the data than complete, random, or cluster sampling (CS). Inappropriate modeling of the sampling method may at least partly explain anomalous results that have previously been attributed to variation over time in birth and death rates.
Testing hypotheses about the sister group of the Passeriformes using an independent 30locus data set
 Mol. Biol. Evol
, 2012
"... Although many phylogenetic studies have focused on developing hypotheses about relationships, advances in data collection and computation have increased the feasibility of collecting large independent data sets to rigorously test controversial hypotheses or carefully assess artifacts that may be mis ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
Although many phylogenetic studies have focused on developing hypotheses about relationships, advances in data collection and computation have increased the feasibility of collecting large independent data sets to rigorously test controversial hypotheses or carefully assess artifacts that may be misleading. One such relationship in need of independent evaluation is the position of Passeriformes (perching birds) in avian phylogeny. This order comprises more than half of all extant birds, and it includes one of the most important avian model systems (the zebra finch). Recent largescale studies using morphology, mitochondrial, and nuclear sequence data have generated very different hypotheses about the sister group of Passeriformes, and all conflict with an older hypothesis generated using DNA–DNA hybridization. We used novel data from 30 nuclear loci, primarily introns, for 28 taxa to evaluate five major a priori hypotheses regarding the phylogenetic position of Passeriformes. Although previous studies have suggested that nuclear introns are ideal for the resolution of ancient avian relationships, introns have also been criticized because of the potential for alignment ambiguities and the loss of signal due to saturation. To examine these issues, we generated multiple alignments using several alignment programs, varying alignment parameters, and using guide trees that reflected the different a priori hypotheses. Although different alignments and analyses yielded slightly different results, our analyses excluded all but one of the five a priori hypotheses. In many cases, the passerines were sister to the Psittaciformes (parrots), and taxa were members of a larger clade that includes Falconidae (falcons) and Cariamidae
Discovery of nuclearencoded genes for the neurotoxin saxitoxin in dinoflagellates. PLoS One
, 2011
"... Saxitoxin is a potent neurotoxin that occurs in aquatic environments worldwide. Ingestion of vector species can lead to paralytic shellfish poisoning, a severe human illness that may lead to paralysis and death. In freshwaters, the toxin is produced by prokaryotic cyanobacteria; in marine waters, it ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
Saxitoxin is a potent neurotoxin that occurs in aquatic environments worldwide. Ingestion of vector species can lead to paralytic shellfish poisoning, a severe human illness that may lead to paralysis and death. In freshwaters, the toxin is produced by prokaryotic cyanobacteria; in marine waters, it is associated with eukaryotic dinoflagellates. However, several studies suggest that saxitoxin is not produced by dinoflagellates themselves, but by cocultured bacteria. Here, we show that genes required for saxitoxin synthesis are encoded in the nuclear genomes of dinoflagellates. We sequenced.1.26106 mRNA transcripts from the two saxitoxinproducing dinoflagellate strains Alexandrium fundyense CCMP1719 and A. minutum CCMP113 using highthroughput sequencing technology. In addition, we used in silico transcriptome analyses, RACE, qPCR and conventional PCR coupled with Sanger sequencing. These approaches successfully identified genes required for saxitoxinsynthesis in the two transcriptomes. We focused on sxtA, the unique starting gene of saxitoxin synthesis, and show that the dinoflagellate transcripts of sxtA have the same domain structure as the cyanobacterial sxtA genes. But, in contrast to the bacterial homologs, the dinoflagellate transcripts are monocistronic, have a higher GC content, occur in multiple copies, contain typical dinoflagellate splicedleader sequences and eukaryotic polyAtails. Further, we investigated 28 saxitoxinproducing and nonproducing dinoflagellate strains from six different genera for the presence of genomic sxtA homologs. Our results show very good agreement between the presence of sxtA and saxitoxinsynthesis, except in three
Phylogenetic inference via sequential Monte Carlo. Systematic Biology
, 2012
"... Abstract.—Bayesian inference provides an appealing general framework for phylogenetic analysis, able to incorporate a wide variety of modeling assumptions and to provide a coherent treatment of uncertainty. Existing computational approaches to Bayesian inference based on Markov Chain Monte Carlo (MC ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
Abstract.—Bayesian inference provides an appealing general framework for phylogenetic analysis, able to incorporate a wide variety of modeling assumptions and to provide a coherent treatment of uncertainty. Existing computational approaches to Bayesian inference based on Markov Chain Monte Carlo (MCMC) have not, however, kept pace with the scale of the data analysis problems in phylogenetics, and this has hindered the adoption of Bayesian methods. In this paper we present an alternative to MCMC based on Sequential Monte Carlo (SMC). We develop an extension of classical SMC based on partially ordered sets, and show how to apply this framework—which we refer to as PosetSMC—to phylogenetic analysis. We provide a theoretical treatment of PosetSMC and also present experimental evaluation of PosetSMC on both synthetic and real data. The empirical results demonstrate that PosetSMC is a very promising alternative to MCMC, providing up to two orders of magnitude faster convergence. We discuss other factors favorable to the adoption of PosetSMC in phylogenetics, including its ability to estimate marginal likelihoods, its ready implementability on parallel and distributed computing platforms, and the possibility of combining with MCMC in hybrid MCMCSMC schemes. Software for PosetSMC is available at