Results 1  10
of
26
A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood
, 2003
"... The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The ..."
Abstract

Cited by 2182 (27 self)
 Add to MetaCart
The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The core of this method is a simple hillclimbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distancebased method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximumlikelihood programs and much higher than the performance of distancebased and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximumlikelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distancebased and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page:
AWTY (Are We There Yet?): a system for graphical exploration of MCMC convergence in Bayesian phylogenetics
, 2007
"... Summary: A key element to a successful Markov chain Monte Carlo (MCMC) inference is the programming and run performance of the Markov chain. However, the explicit use of quality assessments of the MCMC simulations—convergence diagnostics—in phylogenetics is still uncommon. Here we present a simple t ..."
Abstract

Cited by 108 (5 self)
 Add to MetaCart
Summary: A key element to a successful Markov chain Monte Carlo (MCMC) inference is the programming and run performance of the Markov chain. However, the explicit use of quality assessments of the MCMC simulations—convergence diagnostics—in phylogenetics is still uncommon. Here we present a simple tool that uses the output from MCMC simulations and visualizes a number of properties of primary interest in a Bayesian phylogenetic analysis, such as convergence rates of posterior split probabilities and branch lengths. Graphical exploration of the output from phylogenetic MCMC simulations gives intuitive and often crucial information on the success and reliability of the analysis. The tool presented here complements convergence diagnostics already available in other software packages primarily designed for other applications of MCMC. Importantly, the common practice of using traceplots of a single parameter or summary statistic, such as the likelihood score of sampled trees, can be misleading for assessing the success of a phylogenetic MCMC simulation.
Improving the efficiency of SPR moves in phylogenetic tree search methods based on maximum likelihood
 BIOINFORMATICS
, 2005
"... Motivation: Maximum likelihood methods have become very popular for constructing phylogenetic trees from sequence data. However, despite noticeable recent progress, with large and difficult data sets (e.g. multiple genes with conflicting signals) current ML programs still require huge computing time ..."
Abstract

Cited by 31 (9 self)
 Add to MetaCart
(Show Context)
Motivation: Maximum likelihood methods have become very popular for constructing phylogenetic trees from sequence data. However, despite noticeable recent progress, with large and difficult data sets (e.g. multiple genes with conflicting signals) current ML programs still require huge computing times and can become trapped in bad local optima of the likelihood function. When this occurs, the resulting trees may still show some of the defects (e.g. long branch attraction) of starting trees obtained using fast distance or parsimony programs. Methods: Subtree Pruning and Regrafting (SPR) topological rearrangements are usually sufficient to intensively search the tree space. Here, we propose two new methods to make SPR moves more efficient. The first method uses a fast distancebased approach to detect the least promising candidate SPR moves, which are then simply discarded. The second method locally estimates the change in likelihood for any remaining potential SPRs, as opposed to globally evaluating the entire tree for each possible move. These two methods are implemented in a new algorithm with a sophisticated filtering strategy, which efficiently selects potential SPRs and concentrates most of the likelihood computation on the promising moves. Results: Experiments with real data sets comprising 35 to 250 taxa show that, while indeed greatly reducing the amount of computation, our approach provides likelihood values at least as good as those of the best known ML methods so far, and is very robust to poor starting trees. Furthermore, combining our new SPR algorithm with local moves such as PHYML’s nearest neighbor interchanges, the time needed to find good solutions can sometimes be reduced even more. Availability: Executables of our SPR program and the used data sets are available for download at
Detecting correlation between characters in a comparative analysis with uncertain phylogeny. Evolution
"... Abstract. The importance of accommodating the phylogenetic history of a group when performing a comparative analysis is now widely recognized. The typical approaches either assume the tree is known without error, or they base inferences on a collection of wellsupported trees or on a collection of ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
Abstract. The importance of accommodating the phylogenetic history of a group when performing a comparative analysis is now widely recognized. The typical approaches either assume the tree is known without error, or they base inferences on a collection of wellsupported trees or on a collection of trees generated under a stochastic model of cladogenesis. However, these approaches do not adequately account for the uncertainty of phylogenetic trees in a comparative analysis, especially when data relevant to the phylogeny of a group are available. Here, we develop a method for performing comparative analyses that is based on an extension of Felsenstein's independent contrasts method. Uncertainties in the phylogeny, branch lengths, and other parameters are accommodated by averaging over all possible trees, weighting each by the probability that the tree is correct. We do this in a Bayesian framework and use Markov chain Monte Carlo to perform the highdimensional summations and integrations required by the analysis. We illustrate the method using comparative characters sampled from Anolis lizards.
PBPI: a High Performance Implementation of Bayesian Phylogenetic Inference
 In Proc. of Supercomputing’2006
, 2006
"... This paper describes the implementation and performance of PBPI, a parallel implementation of Bayesian phylogenetic inference method for DNA sequence data. By combining the Markov Chain Monte Carlo (MCMC) method with likelihoodbased assessment of phylogenies, Bayesian phylogenetic inferences can in ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
(Show Context)
This paper describes the implementation and performance of PBPI, a parallel implementation of Bayesian phylogenetic inference method for DNA sequence data. By combining the Markov Chain Monte Carlo (MCMC) method with likelihoodbased assessment of phylogenies, Bayesian phylogenetic inferences can incorporate complex statistic models into the process of phylogenetic tree estimation. However, Bayesian analyses are extremely computationally expensive. PBPI uses algorithmic improvements and parallel processing to achieve significant performance improvement over comparable Bayesian phylogenetic inference programs. We evaluated the performance and accuracy of PBPI using a simulated dataset on System X, a terascale supercomputer at Virginia Tech. Our results show that PBPI identifies equivalent tree estimates 1424 times faster on 256 processors than a widelyused, bestavailable (albeit sequential), Bayesian phylogenetic inference program. PBPI also achieves linear speedup with the number of processors for large problem sizes. Most importantly, the PBPI framework enables Bayesian phylogenetic analysis of large datasets previously impracticable. 1.
Limitations of Markov Chain Monte Carlo Algorithms for Bayesian Inference
 of Phylogeny, Science
, 2005
"... Markov Chain Monte Carlo algorithms play a key role in the Bayesian approach to phylogenetic inference. In this paper, we present the first theoretical work analyzing the rate of convergence of several Markov chains widely used in phylogenetic inference. We analyze simple, realistic examples where t ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
(Show Context)
Markov Chain Monte Carlo algorithms play a key role in the Bayesian approach to phylogenetic inference. In this paper, we present the first theoretical work analyzing the rate of convergence of several Markov chains widely used in phylogenetic inference. We analyze simple, realistic examples where these Markov chains fail to converge quickly. In particular, the studied data is generated from a pair of trees, under a standard evolutionary model. We prove that many of the popular Markov chains take exponentially long to reach their stationary distribution. Our construction is pertinent since it is well known that phylogenetic trees for genes may differ within a single organism. Our results are a cautionary light for phylogenetic analysis using Bayesian inference, and highlight future directions for potential theoretical work. 1
An examination of the monophyly of morning glory taxa using Bayesian phylogenetic inference
 Syst. Biol
, 2002
"... Abstract.—The objective of this study was to obtain a quantitative assessment of the monophyly of morning glory taxa, speci�cally the genus Ipomoea and the tribe Argyreieae. Previous systematic studies of morning glories intimated the paraphyly of Ipomoea by suggesting that the genera within the tri ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
Abstract.—The objective of this study was to obtain a quantitative assessment of the monophyly of morning glory taxa, speci�cally the genus Ipomoea and the tribe Argyreieae. Previous systematic studies of morning glories intimated the paraphyly of Ipomoea by suggesting that the genera within the tribe Argyreieae are derived from within Ipomoea; however, no quantitative estimates of statistical support were developed to address these questions. We applied a Bayesian analysis to provide quantitative estimates of monophyly in an investigation of morning glory relationships using DNA sequence data. We also explored various approaches for examining convergence of the Markov chain Monte Carlo (MCMC) simulation of the Bayesian analysis by running 18 separate analyses varying in length. We found convergence of the important components of the phylogenetic model (the tree with the maximum posterior probability, branch lengths, the parameter values from the DNA substitution model, and the posterior probabilities for clade support) for these data after one million generations of the MCMC simulations. In the process, we identi�ed a run where the parameter values obtained were often outside the range of values obtained from the other runs, suggesting an aberrant result. In addition, we compared the Bayesian method of phylogenetic analysis to maximum likelihood and maximum parsimony. The results from the Bayesian analysis and the maximum likelihood analysis
STATISTICAL APPROACH TO TESTS INVOLVING PHYLOGENIES
"... This chapter reviews statistical testing involving phylogenies. We present both the classical framework with the use of sampling distributions involving the bootstrap and permutation tests and the Bayesian approach using posterior distributions. We give some examples of direct tests for deciding whe ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
This chapter reviews statistical testing involving phylogenies. We present both the classical framework with the use of sampling distributions involving the bootstrap and permutation tests and the Bayesian approach using posterior distributions. We give some examples of direct tests for deciding whether the data support a given tree or trees that share a particular property, comparative analyses using tests that condition on the phylogeny being known are also discussed. We introduce a continuous parameter space that enables one to avoid the delicate problem of comparing exponentially many possible models with a finite amount of data. This chapter contains a review of the literature on parametric tests in phylogenetics and some suggestions of nonparametric tests. We also present some open questions that have to be solved by mathematical statisticians to provide the theoretical justification of both current testing strategies and as yet underdeveloped areas of statistical testing in
Identifying conflicting signal in a multigene analysis reveals a highly resolved tree: the phylogeny of Rodentia (Mammalia
 Syst. Biol
, 2003
"... Abstract.—Homoplasy among morphological characters has hindered inference of higher level rodent phylogeny for over 100 years. Initial molecular studies, based primarily on single genes, likewise produced little resolution of the deep relationships among rodent families. Two recent molecular studie ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Abstract.—Homoplasy among morphological characters has hindered inference of higher level rodent phylogeny for over 100 years. Initial molecular studies, based primarily on single genes, likewise produced little resolution of the deep relationships among rodent families. Two recent molecular studies (Huchon et al., 2002, Mol. Biol. Evol. 19:1053–1065; Adkins et al., 2003, Mol. Phylogenet. Evol. 26:409–420), using larger samples from the nuclear genome, have produced phylogenies that are generally concordant with each other, but many of the deep superfamilial nodes were still lacking substantial statistical support. Data are presented here for a total of approximately 3,600 base pairs from portions of three different nuclear proteincoding genes, CB1, IRBP, and RAG2, from 19 rodents and three outgroups. Separate analyses, with data partitioned according to both genes and codon position, produced conflicting results. Trees obtained from all partitions of CB1 and RAG2 and those obtained from the first plus secondposition sites of IRBP were generally concordant with each other and the trees from the two recent studies, whereas trees obtained from the thirdposition sites of IRBP were not. Although the IRBP thirdposition sites represent only 1/9 of the total data set, combined analyses using either parsimony or likelihood resulted in trees in agreement with the IRBP thirdposition sites and in disagreement with the remaining 8/9 of the sites from this data set and the two recent multigene studies. In contrast, maximumlikelihood analysis using a sitespecific rates model did recover a tree that is highly congruent with the trees in the two recent studies. If the IRBP thirdposition sites are removed from the current data set, then combined likelihood analyses obtain a tree that is highly
Building the Tree of Life on Terascale Systems
 In Proc. of the 21st International Parallel and Distributed Processing Symposium
, 2007
"... Bayesian phylogenetic inference is an important alternative to maximum likelihoodbased phylogenetic method. However, inferring large trees using the Bayesian approach is computationally demanding—requiring huge amounts of memory and months of computational time. With a combination of novel parallel ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Bayesian phylogenetic inference is an important alternative to maximum likelihoodbased phylogenetic method. However, inferring large trees using the Bayesian approach is computationally demanding—requiring huge amounts of memory and months of computational time. With a combination of novel parallel algorithms and latest system technology, terascale phylogenetic tools will provide biologists the computational power necessary to conduct experiments on very large dataset, and thus aid construction of the tree of life. In this work we evaluate the performance of PBPI, a parallel application that reconstructs phylogenetic trees using MCMCbased Bayesian methods, on two terascale systems, Blue Gene/L at IBM Rochester and System X at Virginia Tech. Our results confirm that for a benchmark dataset with 218 taxa and 10000 characters, PBPI can achieve linear speedup on 1024 or more processors for both systems. 1.