A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood
, 2003
"... The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The ..."
The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The core of this method is a simple hillclimbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distancebased method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximumlikelihood programs and much higher than the performance of distancebased and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximumlikelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distancebased and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page:
A structural EM algorithm for phylogenetic inference
 Journal of Computational Biology
, 2002
"... Head of the school for Engineering and Computer Science ..."
Head of the school for Engineering and Computer Science
Tail paradox, partial identifiability, and influential priors in Bayesian branch length inference.Mol
 Biol
, 2012
"... Recent studies have observed that Bayesian analyses of sequence data sets using the program MrBayes sometimes generate extremely large branch lengths, with posterior credibility intervals for the tree length (sum of branch lengths) excluding the maximum likelihood estimates. Suggested explanations f ..."
Recent studies have observed that Bayesian analyses of sequence data sets using the program MrBayes sometimes generate extremely large branch lengths, with posterior credibility intervals for the tree length (sum of branch lengths) excluding the maximum likelihood estimates. Suggested explanations for this phenomenon include the existence of multiple local peaks in the posterior, lack of convergence of the chain in the tail of the posterior,mixing problems, andmisspecified priors on branch lengths. Here, we analyze the behavior of Bayesian Markov chain Monte Carlo algorithms when the chain is in the tail of the posterior distribution and note that all these phenomena can occur. In Bayesian phylogenetics, the likelihood function approaches a constant instead of zero when the branch lengths increase to infinity. The flat tail of the likelihood can cause poor mixing and undue influence of the prior. We suggest that the main cause of the extreme branch length estimates produced in many Bayesian analyses is the poor choice of a default prior on branch lengths in current Bayesian phylogenetic programs. The default prior in MrBayes assigns independent and identical distributions to branch lengths, imposing strong (and unreasonable) assumptions about the tree length. The problem is exacerbated by the strong correlation between the branch lengths and parameters in models of variable rates among sites or among site partitions. To resolve the problem, we suggest twomultivariate priors for the branch lengths (called compoundDirichlet priors) that are fairly diffuse anddemonstrate their
Mitochondrial DNAbased phylogeography of North American rubber boas, Charina bottae (Blainville). Molecular Phylogenetics and Evolution 18
, 2001
"... We used 783 bp of mitochondrial DNA sequences to study the phylogeography of Charina bottae (rubber boa) in western North America, with an emphasis on populations from California (U.S.A.). Maximumparsimony and maximumlikelihood methods identified a basal divergence within C. bottae that correspond ..."
We used 783 bp of mitochondrial DNA sequences to study the phylogeography of Charina bottae (rubber boa) in western North America, with an emphasis on populations from California (U.S.A.). Maximumparsimony and maximumlikelihood methods identified a basal divergence within C. bottae that corresponds to southern and northern segments of its current distribution. These clades coincide with the ranges of the two recognized subspecies, C. b. umbratica in the south and C. b. bottae to the north. A subsequent cladogenetic event in the C. b. bottae clade resulted in two groupings, which we refer to as the Sierra Nevada and the Northwestern subclades, based on the geographic distribution of their constituent populations. The two subclades have completely allopatric distributions, with a genetic break in the vicinity of Lassen Volcanic National Park in northeastern California, an area that was subjected to glaciation during the Pleistocene and that has been volcanically active in the past 100 years. An earlier genetic study documented fixed differences between populations of bottae and umbratica in four of seven allozymes surveyed, and despite noticeable variation and overlap in the characters that define C. b. bottae and C. b. umbratica, the two forms still can be separated in most cases using a suite of morphological traits. All available evidence thus indicates that C. b. umbratica is a genetically cohesive, allopatric taxon that is morphologically diagnosable, and we conclude that it is an independent evolutionary unit that should be recognized as a distinct species, Charina umbratica. © 2001 Academic Press
DOI: 10.1080/10635150390196993 Accelerated Likelihood Surface Exploration: The Likelihood Ratchet
"... Abstract. — The existence of multiple likelihood maxima necessitates algorithms that explore a large part of the tree space. However, because of computational constraints, stepwise additionbased treesearching methods do not allow for this exploration in reasonable time. Here, I present an algorith ..."
Abstract. — The existence of multiple likelihood maxima necessitates algorithms that explore a large part of the tree space. However, because of computational constraints, stepwise additionbased treesearching methods do not allow for this exploration in reasonable time. Here, I present an algorithm that increases the speed at which the likelihood landscape can be explored. The iterative algorithm combines the computational speed of distancebased tree construction methods to arrive at approximations of the global optimum with the accuracy of optimality criterion based branchswapping methods to improve on the result of the starting tree. The algorithm moves between local optima by iteratively perturbing the tree landscape through a process of reweighting randomly drawn samples of the underlying sequence data set. Tests on simulated and real data sets demonstrated that the optimal solution obtained using stepwise additionbased heuristic searches was found faster using the algorithm presented here. Tests on a previously published data set that established the presence of tree islands under maximum likelihood demonstrated that the algorithm identifies the same tree islands in a shorter amount of time than that needed using stepwise addition. The algorithm can be readily applied using standard software for phylogenetic inference. [Heuristics; maximum likelihood; parsimony ratchet; phylogenetic inference; tree landscapes.] Algorithms for finding maximumlikelihood trees can sometimes find multiple solutions (Steel, 1994; Rogers and Swofford, 1999; Chor et al., 2000; Salter, 2001), necessitating
Frequent Simian Foamy Virus Infection in Persons Occupationally
, 2003
"... The recognition that AIDS originated as a zoonosis heightens public health concerns associated with human infection by simian retroviruses endemic in nonhuman primates (NHPs). These retroviruses include simian immunodeficiency virus (SIV), simian Tcell lymphotropic virus (STLV), simian type D retro ..."
The recognition that AIDS originated as a zoonosis heightens public health concerns associated with human infection by simian retroviruses endemic in nonhuman primates (NHPs). These retroviruses include simian immunodeficiency virus (SIV), simian Tcell lymphotropic virus (STLV), simian type D retrovirus (SRV), and simian foamy virus (SFV). Although occasional infection with SIV, SRV, or SFV in persons occupationally exposed to NHPs has been reported, the characteristics and significance of these zoonotic infections are not fully defined. Surveillance for simian retroviruses at three research centers and two zoos identified no SIV, SRV, or STLV infection in 187 participants. However, 10 of 187 persons (5.3%) tested positive for SFV antibodies by Western blot (WB) analysis. Eight of the 10 were males, and 3 of the 10 worked at zoos. SFV integrase gene (int) and gag sequences were PCR amplified from the peripheral blood lymphocytes available from 9 of the 10 persons. Phylogenetic analysis showed SFV infection originating from chimpanzees (n 8) and baboons (n 1). SFV seropositivity for periods of 8 to 26 years (median, 22 years) was documented for six workers for whom archived serum samples were available, demonstrating longstanding SFV infection. All 10 persons reported general good health, and secondary transmission of SFV was not observed in three wives
Centers for Disease Control and Prevention
, 2003
All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.