Results 1  10
of
131
A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood
, 2003
"... The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The ..."
Abstract

Cited by 2182 (27 self)
 Add to MetaCart
The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The core of this method is a simple hillclimbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distancebased method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximumlikelihood programs and much higher than the performance of distancebased and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximumlikelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distancebased and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page:
Likelihoodbased tests of topologies in phylogenetics. Syst. Biol
, 2000
"... Abstract.—Likelihoodbased statistical tests of competing evolutionary hypotheses (tree topologies) have been available for approximately a decade. By far the most commonly used is the Kishino–Hasegawa test. However, the assumptions that have to be made to ensure the validity of the Kishino–Hasegawa ..."
Abstract

Cited by 225 (3 self)
 Add to MetaCart
Abstract.—Likelihoodbased statistical tests of competing evolutionary hypotheses (tree topologies) have been available for approximately a decade. By far the most commonly used is the Kishino–Hasegawa test. However, the assumptions that have to be made to ensure the validity of the Kishino–Hasegawa test place important restrictions on its applicability. In particular, it is only valid when the topologies being compared are speci�ed a priori. Unfortunately, this means that the Kishino–Hasegawa test may be severely biased in many cases in which it is now commonly used: for example, in any case in which one of the competing topologies has been selected for testing because it is the maximum likelihood topology for the data set at hand. We review the theory of the Kishino–Hasegawa test and contend that for the majority of popular applications this test should not be used. Previously published results from invalid applications of the Kishino–Hasegawa test should be treated extremely cautiously, and future applications should use appropriate alternative tests instead. We review such alternative tests, both nonparametric and parametric, and give two examples which illustrate the importance of our contentions. [Kishino– Hasegawa test; maximum likelihood; phylogeny; Shimodaira–Hasegawa test; statistical tests; tree topology.] Hasegawa and Kishino (1989) and Kishino and Hasegawa(1989)developed methods for estimating the standard error and con�dence intervals for the difference in loglikelihoods between two topologically distinct phylogenetic trees representing hypotheses that might explain particular aligned sequence data sets. The method initially was introduced to compute con�dence intervals on posterior probabilities for topologies in a
Phylogenetic Tree Construction Using Markov Chain Monte Carlo
, 1999
"... We describe a Bayesian method based on Markov chain simulation to study the phylogenetic relationship in a group of DNA sequences. Under simple models of mutational events, our method produces a Markov chain whose stationary distribution is the conditional distribution of the phylogeny given the obs ..."
Abstract

Cited by 86 (0 self)
 Add to MetaCart
We describe a Bayesian method based on Markov chain simulation to study the phylogenetic relationship in a group of DNA sequences. Under simple models of mutational events, our method produces a Markov chain whose stationary distribution is the conditional distribution of the phylogeny given the observed sequences. Our algorithm strikes a reasonable balance between the desire to move globally through the space of phylogenies and the need to make computationally feasible moves in areas of high probability. Since phylogenetic information is described by a tree, we have created new diagnostics to handle this type of data structure. An important byproduct of the Markov chain Monte Carlo phylogeny building technique is that it provides estimates and corresponding measures of variability for any aspect of the phylogeny under study.
A phylogenomic approach to bacterial phylogeny: evidence of a core of genes sharing a common history.
 Genome Res.
, 2002
"... It has been claimed that complete genome sequences would clarify phylogenetic relationships between organisms, but up to now, no satisfying approach has been proposed to use efficiently these data. For instance, if the coding of presence or absence of genes in complete genomes gives interesting res ..."
Abstract

Cited by 64 (3 self)
 Add to MetaCart
(Show Context)
It has been claimed that complete genome sequences would clarify phylogenetic relationships between organisms, but up to now, no satisfying approach has been proposed to use efficiently these data. For instance, if the coding of presence or absence of genes in complete genomes gives interesting results, it does not take into account the phylogenetic information contained in sequences and ignores hidden paralogies by using a BLAST reciprocal best hit definition of orthology. In addition, concatenation of sequences of different genes as well as building of consensus trees only consider the few genes that are shared among all organisms. Here we present an attempt to use a supertree method to build the phylogenetic tree of 45 organisms, with special focus on bacterial phylogeny. This led us to perform a phylogenetic study of congruence of tree topologies, which allows the identification of a core of genes supporting similar species phylogeny. We then used this core of genes to infer a tree. This phylogeny presents several differences with the rRNA phylogeny, notably for the position of hyperthermophilic bacteria.
Hierarchical Latent Class Models for Cluster Analysis
 Journal of Machine Learning Research
, 2002
"... Latent class models are used for cluster analysis of categorical data. Underlying such a model is the assumption that the observed variables are mutually independent given the class variable. A serious problem with the use of latent class models, known as local dependence, is that this assumption is ..."
Abstract

Cited by 62 (13 self)
 Add to MetaCart
(Show Context)
Latent class models are used for cluster analysis of categorical data. Underlying such a model is the assumption that the observed variables are mutually independent given the class variable. A serious problem with the use of latent class models, known as local dependence, is that this assumption is often untrue. In this paper we propose hierarchical latent class models as a framework where the local dependence problem can be addressed in a principled manner. We develop a searchbased algorithm for learning hierarchical latent class models from data. The algorithm is evaluated using both synthetic and realworld data.
Phylogenomics and the reconstruction of the tree of life
 Nat Rev Genet
, 2005
"... As more complete genomes are sequenced, phylogenetic analysis is entering a new era — that of phylogenomics. One branch of this expanding field aims to reconstruct the evolutionary history of organisms based on the analysis of their genomes. Recent studies have demonstrated the power of this approac ..."
Abstract

Cited by 54 (2 self)
 Add to MetaCart
As more complete genomes are sequenced, phylogenetic analysis is entering a new era — that of phylogenomics. One branch of this expanding field aims to reconstruct the evolutionary history of organisms based on the analysis of their genomes. Recent studies have demonstrated the power of this approach, which has the potential to provide answers to a number of fundamental evolutionary questions. However, challenges for the future have also been revealed. The very nature of the evolutionary history of organisms and the limitations of current phylogenetic reconstruction methods mean that part of the tree of life halsde00193293, version 1 3 Dec 2007 may prove difficult, if not impossible, to resolve with confidence. Introductory paragraph Understanding phylogenetic relationships between organisms is a prerequisite of almost any evolutionary study, as contemporary species all share a common history through their ancestry. The notion of phylogeny follows directly from the theory of evolution presented by Charles Darwin in “The Origin of Species ” 1: the only illustration in his famous book is the first representation of evolutionary relationships among species, in the form of a
FastTree: computing large minimum evolution trees with profiles instead of a distance matrix
 Mol. Biol. Evol
, 2009
"... As DNA sequencing accelerates, gene families are growing rapidly, but standard methods for inferring phylogenies become computationally demanding for alignments of thousands of sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead ..."
Abstract

Cited by 52 (0 self)
 Add to MetaCart
(Show Context)
As DNA sequencing accelerates, gene families are growing rapidly, but standard methods for inferring phylogenies become computationally demanding for alignments of thousands of sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement neighborjoining, and uses heuristics to quickly identify candidate joins. FastTree then refines the topology with nearestneighbor interchanges according to the minimumevolution criterion. Compared to using a distance matrix, FastTree reduces the memory required from O(N2) to O(NLa + N N) and reduces the computation time from O(N2L) to O(N N log(N)La), where N is the number of sequences, L is the width of the alignment, and a is the size of the alphabet. To estimate the tree’s reliability, FastTree uses local bootstrapping, which gives another 100fold speedup over distance matrix approaches. FastTree constructed trees, including support values, for biological alignments with 39,092 or 158,022 distinct sequences in less time than it takes to compute a distance matrix and in a fraction of the space. Traditional neighbor joining with 100 bootstraps would be 10,000 times slower and would require 50 gigabytes of memory. In simulations, FastTree is slightly more accurate than other minimumevolution methods such as neighbor joining, BIONJ, or FastME, and on genuine alignments, FastTree produces topologies with higher likelihoods. FastTree is available at
Models of molecular evolution and phylogeny
 Genome Res
, 1998
"... Phylogenetic reconstruction is a fastgrowing field that is enriched by different statistical approaches and by findings and applications in a broad range of biological areas. Fundamental to these are the mathematical models used to describe the patterns of DNA base substitution and amino acid repla ..."
Abstract

Cited by 51 (0 self)
 Add to MetaCart
(Show Context)
Phylogenetic reconstruction is a fastgrowing field that is enriched by different statistical approaches and by findings and applications in a broad range of biological areas. Fundamental to these are the mathematical models used to describe the patterns of DNA base substitution and amino acid replacement. These may become some of the basic models for comparative genome research. We discuss these models, including the analysis of observed DNA base and amino acid mutation patterns, the concept of site heterogeneity, and the incorporation of structural biology data, all of which have become particularly important in recent years. We also describe the use of such models in phylogenetic reconstruction and statistical methods for the comparison of different models. PCR has deeply transformed and boosted phylogenetic studies. At the same time, the statistical analysis of evolutionary relationships among species has recently revealed important biotechnological uses. For example, the understanding of viral quasispecies variation allows us to trace routes of infectious disease transmission. The analysis of the host–
Phylogenomics of eukaryotes: impact of missing data on large alignments.
 Molecular Biology and Evolution,
, 2004
"... Resolving the relationships between Metazoa and other eukaryotic groups as well as between metazoan phyla is central to the understanding of the origin and evolution of animals. The current view is based on limited data sets, either a single gene with many species (e.g., ribosomal RNA) or many gene ..."
Abstract

Cited by 50 (0 self)
 Add to MetaCart
(Show Context)
Resolving the relationships between Metazoa and other eukaryotic groups as well as between metazoan phyla is central to the understanding of the origin and evolution of animals. The current view is based on limited data sets, either a single gene with many species (e.g., ribosomal RNA) or many genes but with only a few species. Because a reliable phylogenetic inference simultaneously requires numerous genes and numerous species, we assembled a very large data set containing 129 orthologous proteins (;30,000 aligned amino acid positions) for 36 eukaryotic species. Included in the alignments are data from the choanoflagellate Monosiga ovata, obtained through the sequencing of about 1,000 cDNAs. We provide conclusive support for choanoflagellates as the closest relative of animals and for fungi as the second closest. The monophyly of Plantae and chromalveolates was recovered but without strong statistical support. Within animals, in contrast to the monophyly of Coelomata observed in several recent largescale analyses, we recovered a paraphyletic Coelamata, with nematodes and platyhelminths nested within. To include a diverse sample of organisms, data from EST projects were used for several species, resulting in a large amount of missing data in our alignment (about 25%). By using different approaches, we verify that the inferred phylogeny is not sensitive to these missing data. Therefore, this large data set provides a reliable phylogenetic framework for studying eukaryotic and animal evolution and will be easily extendable when large amounts of sequence information become available from a broader taxonomic range.