Results 1  10
of
58
A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood
, 2003
"... The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The ..."
Abstract

Cited by 2182 (27 self)
 Add to MetaCart
The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The core of this method is a simple hillclimbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distancebased method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximumlikelihood programs and much higher than the performance of distancebased and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximumlikelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distancebased and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page:
Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees.
 Mol Biol Evol
, 1993
"... Examining the pattern of nucleotide substitution for the control region of mitochondrial DNA ( mtDNA ) in humans and chimpanzees, we developed a new mathematical method for estimating the number of transitional and transversional substitutions per site, as well as the total number of nucleotide sub ..."
Abstract

Cited by 925 (4 self)
 Add to MetaCart
Examining the pattern of nucleotide substitution for the control region of mitochondrial DNA ( mtDNA ) in humans and chimpanzees, we developed a new mathematical method for estimating the number of transitional and transversional substitutions per site, as well as the total number of nucleotide substitutions. In this method, excess transitions, unequal nucleotide frequencies, and variation of substitution rate among different sites are all taken into account. Application of this method to human and chimpanzee data suggested that the transition / transversion ratio for the entire control region was 15 and nearly the same for the two species. The 95% confidence interval of the age of the common ancestral mtDNA was estimated to be 80,000480,000 years in humans and 0.572.72 Myr in common chimpanzees.
A Hidden Markov Model approach to variation among sites in rate of evolution.
 Mol Biol Evol
, 1996
"... Abstract The method of hidden Markov models is used to allow for unequal and unknown evolutionary rates at different sites in molecular sequences. Rates of evolution at different sites are assumed to be drawn from a set of possible rates, with a finite number of possibilities. The overall likelihoo ..."
Abstract

Cited by 244 (1 self)
 Add to MetaCart
(Show Context)
Abstract The method of hidden Markov models is used to allow for unequal and unknown evolutionary rates at different sites in molecular sequences. Rates of evolution at different sites are assumed to be drawn from a set of possible rates, with a finite number of possibilities. The overall likelihood of a phylogeny is calculated as a sum of terms, each term being the probability of the data given a particular assignment of rates to sites, times the prior probability of that particular combination of rates. The probabilities of different rate combinations are specified by a stationary Markov chain that assigns rate categories to sites. While there will be a very large number of possible ways of assigning rates to sites, a simple recursive algorithm allows the contributions to the likelihood from all possible combinations of rates to be summed, in a time proportional to the number of different rates at a single site. Thus with 3 rates, the effort involved is no greater than 3 times that for a single rate. This "hidden Markov model" method allows for rates to differ between sites, and for correlations between the rates of neighboring sites. By summing over all possibilities it does not require us to know the rates at individual sites. However it does not allow for correlation of rates at nonadjacent sites, nor does it allow for a continuous distribution of rates over sites. It is shown how to use the NewtonRaphson method to estimate branch lengths of a phylogeny, and to infer from a phylogeny what assignment of rates to sites has the largest posterior probability. An example is given using βhemoglobin DNA sequences in 8 mammal species; the regions of high and low evolutionary rates are inferred and also the average length of patches of similar rates.
Comparison of sitespecific rateinference methods for protein sequences: empirical Bayesian methods are superior. Mol Biol Evol,
, 2004
"... The degree to which an amino acid site is free to vary is strongly dependent on its structural and functional importance. An amino acid that plays an essential role is unlikely to change over evolutionary time. Hence, the evolutionary rate at an amino acid site is indicative of how conserved this s ..."
Abstract

Cited by 65 (11 self)
 Add to MetaCart
(Show Context)
The degree to which an amino acid site is free to vary is strongly dependent on its structural and functional importance. An amino acid that plays an essential role is unlikely to change over evolutionary time. Hence, the evolutionary rate at an amino acid site is indicative of how conserved this site is and, in turn, allows evaluation of its importance in maintaining the structure/function of the protein. When using probabilistic methods for sitespecific rate inference, few alternatives are possible. In this study we use simulations to compare the maximumlikelihood and Bayesian paradigms. We study the dependence of inference accuracy on such parameters as number of sequences, branch lengths, the shape of the rate distribution, and sequence length. We also study the possibility of simultaneously estimating branch lengths and sitespecific rates. Our results show that a Bayesian approach is superior to maximumlikelihood under a wide range of conditions, indicating that the prior that is incorporated into the Bayesian computation significantly improves performance. We show that when branch lengths are unknown, it is better first to estimate branch lengths and then to estimate sitespecific rates. This procedure was found to be superior to estimating both the branch lengths and sitespecific rates simultaneously. Finally, we illustrate the difference between maximumlikelihood and Bayesian methods when analyzing siteconservation for the apoptosis regulator protein Bclx L .
General TimeReversible Distances with Unequal Rates across Sites: Mixing Γ and Inverse Gaussian Distributions with Invariant Sites
, 1997
"... This paper aims to explain to biologists the assumptions of these distances and to clarify some earlier misconceptions. Importantly, nearly all of the currently used distance estimates (including those of Tamura, 1992; Tamura and Nei, 1994) are special cases (restrictions) of the general timerevers ..."
Abstract

Cited by 49 (3 self)
 Add to MetaCart
This paper aims to explain to biologists the assumptions of these distances and to clarify some earlier misconceptions. Importantly, nearly all of the currently used distance estimates (including those of Tamura, 1992; Tamura and Nei, 1994) are special cases (restrictions) of the general timereversible distance (see Zharkikh, 1994; Swofford et al., 1996)
Statistical properties of the maximum likelihood method of phylogenetic estimation and comparison with distance matrix methods. Syst. Biol
, 1994
"... Abstract.—A proof is presented that the maximum likelihood (ML) method of phylogenetic estimation from DNA sequences (Felsenstein, 1981, J.'Mol. Evol. 17:368376) is statistically consistent despite the irregularity of the parameter space of the estimation problem. The distance matrix method us ..."
Abstract

Cited by 42 (10 self)
 Add to MetaCart
Abstract.—A proof is presented that the maximum likelihood (ML) method of phylogenetic estimation from DNA sequences (Felsenstein, 1981, J.'Mol. Evol. 17:368376) is statistically consistent despite the irregularity of the parameter space of the estimation problem. The distance matrix method using the least squares (LS) criterion is also consistent, but disconnection of two steps in the method, i.e., estimation of sequence divergence and construction of the tree topology, appears to lead to both theoretical contradictions and practical problems. Comparison of the ML and LS methods shows that the ML method is much more, indeed extremely, tolerant to violation of its assumptions and also has smaller sampling errors caused by limited data. This conclusion should be general, independent of particular models, tree topologies, and number of species in the data set. The problem of evaluating the reliability of the estimated tree topology was examined. The test of positivity of the interior branch length in the estimated tree is not a test of the significance of the ML tree but may be taken as a test of the significance of the LS tree. [Phylogenetic estimation; maximum likelihood; least squares; consistency; sampling error; robustness; parameter space; molecular systematics.] Under regularity conditions, maximum
Performance of a New Invariants Method on Homogeneous and Nonhomogeneous Quartet Trees
, 2006
"... ..."
Probability Models for Genome Rearrangement and Linear Invariants for Phylogenetic Inference
 In Proc. of COCOON
, 1999
"... We review the combinatorial optimization problems in calculating edit distances between genomes and phylogenetic inference based on minimizing gene order changes. With a view to avoiding the computational cost and the "long branches attract" artifact of some treebuilding methods, we explo ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
We review the combinatorial optimization problems in calculating edit distances between genomes and phylogenetic inference based on minimizing gene order changes. With a view to avoiding the computational cost and the "long branches attract" artifact of some treebuilding methods, we explore the probabilization of genome rearrangment models prior to developing a methodology based on branchlength invariants. We characterize probabilistically the evolution of the structure of the gene adjacency set for inversions on unsigned circular genomes and, using a nontrivial recurrence relation, inversions on signed genomes. Concepts from the theory of invariants developed for the phylogenetics of homologous gene sequences can be used to derive a complete set of linear invariants for unsigned inversions, as well as for a mixed rearrangement model for signed genomes, though not for pure transposition nor pure signed inversion models. The invariants are based on an extended JukesCantor semigroup....
Arlequin ver 3.1  An Integrated Software Package for Population Genetics Data Analysis
, 2006
"... ..."