Results 1  10
of
1,886
MEGA5: Molecular evolutionary genetics analysis using maximum . . .
, 2011
"... Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version ..."
Abstract

Cited by 7284 (25 self)
 Add to MetaCart
(Show Context)
Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a userfriendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting bestfit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates sitebysite. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user interface has now been enhanced to be activity driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from
A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood
, 2003
"... The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The ..."
Abstract

Cited by 2182 (27 self)
 Add to MetaCart
The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The core of this method is a simple hillclimbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distancebased method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximumlikelihood programs and much higher than the performance of distancebased and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximumlikelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distancebased and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page:
Learning in graphical models
 STATISTICAL SCIENCE
, 2004
"... Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve largescale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for ..."
Abstract

Cited by 806 (10 self)
 Add to MetaCart
Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve largescale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for approaching these problems, and indeed many of the models developed by researchers in these applied fields are instances of the general graphical model formalism. We review some of the basic ideas underlying graphical models, including the algorithmic ideas that allow graphical models to be deployed in largescale data analysis problems. We also present examples of graphical models in bioinformatics, errorcontrol coding and language processing.
Model Selection and Model Averaging in Phylogenetics: Advantages of Akaike Information Criterion and Bayesian Approaches Over Likelihood Ratio Tests
, 2004
"... Model selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental concepts and techniques of model selection in the context of phylogenetics. We start by reviewing different aspects of the sel ..."
Abstract

Cited by 407 (8 self)
 Add to MetaCart
Model selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental concepts and techniques of model selection in the context of phylogenetics. We start by reviewing different aspects of the selection of substitution models in phylogenetics from a theoretical, philosophical and practical point of view, and summarize this comparison in table format. We argue that the most commonly implemented model selection approach, the hierarchical likelihood ratio test, is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages. In particular, the latter two methods are able to simultaneously compare multiple nested or nonnested models, assess model selection uncertainty, and allow for the estimation of phylogenies and model parameters using all available models (modelaveraged inference or multimodel inference). We also describe how the relative importance of the different parameters included in substitution models can be depicted. To illustrate some of these points, we have applied AICbased model averaging to 37 mitochondrial DNA sequences from the subgenus Ohomopterus (genus Carabus) ground beetles described by Sota and Vogler (2001).
Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution.
 Mol. Biol. Evol.
, 1998
"... An excess of nonsynonymous substitutions over synonymous ones is an important indicator of positive selection at the molecular level. A lineage that underwent Darwinian selection may have a nonsynonymous/synonymous rate ratio (d N /d S ) that is different from those of other lineages or greater tha ..."
Abstract

Cited by 309 (17 self)
 Add to MetaCart
An excess of nonsynonymous substitutions over synonymous ones is an important indicator of positive selection at the molecular level. A lineage that underwent Darwinian selection may have a nonsynonymous/synonymous rate ratio (d N /d S ) that is different from those of other lineages or greater than one. In this paper, several codonbased likelihood models that allow for variable d N /d S ratios among lineages were developed. They were then used to construct likelihood ratio tests to examine whether the d N /d S ratio is variable among evolutionary lineages, whether the ratio for a few lineages of interest is different from the background ratio for other lineages in the phylogeny, and whether the d N /d S ratio for the lineages of interest is greater than one. The tests were applied to the lysozyme genes of 24 primate species. The d N /d S ratios were found to differ significantly among lineages, indicating that the evolution of primate lysozymes is episodic, which is incompatible with the neutral theory. Maximumlikelihood estimates of parameters suggested that about nine nonsynonymous and zero synonymous nucleotide substitutions occurred in the lineage leading to hominoids, and the d N /d S ratio for that lineage is significantly greater than one. The corresponding estimates for the lineage ancestral to colobine monkeys were nine and one, and the d N /d S ratio for the lineage is not significantly greater than one, although it is significantly higher than the background ratio. The likelihood analysis thus confirmed most, but not all, conclusions Messier and Stewart reached using reconstructed ancestral sequences to estimate synonymous and nonsynonymous rates for different lineages.
Raxmliii: a fast program for maximum likelihoodbased inference of large phylogenetic trees
 Bioinformatics
, 2005
"... Motivation: The computation of large phylogenetic trees with statistical models such as maximum likelihood or bayesian inference is computationally extremely intensive. It has repeatedly been demonstrated that these models are able to recover the true tree or a tree which is topologically closer to ..."
Abstract

Cited by 259 (17 self)
 Add to MetaCart
(Show Context)
Motivation: The computation of large phylogenetic trees with statistical models such as maximum likelihood or bayesian inference is computationally extremely intensive. It has repeatedly been demonstrated that these models are able to recover the true tree or a tree which is topologically closer to the true tree more frequently than less elaborate methods such as parsimony or neighbor joining. Due to the combinatorial and computational complexity the size of trees which can be computed on a Biologist’s PC workstation within reasonable time is limited to trees containing approximately 100 taxa. Results: In this paper we present the latest release of our program RAxMLIII for rapid maximum likelihoodbased inference of large evolutionary trees which allows for computation of 1.000taxon trees in less than 24 hours on a single PC processor. We compare RAxMLIII to the currently fastest implementations for maximum likelihood and bayesian inference: PHYML and MrBayes. Whereas RAxMLIII performs worse than PHYML and MrBayes on synthetic data it clearly outperforms both programs on all real data alignments used in terms of speed and final likelihood values. Availability & Supplementary Information: RAxMLIII including all alignments and final trees mentioned in this paper is freely available as open source code at
A Hidden Markov Model approach to variation among sites in rate of evolution.
 Mol Biol Evol
, 1996
"... Abstract The method of hidden Markov models is used to allow for unequal and unknown evolutionary rates at different sites in molecular sequences. Rates of evolution at different sites are assumed to be drawn from a set of possible rates, with a finite number of possibilities. The overall likelihoo ..."
Abstract

Cited by 244 (1 self)
 Add to MetaCart
(Show Context)
Abstract The method of hidden Markov models is used to allow for unequal and unknown evolutionary rates at different sites in molecular sequences. Rates of evolution at different sites are assumed to be drawn from a set of possible rates, with a finite number of possibilities. The overall likelihood of a phylogeny is calculated as a sum of terms, each term being the probability of the data given a particular assignment of rates to sites, times the prior probability of that particular combination of rates. The probabilities of different rate combinations are specified by a stationary Markov chain that assigns rate categories to sites. While there will be a very large number of possible ways of assigning rates to sites, a simple recursive algorithm allows the contributions to the likelihood from all possible combinations of rates to be summed, in a time proportional to the number of different rates at a single site. Thus with 3 rates, the effort involved is no greater than 3 times that for a single rate. This "hidden Markov model" method allows for rates to differ between sites, and for correlations between the rates of neighboring sites. By summing over all possibilities it does not require us to know the rates at individual sites. However it does not allow for correlation of rates at nonadjacent sites, nor does it allow for a continuous distribution of rates over sites. It is shown how to use the NewtonRaphson method to estimate branch lengths of a phylogeny, and to infer from a phylogeny what assignment of rates to sites has the largest posterior probability. An example is given using βhemoglobin DNA sequences in 8 mammal species; the regions of high and low evolutionary rates are inferred and also the average length of patches of similar rates.
Likelihoodbased tests of topologies in phylogenetics. Syst. Biol
, 2000
"... Abstract.—Likelihoodbased statistical tests of competing evolutionary hypotheses (tree topologies) have been available for approximately a decade. By far the most commonly used is the Kishino–Hasegawa test. However, the assumptions that have to be made to ensure the validity of the Kishino–Hasegawa ..."
Abstract

Cited by 225 (3 self)
 Add to MetaCart
Abstract.—Likelihoodbased statistical tests of competing evolutionary hypotheses (tree topologies) have been available for approximately a decade. By far the most commonly used is the Kishino–Hasegawa test. However, the assumptions that have to be made to ensure the validity of the Kishino–Hasegawa test place important restrictions on its applicability. In particular, it is only valid when the topologies being compared are speci�ed a priori. Unfortunately, this means that the Kishino–Hasegawa test may be severely biased in many cases in which it is now commonly used: for example, in any case in which one of the competing topologies has been selected for testing because it is the maximum likelihood topology for the data set at hand. We review the theory of the Kishino–Hasegawa test and contend that for the majority of popular applications this test should not be used. Previously published results from invalid applications of the Kishino–Hasegawa test should be treated extremely cautiously, and future applications should use appropriate alternative tests instead. We review such alternative tests, both nonparametric and parametric, and give two examples which illustrate the importance of our contentions. [Kishino– Hasegawa test; maximum likelihood; phylogeny; Shimodaira–Hasegawa test; statistical tests; tree topology.] Hasegawa and Kishino (1989) and Kishino and Hasegawa(1989)developed methods for estimating the standard error and con�dence intervals for the difference in loglikelihoods between two topologically distinct phylogenetic trees representing hypotheses that might explain particular aligned sequence data sets. The method initially was introduced to compute con�dence intervals on posterior probabilities for topologies in a
Pfold: RNA secondary structure prediction using stochastic contextfree grammars
 Nucleic Acids Res
, 2003
"... RNA secondary structures are important in many biological processes and efficient structure prediction can give vital directions for experimental investigations. Many available programs for RNA secondary structure prediction only use a single sequence at a time. This may be sufficient in some applic ..."
Abstract

Cited by 213 (11 self)
 Add to MetaCart
(Show Context)
RNA secondary structures are important in many biological processes and efficient structure prediction can give vital directions for experimental investigations. Many available programs for RNA secondary structure prediction only use a single sequence at a time. This may be sufficient in some applications, but often it is possible to obtain related RNA sequences with conserved secondary structure. These should be included in structural analyses to give improved results. This work presents a practical way of predicting RNA secondary structure that is especially useful when related sequences can be obtained. The method improves a previous algorithm based on an explicit evolutionary model and a probabilistic model of structures. Predictions can be done on a web server at
RNA secondary structure prediction using stochastic contextfree grammars and evolutionary history
, 1999
"... Motivation: Many computerized methods for RNA secondary structure prediction have been developed. Few of these methods, however, employ an evolutionary model, thus relevant information is often left out from the structure determination. This paper introduces a method which incorporates evolutionary ..."
Abstract

Cited by 180 (16 self)
 Add to MetaCart
Motivation: Many computerized methods for RNA secondary structure prediction have been developed. Few of these methods, however, employ an evolutionary model, thus relevant information is often left out from the structure determination. This paper introduces a method which incorporates evolutionary history into RNA secondary structure prediction. The method reported here is based on stochastic contextfree grammars (SCFGs) to give a prior probability distribution of structures.