Results 1  10
of
52
A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood
, 2003
"... The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The ..."
Abstract

Cited by 2182 (27 self)
 Add to MetaCart
The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The core of this method is a simple hillclimbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distancebased method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximumlikelihood programs and much higher than the performance of distancebased and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximumlikelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distancebased and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page:
Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees.
 Mol Biol Evol
, 1993
"... Examining the pattern of nucleotide substitution for the control region of mitochondrial DNA ( mtDNA ) in humans and chimpanzees, we developed a new mathematical method for estimating the number of transitional and transversional substitutions per site, as well as the total number of nucleotide sub ..."
Abstract

Cited by 925 (4 self)
 Add to MetaCart
Examining the pattern of nucleotide substitution for the control region of mitochondrial DNA ( mtDNA ) in humans and chimpanzees, we developed a new mathematical method for estimating the number of transitional and transversional substitutions per site, as well as the total number of nucleotide substitutions. In this method, excess transitions, unequal nucleotide frequencies, and variation of substitution rate among different sites are all taken into account. Application of this method to human and chimpanzee data suggested that the transition / transversion ratio for the entire control region was 15 and nearly the same for the two species. The 95% confidence interval of the age of the common ancestral mtDNA was estimated to be 80,000480,000 years in humans and 0.572.72 Myr in common chimpanzees.
BEAST: Bayesian evolutionary analysis by sampling trees.
 BMC Evol. Biol.
, 2007
"... ..."
(Show Context)
Heterotachy and longbranch attraction in phylogenetics.
 BMC Evol. Biol.
, 2005
"... ..."
(Show Context)
Arlequin ver 3.1  An Integrated Software Package for Population Genetics Data Analysis
, 2006
"... ..."
A branchandbound algorithm for the inference of ancestral aminoacid sequences when the replacement rate varies among sites: Application to the . . .
, 2002
"... Motivation: We developed an algorithm to reconstruct ancestral sequences, taking into account the rate variation among sites of the protein sequences. Our algorithm maximizes the joint probability of the ancestral sequences, assuming that the rate is gamma distributed among sites. Our algorithm prov ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
Motivation: We developed an algorithm to reconstruct ancestral sequences, taking into account the rate variation among sites of the protein sequences. Our algorithm maximizes the joint probability of the ancestral sequences, assuming that the rate is gamma distributed among sites. Our algorithm provably finds the global maximum. The use of `joint' reconstruction is motivated by studies that use the sequences at all the internal nodes in a phylogenetic tree, such as, for instance, the inference of patterns of aminoacid replacement, or tracing the biochemical changes that occurred during the evolution of a given protein family...
From complete genomes to measures of substitution rate variability within and between proteins
 Genome Res
, 2000
"... Accumulation of complete genome sequences of diverse organisms creates new possibilities for evolutionary inferences from wholegenome comparisons. In the present study, we analyze the distributions of substitution rates among proteins encoded in 19 complete genomes (the interprotein rate distributi ..."
Abstract

Cited by 19 (11 self)
 Add to MetaCart
(Show Context)
Accumulation of complete genome sequences of diverse organisms creates new possibilities for evolutionary inferences from wholegenome comparisons. In the present study, we analyze the distributions of substitution rates among proteins encoded in 19 complete genomes (the interprotein rate distribution). To estimate these rates, it is necessary to employ another fundamental distribution, that of the substitution rates among sites in proteins (the intraprotein distribution). Using two independent approaches, we show that intraprotein substitution rate variability appears to be significantly greater than generally accepted. This yields more realistic estimates of evolutionary distances from aminoacid sequences, which is critical for evolutionarytree construction. We demonstrate that the interprotein rate distributions inferred from the genometogenome comparisons are similar to each other and can be approximated by a single distribution with a long exponential shoulder. This suggests that a generalized version of the molecular clock hypothesis may be valid on genome scale. We also use the scaling parameter of the obtained interprotein rate distribution to construct a rooted wholegenome phylogeny. The topology of the resulting tree is largely compatible with those of global rRNAbased trees and trees produced by other approaches to genomewide comparison. Multiple, complete genome sequences from taxonomically diverse species create unprecedented opportuni
Evolution and phylogeny of the Diptera: a molecular phylogenetic analysis using 28S rDNA
 Syst. Biol
, 1997
"... Abstract.—Portions of the large ribosomal subunit RNA gene (28S rDNA) encompassing the Dl and the D7 region were obtained from 16 dipteran species and families to reconstruct early phylogenetic events in the order Diptera. For outgroup comparison, the corresponding sequences were used from represent ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
Abstract.—Portions of the large ribosomal subunit RNA gene (28S rDNA) encompassing the Dl and the D7 region were obtained from 16 dipteran species and families to reconstruct early phylogenetic events in the order Diptera. For outgroup comparison, the corresponding sequences were used from representative taxa of the Siphonaptera, Mecoptera, and Lepidoptera. A subset of 488 unambiguously alignable sites was analyzed with respect to important sequence evolution parameters. We found (1) sequence variability is significantly higher in doublestranded sites than in singlestranded sites, (2) transitions are close to saturation in most pairwise sequence comparisons, (3) significant substitution rate heterogeneity exists across sites, and (4) significant substitution rate heterogeneity exists among lineages. Tree reconstruction was carried out with the neighbor joining, maximum parsimony, and maximum likelihood methods. Four major subgroups are consistently and robustly supported: the Brachycera, the Culicomorpha, the Tipulomorpha sensu stricto, and the hitherto controversial Bibionomorpha sensu lato, which includes the families Sciaridae, Mycetophilidae, Cecidomyiidae, Bibionidae, Scatopsidae, and Anisopodidae. The phylogenetic relationships within or among these subclades and the positions of the families Psychodidae and Trichoceridae were not robustly resolved. These results support the view that the
Unidentifiable divergence times in ratesacrosssites models
 IEEE/ACM Trans. Comput. Biol. Bioinform
, 2004
"... Abstract. The rates–across–sites assumption in phylogenetic inference posits that the rate matrix governing the Markovian evolution of a character on an edge of the putative phylogenetic tree is the product of a characterspecific scale factor and a rate matrix that is particular to that edge. Thus, ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
Abstract. The rates–across–sites assumption in phylogenetic inference posits that the rate matrix governing the Markovian evolution of a character on an edge of the putative phylogenetic tree is the product of a characterspecific scale factor and a rate matrix that is particular to that edge. Thus, evolution follows basically the same process for all characters, except that it occurs faster for some characters than others. To allow estimation of tree topologies and edge lengths for such models, it is commonly assumed that the scale factors are not arbitrary unknown constants, but rather unobserved, independent, identically distributed draws from a member of some parametric family of distributions. A popular choice is the gamma family. We consider an example of a clocklike tree with three taxa, one unknown edge length, a known root state, and a parametric family of scale factor distributions that contain the gamma family. This model has the property that, for a generic choice of unknown edge length and scale factor distribution, there is another edge length and scale factor distribution which generates data with exactly the same distribution, so that even with infinitely many data it will be typically impossible to make correct inferences about the unknown edge length. 1.
Mathematical modeling for functional divergence after gene duplication
 J Comput Biol
"... In this paper, I present a statistical framework for modeling the functional divergence after gene duplication. A ratecomponent model to describe the rate covariation among homologous genes of a gene family is implemented when a phylogenetic tree is known. The Markov chain model is rigorous but may ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
In this paper, I present a statistical framework for modeling the functional divergence after gene duplication. A ratecomponent model to describe the rate covariation among homologous genes of a gene family is implemented when a phylogenetic tree is known. The Markov chain model is rigorous but may require a huge amount of computational time when the number of sequences is large. On the other hand, the Poissonbased model is mathematically analytical so that computation is very fast even for a large dataset. Moreover, under the posterior framework, we have developed a sitespeci � c pro � le for predicting important amino acid residues responsible for these functional differences between member genes of a gene family. Our study may have great potential for functional genomics because it is costeffective, and these predictions can be further tested by biological experimentation. Key words: functional divergence, gene duplication, Markov chain model, Poissonbased model, posterior prediction.