• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

PAML 4: phylogenetic analysis by maximum likelihood. (2007)

by Z Yang
Venue:Mol Biol Evol
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 1,201
Next 10 →

Choosing BLAST options for better detection of orthologs as reciprocal best hits

by Gabriel Moreno-hagelsieb, Kristen Latimer - PAGE 12 OF 12 at Pennsylvania State U niversity on M arch 1, 2014 http://nar.oxfordjournals.org/ D ow nloaded from , 2008
"... Motivation: The analyses of the increasing number of genome sequences requires shortcuts for the detection of orthologs, such as Reciprocal Best Hits (RBH), where orthologs are assumed if two genes each in a different genome find each other as the best hit in the other genome. Two BLAST options seem ..."
Abstract - Cited by 83 (6 self) - Add to MetaCart
Motivation: The analyses of the increasing number of genome sequences requires shortcuts for the detection of orthologs, such as Reciprocal Best Hits (RBH), where orthologs are assumed if two genes each in a different genome find each other as the best hit in the other genome. Two BLAST options seem to affect alignment scores the most, and thus the choice of a best hit: the filtering of low information sequence segments and the algorithm used to produce the final alignment. Thus, we decided to test whether such options would help better detect orthologs. Results: Using Escherichia coli K12 as an example, we compared the number and quality of orthologs detected as RBH. We tested four different conditions derived from two options: filtering of low-information segments, hard (default) versus soft; and alignment algorithm, default (based on matching words) versus Smith-Waterman. All options resulted in significant differences in the number of orthologs detected, with the highest numbers obtained with the combination of soft filtering with Smith-Waterman alignments. We compared these results with those of Reciprocal Shortest Distances (RSD), supposed to be superior to RBH because it uses an evolutionary measure of distance, rather than BLAST statistics, to rank homologs and thus detect orthologs. RSD barely increased the number of orthologs detected over those found with RBH. Error estimates, based on analyses of conservation of gene order, found small differences in the quality of orthologs detected using RBH. However, RSD showed the highest error rates. Thus, RSD have no advantages over RBH.

2008. Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage. Molecular Biology and Evolution

by Ziheng Yang, Rasmus Nielsen, Ziheng Yang
"... niversity on M ..."
Abstract - Cited by 56 (2 self) - Add to MetaCart
niversity on M
(Show Context)

Citation Context

...el and FMutSel0 models developed in this paper aresimplemented independently by the two authors for error checking.sAll models described in this papersare implemented in the CODEML program in PAML 4 (=-=Yang 2007-=-).sAcknowledgements.sWe thank three referees for many useful comments.sThis study is supportedsby a grant from the Biotechnological and Biological Sciences Research Council (BBSRC) to Z.Y.,sand grants...

JB: The Population Genetics of dN/dS

by Sergey Kryazhimskiy, Joshua B. Plotkin - Plos Genet
"... Evolutionary pressures on proteins are often quantified by the ratio of substitution rates at non-synonymous and synonymous sites. The dN/dS ratio was originally developed for application to distantly diverged sequences, the differences among which represent substitutions that have fixed along indep ..."
Abstract - Cited by 48 (0 self) - Add to MetaCart
Evolutionary pressures on proteins are often quantified by the ratio of substitution rates at non-synonymous and synonymous sites. The dN/dS ratio was originally developed for application to distantly diverged sequences, the differences among which represent substitutions that have fixed along independent lineages. Nevertheless, the dN/dS measure is often applied to sequences sampled from a single population, the differences among which represent segregating polymorphisms. Here, we study the expected dN/dS ratio for samples drawn from a single population under selection, and we find that in this context, dN/dS is relatively insensitive to the selection coefficient. Moreover, the hallmark signature of positive selection over divergent lineages, dN/dS.1, is violated within a population. For population samples, the relationship between selection and dN/dS does not follow a monotonic function, and so it may be impossible to infer selection pressures from dN/dS. These results have significant implications for the interpretation of dN/dS measurements among population-genetic samples.
(Show Context)

Citation Context

...ical work on the relationship between dN/dS and selection likewise assumes that sequences are sampled from independent, divergent species [3], as do computer packages used to estimate dN/dS from data =-=[6,7]-=-. Nonetheless, the dN/dS ratio test is frequently applied to data that may represent samples from a single population, particularly in the case of microbes (e.g. [3,8–18]). In such cases, the differen...

The Effect of Insertions, Deletions, and Alignment Errors on the Branch-Site Test of Positive Selection

by William Fletcher, Ziheng Yang
"... The detection of positive Darwinian selection affecting protein-coding genes remains a topic of great interest and importance. The ‘‘branch-site’ ’ test is designed to detect localized episodic bouts of positive selection that affect only a few amino acid residues on particular lineages and has been ..."
Abstract - Cited by 42 (2 self) - Add to MetaCart
The detection of positive Darwinian selection affecting protein-coding genes remains a topic of great interest and importance. The ‘‘branch-site’ ’ test is designed to detect localized episodic bouts of positive selection that affect only a few amino acid residues on particular lineages and has been shown to have reasonable power and low false-positive rates for a wide range of selection schemes. Previous simulations examining the performance of the test, however, were conducted under idealized conditions without insertions, deletions, or alignment errors. As the test is sometimes used to analyze divergent sequences, the impact of indels and alignment errors is a major concern. Here, we used a recently developed indelsimulation program to examine the false-positive rate and power of the branch-site test. We find that insertions and deletions do not cause excessive false positives if the alignment is correct, but alignment errors can lead to unacceptably high false positives. Of the alignment methods evaluated, PRANK consistently outperformed MUSCLE, MAFFT, and ClustalW, mostly because the latter programs tend to place nonhomologous codons (or amino acids) into the same column, producing shorter and less accurate alignments and giving the false impression that many amino acid substitutions have occurred at those sites. Our examination of two previous studies suggests that alignment errors may impact the analysis of mammalian and vertebrate genes by the branch-site test, and it is important to use reliable alignment methods.
(Show Context)

Citation Context

...stationary codon frequencies are those calculated from the base compositions at the three codon positions in a data set of five a and b mammalian globin gene sequences (data set abglobin.nuc in PAML; =-=Yang 2007-=-). As in Zhang et al. (2005), the simulation model assumes ten site classes. The foreground branch always uses 2258 0.4 0.3 0.2 0.1 0.0 0.4 0.3 0.2 0.1 0.0 FIG. 1. Two model trees used in computer sim...

Hotspots of biased nucleotide substitutions in human genes

by Jonas Berglund, Katherine S. Pollard, Matthew T. Webster - PLoS Biol , 2009
"... Genes that have experienced accelerated evolutionary rates on the human lineage during recent evolution are candidates for involvement in human-specific adaptations. To determine the forces that cause increased evolutionary rates in certain genes, we analyzed alignments of 10,238 human genes to thei ..."
Abstract - Cited by 32 (0 self) - Add to MetaCart
Genes that have experienced accelerated evolutionary rates on the human lineage during recent evolution are candidates for involvement in human-specific adaptations. To determine the forces that cause increased evolutionary rates in certain genes, we analyzed alignments of 10,238 human genes to their orthologues in chimpanzee and macaque. Using a likelihood ratio test, we identified protein-coding sequences with an accelerated rate of base substitutions along the human lineage. Exons evolving at a fast rate in humans have a significant tendency to contain clusters of AT-to-GC (weak-to-strong) biased substitutions. This pattern is also observed in noncoding sequence flanking rapidly evolving exons. Accelerated exons occur in regions with elevated male recombination rates and exhibit an excess of nonsynonymous substitutions relative to the genomic average. We next analyzed genes with significantly elevated ratios of nonsynonymous to synonymous rates of base substitution (dN/dS) along the human lineage, and those with an excess of amino acid replacement substitutions relative to human polymorphism. These genes also show evidence of clusters of weak-to-strong biased substitutions. These findings indicate that a recombination-associated process, such as biased gene conversion (BGC), is driving fixation of GC alleles in the human genome. This process can lead to accelerated evolution in coding sequences and excess amino acid replacement substitutions, thereby generating significant results for tests of positive selection.
(Show Context)

Citation Context

...i.e., has an accelerated substitution rate). We used an LRT to identify exons with statistically significant substitution rate acceleration on the human branch [1]. We used the codeml program of PAML =-=[62]-=- with F3x4 codon frequencies and the Goldman and Yang [53] model of codon substitution to infer the pattern of synonymous and nonsynonymous nucleotide substitutions at each gene on the human and chimp...

Kang L: De novo analysis of transcriptome dynamics in the migratory locust during the development of phase traits

by Shuang Chen, Pengcheng Yang, Feng Jiang, Yuanyuan Wei, Zongyuan Ma, Le Kang - PLoS ONE
"... Locusts exhibit remarkable density-dependent phenotype (phase) changes from the solitary to the gregarious, making them one of the most destructive agricultural pests. This phenotype polyphenism arises from a single genome and diverse transcriptomes in different conditions. Here we report a de novo ..."
Abstract - Cited by 28 (2 self) - Add to MetaCart
Locusts exhibit remarkable density-dependent phenotype (phase) changes from the solitary to the gregarious, making them one of the most destructive agricultural pests. This phenotype polyphenism arises from a single genome and diverse transcriptomes in different conditions. Here we report a de novo transcriptome for the migratory locust and a comprehensive, representative core gene set. We carried out assembly of 21.5 Gb Illumina reads, generated 72,977 transcripts with N50 2,275 bp and identified 11,490 locust protein-coding genes. Comparative genomics analysis with eight other sequenced insects was carried out to indentify the genomic divergence between hemimetabolous and holometabolous insects for the first time and 18 genes relevant to development was found. We further utilized the quantitative feature of RNA-seq to measure and compare gene expression among libraries. We first discovered how divergence in gene expression between two phases progresses as locusts develop and identified 242 transcripts as candidates for phase marker genes. Together with the detailed analysis of deep sequencing data of the 4 th instar, we discovered a phase-dependent divergence of biological investment in the molecular level. Solitary locusts have higher activity in biosynthetic pathways while gregarious locusts show higher activity in environmental interaction, in which genes and pathways associated with regulation of neurotransmitter activities, such as neurotransmitter receptors, synthetase, transporters, and GPCR signaling pathways, are strongly involved. Our study, as the largest de novo transcriptome to date,
(Show Context)

Citation Context

...alignment). A Bayesian Markov chain Monte Carlo method [63] was implemented using a CAT-BP model to construct the phylogenic tree (see Methods S1). Positive selection. An improved branch-site model A =-=[64,65]-=- was used to extrapolate the sites under positive selection (dN/dS.1 in one lineage and dN/dS , = 1 in other lineages). The outgroup was removed prior to analysis. Positive selection criteria: p-value...

The Accuracy of Species Tree Estimation under Simulation: A Comparison of Methods

by Adam D. Leaché, Bruce Rannala , 2010
"... Numerous simulation studies have investigated the accuracy of phylogenetic inference of gene trees under max-imum parsimony, maximum likelihood, and Bayesian techniques. The relative accuracy of species tree inference methods under simulation has received less study. The number of analytical techni ..."
Abstract - Cited by 28 (3 self) - Add to MetaCart
Numerous simulation studies have investigated the accuracy of phylogenetic inference of gene trees under max-imum parsimony, maximum likelihood, and Bayesian techniques. The relative accuracy of species tree inference methods under simulation has received less study. The number of analytical techniques available for inferring species trees is in-creasing rapidly, and in this paper, we compare the performance of several species tree inference techniques at estimating recent species divergences using computer simulation. Simulating gene trees within species trees of different shapes and with varying tree lengths (T) and population sizes (θ), and evolving sequences on those gene trees, allows us to determine how phylogenetic accuracy changes in relation to different levels of deep coalescence and phylogenetic signal. When the probability of discordance between the gene trees and the species tree is high (i.e., T is small and/or θ is large), Bayesian species tree inference using the multispecies coalescent (BEST) outperforms other methods. The performance of all methods improves as the total length of the species tree is increased, which reflects the combined benefits of decreasing the proba-bility of discordance between species trees and gene trees and gaining more accurate estimates for gene trees. Decreasing the probability of deep coalescences by reducing θ also leads to accuracy gains for most methods. Increasing the number of loci from 10 to 100 improves accuracy under difficult demographic scenarios (i.e., coalescent units 4Ne), but 10 loci are adequate for estimating the correct species tree in cases where deep coalescence is limited or absent. In general, the

Rapid evolution and selection inferred from the transcriptomes of sympatric crater lake cichlid fishes,

by K R Elmer , S Fan , H M Gunter , J C Jones , S Boekhoff , S Kuraku , A Meyer - Mol. Ecol., , 2010
"... Abstract Crater lakes provide a natural laboratory to study speciation of cichlid fishes by ecological divergence. Up to now, there has been a dearth of transcriptomic and genomic information that would aid in understanding the molecular basis of the phenotypic differentiation between young species ..."
Abstract - Cited by 27 (0 self) - Add to MetaCart
Abstract Crater lakes provide a natural laboratory to study speciation of cichlid fishes by ecological divergence. Up to now, there has been a dearth of transcriptomic and genomic information that would aid in understanding the molecular basis of the phenotypic differentiation between young species. We used next-generation sequencing (Roche 454 massively parallel pyrosequencing) to characterize the diversity of expressed sequence tags between ecologically divergent, endemic and sympatric species of cichlid fishes from crater lake Apoyo, Nicaragua: benthic Amphilophus astorquii and limnetic Amphilophus zaliosus. We obtained 24 174 A. astorquii and 21 382 A. zaliosus highquality expressed sequence tag contigs, of which 13 106 pairs are orthologous between species. Based on the ratio of nonsynonymous to synonymous substitutions, we identified six sequences exhibiting signals of strong diversifying selection (K a ⁄ K s > 1). These included genes involved in biosynthesis, metabolic processes and development. This transcriptome sequence variation may be reflective of natural selection acting on the genomes of these young, sympatric sister species. Based on Ks ratios and p-distances between 3¢-untranslated regions (UTRs) calibrated to previously published species divergence times, we estimated a neutral transcriptome-wide substitutional mutation rate of 1.25 · 10 )6 per site per year. We conclude that next-generation sequencing technologies allow us to infer natural selection acting to diversify the genomes of young species, such as crater lake cichlids, with much greater scope than previously possible.
(Show Context)

Citation Context

... acid), then downstream of the stop codon was considered a ‘true 3¢-UTR’. If the number of base pairs between the coding region and the stop codon was not divisible by three, then downstream of the ORF was considered a ‘pseudo-UTR’ and excluded from further analyses.Estimating substitution rates We estimated the rate of nonsynonymous substitutions per nonsynonymous site (Ka) to the number of synonymous substitutions per synonymous site (Ks) between putatively orthologous coding regions using a maximum-likelihood method (Yang & Nielsen 2000) implemented by yn00 in the PAML toolkit (vers. 4.0) (Yang 2007). Orthologous ESTs with a Ks rate >0.1 were excluded from further analyses to avoid analysing paralogous genes (Bustamante et al. 2005).Estimating the overall substitutional mutation rate We estimated an overall substitution rate for the cichlid genome based on divergence between orthologous EST pairs (entire EST, including coding region and UTRs > 50 bp long) and synonymous mutations calibrated with a maximum age of crater Lake Apoyo (Kutterolf et al. 2007). Only UTRs contiguous with orthologous coding regions were used in distance calculations to avoid including artefacts of assembly. The ra...

Species-specific activity of HIV-1 Vpu and positive selection of tetherin transmembrane domain variants. PLoS Pathog

by Matthew W. Mcnatt, Trinity Zang, Theodora Hatziioannou, Mackenzie Bartlett, Ismael Ben Fofana, Welkin E. Johnson , 2009
"... Tetherin/BST-2/CD317 is a recently identified antiviral protein that blocks the release of nascent retrovirus, and other virus, particles from infected cells. An HIV-1 accessory protein, Vpu, acts as an antagonist of tetherin. Here, we show that positive selection is evident in primate tetherin sequ ..."
Abstract - Cited by 24 (2 self) - Add to MetaCart
Tetherin/BST-2/CD317 is a recently identified antiviral protein that blocks the release of nascent retrovirus, and other virus, particles from infected cells. An HIV-1 accessory protein, Vpu, acts as an antagonist of tetherin. Here, we show that positive selection is evident in primate tetherin sequences and that HIV-1 Vpu appears to have specifically adapted to antagonize variants of tetherin found in humans and chimpanzees. Tetherin variants found in rhesus macaques (rh), African green monkeys (agm) and mice were able to inhibit HIV-1 particle release, but were resistant to antagonism by HIV-1 Vpu. Notably, reciprocal exchange of transmembrane domains between human and monkey tetherins conferred sensitivity and resistance to Vpu, identifying this protein domain as a critical determinant of Vpu function. Indeed, differences between hu-tetherin and rh-tetherin at several positions in the transmembrane domain affected sensitivity to antagonism by Vpu. Two alterations in the hu-tetherin transmembrane domain, that correspond to differences found in rh- and agm-tetherin proteins, were sufficient to render hu-tetherin completely resistant to HIV-1 Vpu. Interestingly, transmembrane and cytoplasmic domain sequences in primate tetherins exhibit variation at numerous codons that is likely the result of positive selection, and some of these changes coincide with determinants of HIV-1 Vpu sensitivity. Overall, these data indicate that tetherin could impose a barrier to viral zoonosis as a consequence of positive selection that has been driven by ancient viral antagonists, and that the HIV-1 Vpu protein has specialized to target the transmembrane domains found in human/
(Show Context)

Citation Context

...maximum likelihood scores for the two models under comparison and then finding the chi-square critical value. For a recent discussion of these models and their evaluation using the LRT, see reference =-=[24]-=-. d.f. - degrees of freedom. doi:10.1371/journal.ppat.1000300.t001 Positive Selection in Tetherin and Vpu Activity PLoS Pathogens | www.plospathogens.org 9 February 2009 | Volume 5 | Issue 2 | e100030...

Dating Primate Divergences through an Integrated Analysis of Palaeontological and Molecular Data

by Richard D. Wilkinson, Michael E. Steiper, Christophe Soligo, Robert D. Martin, Ziheng Yang, Simon Tavaré
"... Abstract.—Estimation of divergence times is usually done using either the fossil record or sequence data from modern species. We provide an integrated analysis of palaeontological and molecular data to give estimates of primate divergence times that utilize both sources of information. The number of ..."
Abstract - Cited by 22 (4 self) - Add to MetaCart
Abstract.—Estimation of divergence times is usually done using either the fossil record or sequence data from modern species. We provide an integrated analysis of palaeontological and molecular data to give estimates of primate divergence times that utilize both sources of information. The number of preserved primate species discovered in the fossil record, along with their geological age distribution, is combined with the number of extant primate species to provide initial estimates of the primate and anthropoid divergence times. This is done by using a stochastic forwards-modeling approach where speciation and fossil preservation and discovery are simulated forward in time. We use the posterior distribution from the fossil analysis as a prior distribution on node ages in a molecular analysis. Sequence data from two genomic
(Show Context)

Citation Context

...(G. gorilla, P. pygmaeus, C. aethiops, C. moloch, M. murinus, and O. garnettii). Estimation of Divergence Times The two loci are analyzed jointly using the Bayesian MCMC program mcmctree in PAML 4.2 (=-=Yang 2007-=-). The HKY+Γ5 model (Hasegawa et al. 1985; Yang 1994) was used, with different transition/transversion rate ratios (κ), different base frequencies, and different gamma shape parameter α for the two lo...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University