Results 1 - 10
of
1,201
Choosing BLAST options for better detection of orthologs as reciprocal best hits
- PAGE 12 OF 12 at Pennsylvania State U niversity on M arch 1, 2014 http://nar.oxfordjournals.org/ D ow nloaded from
, 2008
"... Motivation: The analyses of the increasing number of genome sequences requires shortcuts for the detection of orthologs, such as Reciprocal Best Hits (RBH), where orthologs are assumed if two genes each in a different genome find each other as the best hit in the other genome. Two BLAST options seem ..."
Abstract
-
Cited by 83 (6 self)
- Add to MetaCart
Motivation: The analyses of the increasing number of genome sequences requires shortcuts for the detection of orthologs, such as Reciprocal Best Hits (RBH), where orthologs are assumed if two genes each in a different genome find each other as the best hit in the other genome. Two BLAST options seem to affect alignment scores the most, and thus the choice of a best hit: the filtering of low information sequence segments and the algorithm used to produce the final alignment. Thus, we decided to test whether such options would help better detect orthologs. Results: Using Escherichia coli K12 as an example, we compared the number and quality of orthologs detected as RBH. We tested four different conditions derived from two options: filtering of low-information segments, hard (default) versus soft; and alignment algorithm, default (based on matching words) versus Smith-Waterman. All options resulted in significant differences in the number of orthologs detected, with the highest numbers obtained with the combination of soft filtering with Smith-Waterman alignments. We compared these results with those of Reciprocal Shortest Distances (RSD), supposed to be superior to RBH because it uses an evolutionary measure of distance, rather than BLAST statistics, to rank homologs and thus detect orthologs. RSD barely increased the number of orthologs detected over those found with RBH. Error estimates, based on analyses of conservation of gene order, found small differences in the quality of orthologs detected using RBH. However, RSD showed the highest error rates. Thus, RSD have no advantages over RBH.
2008. Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage. Molecular Biology and Evolution
"... niversity on M ..."
(Show Context)
JB: The Population Genetics of dN/dS
- Plos Genet
"... Evolutionary pressures on proteins are often quantified by the ratio of substitution rates at non-synonymous and synonymous sites. The dN/dS ratio was originally developed for application to distantly diverged sequences, the differences among which represent substitutions that have fixed along indep ..."
Abstract
-
Cited by 48 (0 self)
- Add to MetaCart
(Show Context)
Evolutionary pressures on proteins are often quantified by the ratio of substitution rates at non-synonymous and synonymous sites. The dN/dS ratio was originally developed for application to distantly diverged sequences, the differences among which represent substitutions that have fixed along independent lineages. Nevertheless, the dN/dS measure is often applied to sequences sampled from a single population, the differences among which represent segregating polymorphisms. Here, we study the expected dN/dS ratio for samples drawn from a single population under selection, and we find that in this context, dN/dS is relatively insensitive to the selection coefficient. Moreover, the hallmark signature of positive selection over divergent lineages, dN/dS.1, is violated within a population. For population samples, the relationship between selection and dN/dS does not follow a monotonic function, and so it may be impossible to infer selection pressures from dN/dS. These results have significant implications for the interpretation of dN/dS measurements among population-genetic samples.
The Effect of Insertions, Deletions, and Alignment Errors on the Branch-Site Test of Positive Selection
"... The detection of positive Darwinian selection affecting protein-coding genes remains a topic of great interest and importance. The ‘‘branch-site’ ’ test is designed to detect localized episodic bouts of positive selection that affect only a few amino acid residues on particular lineages and has been ..."
Abstract
-
Cited by 42 (2 self)
- Add to MetaCart
(Show Context)
The detection of positive Darwinian selection affecting protein-coding genes remains a topic of great interest and importance. The ‘‘branch-site’ ’ test is designed to detect localized episodic bouts of positive selection that affect only a few amino acid residues on particular lineages and has been shown to have reasonable power and low false-positive rates for a wide range of selection schemes. Previous simulations examining the performance of the test, however, were conducted under idealized conditions without insertions, deletions, or alignment errors. As the test is sometimes used to analyze divergent sequences, the impact of indels and alignment errors is a major concern. Here, we used a recently developed indelsimulation program to examine the false-positive rate and power of the branch-site test. We find that insertions and deletions do not cause excessive false positives if the alignment is correct, but alignment errors can lead to unacceptably high false positives. Of the alignment methods evaluated, PRANK consistently outperformed MUSCLE, MAFFT, and ClustalW, mostly because the latter programs tend to place nonhomologous codons (or amino acids) into the same column, producing shorter and less accurate alignments and giving the false impression that many amino acid substitutions have occurred at those sites. Our examination of two previous studies suggests that alignment errors may impact the analysis of mammalian and vertebrate genes by the branch-site test, and it is important to use reliable alignment methods.
Hotspots of biased nucleotide substitutions in human genes
- PLoS Biol
, 2009
"... Genes that have experienced accelerated evolutionary rates on the human lineage during recent evolution are candidates for involvement in human-specific adaptations. To determine the forces that cause increased evolutionary rates in certain genes, we analyzed alignments of 10,238 human genes to thei ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
(Show Context)
Genes that have experienced accelerated evolutionary rates on the human lineage during recent evolution are candidates for involvement in human-specific adaptations. To determine the forces that cause increased evolutionary rates in certain genes, we analyzed alignments of 10,238 human genes to their orthologues in chimpanzee and macaque. Using a likelihood ratio test, we identified protein-coding sequences with an accelerated rate of base substitutions along the human lineage. Exons evolving at a fast rate in humans have a significant tendency to contain clusters of AT-to-GC (weak-to-strong) biased substitutions. This pattern is also observed in noncoding sequence flanking rapidly evolving exons. Accelerated exons occur in regions with elevated male recombination rates and exhibit an excess of nonsynonymous substitutions relative to the genomic average. We next analyzed genes with significantly elevated ratios of nonsynonymous to synonymous rates of base substitution (dN/dS) along the human lineage, and those with an excess of amino acid replacement substitutions relative to human polymorphism. These genes also show evidence of clusters of weak-to-strong biased substitutions. These findings indicate that a recombination-associated process, such as biased gene conversion (BGC), is driving fixation of GC alleles in the human genome. This process can lead to accelerated evolution in coding sequences and excess amino acid replacement substitutions, thereby generating significant results for tests of positive selection.
Kang L: De novo analysis of transcriptome dynamics in the migratory locust during the development of phase traits
- PLoS ONE
"... Locusts exhibit remarkable density-dependent phenotype (phase) changes from the solitary to the gregarious, making them one of the most destructive agricultural pests. This phenotype polyphenism arises from a single genome and diverse transcriptomes in different conditions. Here we report a de novo ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
(Show Context)
Locusts exhibit remarkable density-dependent phenotype (phase) changes from the solitary to the gregarious, making them one of the most destructive agricultural pests. This phenotype polyphenism arises from a single genome and diverse transcriptomes in different conditions. Here we report a de novo transcriptome for the migratory locust and a comprehensive, representative core gene set. We carried out assembly of 21.5 Gb Illumina reads, generated 72,977 transcripts with N50 2,275 bp and identified 11,490 locust protein-coding genes. Comparative genomics analysis with eight other sequenced insects was carried out to indentify the genomic divergence between hemimetabolous and holometabolous insects for the first time and 18 genes relevant to development was found. We further utilized the quantitative feature of RNA-seq to measure and compare gene expression among libraries. We first discovered how divergence in gene expression between two phases progresses as locusts develop and identified 242 transcripts as candidates for phase marker genes. Together with the detailed analysis of deep sequencing data of the 4 th instar, we discovered a phase-dependent divergence of biological investment in the molecular level. Solitary locusts have higher activity in biosynthetic pathways while gregarious locusts show higher activity in environmental interaction, in which genes and pathways associated with regulation of neurotransmitter activities, such as neurotransmitter receptors, synthetase, transporters, and GPCR signaling pathways, are strongly involved. Our study, as the largest de novo transcriptome to date,
The Accuracy of Species Tree Estimation under Simulation: A Comparison of Methods
, 2010
"... Numerous simulation studies have investigated the accuracy of phylogenetic inference of gene trees under max-imum parsimony, maximum likelihood, and Bayesian techniques. The relative accuracy of species tree inference methods under simulation has received less study. The number of analytical techni ..."
Abstract
-
Cited by 28 (3 self)
- Add to MetaCart
Numerous simulation studies have investigated the accuracy of phylogenetic inference of gene trees under max-imum parsimony, maximum likelihood, and Bayesian techniques. The relative accuracy of species tree inference methods under simulation has received less study. The number of analytical techniques available for inferring species trees is in-creasing rapidly, and in this paper, we compare the performance of several species tree inference techniques at estimating recent species divergences using computer simulation. Simulating gene trees within species trees of different shapes and with varying tree lengths (T) and population sizes (θ), and evolving sequences on those gene trees, allows us to determine how phylogenetic accuracy changes in relation to different levels of deep coalescence and phylogenetic signal. When the probability of discordance between the gene trees and the species tree is high (i.e., T is small and/or θ is large), Bayesian species tree inference using the multispecies coalescent (BEST) outperforms other methods. The performance of all methods improves as the total length of the species tree is increased, which reflects the combined benefits of decreasing the proba-bility of discordance between species trees and gene trees and gaining more accurate estimates for gene trees. Decreasing the probability of deep coalescences by reducing θ also leads to accuracy gains for most methods. Increasing the number of loci from 10 to 100 improves accuracy under difficult demographic scenarios (i.e., coalescent units 4Ne), but 10 loci are adequate for estimating the correct species tree in cases where deep coalescence is limited or absent. In general, the
Rapid evolution and selection inferred from the transcriptomes of sympatric crater lake cichlid fishes,
- Mol. Ecol.,
, 2010
"... Abstract Crater lakes provide a natural laboratory to study speciation of cichlid fishes by ecological divergence. Up to now, there has been a dearth of transcriptomic and genomic information that would aid in understanding the molecular basis of the phenotypic differentiation between young species ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
(Show Context)
Abstract Crater lakes provide a natural laboratory to study speciation of cichlid fishes by ecological divergence. Up to now, there has been a dearth of transcriptomic and genomic information that would aid in understanding the molecular basis of the phenotypic differentiation between young species. We used next-generation sequencing (Roche 454 massively parallel pyrosequencing) to characterize the diversity of expressed sequence tags between ecologically divergent, endemic and sympatric species of cichlid fishes from crater lake Apoyo, Nicaragua: benthic Amphilophus astorquii and limnetic Amphilophus zaliosus. We obtained 24 174 A. astorquii and 21 382 A. zaliosus highquality expressed sequence tag contigs, of which 13 106 pairs are orthologous between species. Based on the ratio of nonsynonymous to synonymous substitutions, we identified six sequences exhibiting signals of strong diversifying selection (K a ⁄ K s > 1). These included genes involved in biosynthesis, metabolic processes and development. This transcriptome sequence variation may be reflective of natural selection acting on the genomes of these young, sympatric sister species. Based on Ks ratios and p-distances between 3¢-untranslated regions (UTRs) calibrated to previously published species divergence times, we estimated a neutral transcriptome-wide substitutional mutation rate of 1.25 · 10 )6 per site per year. We conclude that next-generation sequencing technologies allow us to infer natural selection acting to diversify the genomes of young species, such as crater lake cichlids, with much greater scope than previously possible.
Species-specific activity of HIV-1 Vpu and positive selection of tetherin transmembrane domain variants. PLoS Pathog
, 2009
"... Tetherin/BST-2/CD317 is a recently identified antiviral protein that blocks the release of nascent retrovirus, and other virus, particles from infected cells. An HIV-1 accessory protein, Vpu, acts as an antagonist of tetherin. Here, we show that positive selection is evident in primate tetherin sequ ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
(Show Context)
Tetherin/BST-2/CD317 is a recently identified antiviral protein that blocks the release of nascent retrovirus, and other virus, particles from infected cells. An HIV-1 accessory protein, Vpu, acts as an antagonist of tetherin. Here, we show that positive selection is evident in primate tetherin sequences and that HIV-1 Vpu appears to have specifically adapted to antagonize variants of tetherin found in humans and chimpanzees. Tetherin variants found in rhesus macaques (rh), African green monkeys (agm) and mice were able to inhibit HIV-1 particle release, but were resistant to antagonism by HIV-1 Vpu. Notably, reciprocal exchange of transmembrane domains between human and monkey tetherins conferred sensitivity and resistance to Vpu, identifying this protein domain as a critical determinant of Vpu function. Indeed, differences between hu-tetherin and rh-tetherin at several positions in the transmembrane domain affected sensitivity to antagonism by Vpu. Two alterations in the hu-tetherin transmembrane domain, that correspond to differences found in rh- and agm-tetherin proteins, were sufficient to render hu-tetherin completely resistant to HIV-1 Vpu. Interestingly, transmembrane and cytoplasmic domain sequences in primate tetherins exhibit variation at numerous codons that is likely the result of positive selection, and some of these changes coincide with determinants of HIV-1 Vpu sensitivity. Overall, these data indicate that tetherin could impose a barrier to viral zoonosis as a consequence of positive selection that has been driven by ancient viral antagonists, and that the HIV-1 Vpu protein has specialized to target the transmembrane domains found in human/
Dating Primate Divergences through an Integrated Analysis of Palaeontological and Molecular Data
"... Abstract.—Estimation of divergence times is usually done using either the fossil record or sequence data from modern species. We provide an integrated analysis of palaeontological and molecular data to give estimates of primate divergence times that utilize both sources of information. The number of ..."
Abstract
-
Cited by 22 (4 self)
- Add to MetaCart
(Show Context)
Abstract.—Estimation of divergence times is usually done using either the fossil record or sequence data from modern species. We provide an integrated analysis of palaeontological and molecular data to give estimates of primate divergence times that utilize both sources of information. The number of preserved primate species discovered in the fossil record, along with their geological age distribution, is combined with the number of extant primate species to provide initial estimates of the primate and anthropoid divergence times. This is done by using a stochastic forwards-modeling approach where speciation and fossil preservation and discovery are simulated forward in time. We use the posterior distribution from the fossil analysis as a prior distribution on node ages in a molecular analysis. Sequence data from two genomic