Results 1 - 10
of
58
Delimiting Species without Monophyletic Gene Trees
"... Abstract. — Genetic data are frequently used to delimit species, where species status is determined on the basis of an exclusivity criterium, such as reciprocal monophyly. Not only are there numerous empirical examples of incongruence between the boundaries inferred from such data compared to other ..."
Abstract
-
Cited by 47 (3 self)
- Add to MetaCart
Abstract. — Genetic data are frequently used to delimit species, where species status is determined on the basis of an exclusivity criterium, such as reciprocal monophyly. Not only are there numerous empirical examples of incongruence between the boundaries inferred from such data compared to other sources like morphology—especially with recently derived species, but population genetic theory also clearly shows that an inevitable bias in species status results because genetic thresholds do not explicitly take into account how the timing of speciation influences patterns of genetic differentiation. This study represents a fundamental shift in how genetic data might be used to delimit species. Rather than equating gene trees with a species tree or basing species status on some genetic threshold, the relationship between the gene trees and the species history is modeled probabilistically. Here we show that the same theory that is used to calculate the probability of reciprocal monophyly can also be used to delimit species despite widespread incomplete lineage sorting. The results from a preliminary simulation study suggest that very recently derived species can be accurately identified long before the requisite time for reciprocal monophyly to be achieved following speciation. The study also indicates the importance of sampling, both with regards to loci and individuals. Withstanding a thorough investigation into the conditions under which the coalescent-based approach will be effective, namely how the timing of divergence relative to the effective population size of species affects accurate species delimitation, the results are nevertheless consistent with other recent studies (aimed at inferring species relationships), showing that despite the lack of monophyletic gene trees, a signal of species divergence persists and can
Widespread discordance of gene trees with species tree in Drosophila: evidence for incomplete lineage sorting
- PLOS GENET
, 2006
"... The phylogenetic relationship of the now fully sequenced species Drosophila erecta and D. yakuba with respect to the D. melanogaster species complex has been a subject of controversy. All three possible groupings of the species have been reported in the past, though recent multi-gene studies suggest ..."
Abstract
-
Cited by 39 (0 self)
- Add to MetaCart
The phylogenetic relationship of the now fully sequenced species Drosophila erecta and D. yakuba with respect to the D. melanogaster species complex has been a subject of controversy. All three possible groupings of the species have been reported in the past, though recent multi-gene studies suggest that D. erecta and D. yakuba are sister species. Using the whole genomes of each of these species as well as the four other fully sequenced species in the subgenus Sophophora, we set out to investigate the placement of D. erecta and D. yakuba in the D. melanogaster species group and to understand the cause of the past incongruence. Though we find that the phylogeny grouping D. erecta and D. yakuba together is the best supported, we also find widespread incongruence in nucleotide and amino acid substitutions, insertions and deletions, and gene trees. The time inferred to span the two key speciation events is short enough that under the coalescent model, the incongruence could be the result of incomplete lineage sorting. Consistent with the lineage-sorting hypothesis, substitutions supporting the same tree were spatially clustered. Support for the different trees was found to be linked to recombination such that adjacent genes support the same tree most often in regions of low recombination and substitutions supporting the same tree are most enriched roughly on the same scale as linkage disequilibrium, also consistent with lineage sorting. The incongruence was found to be statistically significant and robust to model and species choice. No systematic biases were found. We conclude that phylogenetic incongruence in the D. melanogaster species complex is the result, at least in part, of incomplete lineage
Comparison of species tree methods for reconstructing the phylogeny of bearded manakins (Aves: Pipridae, Manacus) from multilocus sequence data
- Syst. Biol
, 2008
"... Abstract. — Although the power of multi-locus data in estimating species trees is apparent, it is also clear that the analytical methodologies for doing so are still maturing. For example, of the methods currently available for estimating species trees from multiocus data, the Bayesian method introd ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
Abstract. — Although the power of multi-locus data in estimating species trees is apparent, it is also clear that the analytical methodologies for doing so are still maturing. For example, of the methods currently available for estimating species trees from multiocus data, the Bayesian method introduced by Liu and Pearl (2007; BEST) is the only one that provides nodal support values. Using gene sequences from five nuclear loci, we explored two analytical methods (deep coalescence and BEST) to reconstruct the species tree of the five primary Manacus OTUs: M. aurantiacus, M. candei, M. vitellinus, populations of M. manacus from west of the Andes (M. manacus (w)), and populations of M. manacus from east of the Andes (M. manacus (e)). Both BEST and deep coalescence supported a sister relationship between M. vitellinus and M. manacus (w). A lower probability tree from the BEST analysis and one of the most parsimonious deep coalescence trees also supported a sister relationship between M. candei and M. aurantiacus. Because hybrid zones connect the distributions of most Manacus species, we examined the potential influence of post-divergence gene flow on the sister relationship of parapatrically distributed M. vitellinus and M. manacus (w). An isolation-with-migration (IM) analysis found relatively high levels of gene flow between M. vitellinus and M. manacus (w). Whether the gene flow is obscuring a true sister relationship between M. manacus (w) and M. manacus (e) remained unclear, pointing to the need for more detailed models accommodating multispecies, multilocus
Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLoS One 7:e37135
, 2012
"... The ability to efficiently and accurately determine genotypes is a keystone technology in modern genetics, crucial to studies ranging from clinical diagnostics, to genotype-phenotype association, to reconstruction of ancestry and the detection of selection. To date, high capacity, low cost genotypin ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
The ability to efficiently and accurately determine genotypes is a keystone technology in modern genetics, crucial to studies ranging from clinical diagnostics, to genotype-phenotype association, to reconstruction of ancestry and the detection of selection. To date, high capacity, low cost genotyping has been largely achieved via ‘‘SNP chip’ ’ microarray-based platforms which require substantial prior knowledge of both genome sequence and variability, and once designed are suitable only for those targeted variable nucleotide sites. This method introduces substantial ascertainment bias and inherently precludes detection of rare or population-specific variants, a major source of information for both population history and genotype-phenotype association. Recent developments in reduced-representation genome sequencing experiments on massively parallel sequencers (commonly referred to as RAD-tag or RADseq) have brought direct sequencing to the problem of population genotyping, but increased cost and procedural and analytical complexity have limited their widespread adoption. Here, we describe a complete laboratory protocol, including a custom combinatorial indexing method, and accompanying software tools to facilitate genotyping across large numbers (hundreds or more) of individuals for a range of markers (hundreds to hundreds of thousands). Our method requires no prior genomic knowledge and achieves per-site and per-individual costs below that of current SNP chip technology, while requiring similar hands-on time investment, comparable amounts of input DNA, and downstream analysis times on the order of hours. Finally, we provide empirical
2010 How robust are ‘isolation with migration’ analyses to violations of the IM model? A simulation
- Mol. Biol. Evol
, 2009
"... Methods developed over the past decade have made it possible to estimate molecular demographic parameters such as effective population size, divergence time, and gene flow with unprecedented accuracy and precision. However, they make simplifying assumptions about certain aspects of the species ’ his ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Methods developed over the past decade have made it possible to estimate molecular demographic parameters such as effective population size, divergence time, and gene flow with unprecedented accuracy and precision. However, they make simplifying assumptions about certain aspects of the species ’ histories and the nature of the genetic data, and it is not clear how robust they are to violations of these assumptions. Here, we use simulated data sets to examine the effects of a number of violations of the ‘‘Isolation with Migration’ ’ (IM) model, including intralocus recombination, population structure, gene flow from an unsampled species, linkage among loci, and divergent selection, on demographic parameter estimates made using the program IMA. We also examine the effect of having data that fit a nucleotide substitution model other than the two relatively simple models available in IMA. We find that IMA estimates are generally quite robust to small to moderate violations of the IM model assumptions, comparable with what is often encountered in real-world scenarios. In particular, population structure within species, a condition encountered to some degree in virtually all species, has little effect on parameter estimates even for fairly high levels of structure. Likewise, most parameter estimates are robust to significant levels of recombination when data sets are pared down to apparently nonrecombining blocks, although substantial bias is introduced to several estimates when the entire data set with recombination is included. In contrast, a poor fit to the nucleotide substitution model can result in an increased error rate, in some cases due to a predictable bias and in other cases due to an increase in variance in parameter estimates among data sets simulated under the same conditions. Key words: historical demography, introgression, divergence time, effective population size, simulations, isolation with migration.
Updated Three-Stage Model for the Peopling of the Americas
, 2008
"... Background: We re-assess support for our three stage model for the peopling of the Americas in light of a recent report that identified nine non-Native American mitochondrial genome sequences that should not have been included in our initial analysis. Removal of these sequences results in the elimin ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
(Show Context)
Background: We re-assess support for our three stage model for the peopling of the Americas in light of a recent report that identified nine non-Native American mitochondrial genome sequences that should not have been included in our initial analysis. Removal of these sequences results in the elimination of an early (i.e.,40,000 years ago) expansion signal we had proposed for the proto-Amerind population. Methodology/Findings: Bayesian skyline plot analysis of a new dataset of Native American mitochondrial coding genomes confirms the absence of an early expansion signal for the proto-Amerind population and allows us to reduce the variation around our estimate of the New World founder population size. In addition, genetic variants that define New World founder haplogroups are used to estimate the amount of time required between divergence of proto-Amerinds from the Asian gene pool and expansion into the New World. Conclusions/Significance: The period of population isolation required for the generation of New World mitochondrial founder haplogroup-defining genetic variants makes the existence of three stages of colonization a logical conclusion. Thus, our three stage model remains an important and useful working hypothesis for researchers interested in the peopling of the
Gene Sampling Strategies for Multi-Locus Population Estimates of Genetic Diversity (h)
"... Background. Theoretical work suggests that data from multiple nuclear loci provide better estimates of population genetic parameters than do single loci, but just how many loci are needed and how much sequence is required from each has been little explored. Methodology/Principle Findings. To investi ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
Background. Theoretical work suggests that data from multiple nuclear loci provide better estimates of population genetic parameters than do single loci, but just how many loci are needed and how much sequence is required from each has been little explored. Methodology/Principle Findings. To investigate how much data is required to estimate the population genetic parameter h (4Nem) accurately under ideal circumstances, we simulated datasets of DNA sequences under three values of h per site (0.1, 0.01, 0.001), varying in both the total number of base pairs sequenced per individual and the number of equal-length loci. From these datasets we estimated h using the maximum likelihood coalescent framework implemented in the computer program MIGRATE. Our results corroborated the theoretical expectation that increasing the number of loci impacted the accuracy of the estimate more than increasing the sequence length at single loci. However, when the value of h was low (0.001), the per-locus sequence length was also important for estimating h accurately, something that has not been emphasized in previous work. Conclusions/Significance. Accurate estimation of h required data from at least 25 independently evolving loci. Beyond this, there was little added benefit in terms of decreasing the squared coefficient of variation of the coalescent estimates relative to the extra effort required to sample more loci.
Experiments with the site frequency spectrum
, 2010
"... Evaluating the likelihood function of parameters in highly-structured population genetic models from extant deoxyribonucleic acid (DNA) sequences is computationally prohibitive. In such cases, one may approximately infer the parameters from summary statistics of the data such as the site-frequency ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Evaluating the likelihood function of parameters in highly-structured population genetic models from extant deoxyribonucleic acid (DNA) sequences is computationally prohibitive. In such cases, one may approximately infer the parameters from summary statistics of the data such as the site-frequency-spectrum (SFS) or its linear combinations. Such methods are known as approximate likelihood or Bayesian computations. Using a controlled lumped Markov chain and computational commutative algebraic methods we compute the exact likelihood of the SFS and many classical linear combinations of it at a non-recombining locus that is neutrally evolving under the infinitely-many-sites mutation model. Using a partially ordered graph of coalescent experiments around the SFS we provide a decision-theoretic framework for approximate sufficiency. We also extend a family of classical hypothesis tests of standard neutrality at a non-recombining locus based on the
Paleo-drainage basin connectivity predicts evolutionary relationships across three Southeast Asian biodiversity hotspots. Syst. Biol
- Nugroho E., Wowor D., Ng P.K.L., Azizah M.N.S., Von Rintelen T., Hall R., Carvalho G.R
, 2013
"... Abstract.—Understanding factors driving diversity across biodiversity hotspots is critical for formulating conservation priorities in the face of ongoing and escalating environmental deterioration. While biodiversity hotspots encompass a ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract.—Understanding factors driving diversity across biodiversity hotspots is critical for formulating conservation priorities in the face of ongoing and escalating environmental deterioration. While biodiversity hotspots encompass a