Stick Insect Genomes Reveal Natural Selection’s Role in Parallel Speciation

See allHide authors and affiliations

Science  16 May 2014:
Vol. 344, Issue 6185, pp. 738-742
DOI: 10.1126/science.1252136

Stick to the Bush

Can the underlying genetic changes driving the divergence of populations into new species be predicted or repeated? Soria-Carrasco et al. (p. 738) investigated the genetic changes observed after one generation when stick insect (Timema cristinae) populations were transplanted from their preferred host plants to alternative hosts. Diverged genetic regions were relatively small, with most loci showing divergence in a single population pair. However, the number of loci showing parallel divergence was greater than expected by chance. Thus, selection can drive parallel phenotypic evolution via parallel genetic changes.


Natural selection can drive the repeated evolution of reproductive isolation, but the genomic basis of parallel speciation remains poorly understood. We analyzed whole-genome divergence between replicate pairs of stick insect populations that are adapted to different host plants and undergoing parallel speciation. We found thousands of modest-sized genomic regions of accentuated divergence between populations, most of which are unique to individual population pairs. We also detected parallel genomic divergence across population pairs involving an excess of coding genes with specific molecular functions. Regions of parallel genomic divergence in nature exhibited exceptional allele frequency changes between hosts in a field transplant experiment. The results advance understanding of biological diversification by providing convergent observational and experimental evidence for selection’s role in driving repeatable genomic divergence.

Whether evolution is predictable and repeatable is difficult to test yet central to our understanding of biological diversification (16). Instances of repeated, parallel evolution in response to similar environmental pressures provide evidence of evolution by natural selection and can involve repeated divergence at specific genes (79). Indeed, parallel evolution of individual phenotypic traits has been estimated to involve the same genomic regions 30 to 50% of the time (8). Parallel evolution can also result in replicate species formation (i.e., parallel speciation) (10), but the genome-wide consequences of this process are unclear (7, 8, 11). Although some genomic regions will likely diverge repeatedly during parallel speciation, many might show idiosyncratic patterns because of contingencies such as the order in which mutations arise (7, 8, 10, 11).

Even if repeated divergence occurs for some genomic regions (7, 12, 13), the underlying causes of this parallelism often remain speculative because it is difficult to disentangle the contributions of the many factors that can be involved, such as selection, mutation, recombination, gene flow, population size, and the genetic architecture of traits (2, 11, 14). We chose to study these questions with a wingless, herbivorous stick insect (Timema cristinae) endemic to California that has repeatedly evolved ecotypes adapted to different host plant species: Adenostoma fasciculatum and Ceanothus spinosus (Fig. 1). Populations of this species exhibit high gene flow between adjacent populations on different hosts, little or no gene flow between more distant geographic localities, and repeated evolution of partial reproductive isolation between different ecotype pairs (1517). Consistent with the main prediction of parallel speciation, pairs of populations in different localities on the same host exhibit weaker reproductive isolation than do different ecotypes, independent from genetic distance. Thus, the ecotypes of T. cristinae are in the early stages of parallel ecological speciation and have evolved increased sexual isolation through reinforcement.

Fig. 1 Population structure and whole-genome divergence between host-plant ecotypes of T. cristinae.

(A) A maximum likelihood tree from whole-genome data from individuals from T. cristinae populations (rooted with other species of Timema not shown). Black circles depict nodes with >80% bootstrap support (100 replicates). Populations LA, HVA, MRA, and R12A were collected from Adenostoma. Populations PRC, HVC, MRC, and R12C were collected from Ceanothus. The geographic locations of the populations are depicted in fig. S1. (B) Patterns of genetic divergence (FST) along the genome. LG indicates linkage group. Colors denote whether SNPs occur in moderately (black) or highly (red/orange) divergent genetic regions, delimited with a HMM approach (weakly differentiated regions in blue are too sparse to see). Results are shown at three different scales. (Top) FST estimates at all ~4.4 million SNPs for each of the four population pairs. (Middle) Genomic divergence for two scaffolds (scaffolds 1271 on LG7 and 2230 on LG6 from HVA × HVC). (Bottom) Magnified view of genomic divergence for one parallel HMM divergence region that harbored a SNP showing parallel allele frequency divergence between hosts upon transplantation (scaffold 556 on LG4 from R12A × R12C; see also Fig. 3). [Credit: courtesy of R. Ribas.]

We annotated the reference genome for this species (18) [assembly length of 1,027,063,217 base pairs (bp), 50% of assembly in scaffolds at least 312,000 bp long (N50)] and used sequence data from genetic crosses to assemble it into linkage groups (19). We then resequenced 160 additional whole genomes from replicate ecotype pairs and compared patterns of genomic divergence to allele frequency changes in populations experimentally transplanted to different host plants. These data allowed us to analyze the effects of adaptation on genomic divergence, experimentally test the hypothesis that accentuated and parallel genetic divergence between natural populations is driven by parallel divergent selection, rigorously quantify divergence in both coding and noncoding regions, rule out undetected genetic differences because of low coverage across the genome, and quantify the repeatability of genetic divergence for individual loci and the genome as a whole.

Our data indicate that parallel speciation can involve both repeatable and idiosyncratic divergence at the genetic level and that the repeated component involves many regions affected by divergent selection. We examined four replicate population pairs on Ceanothus versus Adenostoma (n = 8 populations) to characterize genomic divergence between ecotypes (n = 19 to 21 individuals per population). Three of these pairs were directly adjacent to one another, and the fourth was separated by 6.4 km (Table 1 and fig. S1). We discovered many rare genetic variants (e.g., 2,526,553 single-nucleotide variants with minor allele frequency, MAF, <1%) and retained 4,391,556 single-nucleotide polymorphisms (SNPs) with MAF >1% that mapped to linkage groups for further analysis (across all SNPs, mean coverage per individual per SNP is 5.0).

Table 1 Characteristics of the four pairs of natural populations of T. cristinae on different host-plant species.

Nem values are from (16).

View this table:

Genomic divergence between ecotype pairs varied geographically. A phylogenomic tree grouped T. cristinae individuals by geographic locality and rejected grouping by host [P < 0.00001, approximately unbiased test (Fig. 1 and fig. S2)] (19). This finding was supported by principal components (PC) analysis [variance partitioning of PC1 is 93.7% by geography and 0.2% by host (fig. S1)]. The average genome-wide fixation index (FST) for the geographically separated pair was twice that of the adjacent pairs, with all three adjacent pairs exhibiting similarly low average FST [median ~ 0.007; mean ~ 0.015 (Fig. 1 and Table 1)]. No fixed differences were observed between ecotype pairs. These levels of population differentiation are lower than those found in many studies that analyzed the genomic consequences of divergence (12, 20, 21), emphasizing that we are examining the early stages of speciation.

Patterns of divergence between ecotype pairs also varied across the genome, with FST for the most differentiated SNPs being about 60 times greater than the median value (Table 1 and fig. S3). A hidden Markov model (HMM) (22) classified the majority of the genome as moderately divergent, but we observed numerous small- to modest-sized regions of high divergence (hereafter referred to as HMM divergence regions) distributed across all linkage groups. Across population pairs, HMM divergence regions varied in number from 3800 to 18,135, comprising 8 to 30% of the genome with mean size from 6276 to 10,580 bp (standard deviation 9942 to 15,514) (Fig. 1 and fig. S4).

We tested whether divergence between replicate population pairs frequently involved the same genomic regions. To increase precision, we focused these analyses on SNPs. We identified SNPs that were highly divergent between each population pair (those above the 90th empirical quantile of the FST distribution; divergent SNPs hereafter). The majority of divergent SNPs (83%) were divergent only in a single population pair. Thus, accentuated divergence was largely nonparallel, supporting repeated and largely independent evolutionary divergence of ecotype pairs in different localities. Nonparallel divergence could reflect different selective pressures acting on ecotypes in different localities, geographically variable trait genetic architecture, independent genetic drift, or a combination of these factors.

Nonetheless, 17% of divergent SNPs were divergent in two or more population pairs. This degree of parallelism was greater than expected by chance (Fig. 2) (all P < 0.001, permutation test) and most pronounced for SNPs that were highly divergent in all population pairs [i.e., twofold greater than null expectations; congruent results were obtained for divergent SNPs above the 99th quantile (table S1)]. Last, these SNPs that were divergent in multiple population pairs were in HMM divergence regions that occurred between multiple population pairs more often than expected by chance [all P < 0.001, randomization test (table S2 and fig. S5)], demonstrating congruence between SNP-based and HMM results. The observed parallelism is consistent with repeatable genetic divergence by natural selection.

Fig. 2 Quantification of parallel and nonparallel divergence across population pairs.

(A) The majority of divergent SNPs (those above the 90th quantile of the empirical FST distribution) were so only in a single comparison, indicative of largely nonparallel genetic divergence across the genome (pie chart; see table S1 for full results, including those from the 99th quantile). Nonetheless, a significant enrichment of parallelism was observed (bar plots, all P < 0.0001, permutation tests of observed versus expected numbers of parallel divergent SNPs). (B) High FST in two population pairs. (C) High FST in three population pairs. (D) High FST in four population pairs. [Credit: courtesy of R. Ribas.]

Even parallel genetic divergence can be affected by factors other than divergent selection, such as background selection and reduced recombination (2, 11). We thus tested whether HMM divergence regions occurring in multiple population pairs (i.e., parallel HMM divergence regions) harbor SNPs exhibiting divergent allele-frequency changes between multiple experimental population pairs transplanted to different plant species in the field. In a paired blocks design, we transplanted T. cristinae from population LA onto nearby Ceanothus and Adenostoma plants that were devoid of T. cristinae [n = 2000 individuals and 5 blocks (Figs. 1 and 3 and fig. S6)]. There was little or no dispersal after transplantation (19). Thirty-one individuals from LA were not transplanted and preserved as representative “ancestors” of the experimental populations [genomic data from these individuals provide a good representation of genetic variation in LA, particularly given low linkage disequilibrium (16) in this population (fig. S3)]. We collected the first-generation descendants of the transplanted insects 1 year later (these insects feature a single generation per year with a long egg diapause through summer, fall, and winter).

Fig. 3 Allele frequency changes across the genome in a field transplant experiment.

(A) The experimental design, where individuals from population LA were transplanted to a nearby location of five paired blocks that contained a single plant individual of each host species. (B) An empirical quantile plot of the mean divergence in allele frequencies between hosts for all SNPs. (C) SNPs showing the largest parallel divergence between hosts (>99.5th quantile) for individual blocks (orange, greater change on Adenostoma; blue, greater change on Ceanothus). (D) SNPs showing the largest parallel divergence between hosts (>99.5th quantile) averaged across blocks. (E) SNPs showing parallel divergence between hosts in the experiment were in parallel HMM divergence regions between natural population pairs more often than expected by chance (P < 0.05, randomization test). [Credit: courtesy of R. Ribas.]

By sequencing these descendants (n = 418 F1 individuals) and their ancestors (n = 31), we observed variable allele frequency changes between hosts across the genome, with most SNPs exhibiting weak to moderate divergence between hosts across a generation (Fig. 3). Numerous SNPs exhibited larger allele frequency changes between hosts that were remarkably consistent across experimental replicates (i.e., blocks). Thirty-two of the 213 SNPs showing the greatest parallel and divergent allele frequency changes between hosts in the experiment (i.e., those above the 99.5th empirical quantile of such changes) were in parallel HMM divergence regions inferred for natural population pairs, which was more than the 23 expected by chance (P < 0.05, randomization test). These 32 SNPs were widely distributed across the genome (e.g., found on 32 different scaffolds and 12 of 13 linkage groups). These results are consistent with the hypothesis that parallel HMM divergence regions in natural populations are enriched for loci affected by host-related divergent selection. Our analyses focused on genomic responses to selection, and thus the role of recombination and correlations among loci in such responses warrants future work.

We examined the function of genomic regions exhibiting parallel divergence. We identified the 0.01% of SNPs with the greatest evidence of parallel divergence on the basis of a metric that combines information on FST levels and the direction of allele frequency differences between the four natural population pairs on Adenostoma and Ceanothus [hereafter referred to as parallel divergence SNPs, n = 439 SNPs; these results are robust to different metrics of parallelism and quantile cutoffs (table S3)]. These parallel divergence SNPs exhibited a 1.5-fold enrichment for being in coding regions of genes compared with expectations from all SNPs (13 and 8%, respectively, P < 0.001, randomization test), consistent with a role for coding changes even early in speciation.

Gene ontology (GO) analyses revealed that parallel divergence SNPs reside in regions containing genes involved in a range of putative biological processes, including metabolism and signal transduction, and exhibiting various molecular functions [we used GO annotations from the nearest gene on the same scaffold for each parallel divergence SNP (table S4)]. Metal ion binding and calcium ion binding were the two statistically overrepresented molecular functions in regions harboring the parallel divergence SNPs (seven- and threefold enrichment relative to all SNPs, P < 0.00001 and P = 0.01, respectively, randomization test). Specifically, almost half of the parallel divergence SNPs with molecular function GO annotations (20 of 41 = 48.8%) were in regions harboring genes implicated in metal ion binding, compared with only 6.6% of all SNPs. Consistent with these results from SNPs, the 32 parallel HMM divergence regions associated with parallel divergence in the experiment contained genes related to binding of metals (e.g., iron) and calcium (table S5). Local adaptation to balance the nutritional versus toxic effects of metals has been described in other insects (23), and metals are known to affect traits that differ between the T. cristinae ecotypes, such as pigmentation and mandible morphology (24, 25).

Last, we examined the function of SNPs that were divergent between only a single population pair (i.e., nonparallel divergence SNPs, n = 440) (19). These SNPs were in regions significantly enriched for genes exhibiting several functions, including one that involved metal binding [i.e., zinc ion binding, P < 0.05, randomization test (table S6)]. This finding raises the possibility that nonparallel genetic divergence is sometimes the result of adaptive divergence between host ecotypes and that it includes cases of parallelism at the functional level, although further work is required to test these hypotheses.

Our data show that early stages of parallel speciation in T. cristinae involve mostly nonparallel genetic divergence between ecotypes. However, numerous regions of the genome have diverged in parallel, and these include more coding genes than expected by chance and harbor loci showing experimentally induced allele frequency changes between hosts. These results indicate that divergent selection plays a role in repeated genomic divergence between ecotypes. Furthermore, our results suggest that, although repeated evolutionary scenarios (i.e., replaying the tape of life) would likely result in idiosyncratic outcomes, there may be a repeatable component driven by selection that can be detected, even at the genome-wide level and during the complex process of speciation.

Supplementary Materials

Materials and Methods

Supplementary Text

Figs. S1 to S6

Tables S1 to S6

References (2680)

References and Notes

  1. Materials and methods are available as supplementary materials on Science Online.
  2. Acknowledgments: The work was funded by the European Research Council (grant R/129639) and Utah State University start-up funds. We thank J. Slate, A. Beckerman, D. Schluter, and two anonymous reviewers for constructive feedback and the High-Throughput Genomics Group at the Wellcome Trust Centre for Human Genetics (funded by Wellcome Trust grant reference 090532/Z/09/Z and Medical Research Council Hub grant G0900747 91070) for generation of the sequencing data. O. Simakov and A. Kapusta provided scripts for transposable elements annotation. The data reported in this paper are tabulated in the supplementary materials and archived at the following databases. The raw sequencing reads are in the NCBI short read archive (BioProject ID: PRJNA243533), and the whole genome assembly and annotation at Additional data and computer source code have been deposited in the Dryad repository,, and are also available from the authors upon request. The authors declare no conflicts of interest.
View Abstract

Navigate This Article