Research Article

Natural selection and the predictability of evolution in Timema stick insects

See allHide authors and affiliations

Science  16 Feb 2018:
Vol. 359, Issue 6377, pp. 765-770
DOI: 10.1126/science.aap9125

Estimating the predictability of evolution

Evolution results from expected effects, such as selection driving alleles toward fixation, and stochastic effects, such as unusual environmental variation and genetic drift. To determine the potential to predict evolutionary change, Nosil et al. examined three naturally occurring morphs of stick insects (see the Perspective by Reznick and Travis). They wanted to determine which selective parameters could be used to foresee changes, despite varying environmental conditions. One morph fit a model of negative frequency-dependent selection, likely owing to predation, but changes in other morph frequencies remained unpredictable. Thus, for specific cases, we can forecast short-term changes within populations, but evolution is more difficult to predict when it involves a balance between multiple selective factors and uncertainty in environmental conditions.

Science, this issue p. 765; see also p. 738


Predicting evolution remains difficult. We studied the evolution of cryptic body coloration and pattern in a stick insect using 25 years of field data, experiments, and genomics. We found that evolution is more difficult to predict when it involves a balance between multiple selective factors and uncertainty in environmental conditions than when it involves feedback loops that cause consistent back-and-forth fluctuations. Specifically, changes in color-morph frequencies are modestly predictable through time (r2 = 0.14) and driven by complex selective regimes and yearly fluctuations in climate. In contrast, temporal changes in pattern-morph frequencies are highly predictable due to negative frequency-dependent selection (r2 = 0.86). For both traits, however, natural selection drives evolution around a dynamic equilibrium, providing some predictability to the process.

Evolutionary biology is often portrayed as a descriptive rather than predictive science (1, 2). Nonetheless, the extent to which past evolution predicts future evolution can be quantified by investigating how well early subsets of a time series predict subsequent changes. However, predictability in the form of such temporal autocorrelation does not consider the underlying mechanisms driving evolutionary change and thus can be inherently low. Considering the mechanisms of evolution can lead to increased understanding of evolutionary change and its predictability, and we study such mechanisms here.

Evolutionary predictability is mediated by several factors. First, evolution can be unpredictable because of random processes such as genetic drift (3). Second, even when evolution occurs by deterministic natural selection, predictive power can be diminished if multiple, complex forms of selection act simultaneously and by uncertainties in how the ecological conditions that affect selection change through time (1, 2, 47). For example, negative frequency-dependent selection (NFDS) favoring rare alleles can enhance predictability by causing increases in allele frequency to be followed predictably by decreases (and vice versa) (8, 9). However, the extent of predictability will depend on how NFDS interacts with other evolutionary processes (e.g., directional selection stemming from climate change) and on whether the ecological conditions that affect these other processes can themselves be predicted. Third, the interaction of genes within their genomic context (i.e., dominance and epistasis) and with the environment (i.e., plasticity) may affect the anticipated trajectory of evolution (1012). For example, directional selection is expected to produce a predictable evolutionary response only if the traits affected are reasonably heritable, and even then responses can be complex and nuanced (47, 10).

Studying the predictability of evolution across different time scales is particularly challenging (1, 2). For example, the immediate impact of natural selection can be readily measured in short-term field studies or experiments. Such studies suggest that strong selection is not uncommon at the scale of one or a few generations (13, 14), especially when new environments are colonized (1517). However, short-term changes need not translate into long-term directional trends. Rather, evolution across geologic and phylogenetic time scales may be characterized by periods of relative stasis interspersed between occasional bursts of sustained directional change, consistent with Simpson’s fossil-record–inspired model of “adaptive zones” (18, 19). Indeed, strong but fluctuating selection can generate a pattern of little change when averaged over longer time periods (14). Field studies that measure patterns of evolution over many years or decades are somewhat intermediate between immediate and geologic time scales and thus have the potential to illuminate how short-term selection relates to longer-term patterns of evolution (1, 2022). Such studies are rare because they require long-term temporal monitoring that cannot be sped up with more effort. We here analyze the predictability of evolution in a long-term study of the stick insect Timema cristinae and bolster our inferences using manipulative experiments and genomic analyses.

Fig. 1 Drawings of the three morphs of T. cristinae.

Polymorphism in stick insects

T. cristinae is a univoltine, wingless, plant-feeding stick insect that exhibits three morphs that are cryptic on different plant species or tissues (Fig. 1) (2327). A green morph bearing a white dorsal stripe is cryptic on the leaves of Adenostoma fasciculatum, a green and unstriped morph is cryptic on the leaves of Ceanothus spinosus, and a melanistic (i.e., brownish/gray and unstriped) morph is cryptic on the stems of both hosts (but is conspicuous on leaves). Accordingly, the striped morph is common on Adenostoma, the unstriped morph is common on Ceanothus, and the melanistic morph is found at ~10% frequency on both hosts. We refer to the variation between green (striped plus unstriped) and melanistic individuals as “color polymorphism” and that between green-striped and green-unstriped individuals as “pattern polymorphism” (i.e., color and pattern are different “traits”). These polymorphisms are highly heritable, with strong genetic dominance (melanistic body coloration is recessive to green, and stripe pattern is recessive to unstriped; details below) (23, 24).

Fig. 2 Genomic change through time at the Mel-Stripe locus.

(A) Manhattan plots showing results for genome-wide association mapping of color. The y axes show P values, with red denoting genome-wide significance. The left-hand plot shows results genome-wide (LG, linkage group). The right-hand plot is zoomed in on LG8, which shows the bulk of association, and here numbers below the x axis delimit different genomic scaffolds. The Mel-Stripe locus is evident by the block of strong association spanning scaffolds 702.1 and 128. (B) Allele frequency change through time in the natural population FHA (2011 versus 2013). (C) Allele-frequency change through time in the within-generation experiment. (D) Allele frequency change through time in the between-generation experiment. In (B) to (D), the vertical red line shows change at the Mel-Stripe locus and the histogram shows the distribution of change across other similar-sized scaffolds in the genome (i.e., the genomic background).

Several processes maintain color and pattern polymorphism (2327). A balance between divergent selection and gene flow between populations on Adenostoma versus Ceanothus helps maintain pattern polymorphism (26, 27). However, other factors likely contribute, because even areas dominated by one host rarely fix for a single-pattern morph (26, 27). The frequency of melanism does not vary markedly among populations, such that variation in color is maintained by balancing selection within populations, potentially involving heterozygote advantage and selection that varies with microhabitat (stems versus leaves) (23, 24). Spatial and host-related aspects of evolution for these morphs are thus reasonably well understood. In contrast, whether and how morph frequencies change through time, and whether they do so predictably, is unknown.

We studied temporal dynamics in T. cristinae using 25 years of field data from the mountains around Santa Barbara, California (545 locality-by-host-by-year estimates of morph frequency on the basis of 34,383 individuals collected from 1990 to 2017; mean n per locality-by-host-by-year = 63; SD = 141) (data S1). We focus our autocorrelation analyses of predictability (28) on the locality HV (Hidden Valley), where we collected data over a continuous 18-year period, with no years of missing data (total n = 3470; mean yearly n = 193; SD = 268).


Temporal change in allele frequencies

We tested the hypothesis that temporal change in allele frequencies at the genetic region underlying the morphs is due, in part, to natural selection. Although past studies support selection on the morphs (2327), these results do not mean that all temporal changes are due to selection, because selection and drift are not mutually exclusive (29, 30). Genomic data from different time points provide a means to test for selection, because strongly selected regions are expected to show greater change through time than the more neutral genomic background (29, 31). Genomic data are further required in our specific instance because genetic dominance and heterozygote excess complicate inference of allele frequency change using phenotypic data alone (23, 24).

We used de novo genome sequencing of a melanistic and green morph, with Dovetail hi-rise scaffolding of Illumina reads (N50 = ~16 and 8 megabases, respectively) (32), linkage mapping (25), and genome-wide association (GWA) mapping to explicitly delimit a single, contiguous genomic region (~10.5 megabases in size) associated with color and pattern variation (Fig. 2 and figs. S1 and S2). Consistent with a similar study with a more fragmented reference genome, this region exhibits three core haplotypes (i.e., alleles), one corresponding to each morph, designated s, u, and m for green-striped, green-unstriped, and melanistic, respectively (i.e., in terms of diploid genotypes and phenotypes: uu, us, and um are green-unstriped; ss and sm are green-striped; mm is melanistic) (23). We refer to this region as the Mel-Stripe locus hereafter.

We quantified allele frequency changes at Mel-Stripe over time within three published data sets: (i) genotyping-by-sequencing (GBS) data collected in a natural population on Adenostoma [FHA (locality Far Hill on Adenostoma] in 2011 and 2013 (30, 33); (ii) resequenced whole genomes from individuals collected in FHA and used in an 8-day (i.e., within-generation) release and recapture field experiment (30); and (iii) GBS data in a between-year (i.e., between-generation) field transplant experiment (25).

In each case, we contrasted change through time at Mel-Stripe to that of the remainder of the genome (to all genomic scaffolds, i.e., loci, that harbored as many single-nucleotide polymorphisms as Mel-Stripe, which was 40, 16, and 39 loci, respectively, for the three data sets noted above). Due to our explicit interest in Mel-Stripe, we did not attempt to delimit other loci under selection. If such loci exist, they could upwardly bias our estimates of genome-wide change relative to a case of neutrality, making our results for Mel-Stripe conservative.

We found that Mel-Stripe showed the greatest temporal allele frequency change of all genomic regions, in all three data sets (FHA: change = 0.0273, P = 0.024; within-generation experiment: change = 0.0340, P = 0.059; between-generation experiment: change = 0.0988, P = 0.025; exact probabilities; Fisher’s combined probability test across data sets: X2 = 20.50, df = 6, P = 0.0023) (Fig. 2). Dispersal alone is unlikely to drive these observed patterns because FHA was sampled over an area that is larger (>10,000 m2) than the dispersal capacity of T. cristinae (i.e., one to a few dozen meters per generation) (30, 33, 34). Furthermore, field surveys detected essentially no dispersal off experimental bushes in the recapture study (30), and selection on pattern has been previously observed in the presence, but not the absence, of predation (with dispersal possible in both treatments) (35, 36). Thus, selection likely contributed to the genetic change we observed at the Mel-Stripe locus. We thus next turned to whether such selection was associated with weakly or strongly predictable patterns of evolution.

Predictability of the evolution of body color and complex selection regimes

We quantified the predictability of evolution using autoregressive moving average (ARMA) models (28). This analysis revealed that color morphs at HV exhibited subtle and only moderately predictable changes through time (median predictive r2 = 0.14, ARMA) (Fig. 3 and table S1). This was associated with support for multiple, complex, and counteracting sources of selection (Fig. 4). Across the 25-year study period, the frequency of melanistic morphs increased in years where spring temperatures were warmer [overall effect of temperature = 0.187, 95% equal-tail probability intervals = 0.063 to 0.309; correlation between observed and predicted frequency from cross-validation = 0.16, 95% confidence interval (CI) = 0.04 to 0.28, P = 0.0102, Bayesian hierarchical linear model] (Fig. 4 and table S2). However, laboratory experiments indicate that melanistic individuals have weaker heat tolerance, relative to green individuals (B = 3.57, 95% CIs = 1.34 to 9.51, P = 0.0111, Cox proportional hazards regression model using exact likelihood). These results imply that selection for crypsis on dry, brownish plants in warmer years may favor dark colors but that thermoregulatory selection acts in an opposing direction. However, further work is required to test this hypothesis directly and to establish how well the laboratory experiments match field conditions. Notably, melanistic individuals exhibit fewer fungal infections and greater mating success than other morphs, further suggesting that selection is multifaceted (24).

Fig. 3 Predicting evolution in Timema cristinae stick insects.

(A) Schematic of analytical approach for predicting evolution using temporal autocorrelation. Black points represent observed values in a time series. Some number of these observed points are removed (in this case, the six right-hand–most points), and the missing values for them are predicted using the remainder of observed points. Predictive power is the strength of association between observed and predicted values. (B) Color-morph frequencies through time. (C) Pattern-morph frequencies through time. (D) Change in color-morph frequencies. (E) Change in pattern-morph frequencies. (F) Predicting change in color-morph frequencies (r2). (G) Predicting change in pattern-morph frequencies (r2). (H) Predicting change in color-morph frequencies (r). The difference from (F) is that r values are not squared such that their sign is evident. Shaded areas are 95% CIs. (I) Predicting change in pattern-morph frequencies (r). The difference from (G) is that r values are not squared such that their sign is evident. Shaded areas are 95% CIs.

Fig. 4 Complex patterns of natural selection.

(A) Associations between the frequency of melanistic morphs and yearly spring temperature. Positive effects indicate increases in melanistic frequency with increased temperature. Significant effects are shown in red. The left-most data point represents the average effect across populations, and the remaining points are for individual populations. Among-population variation is high, but all significant effects are positive. (B) Morph-specific survival time in thermoregulatory (i.e., heat tolerance) laboratory experiments. (C) Genotype-specific fitness in the within-generation experiment on the host Adenostoma (s/s is most fit). Shown are the posterior probabilities from estimates of genotype-specific fitness. (D) Genotype-specific fitness in the natural population FHA on Adenostoma (s/m is most fit). Shown are the posterior probabilities from estimates of genotype-specific fitness.

Given that selection appears complex, we used our genomic data to estimate selection coefficients explicitly (for all six diploid genotypes underlying the morphs, with three alleles: s, m, and u). This analysis revealed that viability selection on Adenostoma during late life-history stages (i.e., in the within-generation experiment) favored the s/s homozygote (posterior probability that fitness of s/s > the following genotypes: m/u = 0.93, u/u = 0.81, m/m = 0.82, m/s = 0.84, u/s = 0.91). In contrast, the most-fit genotype between years at the FHA locality (also on Adenostoma) was the s/m heterozygote (posterior probability that fitness of s/m > the following genotypes: m/u = 0.90, s/s = 0.97, u/u = 0.81, m/m = 0.92, u/s = 0.94). Both s/s and s/m are green-striped in terms of phenotype and cryptic on Adenostoma. Thus, a fluctuating balance between many factors, potentially including heterozygote advantage (23), may explain why evolution involving the deterministic process of selection was not more highly predictable.

Predictability of the evolution of pattern

Our findings for color raise the question of whether evolution is ever highly predictable? We address this issue by reanalyzing the data from HV considering striped versus unstriped individuals (i.e., pattern, rather than color, polymorphism). This demonstrates that striped morph frequencies at HV exhibited consistent increases followed by decreases (i.e., up and down fluctuations) across 18 consecutive years (binomial sampling probability < 0.0001) (Fig. 3). Thus, the evolution of pattern was highly predictable (median predictive r2 = 0.86, ARMA). In fact, predictive power at the scale of 3 to 5 years was near perfect (>0.95), and remained high even after a decade (>0.80). The observed pattern of predictable up and down fluctuations could reflect a case where predators have specific search images for common prey, resulting in NFDS selection favoring rare prey phenotypes.

Morph frequency and selection

To test the NFDS hypothesis, we transplanted green-striped and green-unstriped T. cristinae to Adenostoma in either 1:4 or 4:1 ratios (n = 1000 individuals, 10 replicates per treatment) (Fig. 5). The NFDS hypothesis predicts that the striped morph, cryptic on Adenostoma, would exhibit a stronger survival advantage when rare. Supporting this prediction, the striped morph experienced strong selection when initially rare (selection coefficient, s = 0.70) and increased in frequency in all 10 experimental replicates (posterior probability that change > 0 was >0.999). In contrast, the striped morph showed idiosyncratic changes when initially common (s = –0.04, posterior probability that change > 0 = 0.43). Although our results differ from NFDS, where the sign of selection reverses very strongly, they are consistent with the strength of selection being dependent on frequency (i.e., directional selection that weakens with increasing allele frequency is akin to frequency-dependent selection). It is possible that selection against the striped morph would be more strongly negative if ratios were manipulated more extremely (e.g., to 10:1).

Fig. 5 Evidence for NFDS on pattern and resulting stability in morph frequency differences between hosts.

(A) Posterior probability estimates of the selection coefficient in each treatment. Positive values on the x axis represent selection favoring striped individuals. (B) Changes in the frequency of striped morphs in releases at 20% of the population and 80% of the population during the course of the experiment. (C) Posterior probability distributions of change in the frequency of striped morph per treatment. (D) Yearly differences between host plant species in natural populations in the frequency of striped morphs (lines denote posterior medians and shaded regions, given the 95% equal-tail probability intervals).


We observed complex and fluctuating sources of selection. Together with gene flow (26), these selection pressures likely contribute to relative stability in the difference between hosts in morph frequency over the 25-year study period (Fig. 5). Complex and fluctuating selection may thus help maintain polymorphism but prevent divergence of sufficient magnitude to strongly drive speciation (24). NFDS in particular may cause evolutionary systems to exhibit resilience, as reported in other complex ecological, social, and physical systems (3739). Such resilience increases predictability of short-term evolution, because a system returns to its former state following perturbation. However, it can make long-term predictions difficult because substantial evolution only happens when the system reaches a tipping point that pushes it more permanently to a very different state.

Finally, our results suggest that the predictability of evolution can depend on the nature of selection and our understanding of it. Thus, we predicted evolution more accurately for pattern, where selection appears to be strongly associated with frequency, than for color, where a myriad of factors, some poorly understood, affect fitness. As further illustration of this point, predictability in the form of temporal autocorrelation is modest to weak in other well-known studies of contemporary evolution: beak and body size changes in Darwin’s finches and morph frequency changes in the scarlet tiger and peppered moth (21, 40, 41) (median predictive power per study system, r2, in 3- to 10-year forecasting analyses 0.03 to 0.18, ARMA) (Fig. 6, figs. S3 to S6, and table S1). Adding climatic (i.e., rainfall) data to the Geospiza finch case, where climate is known to affect seed distributions, improves predictive power in Geospiza fortis (e.g., mean r2 increase relative to a model without rainfall = 0.08, ARMA) (table S3). Nonetheless, predictive power even considering rainfall is modest and increased only in one of the two finch species examined (table S3). It is possible that predictability remains limited because seed size itself was not modeled, because the relationship between evolution and rainfall is complex such that only extreme droughts have strong effects, and because some extreme climatic events precede the 10-year period that our forecasting is based upon.

Fig. 6 Predicting evolution in different systems.

(A) Predicting evolution for evolutionary time series in (left to right) Geospiza fortis body size, Geospiza fortis beak morphology (principal components 1 and 2), G. scandens body size, G. scandens beak morphology (principal components 1 and 2), Panaxia dominula and Biston betularia morph frequencies, and T. cristinae color and pattern-morph frequencies. Box plots show the distribution of r2 between true and predicted evolutionary change across 3- to 10-year model-based forecasts. (B) Predicting evolution in studies (r). The difference from (A) is that r values are not squared such that their sign is evident.

In conclusion, our constrained understanding of selection and environmental variation (i.e., limits on data and analysis), rather than inherent randomness, can thus limit ability to predict evolution. In turn, these limitations may affect our understanding of ecological processes, because to the extent that evolution can be predicted, perhaps so can its ecological consequences for population dynamics, community structure, and ecosystem functioning (4244).

Supplementary Materials

Materials and Methods

Figs. S1 to S6

Tables S1 to S3

Data S1

References (4563)

References and Notes

Acknowledgments: The work was funded by a grant from the European Research Council (NatHisGen R/129639) to P.N. and a Natural Sciences and Engineering Research Council grant to B.J.C. (NSERC 06505). V.S.-C. was supported by a Leverhulme Trust Early Career Fellowship. We thank C. Sandoval for contributing data on morph frequencies. C. Sandoval, A. Comeault, M. Muschick, R. Riesch, and T. Reimchen provided discussion and comments on previous versions of the manuscript. The support and resources from the Center for High Performance Computing at the University of Utah are gratefully acknowledged, as well as access to the High Performance Computing Facilities, particularly to the Iceberg HPC cluster, from the Corporate Information and Computing Services at the University of Sheffield. The genome drafts reported in this paper have been deposited in the Genomes database of the National Center for Biotechnology Information (NCBI) under BioProject Accession PRJNA417530. Other data, including custom code written for analyses, have been archived in Dryad Digital Repository (doi:10.5061/dryad.v1q13). R. Ribas drew all the figures. P.N. and Z.G. conceived the project. P.N., R.V., C.F.d.C., T.E.F., V.S.-C., and Z.G. contributed data and analyses. P.N., J.L.F., and Z.G. wrote the manuscript, with feedback from all authors.

Stay Connected to Science


Navigate This Article