Contingency and determinism in evolution: Replaying life’s tape

See allHide authors and affiliations

Science  09 Nov 2018:
Vol. 362, Issue 6415, eaam5979
DOI: 10.1126/science.aam5979

Replaying the tape of life

The evolutionary biologist Stephen Jay Gould once dreamed about replaying the tape of life in order to identify whether evolution is more subject to deterministic or contingent forces. Greater influence of determinism would mean that outcomes are more repeatable and less subject to variations of history. Contingency, on the other hand, suggests that outcomes are contingent on specific events, making them less repeatable. Blount et al. review the numerous studies that have been done since Gould put forward this question, both experimental and observational, and find that many patterns of adaptation are convergent. Nevertheless, there is still much variation with regard to the mechanisms and forms that converge.

Science, this issue p. eaam5979

Structured Abstract


Evolution is a strongly historical process, and evolutionary biology is a field that combines history and science. How the historical nature of evolution affects the predictability of evolutionary outcomes has long been a major question in the field. The power of natural selection to find the limited set of high-fitness solutions to the challenges imposed by environments could, in principle, make those outcomes deterministic. However, the outcomes also may depend on idiosyncratic events that an evolving lineage experiences—such as the order of appearance of random mutations or rare environmental perturbations—making evolutionary outcomes unrepeatable. This sensitivity of outcomes to the details of history is called “historical contingency,” which Stephen Jay Gould argued was an essential feature of evolution. Gould illustrated this view by proposing the thought experiment of replaying life’s tape to see if the living world that we know would re-evolve. But, Gould wrote, “The bad news is that we can’t possibly perform the experiment.”

Gould’s pessimistic assessment notwithstanding, experimental evolutionary biologists have now performed many replay experiments, albeit on a small scale, while comparative biologists are analyzing evolutionary outcomes in nature as though they were natural replay experiments. These studies provide new examples and insights into the interplay of historical contingency and natural selection that sits at the heart of evolution.


Biologists have devised a variety of approaches to study the effects of history on the repeatability of evolutionary outcomes. On the experimental side, several designs have been employed, mostly using microbes, including “parallel replay experiments,” in which initially identical populations are followed as they evolve in identical environments, and “historical difference experiments,” in which previously diverged populations evolve under identical conditions (see the figure). Our review of many such experiments indicates that responses across replicate populations are often repeatable to some degree, although divergence increases as analyses move from overall fitness to underlying phenotypes and genetic changes. It is common for replicates with similar fitness under the conditions in which they evolved to vary more in their performance in other environments. Idiosyncratic outcomes also occur. For example, aerobic growth on citrate has evolved only once among 12 populations in an experiment with Escherichia coli, even after more than 65,000 generations. In that case, additional replays showed that the trait’s evolution was dependent on the prior occurrence of particular mutations.

Meanwhile, comparative biologists have cataloged many notable examples of convergent evolution among species living in similar environments, illustrating the power of natural selection to produce similar phenotypic outcomes despite different evolutionary histories. Nonetheless, convergence is not inevitable—in many cases, lineages adapt phenotypically in different ways to the same environmental conditions. For example, the aye-aye (a lemur) and woodpeckers have evolved different morphological adaptations to similar ecological niches (see the figure). An emerging theme from comparative studies, tentatively supported by replay experiments, is that repeatability is common when the founding populations are closely related, perhaps resulting from shared genetics and developmental pathways, whereas different outcomes become more likely as historical divergences become greater.


Gould would be pleased that his thought experiment of replaying life’s tape has been transformed into an empirical research program that explores the roles of historical contingency and natural selection at multiple levels. However, his view of historical influences as the central feature of evolution remains debatable. Laboratory replay experiments show that repeatable outcomes are common, at least when defined broadly (e.g., at the level of genes, not mutations). Moreover, convergence in nature is more common than many biologists would have wagered not long ago. On the other hand, as evolving lineages accumulate more differences, both experimental and comparative approaches suggest that the power of selection to drive convergence is reduced, and the contingent effects of history are amplified. Recognizing the joint contributions of contingency and natural selection raises interesting questions for further study, such as how the extent of prior genetic divergence affects the propensity for later convergence. Theory and experiments indicate that the “adaptive landscape”—that is, how specific phenotypes, and ultimately fitness, map onto the high dimensionality of genotypic space—plays a key role in these outcomes. Thus, a better understanding of these mappings will be important for a deeper appreciation of how fate and chance intertwine in the evolutionary pageant.

Replaying the tape of life.

The tape of life is replayed on a small scale in evolution experiments of different designs. (A) In a parallel replay experiment, initially identical replicate populations evolve under the same conditions to see whether evolution is parallel or divergent. (B) A historical difference experiment explores the influence of earlier history in phase 1 on later evolution during phase 2. In nature, diverged lineages exposed to similar environmental conditions are similar to a historical difference experiment, in that the potential for convergence on the same adaptive response may depend on their earlier evolutionary histories. In the case of (C) the woodpecker and (D) the aye-aye, they have adapted to the same ecological niche (locating grubs, excavating through dead wood, and extracting them), but they evolved different anatomical traits to do so, reflecting the legacy of their evolutionary histories (e.g., primates lack beaks, birds lack fingers).



Historical processes display some degree of “contingency,” meaning their outcomes are sensitive to seemingly inconsequential events that can fundamentally change the future. Contingency is what makes historical outcomes unpredictable. Unlike many other natural phenomena, evolution is a historical process. Evolutionary change is often driven by the deterministic force of natural selection, but natural selection works upon variation that arises unpredictably through time by random mutation, and even beneficial mutations can be lost by chance through genetic drift. Moreover, evolution has taken place within a planetary environment with a particular history of its own. This tension between determinism and contingency makes evolutionary biology a kind of hybrid between science and history. While philosophers of science examine the nuances of contingency, biologists have performed many empirical studies of evolutionary repeatability and contingency. Here, we review the experimental and comparative evidence from these studies. Replicate populations in evolutionary “replay” experiments often show parallel changes, especially in overall performance, although idiosyncratic outcomes show that the particulars of a lineage’s history can affect which of several evolutionary paths is taken. Comparative biologists have found many notable examples of convergent adaptation to similar conditions, but quantification of how frequently such convergence occurs is difficult. On balance, the evidence indicates that evolution tends to be surprisingly repeatable among closely related lineages, but disparate outcomes become more likely as the footprint of history grows deeper. Ongoing research on the structure of adaptive landscapes is providing additional insight into the interplay of fate and chance in the evolutionary process.

The world in which we live—with all its splendor, tragedy, and strangeness—is the product of a vast, tangled web of events that form what we call history. Had history taken another route, the world of today would be different. Indeed, the historical record is filled with accidents and coincidences that shaped the course of events, critical twists of fate in which wrong turns and stalled cars helped start wars, dropped cigars changed military outcomes, and mutations contributed to toppling empires (13). These instances illustrate a property of history called “contingency,” which makes outcomes sensitive to the details of the interacting events that led up to them. Contingency is why even though some trends may be predictable over the long-term and the past may be explicable, the future is unknowable.

Unlike many natural phenomena, evolution is a historical process, and evolutionary biology is a field in which science and history necessarily come together. Just as historians debate the extent to which certain historical events were inevitable, so too similar debates have raged in evolutionary biology. One person was especially influential in forcing biologists to grapple with the role of history in evolution: Stephen Jay Gould. In many of his writings, and most forcefully in his 1989 book Wonderful Life (4), Gould argued that historical contingency is central to evolution. He asserted that the living world is the product of a particular history, and had that history gone differently, the world of today would be utterly unlike the one we know.

In Wonderful Life, Gould illustrated his view with the now-famous gedankenexperiment of replaying life’s tape and seeing whether the outcome would be at all like the original. Gould’s conclusion was “Replay the tape a million times…and I doubt that anything like Homo sapiens would ever evolve again.” But, Gould lamented, “The bad news is that we can’t possibly perform the experiment.” In recent years, however, evolutionary biologists have shown that Gould’s experiment can, indeed, be conducted, at least on smaller scales. Indeed, a thriving subfield of experimental evolution has performed many replay experiments in both the lab and the field. Moreover, many paleontologists and comparative biologists contend that evolution in nature has conducted natural experiments that can be interpreted as replay experiments. These empirical studies are providing new insights into the interplay of contingency and determinism at the heart of evolution.

“Replaying life’s tape” and the meaning of “contingency”

Any attempt to review the body of empirical research on contingency’s role in evolution must first grapple with two sources of confusion that Gould himself introduced. The first comes from inconsistencies in how Gould described the replay metaphor. As pointed out by the philosopher John Beatty (5), in Wonderful Life, Gould first describes his gedankenexperiment as a strict replaying of the tape of life from identical earlier conditions (6), but later on Gould asks how slight variations at the outset would have altered the outcome (7). One can quibble about which idea Gould really favored, but a number of quotes from Wonderful Life suggest he was thinking more about the latter scenario (8). In any case, different researchers have designed tests of the replay hypothesis based on Gould’s alternative versions, which both complicates and enriches the synthesis of their findings.

Gould also introduced confusion about the concept of contingency itself. Despite its centrality to his thinking, Gould never formally defined “contingency.” He gave various informal descriptions, but these tended to be unfulfilling and circular. Moreover, he often conflated the two common meanings of the word “contingency”: “dependence on something else” and “an accidental or chance event.” Other writers have attempted to define contingency based on their interpretations of Gould’s works, and different researchers have, again, designed work based on different notions of contingency (913). The definitions largely boil down to two alternatives that correspond to the different versions of the replay metaphor (5): unpredictability in outcomes from identical starting conditions, and causal dependence on the history leading to an outcome.

Philosophers of science have worked to clarify and formalize the concept of contingency. Beatty (14, 15) points out that contingency ultimately means that an outcome depends on a history that did not necessarily have to happen. Desjardins (1618) has further identified this property as intrinsic to path-dependent systems in which there are multiple possible paths from an initial state, multiple possible outcomes, and “probabilistic causal dependence” that links the two. These characteristics make path-dependent systems sensitive to differences over their entire history, including initial conditions, as well as later events that may cause paths to diverge even when starting from identical conditions (16, 17). Thus, Gould’s two alternative notions of contingency are just facets of the same thing. These characteristics also mean that a system’s historical sensitivity will vary. In extreme cases, certain events along a historical path might completely preclude a given outcome, or render another outcome inevitable.

Desjardins’ identification of contingency as a property of path-dependent systems is important because evolution inevitably has characteristics of path dependency. In particular, the stochastic processes of mutation and genetic drift virtually guarantee that different histories will occur even when populations start from the same state and evolve under identical conditions (Box 1) (19). Such differences, in turn, constitute the sort of unpredictable antecedent events that might preclude populations from evolving the same solutions when confronting the same selective circumstances or, at least, change the relative likelihoods of different outcomes (5). These effects arise from how mutations and the order in which they occur affect later evolution. Indeed, the particular mutations that occur, their effects, and their fates can alter the rates of occurrence, phenotypic and developmental effects, and fates of later mutations, thereby shifting the probabilities of alternative evolutionary paths (20). These differences may be further amplified or dampened by environmental perturbations that may themselves be stochastic. In short, past genetic changes that originate stochastically through mutations can become the contingencies that shape subsequent evolution. Therefore, just like human history, evolution permits different historical paths, the instantiation of which is governed by probabilistic causal dependence. The central question that remains is whether, and under what conditions, those different paths lead to meaningfully different outcomes. Evolution involves the strongly deterministic force of natural selection, which has no clear analog in human history. Is evolution still meaningfully contingent, despite this deterministic element?

Box 1

Contingency, determinism, and related words in an evolutionary context.

The vocabulary of evolution includes many words used both in ordinary language and to convey specific scientific ideas. Some of them also have different technical definitions in different scholarly contexts. Here we clarify what we mean by some of these words. To do so, we will build up from the basic processes that govern evolution to the conceptual issues that are the focus of this review.

At its core, evolution occurs by four fundamental processes: mutation, recombination, natural selection, and genetic drift. The first two produce genetic variation, whereas the last two govern the fate of variants. (Gene flow, interspecific hybridization, and horizontal gene transfer are special forms of recombination. The first describes the movement of genes across a spatial landscape; the second and third involve genes moving between species and microbial lineages, respectively.) Three of the processes—all except natural selection—are stochastic, in the sense that the specific variants produced or lost in a given generation are (or appear to be) a matter of chance. Chance is a tricky concept, however. There may well be some underlying cause for a chance event, such as a UVB (ultraviolet B) photon hitting DNA to produce a particular mutation or an asteroid striking Earth at a particular moment, but whether any specific event happens is unknowable or, at the least, impossible to incorporate into a mathematically efficient and useful theory of evolution. By contrast, natural selection is a deterministic process that reflects systematic differences in the propensity of alternative genotypes to survive and reproduce, depending on their fit to the environment. Thus, the “determinism” in our paper’s title makes reference to the systematic effects of natural selection that promote repeatable outcomes in evolution. Of course, natural selection can act only on variation that exists within the realm of physical and biological constraints, which might thus be viewed as also contributing to that determinism.

Determinism implies inevitability in some philosophical contexts, but it does not in an evolutionary context because of the interplay between natural selection and the various stochastic processes. For example, a deleterious mutation might reach fixation in a small population by genetic drift, and a beneficial mutation may go extinct by drift, even in a large population, because the number of individuals initially carrying the mutation is small. Thus, our paper attempts to review studies that provide evidence about the repeatability of evolution, rather than to resolve conflicting philosophical positions.

To be sure, evolutionary theory involves higher-level processes, such as speciation and extinction, but they emerge from these four fundamental processes playing out in time and space. This situation is comparable to that in physics, in which a few fundamental forces—gravity, electromagnetism, and the weak and strong nuclear forces (the second and third of these are now unified as the electroweak force)—together gave rise to chemical elements and galaxies.

The words “parallel” and “convergent” are widely used to describe repeatable evolutionary outcomes. If two lineages are ancestrally similar or identical, and if they evolve similar adaptations, then that is often called parallel evolution (although several other definitions of parallel evolution are sometimes used as well). By contrast, if they diverged substantially in the past, but subsequently evolve similar structures or functions, then that is called convergence. However, the distinction is often unclear, especially for organisms in nature and even sometimes in long-running experiments. For this reason, we follow Arendt and Reznick (134) in referring to all cases of independently derived similarity as convergent evolution.

One reason that evolution might be meaningfully contingent, even with the deterministic force of natural selection, is the extraordinarily complex relationship of genotype to fitness. This relation is often described using the metaphor of an “adaptive landscape” (21). The metaphor is often drawn as a vista or topographical map, in which genotypes are arranged according to their mutational distance, while the elevation represents each genotype’s fitness in a given environment. As a population evolves, new genotypes arise and their relative abundances shift, and the population thereby moves through the landscape. Absent any changes in conditions, natural selection tends to push the population uphill to higher average fitness, whereas the stochastic processes of mutation and drift tend to increase dispersion. If a landscape is smooth, with a single peak, then selection will eventually drive a population to that peak. If the landscape is rugged, with multiple peaks, then not all possible paths will lead to the highest peak, and evolutionary outcomes will be more sensitive to the population’s initial state. Moreover, environmental changes may alter the shape of the adaptive landscape, potentially moving peaks or even turning hills into valleys and vice versa. Of course, this analogy of the adaptive landscape to a physical landscape is flawed, in part because the extreme high-dimensionality of potentially relevant genotypic states makes it impossible to identify and represent the possible paths that an evolving population might take. Moreover, the adaptive landscape metaphor as usually put forth implicitly ignores the role of developmental processes in translating genotypes into phenotypes. Nonetheless, while imperfect, the adaptive landscape metaphor remains widely used and is helpful when discussing the role of history in evolution.

Approaches to “replaying the tape” in evolutionary biology

Gould’s writings have inspired many studies of evolutionary contingency using a variety of approaches. Some comparative and paleontological analyses have used “macroevolutionary” data to examine contingency and convergence in key innovations and other phenotypic features (2226). Others have reconstructed ancestral genes to examine contingency in the historical transitions in protein function (2729). However, the main approach has been to perform Gould’s replay experiment, albeit on a smaller scale. In some studies, this approach has been used to evolve replicate populations of digital organisms—programs that replicate, mutate, compete, and evolve—in which all parameters can be controlled and histories reconstructed perfectly (3032). More often, however, replay studies have employed three other approaches: (i) experiments in the laboratory with fast-evolving organisms; (ii) experiments in nature; and (iii) comparative studies of lineages that have experienced similar environments.

A note on the issue of development

The field of evolutionary developmental biology, or “evo-devo,” has shown that development is a key aspect of the evolution of multicellular life, affecting the relationship between genotype, phenotype, and fitness (3335). Indeed, the evolution of developmental systems can introduce the various constraints and biases that preclude or predispose subsequent evolutionary outcomes, making development an important factor in evolutionary contingency (36, 37). In this review, we couch our discussion in terms of genetic changes and gloss over the details of how development affects the contingency of evolution. However, this approach is not intended to discount the role of development. Rather, development is generally encoded by genes (including developmental responses to environmental perturbations), so although our presentation emphasizes genetic changes, we recognize that genes produce phenotypes in multicellular organisms via the developmental process. Moreover, our review places substantial emphasis on experiments with unicellular microbes, for which development is less relevant. Although we discuss studies with multicellular plants and animals with complex developmental programs, we aim to present a view that integrates them with the microbial work, and thus have focused on genetics. For these reasons, we do not dwell on the manner in which the evolution of developmental systems can produce the historical contingencies that are the subject of this essay. Such a topic provides excellent material for dissecting the role of evolutionary contingency, but is beyond the scope of this review.

Laboratory evolution experiments

In these experiments, replicate populations of a given species (or sometimes a community of two or more species) are propagated under controlled conditions, and their evolution monitored (38). History can play out repeatedly in these experiments, with initial and ongoing conditions that are either kept as identical as feasible or subtly changed, depending on the experiment, providing a valuable tool with which researchers can probe and even quantify the effects of contingency. Candidate events upon which particular outcomes are putatively contingent can then be identified, and their effects tested in further experiments. Although these experiments take place in laboratories, their results illuminate the potential role of contingency in the natural world.

The experiments have been performed with a variety of organisms. Microbes have been particularly useful because they are easy to handle and manipulate, they have fast generation times and large populations, and their (typically) asexual reproduction allows researchers to found replicate populations from the same clonal genotype. Moreover, some microbes can be frozen and later revived, allowing the preservation of living “frozen fossil records” of evolving populations (39). These fossil records provide direct access to population histories, making them particularly useful in contingency studies (40).

Alternative experimental designs

Three basic designs have been used to examine contingency and repeatability in laboratory evolution experiments (40) (Fig. 1). The simplest and most common is the “parallel replay experiment” in which initially identical replicate populations evolve under identical conditions, thus effectively playing the same tape several times simultaneously (Fig. 1A). In parallel replay experiments with frozen fossil records, the contingency of a particular outcome can later be tested with “analytic replay experiments,” which are often called simply “replay” or “re-evolution” experiments (Fig. 1B). These experiments highlight the probabilistic nature of evolution and contingency. In an analytic replay experiment, archived samples are used to restart a population from multiple time points in its history. The resurrected populations are then allowed to evolve, and the patterns of recurrence of the outcome of interest examined (41, 42). Researchers use this design to probe for critical historical points at which the probability of a particular eventual outcome shifted to become more or less likely to occur than beforehand. These points can then be examined to identify the critical mutations or other events upon which the outcome’s occurrence or nonoccurrence was contingent. Analytic replay experiments come closest to representing Gould’s thought experiment, as they involve rerunning evolution from a previous point in history and seeing whether (and when and how often) the outcome is the same as the original.

Fig. 1 Designs of microbial evolution experiments to explore historical contingency in parallel replay experiments.

(A) Initially identical replicate populations are evolved under the same conditions to see whether evolution is parallel or divergent. Analytic replay experiments (B) are used to assess the contingency of a given outcome observed in a parallel replay experiment by replaying the population’s evolution from various points in its history to see whether the likelihood of that outcome changes over time. Historical difference experiments explore the influence of differences caused by earlier history in phase 1 on later evolution during phase 2. In the simplest historical difference experiment design (C), initially identical populations evolve under one condition for a period of time. They are then shifted to a second condition, in which they evolve for another period, typically to see whether they evolve convergently despite differences accumulated in the first period. In one variant historical difference experiment design (D), the first phase of evolution is carried out under multiple conditions before the populations are shifted to a single, common condition. In another (E), wild isolates are used to found populations that evolve under a common, laboratory environmental condition. In this case, prior evolution in the wild constitutes phase 1.

Finally, “historical difference experiments” use a two-phase design to examine the effect of divergent evolutionary histories on subsequent evolution (40). In the simplest design, initially identical populations evolve in a single condition, just as in a parallel replay experiment. During this phase, each replicate acquires a unique history. In the second phase, the replicates are moved to a new environment where they evolve for another period (43) (Fig. 1C). Typically, the purpose of the second phase is to see whether the replicates adapt in the same way despite the differences accumulated during the first phase. There are several variations on this design; in all cases, the object of the first phase is for replicate populations to accumulate different histories, whereas the effect of those different histories on subsequent evolution is assessed in the second phase. In one variation, the populations evolve under multiple conditions in the first phase, before being shifted to a single condition in the second phase (Fig. 1D). In another variation, populations are founded from natural isolates and then evolved in a common laboratory environment; in this case, their prior evolution in the wild constitutes the first phase (Fig. 1E).

Survey of findings

In recent years, the number of laboratory evolution experiments relevant to historical contingency has increased greatly. Both the parallel replay and historical difference experimental designs have often been used to address various questions other than contingency. Indeed, the parallel replay experiment is effectively the default design for replicated evolution experiments. Consequently, many studies can be evaluated for what they say about evolutionary contingency, even when they were not explicitly designed for that purpose. A formal meta-analysis of the full body of experiments would be difficult because of their heterogeneity, and it is beyond the scope of this review. Instead, we surveyed 51 studies chosen for their variety of designs and organisms. These studies include 35 that used a parallel replay experiment design, 5 that involved some type of analytic replay experiment, and 14 with variations of the historical difference experiment design (these sum to more than 51 because some studies used multiple designs). Altogether, they involved 17 different species, including bacteria, viruses, and unicellular and multicellular eukaryotes (table S1). For each study, we noted the experimental design, organisms used, specific questions asked, and the sources of any historical differences either among the founding populations or that arose during the experiments. We then evaluated whether and how history affected the measured outcomes. Collectively, the studies present a complex, and sometimes contradictory, picture that suggests a more nuanced role for contingency in evolution than Gould envisioned.

The Long-Term Evolution Experiment with Escherichia coli (LTEE) is the most extensively studied example of a parallel replay experiment. The LTEE has followed 12 populations for over 65,000 generations since they were founded from a single clone in 1988 (44) (Fig. 2). The populations have been serially propagated in a glucose-limited medium that is considerably different from their natural environment, providing substantial opportunities for adaptation. Evolution in the LTEE occurs by de novo mutations, drift, and natural selection, making it a good model for investigating the contributions of these core processes to contingency. The populations have evolved in parallel (i.e., repeatedly) in several ways (45, 46). All have evolved much higher fitness, faster growth, and larger cells than the ancestor. Also, beneficial mutations have accumulated in many of the same genes across some or all of the populations, although the mutations are rarely the same at the nucleotide level. The populations have also diverged in various ways (45, 46). Each has accumulated a unique suite of mutations. Half evolved much higher mutation rates, causing the number of mutations accrued in each population to vary greatly. Most populations have evolved very similar fitness levels under the conditions of the experiment, but even so there are persistent differences in fitness between them, suggesting that they are ascending different peaks on the adaptive landscape. Moreover, the evolved populations vary considerably in their fitness under other conditions, including on different resources (47). Finally, many of the populations have evolved simple ecosystems in which two or more lineages stably coexist (4850), although it remains to be seen whether coexistence typically involves the same ecological and genetic mechanisms. Overall, the LTEE populations seem to be following subtly different evolutionary paths, albeit in the same general direction, with one major exception that we will address later.

Fig. 2 The Long-Term Evolution Experiment with E. coli (LTEE).

The LTEE is a paradigmatic parallel replay experiment that has studied 12 initially identical populations of E. coli for more than 65,000 generations of laboratory evolution under conditions of serial batch culture with daily 100-fold dilution into fresh medium. Samples of each population are frozen every 500 generations to provide a fossil record of viable bacteria.

Broadly speaking, other parallel replay experiments, although much shorter in duration, show a similar pattern of generally consistent evolutionary responses across replicate populations under a variety of conditions. In some instances, these responses have been markedly parallel (5157). However, heterogeneity in evolutionary responses across replicates is not uncommon (5860). Such divergence is often more evident as analyses move from fitness per se to underlying phenotypic and genotypic responses (61). For instance, phenotypic parallelism often involves more variable genotypic changes, although instances of phenotypic variability with genotypic parallelism, at least at the level of genes mutated, have also been reported (54, 6264). Similarly, as in the LTEE, it is not unusual for replicates with similar fitness under the conditions in which they evolved to have genetic differences that cause significant variation in fitness and phenotype under other conditions (65). This cross-condition variability makes it difficult to compare levels of divergence among experiments. This difficulty is exacerbated by logistical differences in obtaining genetic and phenotypic information. Modern genome sequencing and bioinformatics make the detection and comparison of evolved genetic changes easy and cost-effective. By contrast, measuring phenotypes is difficult, costly, and time consuming, so most studies have examined relatively few phenotypic changes under a restricted set of conditions.

Divergence among replicates, when it occurs, is not always subtle (6671). Collins and Bell (66), for example, observed two starkly different adaptive responses among five replicate algal populations that evolved under an elevated CO2 level. Another notable example comes from the LTEE. After more than 31,000 generations, one population evolved the capacity to grow aerobically on citrate (Cit+), which was included in the culture medium as a chelating agent. Although many bacteria are Cit+, E. coli has been historically defined as a species in part by its inability to grow aerobically on citrate (Cit). Occasional environmental isolates of E. coli have been found to be Cit+, but as the result of the acquisition of foreign plasmids, not chromosomal mutations. The Cit+ mutant that arose in the LTEE was only the second case ever reported (72), despite decades of study of this organism in hundreds of laboratories. A recent study found additional spontaneous Cit+ mutants, but their isolation required prolonged, intense, and focused selection (73). When this new ability arose in the LTEE, it changed the population’s ecological circumstances and evolutionary direction in several important ways—allowing the cell numbers to increase several-fold, causing metabolic by-products to accumulate, changing the bacteria’s stoichiometric evolution, and perhaps even setting the Cit+ lineage on a path toward incipient speciation (41, 7476).

The ability to grow on citrate is highly beneficial in the LTEE environment, yet the Cit+ trait has evolved in only 1 of 12 populations, even after more than 65,000 generations. There are two plausible explanations for this seeming paradox. The trait might have been caused by a single extremely rare mutation that could have occurred at any time in any of the populations. Alternatively, the ability to grow on citrate might have required multiple mutations. If so, selection for the Cit+ trait per se would not have facilitated spread of the earlier mutations that, nonetheless, were required for the evolution of the Cit+ trait under the experimental conditions. According to that second hypothesis, the evolution of the Cit+ trait was therefore contingent on a particular history during which one or more required mutations happened to accumulate, “potentiating” the trait’s appearance. To test these ideas, Blount et al. (41) devised the analytic replay experiment design, recognizing that a contingent outcome should be more likely after the potentiating event (or events). In several experiments, they restarted the population with clones isolated at 16 time points in its frozen fossil record, replayed evolution thousands of times, and examined the outcomes. The Cit+ trait re-evolved only in populations founded by clones from 20,000 generations onward, implying that some potentiating mutation had arisen by then.

Subsequent work has revealed the complex evolutionary history that led to the Cit+ trait. Leon et al. (77) found that the trait was slightly beneficial in the ancestral genetic background. However, early evolution in the population was dominated by high-fitness, glucose-adapted mutations against which any rare Cit+ mutants could not effectively compete. This adaptation led to a genetic background in which the Cit+ trait had become detrimental. Further mutations, some of which seem to have been involved in adaptation to growth on acetate (a by-product of glucose metabolism), accumulated between 10,000 and 29,000 generations. The Cit+ trait was slightly beneficial again on this new background (78). At this point, high-fitness mutations were no longer sweeping through the population, and the weakly beneficial Cit+ cells were able to persist long enough to accumulate refining mutations that made the trait highly beneficial (74, 75, 79).

The analytic replay experiment design has since been used to test the contingency of other outcomes seen in parallel replay experiments. Using four closely related clones isolated very early from another LTEE population, Woods et al. (80) performed an analytic replay experiment to investigate why one lineage had eventually prevailed over another, even though the clones representing the eventual winner had demonstrably lower fitness than clones from the lineage that later went extinct. Replays showed that the eventual winners prevailed because they were more evolvable; that is, they were more likely to generate beneficial mutations of large effect. Genome sequencing and genetic manipulations showed that this difference reflected a strong epistatic interaction between mutations at two specific loci. Meyer et al. (81) performed a multispecies analytic replay experiment, which showed that the evolution of a phage λ variant able to infect E. coli via an alternative receptor was contingent on mutations in the coevolving host population. This work highlights how evolutionary contingency can play a key role in community dynamics that are more typically addressed in purely ecological terms.

The analytic replay design is relatively new, and few have been performed to date. However, they show that particular outcomes can hinge on small historical differences between populations that can then lead to substantial divergence even under identical conditions. They also indicate that genetic and ecological interactions can play critical roles in generating the events that drive such divergence. Altogether, analytic replay experiments provide compelling examples of how evolutionary outcomes can hinge on the particulars of history.

Parallel replay experiments show that differences can arise among initially identical populations evolving under identical conditions, and analytic replay experiments show that those differences can alter evolutionary potentials in important ways, even in the absence of environmental change. By contrast, historical difference experiments examine how different histories can affect subsequent evolution when the environment is changed. Forerunners to this design included experiments in which bacteria were challenged to grow in different environments to see whether the sequence of challenges affected the propensity to acquire an altered metabolic or resistance phenotype (82, 83). In the first historical difference experiment to explicitly quantify the effect of history, Travisano et al. (43) isolated clones from each LTEE population after 2000 generations of adaptation to the glucose-limited medium. They then founded three replicate populations from each clone, which evolved for 1000 generations in the same medium except with glucose replaced by maltose. Owing to their different histories, the clones varied greatly in their initial fitness in the maltose environment. However, they rapidly converged in their fitness on maltose during evolution in that new environment (Fig. 3). Several later historical difference experiments have also shown that adaptation to new conditions can drive convergence at the level of fitness, despite initial differences, although the mark of history often lingers at the genetic level (8489).

Fig. 3 Rapid convergence in a historical difference experiment.

Single clones of E. coli were isolated from each of the 12 LTEE populations after 2000 generations of evolution in glucose-limited medium. Three replicate populations of each were founded and then evolved for 1000 generations in a maltose medium. Despite substantial initial variation due to their independent histories of adaptation to glucose, the replicate populations rapidly converged in their fitness on maltose. [Redrawn from Travisano et al. (43)]

Some historical difference experiments, however, have shown stronger historical effects that preclude complete convergence, although those effects can vary with the environment used for the second phase of adaptation (71). Burch and Chao (89) found that two closely related phage ϕ6 genotypes had distinctly different capacities for further adaptation after prior evolution left them in different regions of the adaptive landscape, and Flores-Moya et al. (90) found that history strongly affected the evolution of two dinoflagellate strains. Moore and Woods (91) also found that E. coli strains isolated from different hosts varied significantly in the rate at which they adapted to a glucose-limited medium; this variation did not simply reflect differences in their initial adaptation to the laboratory environment, but instead indicated more idiosyncratic effects of prior history. Similarly, a study of 230 Saccharomyces cerevisiae strains (produced by crossing two highly diverged parental strains) showed a strong tendency for later, convergent adaptation to erase prior history, although the degree of erasure was subtly affected by specific genotypes (92). Taken together, historical difference experiments indicate that the capacity of selection to overcome historical differences has limits. Specifically, the historical difference experiments suggest that adaptation’s ability to drive convergence declines as populations have spent more time diverging from one another, and when that divergence occurred in more distinct environments.

Synopsis of laboratory studies

These replay experiments present a rich and complex picture of the repeatability and contingency of evolutionary outcomes. The direction of evolutionary change typically seems to be broadly consistent in a given condition, regardless of history, and phenotypic and genetic parallelisms are often striking (45). Even so, there remains scope for history to drive substantially divergent outcomes. These divergences are often subtle, such as differences in genotype that nonetheless lead to parallel evolution in phenotypes, including especially fitness itself. But subtlety of immediate effects does not necessarily negate the importance of long-term effects, as differences can build on one another. The evolution of the Cit+ trait in the LTEE is a case in point, illustrating how seemingly minor changes can shift the potential for further evolution in ways that lead to marked divergence (41). Moreover, subtle divergences that matter little in the environment where they emerge can have major effects when conditions change, as a consequence of mutations that have not been tested under the new conditions (45, 88). On the other hand, historical difference experiments show that selection in the new environment can sometimes overcome those previously evolved differences. The deeper the imprint of history, however, the less likely it becomes that evolution can reverse the prior divergence.

One interpretation of the results of the laboratory replay experiments is that the potential for contingency to matter is determined, in part, by the structure of the adaptive landscape encountered by the replicate evolving populations. As might be expected, a rugged landscape that presents multiple adaptive peaks makes distinct outcomes possible, and starting conditions, as well as the form and strength of interactions between mutations (epistasis), will affect the probabilities of those outcomes. Alternatively, a smooth landscape will tend to yield more repeatability if the time scale examined allows replicates to find the peak (67). However, these inferences are potentially circular, because our knowledge of adaptive landscapes typically comes from such experimental outcomes. This issue highlights the need for further investigation into landscape parameters. One factor that may affect ruggedness is environmental complexity; an environment with spatial structure or multiple resources, for instance, may often provide more opportunities for divergent adaptive responses (61, 68). Exogenous events and how organisms modify their environments complicate things further by changing the structure of the landscape in ways that can affect opportunities for subsequent divergence (69, 70). However, a genotype may have multiple distinct paths to higher fitness even in a homogeneous, single-resource environment (71).

Altogether, laboratory experiments on contingency support a nuanced view. Evolution is more likely to be historically insensitive and repeatable if the adaptive landscape offers few alternative paths or many that lead to similar outcomes. If, however, the landscape is rugged, with multiple avenues available that lead to dissimilar adaptations, then outcomes are likely to be more variable and more sensitive to historical contingencies. Evolutionary repeatability varies because the degree to which outcomes are contingent varies.

Experimental evolution in nature

Although most replay experiments have been conducted in the laboratory, an ambitious new direction involves replicated evolution experiments in natural settings. The realization that natural selection can produce rapid evolutionary change (9396) opened the door to evolution experiments in nature. To date, results are available from only a few such experiments, but many more are now under way (97). Some of these studies take advantage of long-running ecological experiments, including the Park Grass Experiment, which was started in 1856 (98, 99).

These studies have focused on hypotheses about adaptation in the wild. However, they often also constitute de facto replay experiments, as replicate populations can be compared to examine variation in evolutionary responses. Several differences should be kept in mind when comparing these studies to laboratory experiments. In particular, the experiments in nature often involve vertebrate animals, rather than the microorganisms and invertebrates typically used in laboratory experiments; therefore, populations are smaller, generations are longer, and founding populations are genetically heterogeneous. These factors make it more likely that evolutionary responses in field experiments rely on standing genetic variation present at the outset, rather than on de novo variation generated during the experiment. They therefore increase the opportunity for parallel responses based on shared variation, on the one hand, and the scope for differences in initial conditions between replicate populations to produce contingent evolutionary responses, on the other hand. Furthermore, in some experiments, such as those on color and life histories in guppies, different populations were used to establish the experimental populations, making these studies more akin to historical difference experiments than to parallel replay experiments (97).

It is perhaps too early to generalize from the field evolution experiments reported to date. Nonetheless, the results so far—including guppies evolving slower life histories in the absence of predators (100) and lizards evolving shorter limbs when forced to use narrow substrates (101)—tend to indicate a high degree of repeatability in evolutionary responses (97).

Comparative studies: Evolutionary replays across space and time

The ideal experiment for characterizing repeatability and contingency in evolution would be to expose initially identical populations to the same conditions in nature and allow them to evolve not for a few years or tens of years, but for thousands and even millions of years. Even if funding were available for such studies, we would have to wait a long time to get the results. But fortuitously, nature has already conducted such experiments for us, albeit not quite as precisely as those performed in the laboratory.

Convergent evolution is broadly defined (Box 1) as the independent evolution of similar features in multiple species or clades (102). Convergent evolution can occur for many reasons. For instance, shared developmental programs may predispose species to evolve in the same way for reasons unrelated to natural selection (103, 104). However, convergence occurring in distinct lineages living in similar environments has long been considered strong evidence of the operation of natural selection (102, 105, 106). For example, both the C4 and CAM (crassulacean acid metabolism) photosynthetic pathways have evolved independently many times in plants, almost always in lineages that now occur in arid or semiarid regions; this evolutionary correlation suggests that the lower rates of water loss and other physiological features of these pathways are advantageous under these conditions (107, 108). Similarly, strikingly convergent carnivorous pitcher plants have evolved in several unrelated genera as an adaptation to waterlogged soils with low nutrient availability and high light (109, 110). Until fairly recently, such cases of convergence were considered relatively rare exceptions. In recent years, however, myriad examples of adaptive convergence have been reported (23, 111, 112). Particularly impressive are cases in which convergence involves not just two (or more) lineages adapting to the same niche, but entire multispecies assemblages evolving similarly, such as evolutionary radiations of Caribbean lizards and Pacific Ocean snails on multiple islands, and frog and bird faunas on different continents (113).

The extent of convergence has led some to argue that the repeated evolution of the same feature under similar circumstances means that evolution is predictable and that contingencies of history hold little sway in directing evolution. More specifically, they argue that the ubiquity of convergence indicates that optimal solutions exist to problems posed by the environment and that lineages have repeatedly, almost deterministically, found these solutions (23, 111, 112).

This argument assumes that the same selective conditions occur repeatedly, that there are a limited number of high-fitness phenotypic solutions (“adaptive peaks”) to these challenges, and that populations inevitably evolve these phenotypes. According to Conway Morris (23), “the evolutionary routes are many, but the destinations are limited.” McGhee (114) put it this way: “Convergent evolution is the result of the fact that there are limited numbers of ways to solve a functional problem within the constraints imposed by the laws of physics and geometry.”

One prerequisite for adaptive convergence is that species respond to similar selective pressures by adopting the same ecological role [i.e., the same niche in the original Grinnellian sense (115)]. This need not be the case, however, because communities of species do not necessarily partition resources in similar ways. Moreover, even when species converge upon the same ecological role, they may evolve distinct nonconvergent phenotypic adaptations. For example, considering the aye-aye (a lemur) and the woodpecker to be convergent misses the point that they evolved very different phenotypic means to accomplish the same task of locating and extracting grubs from inside wood. They occupy the same niche but adapted in divergent, rather than convergent, ways.

Assuming that multiple lineages independently adopt the same ecological niche, how might contingency lead them to adapt in different ways to the same environmental challenge? We see three main possibilities. First, populations might evolve different solutions to the same challenge. For example, some plants may adapt to the presence of a herbivore by evolving physical defenses such as thorns, others by acquiring chemical defenses, and yet others by becoming cryptic. Second, populations may evolve the same function, but by means of different phenotypic changes. For example, the hammering beak and long bristly tongue of the woodpecker accomplish the same ends as the chiseling teeth and long, flexible finger of the aye-aye [more generally, the “many-to-one” phenomenon in biomechanics (116)]. Third, some populations may get stuck on a lower adaptive peak (local optimum) and be unable to evolve the best possible phenotype (global optimum). In all three cases, historical contingencies may predispose a lineage to adapt one way or another (birds lack teeth and hands, and primates lack beaks, explaining the different routes taken by the aye-aye and woodpecker). Their different histories thus may explain why two lineages fail to converge despite experiencing the same selective conditions for millions of years.

In evaluating the extent to which convergence is evidence of evolutionary determinism, several points must be considered. Most generally, we need to ask what constitutes convergence. Birds, bats, and insects all fly, but their wings are constructed differently and their aerodynamics also differ. Are these convergent adaptations, or divergent adaptations accomplishing the same task? At some level, drawing a line becomes arbitrary. Another difficulty is that convergence is identified after the fact. The saber-toothed condition evolved at least three times in the Carnivora, as well as once each in creodonts and South American marsupials, presumably as an adaptation to a particular predatory strategy (117). But how many other taxa, faced with the same selective conditions, failed to evolve this adaptation? Knowing the denominator is key to determining how repeatable a convergent trend is (45), but rarely does one know how many other lineages experienced similar circumstances, yet failed to evolve the trait in question. Moreover, although recent compilations of convergence (23, 111, 112) are impressive, one could just as easily compile lists of adaptive types lacking a convergent doppelgänger: the two-leaved Welwitschia mirabilis, the platypus, chameleons, kiwis, elephants, octopuses, and hominins—all adaptive types that have evolved just once—to name a few (Fig. 4). Finally, the occurrence of convergent evolution is not necessarily inconsistent with the evolutionary importance of contingency. Genetic changes can become the contingencies that shape subsequent evolution. To the extent that shared genetic and developmental systems predispose species to evolve in similar ways (103, 104, 106, 118), then adaptive convergence may often be shaped by the particular history that sculpted the genetics and development of their shared ancestors (119). In such cases, evolution may be deterministic within a clade but contingent at deeper phylogenetic levels when comparing species across clades (104, 112, 119). Moreover, the shared regulatory mechanisms and sometimes cryptic genetic similarities that underlie deep homologies indicate that contingent historical events can shape convergence even among distant relatives (36). The evolutionary reactivation of previously silenced, but still functional, developmental programs is another example of how distant relatives can exhibit evolutionarily derived phenotypic similarity as a result of contingent genetic events (120122).

Fig. 4 Evolutionary one-offs.

Evolutionary one-offs are species or clades that evolved unique adaptations to their ecological circumstances that have not been convergently evolved by other lineages. Clockwise from top left: African elephant, Welwitschia, Moyer’s pygmy chameleon, red octopus. (Note that similarity among, for example, species of elephants or chameleons is not convergent; rather, their shared features are the result of inheritance from a common ancestral species that evolved their trademark features a single time.) [Photo credits: African elephant: Jonathan Losos. Welwitschia: Thomas Schoch, CC BY-SA 3.0 license; original at Moyer’s pygmy chameleon: Martin Neilsen, CC BY-SA 4.0 license; original at Red octopus: Jerry Kirkhart, CC BY-SA 2.0 license; original at]

Some convergence proponents go so far as to say that if life has evolved on Earth-like exoplanets, it will look much like what we see here (23). But we need not look to the stars to test that hypothesis: All we need to do is go to New Zealand, an island lacking any native terrestrial mammals. In their absence, New Zealand’s flora and fauna evolved to bear little resemblance to any other ecosystem in the world. In addition to kiwis, there are both carnivorous and flightless parrots, adzebills, moas, giant eagles, and flightless wrens, as well as a semi-terrestrial bat [“the bat family’s attempt to make a mouse” (123)], giant snails and orthopterans, and divaricating shrubs with leaves that grow in the interior of the bush. And going back in time, one would be hard-pressed to find many similarities between the Mesozoic world of the dinosaurs and today’s faunas.

In short, lineages adapting to similar environmental conditions in nature can be thought of as evolutionary replays, even if these “natural experiments” are not as precise as carefully designed and controlled laboratory experiments. Because the lineages will have different genetic constitutions and will have experienced different histories, these cases are analogous to the historical difference experiments in laboratory studies. Unfortunately, however, the evidence boils down to one list of cases in which convergence occurred and another where it did not, rendering quantitative conclusions unsatisfactory. Nonetheless, the many impressive cases of convergence show that repeated outcomes can arise from similar environmental challenges. Conversely, the many cases in which convergence did not occur suggest that contingent effects can play a strong role in shaping divergent adaptive responses.

Against that murky conclusion, one trend stands out (despite some exceptions): Conspecific populations and closely related species seem to evolve in similar ways more often than distantly related taxa (124). Such a trend is expected in part because closely related species tend to interact with the environment in similar ways. Moreover, they share more of their history, and thus share more of the past changes in their genetic and development systems that can shape later evolution. Closely related lineages are thus predisposed to evolve in the same way. Indeed, some cases of parallel evolution have occurred by selection on shared variation that was present in a common ancestral population (125, 126). By contrast, convergence between distantly related lineages is less likely to result from selection on shared variation. A related finding is that when convergence occurs, the extent to which the response involves the same gene is greater when the taxa are closely related (127). This pattern accords with the tentative conclusion from laboratory studies that parallel replay experiments (with replicate populations founded by the same ancestor) tend to produce parallel outcomes more often than historical difference experiments (with populations founded by different ancestral strains or species).

Conclusions and future prospects

Gould’s gedankenexperiment that “we can’t possibly perform” has been transformed into a real experimental program, one in which increasingly sophisticated and audacious studies are exploring the roles of contingency and determinism at ever deeper levels. Although Gould’s ideas on contingency have stimulated a great deal of productive work, his view that contingent effects were pervasive throughout evolution remains debatable. The laboratory replays performed to date typically show that repeatable outcomes are common, at least when the founding populations are similar, when repeatability is defined broadly (e.g., at the level of affected genes and pathways, as opposed to precise mutational changes), and over the time scales accessible to experiments. Moreover, evolutionary convergence across lineages that share similar natural environments has proven more common than most biologists would have wagered even two decades ago—its prevalence attests to the power of natural selection to repeatedly sculpt the same adaptive solutions. That it does so more often among closely related taxa, which share similar genetics and developmental programs, illustrates the yin and yang of contingency and determinism.

Where to now? Clearly, evolution can be both contingent and deterministic, and often in complicated and fascinating ways. Recognizing this mixed nature will allow future research to investigate how contingency and determinism interact. Many questions remain to be addressed; for example, what circumstances promote contingent and deterministic outcomes, how does the extent of prior genetic divergence affect the propensity for future parallelism versus contingency, what types of divergence—say, a few mutations of large effect versus the accumulation of minor variants over long periods—lead to which outcomes, and what circumstances allow convergence even in distantly related taxa? Theory and experiments show that the structure of the adaptive landscape plays a critical role in determining the potential for contingent outcomes. Therefore, a deeper understanding of adaptive landscapes will be important for understanding evolutionary contingency (89, 128133). In short, there’s no shortage of work to do, and interesting outcomes to be discovered and quantified. Gould would be pleased that the field he inspired has such bright prospects, as the tape of life plays on.

Supplementary Materials

References and Notes

  1. “I call this experiment ‘replaying life's tape.’ You press the rewind button and, making sure you thoroughly erase everything that actually happened, go back to any time and place in the past…Then let the tape run again and see if the repetition looks at all like the original” [p. 48 in (4)].
  2. “[A]ny replay, altered by an apparently insignificant jot or tittle at the outset, would have yielded an…outcome of entirely different form…” [p. 248 in (4)].
  3. “Alter any early event, ever so slightly and without apparent importance at the time, and evolution cascades into a radically different channel” [p. 51 in (4)]; “A historical explanation…[rests]…on an unpredictable sequence of antecedent states, where any major change in any step of the sequence would have altered the final result. This final result is therefore dependent, or contingent, upon everything that came before—the unerasable and determining signature of history” [p. 283 in (4)]; “Historical explanations take the form of narrative: E, the phenomenon to be explained, arose because D came before, preceded by C, B and A. If any of these earlier stages had not occurred, or had transpired in a different way, then E would not exist” [p. 283 in (4)].
Acknowledgments: We thank E. Desjardins for helpful discussion and comments on an earlier draft of this paper. Funding: This work was supported in part by a grant from the National Science Foundation (DEB-1451740), the BEACON Center for the Study of Evolution in Action (DBI-0939454), Michigan State University, and Harvard University. Competing interests: We declare no competing interests.

Stay Connected to Science

Navigate This Article