Continuity in Evolution: On the Nature of Transitions

See allHide authors and affiliations

Science  29 May 1998:
Vol. 280, Issue 5368, pp. 1451-1455
DOI: 10.1126/science.280.5368.1451


To distinguish continuous from discontinuous evolutionary change, a relation of nearness between phenotypes is needed. Such a relation is based on the probability of one phenotype being accessible from another through changes in the genotype. This nearness relation is exemplified by calculating the shape neighborhood of a transfer RNA secondary structure and provides a characterization of discontinuous shape transformations in RNA. The simulation of replicating and mutating RNA populations under selection shows that sudden adaptive progress coincides mostly, but not always, with discontinuous shape transformations. The nature of these transformations illuminates the key role of neutral genetic drift in their realization.

A much-debated issue in evolutionary biology concerns the extent to which the history of life has proceeded gradually or has been punctuated by discontinuous transitions at the level of phenotypes (1). Our goal is to make the notion of a discontinuous transition more precise and to understand how it arises in a model of evolutionary adaptation.

We focus on the narrow domain of RNA secondary structure, which is currently the simplest computationally tractable, yet realistic phenotype (2). This choice enables the definition and exploration of concepts that may prove useful in a wider context. RNA secondary structures represent a coarse level of analysis compared with the three-dimensional structure at atomic resolution. Yet, secondary structures are empirically well defined and obtain their biophysical and biochemical importance from being a scaffold for the tertiary structure. For the sake of brevity, we shall refer to secondary structures as “shapes.” RNA combines in a single molecule both genotype (replicatable sequence) and phenotype (selectable shape), making it ideally suited for in vitro evolution experiments (3, 4).

To generate evolutionary histories, we used a stochastic continuous time model of an RNA population replicating and mutating in a capacity-constrained flow reactor under selection (5,6). In the laboratory, a goal might be to find an RNA aptamer binding specifically to a molecule (4). Although in the experiment the evolutionary end product was unknown, we thought of its shape as being specified implicitly by the imposed selection criterion. Because our intent is to study evolutionary histories rather than end products, we defined a target shape in advance and assumed the replication rate of a sequence to be a function of the similarity between its shape and the target. An actual situation may involve more than one best shape, but this does not affect our conclusions.

An instance representing in its qualitative features all the simulations we performed is shown in Fig.1A. Starting with identical sequences folding into a random shape, the simulation was stopped when the population became dominated by the target, here a canonical tRNA shape. The black curve traces the average distance to the target (inversely related to fitness) in the population against time. Aside from a short initial phase, the entire history is dominated by steps, that is, flat periods of no apparent adaptive progress, interrupted by sudden approaches toward the target structure (7). However, the dominant shapes in the population not only change at these marked events but undergo several fitness-neutral transformations during the periods of no apparent progress. Although discontinuities in the fitness trace are evident, it is entirely unclear when and on the basis of what the series of successive phenotypes itself can be called continuous or discontinuous.

Figure 1

(A) Simulation of an RNA population evolving toward a tRNA target shape (inset of Fig. 2A) in a flow reactor logistically constrained to a capacity of 1000 sequences on average. The replication accuracy per position is 0.999. The replication rate (=fitness) of a sequence whose shape is α is given by [0.01 + d(α, tRNA)/l]−1, where l= 76 is the sequence length and d is the distance between α and the target. Linear or exponential functions did not affect the character of the dynamics. The initial population consisted of 1000 identical sequences folding into a random shape. The target was reached after about 11 × 106 replications. The black trace shows the average structure distance of the shapes in the population to the target. The chain of shape innovations linking the initial shape to the target (evolutionary path) comprises 43 shapes (17). To each of these corresponds one horizontal level placed above the black curve. The topmost level belongs to the initial shape and the bottom level to the target shape. For these levels, only the time axis has a meaning. At each level, a series of red intervals represents the time periods during which the corresponding shape was present in the population. The green step curve indicates the transitions between shapes and hence the time spent by each shape on the evolutionary path. Each transition was caused by single point mutation in the underlying sequences. The vertical dotted lines and the labels mark transitions referred to in the text. (B) Enlargement of the evolutionary path around event e of (A). The transition indicated on the left (α to β) is continuous. This is shown by the fact that β is present (intermittently) in the population well before becoming a link in the evolutionary path (green trace). In other words, β's presence is stochastically correlated with that of α, because it is near α in shape space. The intersecting neighborhood disks (see Fig.3) illustrate schematically that the continuous transition from α to β stays within the neighborhood of α. In contrast, the transition from β to γ is discontinuous, as shown by the fact that γ's presence does not correlate with β's (mutants of sequences folding into β do not typically fold into γ). Here, γ has a fitness advantage and almost immediately becomes the next link in the evolutionary path. β remains intermittently present after γ's takeover. This is because β is near γ, despite the fact that γ is not near β. The topological relation of nearness need not be symmetric.

A set of entities is organized into a (topological) space by assigning to each entity a system of neighborhoods. In the present case, there are two kinds of entities: sequences and shapes, which are related by a thermodynamic folding procedure. The set of possible sequences (of fixed length) is naturally organized into a space because point mutations induce a canonical neighborhood. The neighborhood of a sequence consists of all its one-error mutants. The problem is how to organize the set of possible shapes into a space. The issue arises because, in contrast to sequences, there are no physical processes that directly (and inheritably) modify shapes. Rather, transformations of a shape are a complicated consequence of changes in its underlying sequence. To properly frame continuity in the spirit of topology, we must understand how one shape can be considered to be “near” some other. We may then call a temporal succession of sequence-shape pairs (an evolutionary path) continuous, if successive sequences are neighbors in sequence space and their corresponding shapes are neighbors in shape space. A topology is weaker than a metric because the relation of nearness does not quantify distance (or similarity). We note this to emphasize that continuity does not hinge on the similarity of successive shapes in time. We next define and explore an appropriate relation of nearness for RNA shapes and then return to the discussion of Fig. 1.

For a shape β to succeed a shape α, β must obviously be accessible from α. Accessibility means that a sequence whose shape is β arises by mutation from a sequence whose shape is α. The issue of accessibility logically precedes any reasoning about the fitness of β, although fitness will strongly affect the fate of the mutant in a population under selection. We shall call a shape β “near” a shape α if β is very likely to be accessible from α. The issue then becomes one of estimating the statistical frequency with which a mutation in α's sequence yields the mutant shape β. It is here that neutrality comes crucially into play (8). When a shape α is realized by a large class of sequences, “nearness” of β to α comes to mean that β must arise from α with a high probability when averaged over all sequences folding into α. Only then is the neighborhood of α a robust property of α itself, independent of a particular sequence.

This notion of neighborhood is illustrated by considering a tRNA-like shape of length 76 (9) (inset in Fig.2A). A sample of the many sequences folding into this shape was obtained by an inverse folding procedure (10, 11). For every sequence in the sample, we computed all shapes realized by its 228 one-error mutants (the sequence neighborhood). From these data, we determined the fraction of sequence neighborhoods in which a mutant shape appeared at least once. The totality of these mutant shapes, irrespective of how often they occurred, is termed the (shape space) boundary of the tRNA.

Figure 2

(A) Rank-ordered frequency distribution of shapes in the tRNA boundary. A sample of 2199 sequences whose minimum free energy secondary structure is a tRNA cloverleaf (inset) was generated. All their one-error mutants (501,372 sequences) were folded. Twenty-eight percent of the mutants retained the original structure (that is, were neutral). The remaining 358,525 sequences realized 141,907 distinct shapes. The frequencyf(α) is the number of one-error neighborhoods in which α appeared at least once, divided by the number of sequences in the sample. The logarithm-logarithm plot shows the rank of α versusf(α). Rank n means the nth most frequent shape. The dotted line indicates a change in the slope that we take to naturally delimit the high-frequency domain (to the left) whose shapes form the characteristic set of the tRNA. (B) The 12 highest ranked shapes (left to right, top to bottom) in the characteristic set.

When rank-ordering the boundary shapes with decreasing frequency, we obtained Fig. 2A. The most salient feature is a marked change in the scaling exponent, suggesting a natural cutoff point for the definition of neighborhood. In the present case, the high-frequency range comprises some 20 shapes, which we define to be near the tRNA shape (12). These shapes constitute the characteristic set of the tRNA, that is, its most specific neighborhood. The topmost 12 shapes are listed in Fig. 2B and exhibit two properties we found to hold for all shapes whose neighborhoods we studied. First, most shapes in the characteristic set of a shape α are highly similar to α, typically differing in a stack size by single base pairs (13). Second, some shapes, such as tRNA8 (the shape ranked eighth in Fig.2B), differ by the loss of an entire stack. The latter finding illustrates that nearness does not imply similarity. More importantly, it illustrates that nearness is not a symmetric relation. In fact, the tRNA shape was not found in the characteristic set of the tRNA8, and it did not even occur in its boundary sample. Not surprisingly, the destruction of a structural element through a single point mutation is easier than its creation. Although the high frequency of the event is surprising, it is ultimately a consequence of the average base pair composition of stacks and the markedly different stacking energies of AU and GC base pairs (12).

The tRNA boundary has an intriguing property. Intersections with large samples of coarse-grained random shapes of the same length support the conjecture that all common coarse-grained shapes occur in the boundary of any common shape (9, 14). This conjecture was verified in the case of the exhaustively folded binary (GC-only) sequence space of length 25.

We may visualize the neighborhood structure (the topology) on the set of all shapes as a directed graph. Each shape is represented by a node. Directed edges fan out from a node α to the nodes in its characteristic set. We can think of a continuous transformation of shape α into shape β as a connected path in the graph that follows the direction of the edges. Discontinuous transformations are transitions between disconnected components of the graph.

The preceding data enable us to characterize continuous transformations as those structural rearrangements that fine-tune a shape architecture in a sequential fashion by lengthening or shortening stacks or that destroy a stack element and the loop implied by it (Fig.3). Discontinuous transformations are characterized by the two remaining possible structural changes: (i) the creation of a long stack in a single step and (ii) generalized shifts (Fig. 3). For example, one strand of a stacked region slides past the other by a few positions (simple shift). Notice here that structural similarity does not imply nearness. Both types of discontinuous transformations require the synchronous participation of several bases (or base pairs) in a fashion that cannot be sequentialized on thermodynamic grounds (15).

Figure 3

The strings illustrate transformations between RNA secondary structure parts. Solid arrows indicate continuous transformations and dashed arrows indicate discontinuous transformations in our topology. Three groups of transformations are shown. (A) The loss and formation of a base pair adjacent to a stack are both continuous. (B) The opening of a constrained stack (for example, closing a multiloop) is continuous, whereas its creation is discontinuous. This result reflects the fact that the formation of a long helix between two unpaired random segments upon mutation of a single position is a highly improbable event, whereas the unzipping of a random helix is likely to occur as soon as a mutation blocks one of its base pairs. (C) Generalized shifts are discontinuous transformations in which one strand of a helix slides past the other. After the shift, the two strand segments may or may not overlap. Accordingly, we partition generalized shifts into the four classes shown. The intersecting disks are a schematic representation of continuous and discontinuous transitions between two shapes α and β. The disk with center α stands for the set of shapes that are near α, and the disk with center β stands for the set of shapes that are near β. If β is a member of α's disk (neighborhood), a transition from α to β is continuous (solid arrow). A discontinuous transition leaves the neighborhood of α (dashed arrow). Even if α and β are highly dissimilar, α might nonetheless be transformed continuously into β through intermediate shapes whose neighborhoods have sufficient overlap.

A pertinent issue is whether the folding map from sequences to shapes is continuous in our topology, that is, whether the shapes realized in the sequence neighborhood of a particular sequence folding into α are in the neighborhood of α. It turns out that the folding map is almost nowhere continuous. Many of the frequent shapes assumed by the one-error mutants of a sequence folding into α are not members of the characteristic set of α, and those that are do not always occur with high frequency. Each sequence folding into α has, therefore, its own specific set of accessible shapes. Yet, the local peculiarities disappear and a shape-specific neighborhood is obtained when averaging over a sufficiently large sample of sequences folding into α.

Equipped with this fitness-independent notion of (dis)continuous shape transformations, we resume the discussion of Fig. 1. To obtain an evolutionary path in shape space, we recorded during a simulation all mutation events that produced a new shape. “New” means here that the shape is not present in the population at the time it is produced, although it may have been present in the past. For each shape ever seen, we obtained a series of presence intervals delimited by the shape's entrance and exit times in the population. We define an evolutionary path, αnαn −1αn −2… αi +1αi… α1α0, retrospectively by searching in the history log for the shape αn −1, which first gave rise to the target shape αn, and next obtaining the shape αn −2, which started that presence interval of αn −1 during which αn was produced, and so on until an initial shape α0 is reached. This back track reconstructs the unique uninterrupted chain of shape innovations that led from an initial shape to the evolutionary end product. This chain is defined without regard to fitness or to the frequency of a shape in the population (16). The path is continuous at theith succession, if the sequences underlying theith shape innovation differ by a single point mutation (which they typically do at high replication accuracy) and if shape αi +1 is near αi in the sense defined above.

The evolutionary path (green trace) of Fig. 1A comprises 43 shapes (17). Their presence intervals during the entire history are shown in red, with one horizontal level for each shape. The patterns of presence intervals confirm and nicely visualize the nearness relation just developed. When a shape α is succeeded by a shape β that is near α, β is present intermittently in the population well before becoming part of the path (Fig. 1B). That is, once α is present, β is unavoidable, and a transition to β is continuous. Conversely, at a discontinuous transition, when α is succeeded by a shape β that is not near α, β almost always has its first ever appearance just before that transition (Fig. 1B). Seen together, the presence intervals of successive shapes on the path form blocks of continuous (within-neighborhood) transitions, separated by discontinuous transitions (neighborhood escapes).

In all computer simulations, we observed a few basic patterns of events that combine to form particular histories. When starting with a random shape, there is a short initial phase of a few discontinuous transitions rapidly decreasing the distance to the target. This is understood by noting that many modifications of a random shape increase its similarity to a (random) target and by recalling that such modifications are accessible in the local neighborhood of any random sequence (discontinuity of the folding map). Both properties effectively establish a funnel in shape space enabling fast relaxation to a level of similarity beyond which adaptation becomes harder. Then the character of evolutionary dynamics changes.

In the second phase, the population level (as monitored by distance to target) is entirely dominated by punctuation events. The point is that these events do mostly, but not always, line up with discontinuous transitions on the evolutionary path. In Fig. 1A, events a and b are rapid (18) successions of continuous transitions shortening and elongating stacks by single base pairs. This shows that sudden changes in fitness do not imply discontinuous phenotypic transformations. The reverse is not true either, as shown by the discontinuous shift event c, which is silent in terms of fitness. All remaining fitness changes do, however, coincide with discontinuous transitions in shape space. These discontinuous transitions are the simple shift events e, g, h, i, the double flip (d), and the flip (f) (19). An ancestral shape that has been on the path in the distant past is reoccurring (but not on the path) several discontinuous transitions thereafter (event j in Fig. 1A), arising by a single point mutation from shapes currently on the path. This is a molecular version of atavism.

Given its nature, a discontinuous transformation can be triggered by a single point mutation only if the rest of the sequence provides the appropriate context. Such sequences are severely constrained and hence rare. When a phenotype is under strong selection, neutral drift is the only means for producing the required genotypic context (6,20). This is why discontinuous transitions are preceded by extended periods of neutral drift in Fig. 1A.

The concept of evolutionary continuity cannot be separated from an understanding of the relation between genotype and phenotype. It is indeed defined by it. A necessary step toward formalizing the concept of punctuated equilibrium is the study of the fitness-independent topological structure of phenotype space induced by the genotype-phenotype map. In a final analysis, punctuation may turn out to be a phenomenon intrinsic to an evolving entity and less dependent on external contingencies than hitherto assumed.


Stay Connected to Science

Navigate This Article