## Abstract

To distinguish continuous from discontinuous evolutionary change, a relation of nearness between phenotypes is needed. Such a relation is based on the probability of one phenotype being accessible from another through changes in the genotype. This nearness relation is exemplified by calculating the shape neighborhood of a transfer RNA secondary structure and provides a characterization of discontinuous shape transformations in RNA. The simulation of replicating and mutating RNA populations under selection shows that sudden adaptive progress coincides mostly, but not always, with discontinuous shape transformations. The nature of these transformations illuminates the key role of neutral genetic drift in their realization.

A much-debated issue in evolutionary biology concerns the extent to which the history of life has proceeded gradually or has been punctuated by discontinuous transitions at the level of phenotypes (1). Our goal is to make the notion of a discontinuous transition more precise and to understand how it arises in a model of evolutionary adaptation.

We focus on the narrow domain of RNA secondary structure, which is currently the simplest computationally tractable, yet realistic phenotype (2). This choice enables the definition and exploration of concepts that may prove useful in a wider context. RNA secondary structures represent a coarse level of analysis compared with the three-dimensional structure at atomic resolution. Yet, secondary structures are empirically well defined and obtain their biophysical and biochemical importance from being a scaffold for the tertiary structure. For the sake of brevity, we shall refer to secondary structures as “shapes.” RNA combines in a single molecule both genotype (replicatable sequence) and phenotype (selectable shape), making it ideally suited for in vitro evolution experiments (3, 4).

To generate evolutionary histories, we used a stochastic continuous time model of an RNA population replicating and mutating in a capacity-constrained flow reactor under selection (5,6). In the laboratory, a goal might be to find an RNA aptamer binding specifically to a molecule (4). Although in the experiment the evolutionary end product was unknown, we thought of its shape as being specified implicitly by the imposed selection criterion. Because our intent is to study evolutionary histories rather than end products, we defined a target shape in advance and assumed the replication rate of a sequence to be a function of the similarity between its shape and the target. An actual situation may involve more than one best shape, but this does not affect our conclusions.

An instance representing in its qualitative features all the simulations we performed is shown in Fig.1A. Starting with identical sequences folding into a random shape, the simulation was stopped when the population became dominated by the target, here a canonical tRNA shape. The black curve traces the average distance to the target (inversely related to fitness) in the population against time. Aside from a short initial phase, the entire history is dominated by steps, that is, flat periods of no apparent adaptive progress, interrupted by sudden approaches toward the target structure (7). However, the dominant shapes in the population not only change at these marked events but undergo several fitness-neutral transformations during the periods of no apparent progress. Although discontinuities in the fitness trace are evident, it is entirely unclear when and on the basis of what the series of successive phenotypes itself can be called continuous or discontinuous.

A set of entities is organized into a (topological) space by assigning to each entity a system of neighborhoods. In the present case, there are two kinds of entities: sequences and shapes, which are related by a thermodynamic folding procedure. The set of possible sequences (of fixed length) is naturally organized into a space because point mutations induce a canonical neighborhood. The neighborhood of a sequence consists of all its one-error mutants. The problem is how to organize the set of possible shapes into a space. The issue arises because, in contrast to sequences, there are no physical processes that directly (and inheritably) modify shapes. Rather, transformations of a shape are a complicated consequence of changes in its underlying sequence. To properly frame continuity in the spirit of topology, we must understand how one shape can be considered to be “near” some other. We may then call a temporal succession of sequence-shape pairs (an evolutionary path) continuous, if successive sequences are neighbors in sequence space and their corresponding shapes are neighbors in shape space. A topology is weaker than a metric because the relation of nearness does not quantify distance (or similarity). We note this to emphasize that continuity does not hinge on the similarity of successive shapes in time. We next define and explore an appropriate relation of nearness for RNA shapes and then return to the discussion of Fig. 1.

For a shape β to succeed a shape α, β must obviously be accessible from α. Accessibility means that a sequence whose shape is β arises by mutation from a sequence whose shape is α. The issue of accessibility logically precedes any reasoning about the fitness of β, although fitness will strongly affect the fate of the mutant in a population under selection. We shall call a shape β “near” a shape α if β is very likely to be accessible from α. The issue then becomes one of estimating the statistical frequency with which a mutation in α's sequence yields the mutant shape β. It is here that neutrality comes crucially into play (8). When a shape α is realized by a large class of sequences, “nearness” of β to α comes to mean that β must arise from α with a high probability when averaged over all sequences folding into α. Only then is the neighborhood of α a robust property of α itself, independent of a particular sequence.

This notion of neighborhood is illustrated by considering a tRNA-like shape of length 76 (9) (inset in Fig.2A). A sample of the many sequences folding into this shape was obtained by an inverse folding procedure (10, 11). For every sequence in the sample, we computed all shapes realized by its 228 one-error mutants (the sequence neighborhood). From these data, we determined the fraction of sequence neighborhoods in which a mutant shape appeared at least once. The totality of these mutant shapes, irrespective of how often they occurred, is termed the (shape space) boundary of the tRNA.

When rank-ordering the boundary shapes with decreasing frequency, we obtained Fig. 2A. The most salient feature is a marked change in the scaling exponent, suggesting a natural cutoff point for the definition of neighborhood. In the present case, the high-frequency range comprises some 20 shapes, which we define to be near the tRNA shape (12). These shapes constitute the characteristic set of the tRNA, that is, its most specific neighborhood. The topmost 12 shapes are listed in Fig. 2B and exhibit two properties we found to hold for all shapes whose neighborhoods we studied. First, most shapes in the characteristic set of a shape α are highly similar to α, typically differing in a stack size by single base pairs (13). Second, some shapes, such as tRNA_{8} (the shape ranked eighth in Fig.2B), differ by the loss of an entire stack. The latter finding illustrates that nearness does not imply similarity. More importantly, it illustrates that nearness is not a symmetric relation. In fact, the tRNA shape was not found in the characteristic set of the tRNA_{8}, and it did not even occur in its boundary sample. Not surprisingly, the destruction of a structural element through a single point mutation is easier than its creation. Although the high frequency of the event is surprising, it is ultimately a consequence of the average base pair composition of stacks and the markedly different stacking energies of AU and GC base pairs (12).

The tRNA boundary has an intriguing property. Intersections with large samples of coarse-grained random shapes of the same length support the conjecture that all common coarse-grained shapes occur in the boundary of any common shape (9, 14). This conjecture was verified in the case of the exhaustively folded binary (GC-only) sequence space of length 25.

We may visualize the neighborhood structure (the topology) on the set of all shapes as a directed graph. Each shape is represented by a node. Directed edges fan out from a node α to the nodes in its characteristic set. We can think of a continuous transformation of shape α into shape β as a connected path in the graph that follows the direction of the edges. Discontinuous transformations are transitions between disconnected components of the graph.

The preceding data enable us to characterize continuous transformations as those structural rearrangements that fine-tune a shape architecture in a sequential fashion by lengthening or shortening stacks or that destroy a stack element and the loop implied by it (Fig.3). Discontinuous transformations are characterized by the two remaining possible structural changes: (i) the creation of a long stack in a single step and (ii) generalized shifts (Fig. 3). For example, one strand of a stacked region slides past the other by a few positions (simple shift). Notice here that structural similarity does not imply nearness. Both types of discontinuous transformations require the synchronous participation of several bases (or base pairs) in a fashion that cannot be sequentialized on thermodynamic grounds (15).

A pertinent issue is whether the folding map from sequences to shapes is continuous in our topology, that is, whether the shapes realized in the sequence neighborhood of a particular sequence folding into α are in the neighborhood of α. It turns out that the folding map is almost nowhere continuous. Many of the frequent shapes assumed by the one-error mutants of a sequence folding into α are not members of the characteristic set of α, and those that are do not always occur with high frequency. Each sequence folding into α has, therefore, its own specific set of accessible shapes. Yet, the local peculiarities disappear and a shape-specific neighborhood is obtained when averaging over a sufficiently large sample of sequences folding into α.

Equipped with this fitness-independent notion of (dis)continuous shape transformations, we resume the discussion of Fig. 1. To obtain an evolutionary path in shape space, we recorded during a simulation all mutation events that produced a new shape. “New” means here that the shape is not present in the population at the time it is produced, although it may have been present in the past. For each shape ever seen, we obtained a series of presence intervals delimited by the shape's entrance and exit times in the population. We define an evolutionary path, α_{n}α_{n}
_{−1}α_{n}
_{−2}… α_{i}
_{+1}α_{i}… α_{1}α_{0}, retrospectively by searching in the history log for the shape α_{n}
_{−1}, which first gave rise to the target shape α_{n}, and next obtaining the shape α_{n}
_{−2}, which started that presence interval of α_{n}
_{−1} during which α_{n} was produced, and so on until an initial shape α_{0} is reached. This back track reconstructs the unique uninterrupted chain of shape innovations that led from an initial shape to the evolutionary end product. This chain is defined without regard to fitness or to the frequency of a shape in the population (16). The path is continuous at the*i*th succession, if the sequences underlying the*i*th shape innovation differ by a single point mutation (which they typically do at high replication accuracy) and if shape α_{i}
_{+1} is near α_{i} in the sense defined above.

The evolutionary path (green trace) of Fig. 1A comprises 43 shapes (17). Their presence intervals during the entire history are shown in red, with one horizontal level for each shape. The patterns of presence intervals confirm and nicely visualize the nearness relation just developed. When a shape α is succeeded by a shape β that is near α, β is present intermittently in the population well before becoming part of the path (Fig. 1B). That is, once α is present, β is unavoidable, and a transition to β is continuous. Conversely, at a discontinuous transition, when α is succeeded by a shape β that is not near α, β almost always has its first ever appearance just before that transition (Fig. 1B). Seen together, the presence intervals of successive shapes on the path form blocks of continuous (within-neighborhood) transitions, separated by discontinuous transitions (neighborhood escapes).

In all computer simulations, we observed a few basic patterns of events that combine to form particular histories. When starting with a random shape, there is a short initial phase of a few discontinuous transitions rapidly decreasing the distance to the target. This is understood by noting that many modifications of a random shape increase its similarity to a (random) target and by recalling that such modifications are accessible in the local neighborhood of any random sequence (discontinuity of the folding map). Both properties effectively establish a funnel in shape space enabling fast relaxation to a level of similarity beyond which adaptation becomes harder. Then the character of evolutionary dynamics changes.

In the second phase, the population level (as monitored by distance to target) is entirely dominated by punctuation events. The point is that these events do mostly, but not always, line up with discontinuous transitions on the evolutionary path. In Fig. 1A, events a and b are rapid (18) successions of continuous transitions shortening and elongating stacks by single base pairs. This shows that sudden changes in fitness do not imply discontinuous phenotypic transformations. The reverse is not true either, as shown by the discontinuous shift event c, which is silent in terms of fitness. All remaining fitness changes do, however, coincide with discontinuous transitions in shape space. These discontinuous transitions are the simple shift events e, g, h, i, the double flip (d), and the flip (f) (19). An ancestral shape that has been on the path in the distant past is reoccurring (but not on the path) several discontinuous transitions thereafter (event j in Fig. 1A), arising by a single point mutation from shapes currently on the path. This is a molecular version of atavism.

Given its nature, a discontinuous transformation can be triggered by a single point mutation only if the rest of the sequence provides the appropriate context. Such sequences are severely constrained and hence rare. When a phenotype is under strong selection, neutral drift is the only means for producing the required genotypic context (6,20). This is why discontinuous transitions are preceded by extended periods of neutral drift in Fig. 1A.

The concept of evolutionary continuity cannot be separated from an understanding of the relation between genotype and phenotype. It is indeed defined by it. A necessary step toward formalizing the concept of punctuated equilibrium is the study of the fitness-independent topological structure of phenotype space induced by the genotype-phenotype map. In a final analysis, punctuation may turn out to be a phenomenon intrinsic to an evolving entity and less dependent on external contingencies than hitherto assumed.