Network of epistatic interactions within a yeast snoRNA

See allHide authors and affiliations

Science  13 May 2016:
Vol. 352, Issue 6287, pp. 840-844
DOI: 10.1126/science.aaf0965

This article has a correction. Please see:

Epistasis and mutational fitness landscape

A fitness landscape of a gene defines the molecular potential of evolution. This can help us understand the current state of evolution as well as predict unrealized potential. Using deep sequencing to examine mutations in nonessential genes that affect the growth of yeast strains, two studies have generated fitness landscapes and measured the effect of epistatic interactions (see the Perspective by He and Liu). Li et al. generated a library of mutants in a transfer RNA gene, including all single and many double and multiple mutants. The RNA secondary structure was generally predictive of bases under selection. Similarly, Puchta et al. assessed a small nucleolar RNA gene for the fitness effects of individual mutations, which correlated with evolutionary conservation and structural stability. Both studies suggest that epistasis—the combined functional effect—for double substitutions is more often negative than positive.

Science, this issue pp. 837 and 840; see also p. 769


Epistatic interactions play a fundamental role in molecular evolution, but little is known about the spatial distribution of these interactions within genes. To systematically survey a model landscape of intragenic epistasis, we quantified the fitness of ~60,000 Saccharomyces cerevisiae strains expressing randomly mutated variants of the 333-nucleotide-long U3 small nucleolar RNA (snoRNA). The fitness effects of individual mutations were correlated with evolutionary conservation and structural stability. Many mutations had small individual effects but had large effects in the context of additional mutations, which indicated negative epistasis. Clusters of negative interactions were explained by local thermodynamic threshold effects, whereas positive interactions were enriched among large-effect sites and between base-paired nucleotides. We conclude that high-throughput mapping of intragenic epistasis can identify key structural and functional features of macromolecules.

The effect of a mutation on phenotype may depend on the presence of additional mutations. This phenomenon, known as epistasis, explains synthetic lethal interactions, in which a combination of two individually viable mutations causes death, and compensatory interactions, in which the fitness cost of a mutation is reduced by a second mutation (1, 2). Epistasis plays a major role in evolution; it determines the accessibility of mutational pathways (3) and thereby influences the rate of adaptation and the diversity and robustness of genetic variants (4, 5). Although genome-wide studies have revealed a network of intergenic epistasis (6), it has been suggested that interactions within genes may be even more common (711). However, previous studies focused on relatively small networks of interactions, and the comprehensive pattern of epistasis has not yet been determined for any gene.

We used “doped” oligonucleotides to synthesize ~130,000 randomly mutated variants of the 333-nucleotide Saccharomyces cerevisiae gene SNR17A, which encodes the U3 small nucleolar RNA. U3 forms base pairs with the primary ribosomal RNA (rRNA) transcript (pre-rRNA), and this interaction is required for pre-rRNA cleavage and 18S rRNA biogenesis. Our mutagenesis approach ensures uniform coverage of mutations among positions 7 to 333, which encompass 98% of the gene, and prevents bias toward specific types of mutations (Fig. 1A and fig. S1). We generated two independent mutant libraries, which contained, on average, 3 and 10 single-nucleotide polymorphisms (SNPs) per allele, respectively. In addition to the SNPs, 43.6% of variants also contained short deletions [median length, 1 nucleotide (nt)] or insertions (median length, 1 nt). All 981 (3 × 327) possible point mutations were represented in the library, and 99.4% of the 53,301 (327 × 326/2) possible pairs of sites were jointly mutated, most of them in alleles that contained additional mutations. To facilitate unambiguous identification of variants by high-throughput sequencing, we tagged each variant with a unique 20-nt barcode (Fig. 1A) placed in a nontranscribed region downstream of the U3 gene to minimize interference with function.

Fig. 1 Experimental mapping of the U3 fitness landscape.

(A) To perform saturation mutagenesis of U3, we used the polymerase chain reaction (PCR) to assemble overlapping “97:1:1:1-doped” and nondoped oligonucleotides covering the whole length of the gene (13) and attached a unique 20-nt barcode to each variant. (B) We cloned the U3 mutant library into centromeric plasmids and transformed the plasmids into the D343 yeast strain. PGAL, galactose-inducible promoter; –Ade, without adenine. (C) Normalized read-counts from barcode sequencing for seven randomly chosen wild-type U3 variants (red) and seven variants carrying a single mutation in box D (blue), under control (galactose) and competitive (glucose) conditions. Fitness was approximated by fitting exponential decay curves to the barcode count data. (D) Rows indicate positions along U3, and columns indicate substitution to one of four bases: A, C, G, and T. Log fitness effects are shown in blue for deleterious effects, red for positive effects, and white for no effect on galactose at 30°C (Gal) and glucose (Glu) at 30°C and 37°C. Genetic variants with one or two mutations were included in the analysis (13). Positions for which no data were obtained are shown in gray.

To measure fitness, we used the D343 yeast strain, which contains a single copy of the wild-type U3 gene under the control of a galactose-inducible promoter (12). D343 cells can grow in galactose-containing medium, but shifting to glucose results in down-regulation of U3 and growth arrest. Transformation of wild-type U3 on a plasmid allows the cells to survive on glucose, but nonfunctional U3 mutants do not support growth (fig. S2). We transformed D343 cells with centromeric plasmids carrying the U3 mutant libraries and measured the frequency of each mutant during competitive growth on glucose (Fig. 1B). As expected, nonfunctional variants decreased in frequency during the competition, whereas the wild-type gene increased (Fig. 1C). Growth patterns were reproducible between four replicate experiments and across replicate U3 variants within an experiment (Fig. 1D and fig. S3).

We measured the logarithm of relative fitness (log fitness) of ~60,000 variants that passed quality filters by fitting exponential decay curves to the barcode count data (13). Log fitness of wild-type U3 was set to 0. We first focused on the effects of single mutations in an otherwise wild-type gene (13). In most positions, mutations were tolerated with minimal effect on fitness (Fig. 2A). The exceptions were the conserved protein binding sites known as boxes B, C, C′, and D, in which mutations are lethal or highly deleterious. In addition, a moderate fitness decrease was observed for mutations within stems I, II, III, and V, particularly in G-C base pairs located at the base of stems, which suggests a role in structural stability. Folding predictions confirmed that destabilizing mutations in individual stems reduced fitness (fig. S4). The 5-nt 3′-terminal stem of U3 confers protection from degradation by 3′-5′ exonucleases (12). Mutations in this stem reduced fitness proportionally to their predicted effect on RNA folding strength (figs. S4 and S5). U→C mutations in positions 178 and 191 were highly deleterious (fig. S5), possibly because they created consensus binding motifs (UCUUG) for the RNA degradation factor Nab3 (14). The fitness effects of mutations were slightly larger at 37°C than at 30°C, consistent with the destabilizing effect of temperature on U3 structure (fig. S6). We found no mutations that consistently increased fitness, which suggested that wild-type U3 is optimally adapted for function. In conditions where the genomic copy of U3 was coexpressed with the mutant library, the mutations had no effect, which indicated a lack of dominant-negative or gain-of-function effects (Fig. 1D). Overall, these results show the expected pattern of single-site fitness effects and support the reliability of the measurements.

Fig. 2 Distribution of fitness effects mapped to secondary structure.

(A) The log fitness of single mutants at each position of U3 (fi) is represented according to the color scale, from blue for no effect to red for effects –1 and stronger; nonmutagenized positions are white. (B) The population-average log fitness effect for each position in the background of multiple mutations (pi) (13). Evolutionarily conserved motifs are indicated on the secondary structure. (C) Nonlinear relation between the fitness effects of single mutations in wild-type background (fi) and in mutated backgrounds (pi). (D) Cumulative distributions of log fitness for mutants grouped by the number of mutations per variant. (E) Mean measured log fitness (white boxes) is always lower than expected in the absence of epistasis (gray boxes). Inclusion of epistasis improves the fit between the model and the data (dark gray boxes). The boxes show the median and interquartile ranges.

We then calculated pi, the aggregate log fitness effect of position i across all genetic backgrounds represented in the library (13). In contrast to the single mutants (Fig. 2A), most positions showed substantial effects on fitness in combination with other mutations (Fig. 2B). The variation in pi across the gene was reproducible between replicate experiments, was observed in both mutant libraries, was robust to the exclusion of outlier variants, and did not reflect the co-occurrence of mutations with large-effect mutations in other sites (figs. S6 and S7). Most positions with very negative pi map to the conserved core of the U3 gene (stems I to III, 5′ hinge, 3′ hinge, see Fig. 2B). In contrast, sites with near-zero pi were located in the fungal- or yeast-specific regions of the molecule (stems IV to VI, see Fig. 2B). The pi values were correlated with fitness effects in wild-type background (Spearman rho = 0.57, P < 2 × 10−16) (Fig. 2C). However, the relation was markedly nonlinear, which indicated that many mutations had small effects in an otherwise wild-type U3 but large effects in the context of additional mutations.

This pattern suggests a high prevalence of negative epistatic interactions within U3. To confirm this, we analyzed the distribution of fitness effects as a function of the number of mutations in each allele (Fig. 2D). As expected, the average fitness of variants decreased with increasing numbers of mutations. Notably, measured fitness was consistently lower than we expected under an additive model (Fig. 2E). This indicates overall enrichment of negative epistatic interactions relative to positive interactions.

We estimated the strength of all pairwise interactions from measurements of single, double, and multiple mutants (13) with a regression model (1517) that explained ~86% of variance in measured fitness and produced similar patterns of interactions when applied to our replicate experiments (fig. S9). We obtained a consensus set of epistatic interactions by averaging the interaction estimates from all four glucose competition experiments. Plotting these interactions resulted in a characteristic tartan pattern (Fig. 3A), in which several positions in the gene showed strong positive interactions with most other sites. These hubs of positive epistasis correspond to the C′, C, and D boxes, which are highly conserved in evolution and show the largest individual effects on fitness. This observation indicates a saturation effect, whereby large-effect mutations inactivate the gene to such an extent that additional mutations become irrelevant, which results in positive epistasis. Consistently, sites with large individual effects showed a strong bias toward positive epistasis (Wilcoxon test, P < 2 × 10−16) (Fig. 3, B and C), and the fitness of C′, C, and D box mutants did not depend strongly on the presence of additional mutations in the gene. The saturation effect is the within-gene equivalent of positive epistasis between pairs of mutations that independently inactivate the same metabolic pathway (18, 19).

Fig. 3 Localization of negative and positive epistasis.

(A) Estimated pairwise epistatic interactions (wij) between positions in U3 (13), averaged between all experiments in glucose media. Negative interactions are green, and positive are red; evolutionarily conserved motifs are indicated on the right and positions in U3 on the bottom. Data for the first six positions are skipped because of lower coverage of mutations (see fig. S1). (B to D) Distribution of wij showing negative epistasis (green bars) and positive epistasis (red bars) for all interactions (B), for sites with large individual effects [effect size <–1 for at least one site in pair (C)], and among pairs of base-paired positions (D). (E) Distributions of wij for individual positions in the terminal stem (74 to 77, 330 to 333). Red asterisks indicate interactions between known base pairs.

We also expected positive interactions between base-paired positions, because of the presence of compensatory mutations, which are common in RNA evolution (5). Indeed, base-paired positions showed an enrichment of positive epistasis relative to all pairs of positions (Wilcoxon test, P = 2 × 10−7) (Fig. 3D). In particular, all positions within the essential terminal stem formed strong positive interactions with their corresponding base-paired residues (Fig. 3E). To test whether positive epistasis can be used to predict RNA folding, we intersected the set of positive interactions with a list of all potentially interacting triplets of nucleotides (13). In this analysis, five out of six of the strongest positive interactions corresponded to known base pairs. When we used these interactions as constraints in RNA folding prediction, the accuracy of predicted secondary structure was improved (fig. S10).

Despite the enrichment of positive interactions among large-effect or base-paired sites, negative interactions were more common overall (Fig. 3, A and B). Whereas the strongest positive interactions typically involved at least one large-effect position (Fig. 4A), negative interactions were common among low- and intermediate-effect sites and were distributed throughout the molecule, with enrichment in the conserved core (Fig. 4B). The strength of negative epistasis was inversely correlated with the distance along the primary sequence: 8 out of the 10 strongest negative interactions were between pairs of adjacent nucleotides, and the median distance between the 100 most strongly interacting pairs was 18 nt. We thus focused on a hotspot of interactions encompassing the 3′ hinge (fig. S11A) that mediates base-pairing between U3 and the pre-rRNA and is necessary for the pre-rRNA cleavage step of ribosome biogenesis (20). Our results suggest that the 3′ hinge can tolerate a single SNP but that multiple mutations within this region reduce fitness, probably because they disrupt U3-rRNA binding. A similar, but less pronounced, pattern was found in the 5′ hinge area. Our analysis shows that the thermodynamic threshold model, wherein fitness decreases abruptly when molecule stability falls below a certain level (9), also operates at the level of interactions between distinct molecules (fig. S11B).

Fig. 4 Network of epistatic interactions.

Circos plots showing patterns of the 200 strongest positive (A) and negative (B) interactions within U3. Evolutionarily conserved motifs are indicated.

In genome-wide studies, epistatic interactions between genes correlate with physical contacts, coevolution, and co-occurrence within biochemical pathways (6). Mapping genetic interaction networks therefore provides information about cellular organization. We postulate that intragenic interaction maps will similarly illuminate patterns of molecular organization. This and other studies (8, 17, 21, 22) suggest that within-gene epistatic interactions are enriched among residues in physical proximity. Were this correlation sufficiently strong, intragene epistasis would identify secondary and tertiary structures of macromolecules. Notably, recent studies have successfully predicted the 3D structures of proteins and complexes by measuring coevolution between residues within protein alignments, a phenomenon intimately linked to epistasis (23, 24). Improved methods to extract structurally relevant interactions from the dense network of intramolecular epistasis should allow macromolecular structures to be derived from maps of within-gene epistasis.

Supplementary Materials

Materials and Methods

Figs. S1 to S11

Tables S1 and S2

References (2541)

References and Notes

  1. Materials and methods are available as supporting material on Science Online.
  2. Acknowledgments: We thank B. Charlesworth, A. DeLuna, J. Plotkin, C. Schneider, and J. Zhang for discussions; E. Wacker and A. Gallacher for technical assistance; M. Bartosovic for computational help; and Edinburgh Genomics, Edinburgh Clinical Research Facility, and Institute of Genetics and Molecular Medicine technical support for next-generation sequencing. The sequencing and barcode count data were deposited in GEO (accession number GSE77709), and the data modeling pipeline can be found on G.K. and O.P. designed the project; O.P. performed experiments; O.P., B.C., H.C., D.T., G.S., and G.K. analyzed the data; B.C. and G.S. designed and B.C. implemented the data modeling pipeline; and G.K. wrote the manuscript with input from other coauthors. O.P. and G.K. were supported by Wellcome Trust grant 097383 and by the U.K. Medical Research Council, D.T. was supported by Wellcome Trust grant 077248, G.S. was supported by the European Research Council grant MLCS306999.
View Abstract

Navigate This Article