Report

The fitness landscape of a tRNA gene

See allHide authors and affiliations

Science  13 May 2016:
Vol. 352, Issue 6287, pp. 837-840
DOI: 10.1126/science.aae0568

Epistasis and mutational fitness landscape

A fitness landscape of a gene defines the molecular potential of evolution. This can help us understand the current state of evolution as well as predict unrealized potential. Using deep sequencing to examine mutations in nonessential genes that affect the growth of yeast strains, two studies have generated fitness landscapes and measured the effect of epistatic interactions (see the Perspective by He and Liu). Li et al. generated a library of mutants in a transfer RNA gene, including all single and many double and multiple mutants. The RNA secondary structure was generally predictive of bases under selection. Similarly, Puchta et al. assessed a small nucleolar RNA gene for the fitness effects of individual mutations, which correlated with evolutionary conservation and structural stability. Both studies suggest that epistasis—the combined functional effect—for double substitutions is more often negative than positive.

Science, this issue pp. 837 and 840; see also p. 769

Abstract

Fitness landscapes describe the genotype-fitness relationship and represent major determinants of evolutionary trajectories. However, the vast genotype space, coupled with the difficulty of measuring fitness, has hindered the empirical determination of fitness landscapes. Combining precise gene replacement and next-generation sequencing, we quantified Darwinian fitness under a high-temperature challenge for more than 65,000 yeast strains, each carrying a unique variant of the single-copy Embedded Image gene at its native genomic location. Approximately 1% of single point mutations in the gene were beneficial and 42% were deleterious. Almost half of all mutation pairs exhibited statistically significant epistasis, which had a strong negative bias, except when the mutations occurred at Watson-Crick paired sites. Fitness was broadly correlated with the predicted fraction of correctly folded transfer RNA (tRNA) molecules, thereby revealing a biophysical basis of the fitness landscape.

Fitness landscapes can provide information about the direction and magnitude of natural selection and can elucidate evolutionary trajectories (1), but their empirical determination requires quantifying the fitness of an astronomically large number of possible genotypes. Past studies were limited to relatively few genotypes (2, 3). Next-generation DNA sequencing (NGS) has permitted the analysis of many more genotypes (411), but research has focused on biochemical functions (4, 6, 812) rather than fitness. In the few fitness landscapes reported, only a small fraction of sites or combinations of mutations per gene were examined (57, 9).

Here, we combined gene replacement in Saccharomyces cerevisiae with an NGS-based fitness assay to determine the fitness landscape of a tRNA gene. tRNAs carry amino acids to ribosomes for protein synthesis, and mutations can cause diseases such as cardiomyopathy and deafness (13). tRNA genes are typically shorter than 90 nucleotides, allowing coverage by a single Illumina sequencing read. We focused on Embedded Image, which recognizes the arginine codon AGG via its anticodon 5′-CCU-3′. Because AGG is also recognizable by Embedded Image via wobble pairing, Embedded Image is known to be encoded by a single-copy nonessential gene in S. cerevisiae (14). Deleting the Embedded Image gene (fig. S1 and table S1) reduces growth rates in both fermentable (YPD) and nonfermentable (YPG) yeast growth media, a problem exacerbated by high temperature (fig. S2).

We chemically synthesized the 72-nucleotide Embedded Image gene with a mutation rate of 3% per site (1% per alternate nucleotide) at 69 sites; for technical reasons, we kept the remaining three sites invariant (15). Using these variants, we constructed a pool of >105 strains, each carrying a Embedded Image gene variant at its native genomic location (Fig. 1 and fig. S1). Six parallel competitions of this strain pool were performed in YPD medium at 37°C for 24 hours. The Embedded Image gene amplicons from the common starting population (T0) and those from six replicate competitions (T24) were sequenced with 100-nucleotide paired-end NGS (Fig. 1 and table S2). Genotype frequencies were highly correlated between two T0 technical repeats (Pearson’s correlation r = 0.99997; fig. S3A) and among six T24 biological replicates (average r = 0.9987; fig. S3B) (15). Changes in genotype frequencies between T0 and T24 were used to determine the Darwinian fitness of each genotype relative to the wild type (15). For our fitness estimation, we considered 65,537 genotypes with read counts of ≥100 at T0. In theory, a cell that does not divide has a fitness of 0.5 (16). Because Embedded Image mutations are unlikely to be fatal, we set genotype fitness at 0.5 when the estimated fitness is <0.5 (due to stochasticity) (15). Fitness values from these en masse competitions agreed with those obtained from growth curve and pairwise competition (fig. S4), as reported previously (16). We observed strong fitness correlations across diverse environments for a subset of genotypes examined (fig. S5), which suggests that our fitness landscape is broadly relevant (15).

Fig. 1 Determining the fitness landscape of the yeast Embedded Image gene.

Chemically synthesized Embedded Image gene variants are fused with the marker gene URA3 before placement at the native Embedded Image locus. The tRNA variant–carrying cells are competed with one another. The fitness of each Embedded Image genotype relative to the wild type is calculated from the relative frequency change of paired-end sequencing reads covering the tRNA gene variant during competition (fig. S1) (15).

We estimated the fitness (f) of all 207 possible mutants that differ from the wild type by one point mutation (N1 mutants) and calculated the average mutant fitness at each site (Fig. 2A). Average fitness decreased to <0.75 by mutation at nine key sites, including all three anticodon positions (table S3), three TΨC loop sites, one D stem site, and two paired TΨC stem sites (Fig. 2A). The TΨC loop and stem sites are components of the B-box region of the internal promoter, with C55 essential for both TFIIIC transcription factor binding and polymerase III (Pol III) transcription (17). In addition, some sites such as T54 are ubiquitously posttranscriptionally modified (18). By contrast, the average mutant fitness is ≥0.95 at 30 sites (Fig. 2A). Overall, mutations in loops are more deleterious than in stems (P = 0.01, Mann-Whitney U test), although this difference becomes insignificant after excluding the anticodon (P = 0.09). Unsurprisingly, different mutations at a site have different fitness effects (fig. S6). For example, mutation C11T in the D stem is tolerated (fC11T ± SE = 1.006 ± 0.036), but C11A and C11G are not (fC11A = 0.676 ± 0.030 and fC11G = 0.661 ± 0.035), likely because of G:U paring in RNA.

Fig. 2 Yeast Embedded Image gene fitness landscape.

(A) Average fitness upon a mutation at each site. White circles indicate invariant sites. (B to D) Fitness distributions of N1 (B), N2 (C), and N3 (D) mutants. (E) Mean observed fitness (black circles) decreases with mutation number. Red circles show mean expected fitness without epistasis (right-shifted for viewing). Error bars denote SD. (F) Fraction of the 200 eukaryotic Embedded Image genes with the same nucleotide as yeast at a given site decreases with the average fitness upon mutation at the site in yeast. Each dot represents one of the 69 examined tRNA sites. (G) Fraction of times that a mutant nucleotide appears in the 200 sequences increases with the fitness of the mutant in yeast. Each dot represents a N1 mutant. In (F) and (G), ρ is the rank correlation coefficient; P values are from t tests.

The fitness distribution of N1 mutants shows a mean of 0.89 and a peak at 1 (Fig. 2B). Only 1% of mutations are significantly beneficial (nominal P < 0.05; t test based on the six replicates), whereas 42% are significantly deleterious. We estimated the fitness of 61% of all possible genotypes carrying two mutations (N2 mutants) and observed a left-shifted distribution peaking at 0.50 and 0.67 (Fig. 2C). We also estimated the fitness of 1.6% of genotypes with three mutations (N3 mutants); they exhibited a distribution with only one dominant peak at 0.5, indicating that many triple mutations completely suppress yeast growth in the en masse competition (Fig. 2D). The fitness distribution narrows and shifts further toward 0.5 in strains carrying more than three mutations (Fig. 2E).

Fitness landscapes allow prediction of evolution, because sites where mutations are on average more harmful should be evolutionarily more conserved. We aligned 200 nonredundant Embedded Image gene sequences across the eukaryotic phylogeny (15). The percentage of sequences having the same nucleotide as yeast at a given site is negatively correlated with the average fitness upon mutation at the site (Spearman’s ρ = –0.61, P = 2 × 10−8; Fig. 2F). Among N1 mutants, the number of times that a mutant nucleotide appears in the 200 sequences is positively correlated with the fitness of the mutant (ρ = 0.51, P = 2 × 10−15; Fig. 2G). Furthermore, mutations observed in other eukaryotes have smaller fitness costs in yeast than those unobserved in other eukaryotes (P = 9 × 10−6, Mann-Whitney U test).

Two mutations may interact with each other, creating epistasis (ε) with functional and evolutionary implications (19). We estimated ε within the tRNA gene from the fitness of 12,985 N2 mutants and 207 N1 mutants (Fig. 3A) (15). ε is negatively biased, with only 34% positive values (P < 10−300, binomial test; Fig. 3B and figs. S7A and S8). Among the 45% of ε values that differ significantly from 0 (nominal P < 0.05, t test based on the six replicates), 86% are negative (P < 10−300, binomial test; Fig. 3B and figs. S7A and S8). Consistent with the overall negative ε, the mean fitness of N2 mutants (0.75) is lower than that predicted from N1 mutants under the assumption of no epistasis (0.81) (Fig. 2E). Note that as the first mutation becomes more deleterious, the mean epistasis between this mutation and the next mutation becomes less negative and, in some cases, even positive (Fig. 3C and fig. S9), similar to between-gene epistasis involving an essential gene (20). Consequently, the larger the fitness cost of the first mutation, the smaller the mean fitness cost of the second mutation (Fig. 3D and fig. S10). Pairwise epistasis involving three or four mutations is also negatively biased (fig. S11). Consistently, N3 to N8 mutants all show lower average fitness than expected under the assumption of no epistasis (Fig. 2E).

Fig. 3 Epistasis (ε) in fitness between point mutations in the Embedded Image gene is negatively biased.

(A) Epistasis between point mutations. Lower right triangle shows all pairwise epistasis (white = not estimated); upper left triangle shows statistically significant epistasis (white = no estimation or insignificant). Embedded Image secondary structure is plotted linearly. Parentheses and crosses show stem and loop sites, respectively. Same color indicates sites in the same loop or stem. Each site has three mutations. (B) Distributions of pairwise epistasis (gray) and statistically significant pairwise epistasis (blue) among 12,985 mutation pairs. (C) Mean epistasis between first and second mutations increases with the fitness cost of the first mutation. (D) Mean fitness cost of the second mutation decreases with the fitness cost of the first mutation. In (C) and (D), the Pearson’s correlation (r), associated P value, and linear regression (red) are shown. (E and F) Distributions of epistasis (gray) and statistically significant epistasis (blue) between pairs of mutations that convert a Watson-Crick (WC) base pair to another WC pair (E) or break a WC pair in stems (F). In (B), (E), and (F), the vertical red line shows zero epistasis.

The distribution of epistasis between mutations at paired sites is expected to differ from the above general pattern, because different Watson-Crick (WC) pairs may be functionally similar (21). We estimated the fitness of 71% ofall possible N2 mutants at WC paired sites. Among the 41 cases that switched from one WC pair to another, 23 (56%) have positive ε (Fig. 3E). Among the 80 N2 mutants that destroyed WC pairing, 39 (49%) showed positive ε (Fig. 3F). The ε values are more positive for each of these two groups than for N2 mutants where the two mutations do not occur at paired sites (P = 7 × 10−6 and 2.6 × 10−3, respectively; Mann-Whitney U test). Furthermore, ε is significantly more positive in the 41 cases with restored WC pairing than in the 80 cases with destroyed pairing (P = 0.04). These two trends also apply to cases with significant epistasis (corresponding to P = 3 × 10−5, 0.01, and 0.01, respectively; Fig. 3, E and F, and fig. S7, B and C). However, epistasis is not always positive between paired sites, likely because base pairing is not the sole function of the nucleotides at paired sites. We observed 160 cases of significant sign epistasis (15), which is of special interest because it may block potential paths for adaptation (2). We also detected ε with opposite signs in different genetic backgrounds, indicating a high-order epistasis (table S4).

A tRNA can fold into multiple secondary structures. We computationally predicted the proportion of Embedded Image molecules that are potentially functional (i.e., correctly folded with no anticodon mutation) for each genotype (Pfunc). Raising Pfunc increases fitness (ρ = 0.40, P < 10−300), albeit with diminishing returns (Fig. 4A), and this correlation holds after controlling for mutation number (ρ = 0.26, 0.37, and 0.24 for N1, N2, and N3 mutants, respectively). Because computational prediction of RNA secondary structures is only moderately accurate, the Pfunc-fitness correlation demonstrates an important role of Pfunc in shaping the tRNA fitness landscape. Nonetheless, after controlling for Pfunc, mutant fitness still correlates with mutation number [ρ = –0.51, P < 10−300; see also locally weighted polynomial regressions (LOESS) for N1, N2, and N3 mutants in Fig. 4B], which suggests that other factors also have an impact on fitness.

Fig. 4 Embedded Image folding offers a mechanistic explanation of the fitness landscape.

(A) Relationship between the predicted proportion of tRNA molecules that are functional (Pfunc) for a genotype and its fitness. Genotypes (with Pfunc ≥ 10−4) are ranked by Pfunc and grouped into 20 equal-size bins; mean Pfunc and mean fitness ± SE of each bin are shown. The red dot represents all variants with Pfunc < 10−4. (B) LOESS regression curves between Pfunc and fitness for N1, N2, and N3 mutants, respectively, with dashed lines indicating 95% confidence intervals. (C) Quantile-quantile plot between epistasis predicted from Pfunc values using N1 and N2 LOESS curves and observed epistasis. The ith dot from the left shows the ith smallest predicted epistasis value (y axis) and ith smallest observed epistasis value (x axis). Red diagonal line shows the ideal situation of y = x. Above and left of the plot are frequency distributions of observed and predicted epistasis, respectively. Red horizontal and vertical lines indicate zero epistasis.

To investigate whether Pfunc explains epistasis, we computed epistasis using the fitness of N1 and N2 mutants predicted from their respective Pfunc-fitness regression curves (Fig. 4B) and observed a significant correlation between the predicted and observed epistasis (ρ= 0.04, P = 2.7 × 10−5). The weakness of this correlation is at least partly because epistasis is computed from three fitness measurements (or predictions) and is therefore associated with a considerable error. There is a similar bias in predicted epistasis toward negative values (Fig. 4C), but further analyses suggest that it probably arises from factors other than tRNA folding (15). These results regarding Pfunc and epistasis are not unexpected, given that a tRNA site can be involved in multiple molecular functions (17, 18).

Our results clarify the in vivo fitness landscape of a yeast tRNA gene under a high-temperature challenge. Broadly consistent with the neutral theory, beneficial mutations are rare (1%), relative to deleterious (42%) and (nearly) neutral mutations (57%). We found widespread intragenic epistasis between mutations, consistent with studies at smaller scales (1). Intriguingly, 86% of significant epistasis is negative, indicating that the fitness cost of the second mutation is on average greater than that of the first. A bias toward negative epistasis was also observed in protein genes (7, 10, 11, 22); hence, this may be a general trend. Variation in fitness is partially explained by the predicted fraction of correctly folded tRNA molecules; this implies the existence of general principles underlying complex fitness landscapes. Our tRNA variant library provides a resource in which various mechanisms contributing to the tRNA’s fitness landscape can be evaluated, and the methodology developed here is applicable to the study of fitness landscapes of longer genomic segments, including protein genes.

Supplementary Materials

www.sciencemag.org/content/352/6287/837/suppl/DC1

Materials and Methods

Figs. S1 to S11

Tables S1 to S4

References (2327)

References and Notes

  1. See supplementary materials on Science Online.
Acknowledgments: We thank S. Cho, W.-C. Ho, G. Kudla, and J.-R. Yang for valuable comments. Supported by NSF DDIG grant DEB-1501788 (J.Z. and C.L.) and NIH grant R01GM103232 (J.Z.). The NCBI accession number for the sequencing data is PRJNA311172.
View Abstract

Stay Connected to Science

Navigate This Article