Comprehensive AAV capsid fitness landscape reveals a viral gene and enables machine-guided design

See allHide authors and affiliations

Science  29 Nov 2019:
Vol. 366, Issue 6469, pp. 1139-1143
DOI: 10.1126/science.aaw2900

The fitness landscape of AAV

Adeno-associated virus (AAV) is an important gene therapy vector. Using tools from synthetic biology, Ogden et al. provide a comprehensive view of how sequence changes in capsid proteins affect AAV properties. After saturation mutagenesis of the AAV2 capsid gene, the resulting library was subjected to multiplexed phenotypic analyses, including virus production, immunity, thermostability, and biodistribution. The mutant distribution to major organs in mice revealed dominant trends affecting in vivo delivery. Moreover, the findings uncovered a viral accessory protein with a role in viral production. Finally, a model built from the capsid fitness landscape enabled machine-guided design of useful variants with much higher efficiency than random mutagenesis.

Science, this issue p. 1139


Adeno-associated virus (AAV) capsids can deliver transformative gene therapies, but our understanding of AAV biology remains incomplete. We generated the complete first-order AAV2 capsid fitness landscape, characterizing all single-codon substitutions, insertions, and deletions across multiple functions relevant for in vivo delivery. We discovered a frameshifted gene in the VP1 region that expresses a membrane-associated accessory protein that limits AAV production through competitive exclusion. Mutant biodistribution revealed the importance of both surface-exposed and buried residues, with a few phenotypic profiles characterizing most variants. Finally, we algorithmically designed and experimentally verified a diverse in vivo targeted capsid library with viability far exceeding random mutagenesis approaches. These results demonstrate the power of systematic mutagenesis for deciphering complex genomes and the potential of empirical machine-guided protein engineering.

Since the discovery of the adeno-associated virus (AAV) in 1965 (1), AAV capsids have become a powerful tool for therapeutic in vivo gene delivery (2, 3). However, the transduction efficiency of natural capsids is still limiting for therapeutic purposes (4). Furthermore, engineering of enhanced capsids has proven challenging because of the complexity of genotype–phenotype relationships and the many functional properties that must be simultaneously optimized (5).

To better understand AAV function and inform capsid engineering, we generated all single-codon mutants of the AAV2 cap gene. AAV2 is the most well-characterized AAV serotype and is a component of the first U.S. Food and Drug Administration–approved gene therapy (2). Additionally, the AAV2 rep gene and inverted terminal repeat (ITR) sequences are commonly used for recombinant AAV production. Whereas recent high-throughput AAV mutagenesis studies have focused on limited capsid regions (6, 7), we examined the effects of mutations systematically across all 735 positions. Moreover, we included all synonymous codons for each amino acid to enable detection of noncoding elements. Wild-type (WT) AAV2 sequences and stop codon substitutions were included as positive and negative controls, respectively. In addition to codon substitutions, we generated all single-codon insertions and deletions. The full library was generated through mutant synthesis and plasmid assembly: final constructs contained ITRs flanking the full-length capsid gene with an upstream promoter and a downstream barcode, enabling pooled measurements of mutant frequencies by high-throughput sequencing (Fig. 1A and fig. S1).

Fig. 1 Measurement of all single AAV2 capsid mutations in a multiplexed viral production assay.

(A) Assay and calculation of production fitness (s′). (B) Barcode frequencies: plasmid (fp) versus virus (fv). (C) Fitness for WT replicates and stop codons in VP1, VP2, and VP3. (D) Fitness for all single–amino-acid insertions, deletions (Δ), stop codons (*), and substitutions. Radius is from capsid center. VR, variable regions. (E) Average fitness for insertions at each position colored on the 3D structure. The triangle is the 3-fold axis and the pentagon is the 5-fold axis. (F) Fitness distributions split by conservation and location within or outside of variable regions. In all panels, *p < 10−20 (Mann–Whitney U test).

To understand how mutations affect virus production (e.g., capsid assembly and genome packaging), we transfected the plasmid library into human embryonic kidney (HEK) 293T cells to produce recombinant AAV and purified the resulting virus. We calculated the fitness of each variant as normalized enrichment relative to WT (Fig. 1B), summing counts for all synonymous codons of the same amino acid. VP1, VP2, and VP3 are cap isoforms that assemble into capsids with a 1:1:10 ratio (8). We observed that nonsense mutations in the VP3 region were more strongly depleted than those in VP1 and VP2 (Fig. 1C), consistent with only VP3 being essential for capsid assembly (9).

We found that mutations at buried positions and those near the 5-fold axis of symmetry were more deleterious, whereas exposed residues and those at the 3-fold axis were better tolerated (Fig. 1, D and E). Additionally, mutations in variable regions identified from evolutionary capsid alignments (10) had greater average fitness than nonvariable regions (Fig. 1E). Outside of the variable regions, substitution to amino acids found within other serotypes were better tolerated than substitutions to amino acids never observed across a set of commonly studied AAVs (Fig. 1F). In contrast to alanine scanning, comprehensive mutagenesis revealed the importance of amino acid biochemical properties: positive charge was more deleterious across all positions, whereas negative charge was beneficial mainly at external projections, especially near the 3-fold axis (Fig. 1D). Mutations to methionine (ATG) were deleterious throughout the VP1 region, likely because the early initiation of translation there reduced the production of VP2 and VP3 monomers or because truncated VP1 products inhibit capsid formation.

We also developed assays for measuring the evasion of neutralizing antibodies and thermostability. We identified mutations that escape neutralization by the A20 monoclonal antibody, whose binding epitope on the capsid surface has been identified through cryo–electron microscopy (11). Many mutants escaped neutralization, with escaping mutations being more likely to arise from known A20 epitope positions (fig. S4). We further measured capsid thermostability by incubating the library at varying temperatures and then digesting any genomes released from capsids. Most mutations that decreased thermostability occurred at the 3-fold axis, suggesting that capsid disassembly initiates at these positions (fig. S5). These in vitro assays showed the utility and versatility of such libraries for studying complex AAV functions.

Our comprehensive codon-scanning approach enabled the detection of hidden gene products and genetic elements. In particular, functions independent of coding for the capsid could manifest as fitness differences among synonymous codons. We devised a Frameshift Score (FS) to detect the presence of frameshifted open reading frames (ORFs) by comparing the differences in fitness observed among synonymous cap mutants when stop codons in alternative reading frames were present or absent. We evaluated this metric using AAV production data across Assembly Activating Protein (AAP), a known frameshifted ORF within cap (9). We observed significant FS in the +1 frame (Fig. 2A), as expected given AAP’s essential role in AAV2 capsid assembly (9). Other frameshifted ORFs, such as the X gene, have been proposed (12), however, we detected no highly significant FS within this region.

Fig. 2 A frameshifted protein expressed from the VP1 region functions through competitive exclusion.

(A) Discovery of MAAP, a frameshifted ORF in the VP1 region. Top: Production fitness for mutations with stop codons in the +1 frame (red) and for cap codons synonymous to the red points but without creating stops in the +1 frame (gray). Solid lines indicate the 10-position moving average. Bottom: p-value for the observed difference in +1 frame stops and non-stops for the moving window of 10 positions. (B) Western blot of MAAP-3xFLAG with M2 anti-FLAG HRP antibody and an anti-GAPDH loading control. (C) Membrane association: confocal imaging of MAAP-GFP localization for AAV serotypes 2, 5, 8, and 9. Blue is membrane stain and green is green fluorescent protein. (D) Deleterious effects on production for MAAP stop codons relative to other synonymous codons in the cap gene, supplying in trans: pRep, pRep+MAAP, or pRep+MAAP negative controls. *p < 10−5 (Mann–Whitney U test). (E) MAAP mutants produce at levels similar to WT when expressed individually (top) are outcompeted by WT in a head-to-head format, but then rescued by pRep+MAAP in trans (bottom). **p < 0.05, ***p < 0.01 (one-way t-test).

Instead, we detected a +1 frameshifted ORF in the VP1 region (Fig. 2A). Guided by differences in fitness of synonymous codons, we identified cap positions 27 to 147 as the most likely ORF location. We hypothesized that the ORF starts with CTG, a noncanonical start codon. Supporting this hypothesis, all mutations to P27(CCT) were deleterious, except those that preserved the CTG start codon (fig. S6A). In this frame, translation continues until reaching a TAG stop codon, creating a protein 119 amino acids in length (fig. S7). A pBLAST search for this sequence on the National Center for Biotechnology Information (NCBI) website against the nonredundant protein database returned no proteins with significant homology.

To validate the ORF’s translation in the native context we added a FLAG tag at the C terminus and transfected the plasmid construct into HEK293T cells. Using Western blotting, we confirmed the presence of a protein migrating near the expected size of 16 kDa (Fig. 2B and fig. S6B). Synonymous changes to cap that mutated the hypothesized CTG start codon ablated the primary protein product, as did a cap mutation creating an early stop codon (Fig. 2B). Light bands at lower molecular weight indicated the potential presence of additional downstream start codons.

Intriguingly, although AAV2 assembles in the nucleolus (13), anti-FLAG immunofluorescence imaging revealed the ORF protein to be membrane associated (fig. S6C). We confirmed this by replacing the FLAG tag with a C-terminal GFP fusion and also by testing sequences derived from AAV5, AAV8, and AAV9. In all cases, the protein was associated with the cell membrane (Fig. 2C and figs. S6, C and D, and S8). On the basis of these observations, we proposed the name “membrane-associated accessory protein” (MAAP).

To understand the functional role of MAAP, we repeated the production experiment while supplementing the library with MAAP expressed in trans. This rescued the packaging abilities of VP mutants containing MAAP-null mutants (Fig. 2D). Although MAAP-null mutants produced individually did not have reduced titers relative to WT, when we assayed individual MAAP mutants in head-to-head competition with WT, MAAP mutants were outcompeted unless complemented in trans with functional MAAP (Fig. 2E). MAAP’s function therefore manifests through competitive exclusion, possibly explaining the high genome–capsid coupling observed for libraries of engineered AAV capsids (14).

We studied the effects of mutations on in vivo delivery by administering the virus library to mice through retroorbital injection. We collected blood 1 hour later; spleen, liver, kidney, heart, and lung 18 days later; and then sequenced barcodes from purified virus genomes (Fig. 3A and fig. S9). We chose five mutants with divergent biodistribution profiles and verified that biodistribution from individual variants matched our library-based measurements (Fig. 3B).

Fig. 3 Multiplexed measurement of in vivo biodistribution reveals phenotypic clustering and structural design principles.

(A) Biodistribution assay. (B) In vivo selection values for validation mutants in library format versus individual assays. (C) Projection of individual mutants onto PC1 and PC2 derived from PCA, colored by tissue enrichment. (D) Top: Highlighting k-means clusters of mutants with enhanced tissue targeting. Bottom: Mutant biodistribution values within each cluster. (E) Position of cluster mutants in capsid structure, rendering first all residues and then only residues from each cluster to show the importance of buried residues for cluster 3.

Principal component analysis (PCA) revealed distinct relationships between tropism profiles and capsid structure. The first two principal components explained 80% of the variance in tropism profiles. We identified three mutant clusters that increased biodistribution to at least one tissue (Fig. 3CD and fig. S10). Cluster 1 included the well-studied R585 and R588 mutants (15), which were depleted from the liver and enriched in the blood, as expected, and enriched in the heart and kidney. Cluster 2 mutants were similar, but selectively depleted from the heart. Cluster 3 mutants were depleted from the blood and spleen but enriched across the other tissues. The structural locations of these clusters were distinct: Mutants from cluster 1 and cluster 2 occurred in tight patches on the capsid surface near the 3-fold axis, whereas cluster 3 mutations were dispersed in buried positions throughout the capsid (Fig. 3E).

With most single–amino-acid changes being deleterious, introducing multiple mutations without breaking capsid function has been a limitation for engineering through random mutagenesis (16). We hypothesized that an additive model built from our data would approximate the fitness of nearby variants with multiple mutations, enabling the design of functional variants with greater throughput than rational design and higher efficiency than random mutagenesis.

To validate this hypothesis, we focused on capsid positions 561 to 588, a region containing both surface-exposed and buried positions. Informed by liver-biodistribution data from an additional in vivo mouse experiment focused on single mutations only at these positions (Fig. 4A), we designed multimutant variants by sampling mutations at each position proportional to their measured effect on liver delivery. Within the library, each designed target variant was tested either alone or with a complete path of stepwise edits from WT to the target itself (Fig. 4A).

Fig. 4 Machine-guided design of AAV capsids outperforms random mutagenesis.

(A) Left: Generation of multimutants from individual amino-acid liver-biodistribution measurements (black dots: WT). Mutation probability distributions for each position are calculated from single–amino-acid mutant fitness (top). Right: Sampled mutations are combined to generate a multimutant target variant and then all mutants from WT to target are synthesized and experimentally measured (top shows ordering by increasing fitness). (B) Top: Fraction of designed mutants with liver biodistribution values greater than WT for random (gray) and machine-guided design mutants (orange). Bottom: Distribution of biodistribution values for random and designed mutants separated by number of mutations. Even at distances farther than four mutations, the designed approach outperforms random mutagenesis (pink box).

Using this strategy, we designed 1271 variants in addition to 10,047 randomly generated mutants with 1 to 10 mutations from the WT reference and measured liver biodistribution in mice. The designed set contained a much higher fraction of mutants targeted to the liver. This trend was most pronounced when the number of mutations was four or more: 147 designed mutants (25.6% of those tested) were functional, whereas nearly all of the 4477 randomly generated mutants were not viable or had weaker liver tropism than WT (99.8%; Fig. 4B).

Our comprehensive, machine-guided design strategy generated viable mutants in a principled and high-throughput manner and is generalizable to other proteins and engineering challenges. Applied to AAV, such methods now enable the systematic optimization of natural capsids into synthetic variants with enhanced properties for emerging gene therapies.

Supplementary Materials

Materials and Methods

Figs. S1 to S10

References (1720)

References and Notes

Acknowledgments: We thank S. Biswas, N. Davidsohn, D. Goodman, N. Jain, G. Kuznetsov, M. Schubert, D. Thompson, and members of the Church lab for helpful discussions and J. Aach, R. Kishony, S. Slomovic, and H. Wang for feedback on early versions of the manuscript. Funding: This work was supported by grant nos. NIH-P50-HG005550 and NIH-RM1-HG008525, and by internal funding from the Harvard Wyss Institute. Computational resources for this work were provided by the AWS Cloud Credits for Research Program. Author contributions: Conceptualization, methodology, writing: P.J.O., E.D.K., S.S., and G.M.C.; Investigation: P.J.O. and E.D.K.; Formal analysis, Software: P.J.O., E.D.K., and S.S.; Supervision: E.D.K. and G.M.C. Competing interests: P.J.O., E.D.K., S.S., and G.M.C. are inventors on patent applications filed by Harvard University related to this work. E.D.K., S.S., and G.M.C. hold equity in Dyno Therapeutics, Inc. G.M.C.’s tech transfer and advisory roles are listed at: Data and materials availability: These data are available at the GEO website under accession number GSE139657. Barcode scripts are available at Commit hash: e42505099b6d5f1f64770e4fa90f17c198915662.

Stay Connected to Science

Navigate This Article