Accurate design of megadalton-scale two-component icosahedral protein complexes

See allHide authors and affiliations

Science  22 Jul 2016:
Vol. 353, Issue 6297, pp. 389-394
DOI: 10.1126/science.aaf8818

Designed to assemble

Symmetric macromolecular structures that form cages, such as viral capsids, have inspired protein engineering. Bale et al. used pairwise combinations of dimeric, trimeric, or pentameric building blocks to design two-component, 120-subunit protein complexes with three distinct icosahedral architectures. The capsid-like nanostructures are large enough to hold nucleic acids or other proteins, and because they have two components, the assembly of cargoes such as drugs and vaccines can be done in a controlled way.

Science, this issue p. 389


Nature provides many examples of self- and co-assembling protein-based molecular machines, including icosahedral protein cages that serve as scaffolds, enzymes, and compartments for essential biochemical reactions and icosahedral virus capsids, which encapsidate and protect viral genomes and mediate entry into host cells. Inspired by these natural materials, we report the computational design and experimental characterization of co-assembling, two-component, 120-subunit icosahedral protein nanostructures with molecular weights (1.8 to 2.8 megadaltons) and dimensions (24 to 40 nanometers in diameter) comparable to those of small viral capsids. Electron microscopy, small-angle x-ray scattering, and x-ray crystallography show that 10 designs spanning three distinct icosahedral architectures form materials closely matching the design models. In vitro assembly of icosahedral complexes from independently purified components occurs rapidly, at rates comparable to those of viral capsids, and enables controlled packaging of molecular cargo through charge complementarity. The ability to design megadalton-scale materials with atomic-level accuracy and controllable assembly opens the door to a new generation of genetically programmable protein-based molecular machines.

The forms and functions of natural protein assemblies have inspired many efforts to engineer self- and co-assembling protein complexes (124). A common feature of these approaches, as well as the structures that inspire them, is symmetry. By repeating a small number of interactions in geometric arrangements that are consistent with the formation of regular structures, symmetry reduces the number of distinct interactions and subunits required to form higher-order assemblies (2, 3, 25). Symmetric complexes can be designed to form through self-assembly of a single type of protein subunit or co-assembly of two or more distinct types of protein subunits. Multicomponent materials possess several important advantages, including the potential to control the initiation of assembly by mixing independently prepared components. This property could allow, for example, assembly to be performed in the presence of cargo molecules in order to package the cargo inside the designed nanomaterial. Thus far, only relatively small (24-subunit) two-component tetrahedra have been designed with high accuracy (20, 26). Packaging substantial amounts of cargo requires larger assemblies; icosahedral symmetry is the highest of the point group symmetries and therefore generally results in the maximum enclosed volume for a symmetric assembly formed from a protein subunit of a given size (27, 28).

We set out to design two-component icosahedral protein complexes capable of packaging macromolecular cargo through controlled in vitro assembly. The twofold, threefold, and fivefold rotational axes present within icosahedral symmetry provide three possible ways to construct such complexes from pairwise combinations of oligomeric building blocks; we refer to these architectural types as I53, I52, and I32 (fig. S1). The I53 architecture is formed from a combination of 12 pentameric building blocks and 20 trimeric building blocks aligned along the fivefold and threefold icosahedral symmetry axes, respectively (Fig. 1, A to E; I53 stands for icosahedral assembly constructed from pentamers and trimers). Similarly, the I52 architecture is formed from 12 pentamers and 30 dimers (Fig. 1F), and the I32 architecture is formed from 20 trimers and 30 dimers, each aligned along their corresponding icosahedral symmetry axes (Fig. 1G). To generate novel icosahedral assemblies, 14,400 pairs of pentamers and trimers, 50,400 pairs of pentamers and dimers, and 276,150 pairs of trimers and dimers derived from x-ray crystal structures (tables S1 to S3) were arranged as described above, with each building block allowed to rotate around and translate along its fivefold, threefold, or twofold symmetry axis. These degrees of freedom were systematically sampled to identify configurations that would be suitable for interface design, as assessed by several parameters, including the size and secondary structure content of the newly formed interface, and the relative orientation of backbone elements on the two sides of the interface. Protein-protein interface design calculations were then carried out on the resulting 66,115 designs of type I53, 35,468 designs of type I52, and 161,007 designs of type I32. The designs were filtered based on a variety of metrics, including interface area, predicted binding energy, and shape complementarity (29). Seventy-one designs of type I53, 44 of type I52, and 68 of type I32—derived from 23 distinct pentameric, 57 distinct trimeric, and 91 distinct dimeric protein scaffolds—were selected for experimental characterization (figs. S2 to S5 and table S4).

Fig. 1 Overview of the design method and target architectures.

In (A) to (E), the design process is illustrated with the I53 architecture. (A) An icosahedron is outlined with dashed lines, with fivefold symmetry axes (gray) going through each vertex and threefold symmetry axes (blue) going through each face of the icosahedron. (B) Twelve pentamers (gray) and 20 trimers (blue) are aligned along the fivefold and threefold symmetry axes, respectively. Each oligomer possesses two rigid-body degrees of freedom, one translational (r) and one rotational (ω), that are systematically sampled to identify (C)configurations with (D) a large interface between the pentamer and trimer that makes them suitable for protein-protein interface design; only the backbone structure and beta carbons of the oligomers are taken into account during this procedure. (E) Amino acid sequences are designed at the new interface to stabilize the modeled configuration. (F) The I52 architecture comprises 12 pentamers (gray) and 30 dimers (orange) aligned along the fivefold and twofold icosahedral symmetry axes. (G) The I32 architecture comprises 20 trimers (blue) and 30 dimers (orange) aligned along the threefold and twofold icosahedral symmetry axes.

Codon-optimized genes encoding each pair of designed sequences were cloned into a vector for inducible coexpression in Escherichia coli, with a hexahistidine tag appended to the N or C terminus of one subunit in each pair. The proteins were expressed at a small scale and purified by immobilized metal-affinity chromatography (IMAC); clarified lysates and purification products were subjected to gel electrophoresis under denaturing conditions to screen for soluble expression and copurification of the hexahistidine-tagged and nontagged subunits (fig. S6A). Designs that appeared to copurify were subsequently analyzed by nondenaturing gel electrophoresis to screen for slowly migrating species as an additional indication of assembly into higher-order materials (fig. S6B). Those found to both copurify and assemble were expressed at a larger scale and purified by IMAC, which was followed by size exclusion chromatography (SEC; fig. S7). Ten designs—four I53 (I53-34, I53-40, I53-47, and I53-50), three I52 (I52-03, I52-32, and I52-33), and three I32 (I32-06, I32-19, and I32-28)—yielded major SEC peaks near the elution volumes expected based on the diameters of the design models (Fig. 2 and table S4). Two other designs, I53-51 and I32-10, also appeared to form large, discrete assemblies, but their structures could not be verified by subsequent experiments (supplementary text and figs. S8 and S9).

Fig. 2 Experimental characterization by SEC and SAXS.

Computational design models (left), SEC chromatograms (middle), and SAXS profiles (right) are shown for (A) I53-34, (B) I53-40, (C) I53-47, (D) I53-50, (E) I52-03, (F) I52-32, (G) I52-33, (H) I32-06, (I) I32-19, and (J) I32-28. Design models (shown to scale relative to the 30-nm scale bar) are viewed down one of the fivefold symmetry axes, with ribbon-style renderings of the protein backbone (pentamers are shown in gray, trimers in blue, and dimers in orange). Coexpressed and purified designs yield dominant SEC peaks near the expected elution volumes for the target 120-subunit complexes and x-ray scattering intensities (gray dots) that match well with profiles calculated from the design models (green). Alternative configurations of the designs, generated by translating the oligomeric building blocks in the design models by ±10 Å and/or rotating them about their aligned symmetry axes by ±20°, generally fit worse with the SAXS data than the original design models do (the range of values obtained from fitting the alternative configurations is shown with light blue shading).

Small-angle x-ray scattering (SAXS) performed on the SEC-purified samples indicated that all 10 designs form assemblies similar to the intended three-dimensional configurations in solution. The experimentally measured SAXS profiles are feature-rich and distinct, with multiple large dips in scattering intensity in the region between 0.015 and 0.15 Å−1, each of which is closely recapitulated in profiles calculated from the design models (Fig. 2) (30). To further evaluate how accurately and uniquely the design models match the experimental data, each was compared with a set of alternative models generated by systematically perturbing the radial displacements and/or the rotations of the building blocks in each design by ±10 Å and ±20°, respectively. The vast majority of alternative configurations were found to produce worse fits to the experimental data than the original design models (Fig. 2), suggesting that the materials assemble precisely in solution.

The information provided by SAXS about the overall ensemble of the structures observed in solution for each design was complemented and corroborated by visualization of individual particles by negative-stain electron microscopy (EM). Micrographs of I53-34, I53-40, I53-47, I53-50, I52-03, I52-33, I32-06, and I32-28 show fields of particles with the size and shape of the design models, and particle averaging yielded distinct structures clearly mathching the models (Fig. 3). The large trimeric and pentameric voids observed in the I52 and I32 averages, for instance, closely resemble the cavities in projections generated from the corresponding design models when viewed down the threefold and fivefold symmetry axes, respectively. The turreted morphology of the I53-50 and I52-33 design models and projections, resulting from pentameric and dimeric components that protrude away from the rest of the icosahedral shell, are also readily apparent in the corresponding class averages. Although the results from SEC and SAXS strongly indicate that I52-32 and I32-19 form assemblies closely matching the design models in solution, both appear to be unstable under the conditions encountered during grid preparation, yielding broken particles that were not suitable for further EM analysis (fig. S10).

Fig. 3 Characterization of the designed materials by EM.

Raw negative-stain electron micrographs of coexpressed and purified (A) I53-34, (B) I53-40, (C) I53-47, (D) I53-50, (E) I52-03, (F) I52-33, (G) I32-06, and (H) I32-28. All raw micrographs are shown to scale relative to the 100-nm scale bar in (H). The insets show experimentally computed class averages (roughly corresponding to the fivefold, threefold, and twofold icosahedral symmetry axes; left column in each inset), along with back projections calculated from the design models (right column in each inset). The width of each inset box is 55 nm.

To further evaluate the accuracy of our designs, x-ray crystal structures were determined for one material from each of the three architectural types: I53-40, I52-32, and I32-28 (Fig. 4 and table S5). Although the resolution of the structures (3.5 to 5.6 Å) is insufficient to permit detailed analysis of the side chains at the designed interfaces, backbone-level comparisons show that the building block interfaces were designed with high accuracy, giving rise to 120-subunit complexes that match the computational design models very well. Comparing pairs of interface subunits from each structure with the design models yields backbone root mean square deviations (RMSDs) between 0.2 and 1.1 Å, whereas the RMSD over all 120 subunits in each material ranges from 0.8 to 2.7 Å (Fig. 4, A to C, and table S6). With diameters between 26 and 31 nm, over 130,000 heavy atoms, and molecular weights greater than 1.9 MDa, these structures are comparable in size to small viral capsids and, to our knowledge, are the largest designed biomolecular nanostructures to date to be verified by x-ray crystallography (fig. S11).

Fig. 4 Crystal structures, assembly dynamics, and packaging.

Design models (top) and x-ray crystal structures (bottom) of (A) I53-40, (B) I52-32, and (C) I32-28. Views are shown to scale along the threefold, twofold, and fivefold icosahedral symmetry axes. Pentamers are shown in gray, trimers in blue, and dimers in orange. RMSDs are between crystal structures and design models over all backbone atoms in all 120 subunits. (D) In vitro assembly dynamics of I53-50. A schematic is shown in the upper panel. Below, normalized static light-scattering intensity is plotted over time after mixing independently expressed and purified variants of the I53-50 trimer and pentamer in a 1:1 molar ratio at final concentrations of 8, 16, 32, or 64 μM (blue, orange, gray, and black solid lines, respectively, representing detector voltage). Intensities measured from SEC-purified assembly at concentrations of 8, 16, 32, or 64μM are indicated with dashed horizontal lines and used as the expected end point of each assembly reaction. The midpoint of each reaction is marked with a dashed vertical line. (E) Encapsulation of supercharged GFP in a positively charged I53-50 variant. A schematic is shown in the upper panel (bright green, GFP). SEC chromatograms and SDS-PAGE analyses of packaging and assembly reactions are shown below. The reactions were performed in either 65 mM NaCl (top and bottom) or 1 M NaCl (middle). I53-50A.1PT1 and I53-50B.4PT1 are variants of the trimeric and pentameric components of I53-50 bearing several mutations to positively charged residues; I53-50A.1 is a control variant of the trimeric component that lacks these mutations (supplementary text). In each case, the same buffer used in the packaging and assembly reaction was also used during SEC. Absorbance measurements at 280 nm (black) and 488 nm (green) are shown. Each SEC chromatogram was normalized relative to the 280-nm peak near 12 ml elution volume. Locations of 37-, 25-, 20-, and 15-kDa–molecular weight markers on SDS-PAGE gels are indicated by horizontal lines to the left of the gels.

The multicomponent composition of the materials presents the possibility of controlling their assembly through in vitro mixing of independently produced building blocks (20). Taking advantage of this feature, the assembly kinetics of an I53-50 variant (fig. S12A) with improved individual subunit stability were investigated by light scattering (supplementary materials). SEC-purified components were mixed at concentrations of 64, 32, 16, or 8 μM, and the change in light scattering was monitored over time (Fig. 4D). Assembly was roughly halfway complete within 1 min at 64 and 32 uM, within 3 min at 16 uM, and within 10 min at 8 uM. Similar assembly time scales have been observed for several viral capsids (31, 32). Because our design process focused exclusively on structure without any consideration of kinetics, these results raise the interesting possibility that the rate of assembly of these viral capsids has not been highly optimized during evolution.

The ability to assemble the materials in vitro potentially enables the controlled packaging of macromolecular cargoes. To investigate this possibility, the trimeric and pentameric components of an I53-50 variant with several mutations to positively charged residues on its interior surface (supplementary materials) were successively mixed with a supercharged green fluorescent protein (GFP) with a net charge of –30 (33), and encapsulation was evaluated using SEC followed by SDS–polyacrylamide gel electrophoresis (PAGE) of relevant fractions (Fig. 4E and supplementary materials). When both the packaging reaction and SEC were performed in a buffer containing low (65 mM) NaCl, GFP(–30) and both I53-50 components coeluted from the column at the same elution volume previously observed for unmodified I53-50 (Fig. 2D). Mixtures of GFP(–30) with only one of the two components eluted at later volumes, indicating that the observed coelution requires assembly of I53-50 (fig. S12, B to D). When the packaging reaction was carried out in a buffer containing high (1 M) NaCl or using a variant of the trimeric component that lacked mutations to positively charged residues on the interior surface, little to no coelution was observed (Fig. 4E), suggesting that packaging is driven by the engineered electrostatic interactions between the I53-50 interior and GFP(–30). High-salt incubation resulted in disassociation of packaged GFP (fig. S12E), as has also been observed for an evolved variant of a naturally occurring protein container that packages cargo by means of electrostatic complementarity (34, 35). Based on measurements of fluorescence intensity and ultraviolet–visible light absorbance, we estimate that about 7 to 11 GFPs are packaged per icosahedral assembly in 65 mM NaCl, occupying roughly 11 to 17% of the interior volume (supplementary materials).

How do the architectures of our designs compare to those of virus capsids and other icosahedral protein complexes found in nature? Our designs obey strict icosahedral symmetry, with the asymmetric unit in each case containing a heterodimer that comprises one subunit from each of the two components. The most similar naturally occurring structures of which we are aware are cowpea mosaic virus (CPMV) and related 120-subunit capsids with pseudo T = 3 symmetry [T refers to the triangulation number 27)]. Like our I53 designs, CPMV is composed of 60 copies each of two distinct protein subunits, with one type of subunit arranged around the icosahedral fivefolds and a second type of subunit arranged around the threefolds (fig. S13). However, the two subunits of CPMV are composed of three similar domains occupying spatially equivalent positions to those found in T = 3 assemblies formed from 180 copies of a single type of protein subunit (36, 37). Our I53 designs display no such underlying pseudosymmetry and therefore cannot be considered to be pseudo T = 3. Furthermore, we are not aware of any natural protein complexes characterized to date that exhibit I52 or I32 architectures. Our designs thus appear to occupy new regions of the protein assembly universe, which either have not yet been explored by natural evolution or are undiscovered at present in natural systems.

The size and complexity of the materials presented here, together with the accuracy with which they assemble, push the boundaries of biomolecular engineering into new territory. The large lumens of our designed materials, combined with their multicomponent nature and the ability to control assembly through mixing of purified components, make them well suited for encapsulation of a broad range of materials including small molecules, nucleic acids, polymers, and other proteins. These features, along with the precision and modularity with which they can be engineered, make our designed nanomaterials attractive starting points for new approaches to targeted drug delivery, vaccine design, and bioenergy.

Supplementary Materials

Materials and Methods

Supplementary Text

Figs. S1 to S13

Tables S1 to S6

References (3860)

Databases S1 to S3

References and Notes

Acknowledgments: We thank M. Sawaya and M. Collazo for their assistance with crystallography, conducted at the UCLA-DOE X-ray Crystallization and Crystallography Core Facilities, which are supported by DOE grant DE-FC02-02ER63421. We thank M. Capel, K. Rajashankar, N. Sukumar, J. Schuermann, I. Kourinov, and F. Murphy at Northeastern Collaborative Access Team beamlines 24-ID-E and 24-ID-C at the Advanced Photon Source (APS), which are supported by grants from the National Center for Research Resources (5P41RR015301-10) and the National Institute of General Medical Sciences (8 P41 GM103403-10) of the National Institutes of Health. Use of the APS is supported by DOE under contract no. DE-AC02-06CH11357. We thank the staff at the Advanced Light Source SIBYLS beamline at Lawrence Berkeley National Laboratory, including K. Burnett, G. Hura, M. Hammel, J. Tanamachi, and J. Tainer for the services provided through the mail-in SAXS program, which is supported by the DOE Office of Biological and Environmental Research Integrated Diffraction Analysis program and the NIH project MINOS (Macromolecular Insights on Nucleic Acids Optimized by Scattering; grant no. RO1GM105404). We also thank U. Nattermann for help with EM, Y. Hsia for assistance with light-scattering experiments, C. Stafford for mass spectroscopy assistance, B. Nickerson for assistance with in vitro assembly experiments, and G. Rocklin for providing scripts used in data analysis. This work was supported by the Howard Hughes Medical Institute (S.G., D.C., T.G., and D.B.) and its Janelia Research Campus visitor program (S.G.), the Bill and Melinda Gates Foundation (D.B. and N.P.K.), Takeda Pharmaceutical Company (N.P.K.), NSF (grant no. CHE-1332907 to D.B. and T.O.Y.),the Air Force Office of Scientific Research (grant no. FA950-12-10112 to D.B.), and the Defense Advanced Research Projects Agency (grant no. W911NF-14-1-0162 to D.B. and N.P.K.). Y.L. was supported by a Whitcome Fellowship through the UCLA Molecular Biology Institute, and J.B.B. was supported by a NSF graduate research fellowship (grant no. DGE-0718124). Coordinates and structure factors were deposited in the Protein Data Bank with accession codes 5IM5 (I53-40), 5IM4 (I52-32), and 5IM6 (I32-28). J.B.B., W.S., N.P.K., D.E., and D.B. have filed a nonprovisional U.S. patent application, no. 14/930,792, related to the work presented herein.

Stay Connected to Science

Navigate This Article