Design of ordered two-dimensional arrays mediated by noncovalent protein-protein interfaces

See allHide authors and affiliations

Science  19 Jun 2015:
Vol. 348, Issue 6241, pp. 1365-1368
DOI: 10.1126/science.aaa9897

Designing proteins to self-assemble

DNA has been used as a nano building material since the 1980s. Protein nanostructures have the potential to give greater geometric control and shape variability. Gonen et al. describe the computational design of proteins that self-assemble into two-dimensional arrays. These programmable protein lattices should enable new approaches in biomolecular structure determination and molecular sensing.

Science, this issue p. 1365


We describe a general approach to designing two-dimensional (2D) protein arrays mediated by noncovalent protein-protein interfaces. Protein homo-oligomers are placed into one of the seventeen 2D layer groups, the degrees of freedom of the lattice are sampled to identify configurations with shape-complementary interacting surfaces, and the interaction energy is minimized using sequence design calculations. We used the method to design proteins that self-assemble into layer groups P 3 2 1, P 4 21 2, and P 6. Projection maps of micrometer-scale arrays, assembled both in vitro and in vivo, are consistent with the design models and display the target layer group symmetry. Such programmable 2D protein lattices should enable new approaches to structure determination, sensing, and nanomaterial engineering.

Programmed self-assembly provides a route to patterning matter at the atomic scale. DNA origami methods (1, 2) have been used to generate a wide variety of ordered structures, but progress in designing protein assemblies has been slower owing to the greater complexity of protein-protein interactions. Biology provides a number of examples of ordered two-dimensional (2D) protein arrays: Bacterial S-layer proteins assemble into oblique, square, or hexagonal planar symmetry (3); gap-junction plaques, abundant in muscle and heart tissue, display hexagonal planar symmetry (4); and water channels display square planar symmetry (5). Although proteins that form ordered 3D crystals have been designed (6) and 2D lattices have been generated by genetically fusing or chemically cross-linking oligomers with appropriate point symmetric groups (710), there has been little success in designing self-assembling 2D lattices with order sufficient to diffract electrons or x-rays below 15 Å resolution (7). Naturally occurring 2D arrays and assemblies are stabilized by extensive noncovalent interactions between protein subunits (10, 11), and this principle has been used to design self-assembling tetrahedral and octahedral cages (12, 13).

We sought to design ordered 2D arrays mediated by designed protein-protein interfaces stabilized by extensive noncovalent interactions. We focused on symmetric arrays, as symmetry reduces the number of distinct protein interfaces required to stabilize the lattice (14, 15). There are 17 distinct ways (layer groups) in which 3D objects can come together to form periodic 2D layers (16). In some layer groups, there are only two unique interfaces between identical subunits, in others, three or four (17). To simplify the design challenge, we focused on the layer groups that involve only two unique interfaces and building blocks with internal point symmetry (which already contain one of the two required interfaces), which leaves only one unique interface to be designed to form the 2D array. Of the 17 layer groups, 11 have two unique interfaces; we focused here on 6 of these 11 groups involving cyclic rather than dihedral point groups because there are considerably more cyclic oligomers than dihedral oligomers in the Protein Data Bank that can serve as building blocks. The six layer groups with two unique interfaces that can be built from cyclic oligomers are P 2 21 21 (from C2 building blocks), P 3 and P 3 2 1 (from C3 building blocks), P 4 and P 4 21 2 (from C4 building blocks), and P 6 (from C6 building blocks). The different groups have different numbers of degrees of freedom describing the placement of an object with cyclic symmetry in the lattice, for example, for P 3 2 1 (Fig. 1A) and P 4 21 2 (Fig. 1F), there are three degrees of freedom, whereas for P 6 (Fig. 1K) there are only two.

Fig. 1 Computational design strategy and experimental analysis of designed arrays.

(A) The P 3 2 1 unit cell with threefold axes represented by triangles. Yellow (–) and purple (+) C3 objects have opposite orientations along the z axis. (Inset) The three degrees of freedom of the lattice. (B) p3Z_42 2D array. (C) p3Z_42 designed interface with “zipper-like” hydrophobic packing and peripheral hydrogen bonds. (D) Large (>1 μm) E. coli–grown array (middle), higher magnification view with lattice spacing as in (B) (right), and Fourier transform (amplitudes) of the large array (left). (E) (left) Projection map at 15 Å calculated from a large array. (Right) overlay of the p3Z_42 design model on the projection map. (F) The P 4 21 2 lattice. Ovals represent twofold axes and squares, fourfold axes. (G) p4Z_9 array. (H) p4Z_9 designed interface. (I) Negatively stained E. coli–grown array (main panel), an in vitro refolded lattice at higher magnification (inset), and Fourier transform of the main panel (left). (J) Projection map at 14 Å calculated from an E. coli array as in (I) without (left) and with (right) p4Z_9 design model. (K) The P 6 lattice has two degrees of freedom (A,θ) (inset) available for sampling. Sixfolds are represented by hexagons. (L) p6_9H array. (M) p6_9H designed interface. (N) p6_9H lattice grown in vivo with Fourier transform at left and higher magnification view at right. (O) Projection map at 14 Å of p6_9H from E. coli–grown arrays as in (N) and cartoon overlay (right). All scale bars: black, 5 nm; white, 50 nm.

We used symmetric docking in Rosetta (14, 18, 19) to search for placements of cyclic oligomers into each of the six layer groups with shape-complementary (20) interfaces between different oligomer copies. The docking scoring function consisted of a soft sphere model of steric interactions and a simple measure of the designable interface area: the number of interface Cβs within 7 Å. For each cyclic oligomer in each layer group, ~20 independent Monte Carlo docking trajectories were carried out that started from placements of six to nine copies of the oligomer with its symmetry axis aligned with the corresponding symmetry axes of the layer group (for example, trimers were placed on the threefold symmetry axes indicated by the triangles in Fig. 1A, tetramers on the fourfold symmetry axes indicated by squares in Fig. 1F, and hexamers on the sixfold symmetry axes indicated by hexagons in Fig. 1K). In the Monte Carlo docking simulations, the degrees of freedom sampled were those compatible with the layer group [Fig. 1, A, F, and K (right)], and hence, the layer group symmetry was preserved throughout the calculations.

We then selected the most shape-complementary (largest number of contacting residues with fewest clashes) solutions from the trajectories and carried out Rosetta sequence design calculations to generate well-packed low-energy interfaces between oligomers. Monte Carlo searches were carried out over all amino acid identities and side-chain rotamer states for residues near the newly formed interface between oligomers, while optimizing the Rosetta all-atom energy of the entire complex (12, 13, 21). After this sequence design step, the energy was further minimized with respect to the side-chain torsion angles of residues near the interface and the symmetric degrees of freedom of the layer group. Finally, the resulting lattice models were filtered on the basis of the shape-complementarity of the designed interface (>0.5), surface area of the designed interface (>400 Å per monomer), buried unsatisfied hydrogen bonds introduced at the new interface (<4 using a 1.4 Å solvent accessibility probe) (22), and predicted relative free energy (23) of complex formation (≤ 10 Rosetta energy units per subunit) (sample Rosetta script files accompany the supplementary material). After further sequence optimization (13, 24), models passing the filters were manually inspected, and 62 designs were selected for experimental characterization; 16 for P 2 21 21, 2 for P 3, 10 for P 3 2 1, 16 for P 4, 3 for P 4 21 2, and 15 for P 6.

Synthetic genes were obtained for the 62 designs, and the proteins were expressed in the Escherichia coli cytoplasm by using a standard T7-based expression vector. Of the 62 designs, 43 expressed; of these, 18 had protein in the supernatant after clearing the lysate at 12,000g for 30 min, whereas all 43 had protein in the pellet. To investigate the degree of order in the pelleted material, we examined negatively stained samples by electron microscopy (EM). Regular lattices were observed for four of the designs: One formed only stacked 2D layers (fig. S1), whereas three formed planar arrays. The latter are described in the following sections.

Design p3Z_42 is in layer group P 3 2 1. The rigid body arrangement of the constituent β-helix trimers in the lattice was identified by Monte Carlo search over the three degrees of freedom of the lattice: the rotation of the trimer around its axis θ, the lattice spacing A, and the z offset of the trimer from the lattice plane (Fig. 1A). In the lattice identified in the Monte Carlo docking calculations, the oligomeric building blocks pack into a dense array (Fig. 1B; the yellow and purple copies are inverted with respect to each other) stabilized by a large contact surface between adjacent copies with close complementary side-chain packing (Fig. 1C) generated in the sequence design calculations.

p3Z_42 formed large and very well ordered 2D crystals (Fig. 1D). Most of the protein expressed in E. coli appeared to assemble in these 2D crystals, as there was very little present in the soluble fraction (fig. S3). At low (16°C) expression temperatures, 2D sheets were obtained (Fig. 1D), whereas at 37°C, where larger amounts of proteins are produced, large 2D sheets stacked mainly into thick 3D crystals. Higher magnification (Fig. 1D, inset) showed a trigonal lattice similar to that of the design model [compare Fig. 1D (right) with Fig. 1B]. Fourier transformation of the lattice [Fig. 1D (left)] yielded peaks out to 15 Å resolution; the order in the unstained lattice is probably markedly higher, as the negative stain likely limits the observed resolution. A 15 Å projection map (Fig. 1E) back-computed from the Fourier components followed the contour of the designed lattice [Fig. 1E (right)] (unit cell dimensions a = b = 85 Å, γ = 120°). It is notable that planar crystals of such large size can grow without support within the confines (and with the many cellular obstacles) of an E. coli cell. Cell-free expression of this design yielded large, ordered 2D crystals similar to those formed in E. coli (fig. S4A).

Design p4Z_9 is in layer group P 4 21 2. Search over the three degrees of freedom of the layer group [the rotation around the internal C4 axis, the lattice spacing, and the z offset between adjacent inverted tetramers (Fig. 1F)] yielded the close-packed arrangement shown in Fig. 1G (side view in fig. S2B). The designed interface is composed of hydrophobic residues nestled between two α helices surrounded by polar residues (Fig. 1H).

p4Z_9 formed crystals up to 1 μm in width (Fig. 1I) with little of the protein present in the soluble fraction (fig. S3). Incubation of the pellet material with 6 M guanidine and subsequent purification and refolding (by dialysis or fast dilution) yielded crystalline 2D arrays and fibers with the same square packing (fig. S4, B and C). Fourier transformation of the negatively stained large 2D lattices generated in vivo yielded peaks out to 14 Å resolution [Fig. 1I (left)]. The 14 Å projection map produced by back-transformation had distinctive rectangular voids in alternating directions, which closely matched the design model [Fig. 1J and 1J (right)] (unit cell dimensions a = b = 56 Å, γ = 90°).

Design p6_9 is built from α-helical hexamers in layer group P 6. In this case, all oligomers are in the same orientation along the z axis (perpendicular to the plane in Fig. 1K), and hence, there are only two degrees of freedom—the rotation around the sixfold axis and the lattice spacing [Fig. 1K (right)]. The shape-complementary docking solution (Fig. 1L and side view fig. S2C) is composed of four closely associating α helices along the twofold axis of the lattice (Fig. 1M) with two interacting phenylalanines. We also tested a variant, p6_9H, which introduces a hydrogen bond network across the interface (Fig. 1M).

Design p6_9 expressed in E. coli was found in both the supernatant and pellet (fig. S3). EM investigation revealed that the pellet contained highly ordered single-layer 2D hexagonal arrays, whereas the supernatant did not. p6_9H formed even larger arrays (Fig. 1N, fig. S5, and table S1). The 2D layers in the pellet were highly ordered with clearly evident hexagonal packing [Fig. 1N and 1N (inset)]. Fourier transformation of the negatively stained arrays [Fig. 1N (left)] yielded peaks out to 14 Å resolution; and the back-computed 14 Å map was again closely consistent with the design model of the array [Fig. 1O and 1O (right)] (unit cell dimensions: a = b = 120 Å, γ = 120°). Large arrays were also formed in vitro after concentration of soluble p6_9H purified from the supernatant after lysis of E. coli (fig. S4, D and E).

To achieve higher resolution than possible with negatively stained samples, we analyzed designs without stain by electron cryomicroscopy (cryo-EM). Analysis of p3Z_42 crystals by cryo-EM (Fig. 2, A and B) and electron diffraction yielded data to 3.5 Å resolution (Fig. 2C). The vast majority of crystals diffracted to this resolution in the cryo preparations, indicating high long-range order. Movie micrographs of the resulting crystals were also collected, motion corrected, and processed in 2dx (25) to yield a projection map at 4 Å resolution in agreement with the design model (Fig. 2, compare D and E). To our knowledge, this is the highest order observed to date for a designed macromolecular 2D lattice.

Fig. 2 Cryo-EM analysis of design p3Z_42.

(A) Cryo-EM micrograph of E. coli–grown p3Z_42 recorded from nonpurified, resuspended insoluble material. (B) Fourier transform calculated from motion-corrected movies taken from samples like those in (A). (C) Electron diffraction of a crystal as in (A). (D) Projection map at 4 Å calculated from motion-corrected movies from material as in (A) showing a linked repeat-protein arrangement similar to the p3Z_42 design model. The unit cell is shown in blue and contains two alternating trimeric units. Triangular density at the corners of the unit cell is likely an averaging artifact. (E) p3Z_42 design model in a similar view as in (D). Scale bar, 50 nm.

Our designed planar protein arrays form large planar 2D crystals both in vivo and in vitro that are closely consistent with the design models. Two of the three successes were with layer groups with adjacent building blocks in opposite orientations along the z axis; these have the advantages that (i) there is an additional degree of freedom (the z offset) providing more possible packing arrangements for a given oligomeric building block; (ii) the interfaces are antiparallel rather than parallel so that, in the design calculations, opposing residues can have different identities; and (iii) inaccuracies in the design calculations that result in deviation from planarity effectively cancel out. On the other hand, designed “polar” arrays with all subunits oriented in the same direction—such as p6_9—have advantages for functionalization, as the two sides are distinct and can be addressed separately.

It is notable that, for all three designs, extensive crystalline arrays form unsupported in E. coli and from purified protein in vitro. The coherent arrays can extend up to 1 μm in length but are only 3 to 8 nM thick by design (fig. S2). We anticipate that even larger and perhaps more highly ordered crystals would form on a solid support, which will be useful for future nanotechnology applications. The ability to precisely design 2D arrays at the near atomic level should enable new approaches in structural biology [fusing proteins of unknown structure to array components for electron crystallography or using these to nucleate 3D crystal growth for x-ray and MicroED (26) applications], new sensing modalities with the coupling of analyte binding domains to the arrays, and the organization of enzyme networks and light-harvesting chromophores in two dimensions.

Supplementary Materials

Materials and Methods

Supplementary Text

Figs. S1 to S5

Tables S1 and S2

References (2739)

References and Notes

  1. Acknowledgments: We thank D. Shi and J. de la Cruz for help with EM, S. Sanchez-Martinez for help with protein expression, J. Bale and N. King for support and helpful discussions, and W. Sheffler for Rosetta code. We would also like to thank members of both the Baker and Gonen labs for scripts and useful discussions. We thank HHMI’s Janelia Research Campus visitor program, the University of Washington’s Biochemistry Department and Biological Physics, Structure, and Design program, U.S. Defense Threat Reduction Agency, U.S. Air Force Office of Scientific Research, and the Howard Hughes Medical Institute for funding and support. S.G., F.D., T.G., and D.B. are inventors on a provisional patent application that covers the method for the design of self-assembling 2D protein material. Supporting materials, methods and the design models and sample design files are available in the supplementary materials. Author contributions: S.G. and F.D. worked on the docking and design. S.G. worked on design optimization. F.D. wrote Rosetta code. S.G., F.D., and D.B. computationally analyzed the designs. S.G. worked on the biochemistry, electron microscopy, and data analysis. S.G. and T.G. analyzed the EM data. All authors designed the research and wrote, edited, and contributed to the manuscript.
View Abstract


Navigate This Article