Research Article

High-Resolution Protein Design with Backbone Freedom

See allHide authors and affiliations

Science  20 Nov 1998:
Vol. 282, Issue 5393, pp. 1462-1467
DOI: 10.1126/science.282.5393.1462


Recent advances in computational techniques have allowed the design of precise side-chain packing in proteins with predetermined, naturally occurring backbone structures. Because these methods do not model protein main-chain flexibility, they lack the breadth to explore novel backbone conformations. Here the de novo design of a family of α-helical bundle proteins with a right-handed superhelical twist is described. In the design, the overall protein fold was specified by hydrophobic-polar residue patterning, whereas the bundle oligomerization state, detailed main-chain conformation, and interior side-chain rotamers were engineered by computational enumerations of packing in alternate backbone structures. Main-chain flexibility was incorporated through an algebraic parameterization of the backbone. The designed peptides form α-helical dimers, trimers, and tetramers in accord with the design goals. The crystal structure of the tetramer matches the designed structure in atomic detail.

Proteins exhibit precise geometric packing of atoms in their interiors. Nevertheless, empirical protein design methods have achieved a measure of generality and simplicity by ignoring detailed interactions between amino acid residues. These design approaches rely instead on imitation of statistical sequence patterns in naturally occurring proteins, such as hydrophobic-polar residue patterns, amino acid secondary structure propensities, and characteristic local interaction motifs (1). Perhaps as a direct consequence, many designed proteins exhibit fluctuating or “molten” interiors (2), and some assume unintended tertiary conformations (3).

Packing in proteins has been studied computationally by holding the protein main chain in the wild-type conformation (the “fixed-backbone” approximation) and asking which sets of amino acid side chains can efficiently fill the interior space (4). This technique has been used successfully to repack wild-type side chains into predetermined backbone structures (5) and has more recently been extended to the design of amino acid sequences. Fixed-backbone design based on naturally occurring backbone templates has produced proteins that fold to the target structures with high thermal stabilities (6).

However, fixed-backbone methods are unable to model protein main-chain flexibility. Thus, main-chain adjustments known to occur in response to core mutations in proteins (7) are not allowed. Moreover, the fixed-backbone approach has a severe limitation when applied to backbone structures for which a naturally occurring example does not exist. Although naturally occurring backbone coordinates represent the ground-state conformation for at least one sequence (the naturally occurring sequence), this assumption is not necessarily valid for arbitrary backbone coordinates. When the backbone structure is designed de novo, a complete set of plausible backbone conformations must be sampled to identify structures that lie at a free-energy minimum in the sequence and conformational spaces.

These limitations may be overcome by treating the backbone as a parametric family of structures rather than as a static entity (8). A small but well-defined subset of main-chain conformations can then be exhaustively sampled in a finite time. For example, coupled searches of side-chain packing and main-chain conformation under a parametric coiled-coil backbone model have been used to reproduce detailed, crystallographically observed conformations for coiled-coil proteins (8).

A true test of the utility of parametric-backbone models in protein design would be to engineer a protein fold for which no structural example is known. We report here the computational design of a family of dimeric, trimeric, and tetrameric α-helical bundles with a right-handed superhelical structure. Although a nuclear magnetic resonance (NMR) structure of a right-handed dimer of helices in detergent micelles has been reported (9), no structures of trimeric or tetrameric right-handed coiled coils exist. The x-ray crystal structure of the tetrameric bundle designed here matches the predicted structure in atomic detail, adopting an unprecedented, but deliberately engineered, right-handed superhelical fold.

Design principles. The first step of our design is based on an analysis of the hydrophobic-polar residue pattern in left-handed coiled coils. The superhelical twist of left-handed coiled coils arises from a small difference between the integral frequency of the heptad repeat and the characteristic frequency of α helices (10). Each amino acid in a straight helix rotates about 100° radially around the helix axis (360° for 3.6 residues). Seven amino acids rotate 700°, lagging two full turns (720°) by 20°. A sevenfold repeat, therefore, forms a left-handed stripe in a straight α helix (Fig. 1). In a left-handed supercoiled conformation that evolves 20° every seven residues, this stripe can always face toward the axis of superhelical rotation (Fig. 1).

Figure 1

(Left) Sevenfold hydrophobic repeats give rise to left-handed coiled coils, and 11-fold repeats to right-handed coiled coils. (A) A heptad repeat in a regular α helix produces a left-handed stripe and a left-handed supercoil. This arrangement is schematically illustrated alongside the standard sevenfold helical wheel projection for coiled coils. (B) An undecatad repeat in a regular α helix produces a right-handed stripe and a right-handed supercoil. The 11-fold helical wheel projection is illustrated. H, hydrophobic residue; P, polar residue; +/−, charged residue.

Application of this principle to an 11-fold (undecatad) amino acid repeat suggests that a right-handed supercoil should form. Eleven amino acids rotate about 1100°, which leads three full turns (1080°) by 20°. Thus, an undecatad repeat produces a right-handed stripe in a straight α helix, which should give rise to a right-handed supercoil (Fig. 1). Examination of an 11-residue helical-wheel projection indicates that amino acids in the first, fourth, and eighth positions (positions a, d, and h) fall on the same surface of the helix. These considerations suggest that a 3-4-4 hydrophobic repeat might specify a right-handed coiled coil.

The second step of our design consisted of determining which amino acids can pack the core of a right-handed bundle with a 3-4-4 hydrophobic repeat, and by their shapes direct dimer, trimer, or tetramer formation. Detailed dimer, trimer, and tetramer right-handed coiled coils were modeled for all possible core sequences made up of the small aliphatic amino acids alanine, valine, norvaline, leucine, isoleucine, and alloisoleucine. Alloisoleucine, the stereoisomer of isoleucine with inverted chirality at the Cβ carbon, was included in the design calculation because preliminary models suggested the need for a residue that would orient side-chain volume into a trans χ1 dihedral angle in its most commonly occurring rotamer. Norvaline (an n-propyl side chain) was used as a general straight-chain analog for methionine and lysine (11). These six amino acids could be placed at the core positions a, d, and h in 216 (63) possible sequences. However, because most of the amino acids have multiple side-chain conformations, about 25 possible side-chain rotamers exist at each position. For simplicity, the design calculations were limited to the 11 lowest energy rotamers at each core position in each oligomeric state, as determined by single-level packing calculations (12). Thus, 3993 (3 × 113) structures (ignoring variations in backbone conformation) were generated for the design analysis.

Computation. To model the right-handed structures, we modified a technique previously used to predict crystal structures of natural left-handed coiled coils (8, 12). For each of the 3993 core rotamer conformations, main-chain coordinates were determined by exploring a parametric family of superhelix backbones described originally by Francis Crick (13). The parametric backbone algebra for left-handed coiled coils was altered to reflect an 11-fold amino acid repeat. Periodic boundary conditions were applied, and two-, three-, or fourfold rotational symmetry was imposed around the superhelix axis. For the computational studies, alanine residues were placed at the exterior positions of the undecatad repeat (positions b, c, f, g, and j), and α-amino butyric acid residues (an ethyl side chain) were placed at positions on the boundary of the hydrophobic core (positions e, i, and k). As the backbone coordinate search for each core rotamer conformation required ∼3 min on a MIPS R3000 processor, the entire calculation took about 8 days.

A difficult aspect of protein design is the need to compare unfolding free energies for different candidate amino acid sequences. The effects of sequence changes on unfolding free energies can be calculated as the difference in the free energy of mutation in the folded and unfolded states (14). To avoid explicitly modeling the unfolded state, which consists of a large ensemble of conformations, we modified this basic strategy by calculating an energy of permutation: the energy difference between two different covalent arrangements of the same amino acids (Fig. 2). Whereas energies of mutation are seldom zero, an energy of permutation will be zero if the amino acid side chains do not interact with each other, which we assume to be the case in the unfolded state. Importantly, conformation-independent energetic biases for one amino acid over another in the underlying potential energy function are canceled in a permutation difference. The permutation used here interchanges an amino acid at position f of a left-handed coiled coil with an amino acid in the hydrophobic core of a right-handed coiled coil. Using experimentally measured unfolding free energies of f-substituted left-handed coiled coils (15), and assuming that the free energy of permutation in the unfolded state is zero, we derived calculated unfolding energy differences for different undecatad sequences in the right-handed model conformation (16).

Figure 2

(Right) Calculating the effects of sequence changes on unfolding free energies, illustrated for an isoleucine-to-valine substitution at position a of a right-handed coiled-coil dimer. The top panels show axial projections of one right-handed and one left-handed coiled coil. The top left panel differs from the top right panel by interchange of the amino acid at position a of the right-handed coiled coil with the amino acid at position f of the left-handed coiled coil. The lower panels show the unfolded polypeptides. Each leg of the thermodynamic cycle is labeled with a letter. The legs labeled A and B correspond to unfolding of the wild-type and mutant coiled-coil sequences respectively. The legs labeled C and D correspond to residue permutation in the folded and unfolded states, respectively. Because the cycle is closed, the difference in the two unfolding free energies is equal to the difference in the two permutation free energies: A − B = C − D. The A and B legs are a sum of two terms, the unfolding free energy for the right-handed coiled coil, ΔG RH unfold, and the unfolding free energy for the left-handed coiled coil, ΔG LH unfold. Expanding A and B and rearranging terms gives [ΔG RH unfold, (I at a) – ΔG RH unfold, (V at a)] = C − D + [ΔG LH unfold (I at f) – ΔG LH unfold, (V at f)]. For the computational studies reported here (12, 16), D was assumed to be 0 kcal/mol, C was computed from the bonded, van der Waals and hydrogen bonding terms of the CHARMM19 potential (26) and a solvent accessible surface hydration potential (25), and [ΔG LH unfold, (V at f) – ΔG LH unfold, (I at f)] was taken from experimentally measured free energies of unfolding (15). Differences in calculated stability are dominated by the CHARMM19 potential, which accounts for 80% of the variance in the calculated stabilities of right-handed coiled-coil sequences. The surface hydration potential and left-handed coiled-coil unfolding free energies each account for ∼10% of the variance in the calculated stabilities.

Experimentally observed structures. Two criteria were used to choose optimal core sequences for each oligomeric state. First, the stability of the sequence in the target conformation was required to be high. Second, the specificity of the sequence for forming the target conformation, instead of the two alternative oligomeric states, was maximized. We took the following approach: (i) Within each oligomeric state, the mean stability averaged over the entire family of 216 possible sequences was tabulated; stabilities of individual sequences were expressed as standard deviations from this mean. (ii) Because equilibria between different oligomerization states depend on monomer concentration and on residues outside of the hydrophobic core, we calculated specificities using an arbitrary standard state that factors out these considerations. For each equilibrium, the free energy of interconversion was assigned a value of zero when averaged over all 216 core sequences. Specificities were thus evaluated by taking the stability difference (in kilocalories per mole) for each sequence between the target conformation and the alternative oligomeric conformations, and expressing the differences in units of standard deviation.

Dimer, trimer, and tetramer core sequences thus identified (Table 1) were inserted into a common 33-residue template to generate the peptides RH2, RH3, and RH4 (Fig. 3). The template contains positively charged lysine residues on one side of the hydrophobic core and negatively charged glutamate residues on the other side to favor parallel helix arrangements over antiparallel arrangements (17). Two norvaline residues at positions h of the dimer sequence pack against glutamate residues at positions k in the neighboring helix and were therefore substituted with the isosteric amino acid lysine. Charged and polar residues were placed at the exterior positions of the template to ensure high solubility, and a single tyrosine residue was included to facilitate concentration determination. The RH peptides were prepared by solid-phase synthesis and purified by reversed-phase high-performance liquid chromatography (18).

Figure 3

Helical wheel projection of residues Ala1 to Ala33 of the RH coiled-coil template sequence (28). View is from the NH2-terminus, and residues in the first three helical turns are boxed or circled. Undecatad positions are labeled a through k. The full template sequence consists of three undecatads. The peptides RH2, RH3, and RH4 differ only by the amino acids present at positions a, d, and h (18). The sequences (with positions a, d, and h in italics; aI, alloisoleucine; and nV, norvaline) are the following.



The template sequence was chosen to contain positively charged lysine residues at position e on one side of the helix and negatively charged glutamate residues at positions g and k on the opposite side to favor a parallel association of helices in a helical bundle.

Table 1

Top sequence solutions for right-handed dimer, trimer, and tetramer coiled coils according to the packing calculation. The table consists of three lists, the first sorted according to dimer stability, the second according to trimer stability, and the third according to tetramer stability.

View this table:

On the basis of circular dichroism (CD) measurements at 10 μM peptide concentration in neutral pH phosphate-buffered saline (PBS) at 4°C, RH2 appears to be ∼80% helical, whereas RH3 and RH4 appear to be >95% helical (19) (Fig. 4A). Under these conditions, RH2 displays a broad thermal unfolding transition, RH3 exhibits a cooperative melt with an apparent melting temperature (T m) of 95°C, and RH4 has a thermal stability that exceeds 100°C (Fig. 4B). In the presence of the denaturant guanidinium hydrochloride (GdmCl) at 3 M concentration, RH4 melts cooperatively with an apparent T m of 90°C. Thus, whereas RH2 appears to be incompletely folded under physiologic conditions, RH3 and RH4 form well-structured and extremely stable helical structures.

Figure 4

The RH2 (open circles), RH3 (open triangles), and RH4 (open squares) peptides form two-, three-, and four-stranded helical bundles. (A) CD spectra at 4°C in PBS (pH 7.0) and 10 μM peptide concentration (19). The mean residue ellipticity, [θ], is reported in units of 103 degrees cm2 dmol–1. (B) Thermal melts monitored by CD at 222 nm (19). The filled squares show data for RH4 collected in the presence of 3 M GdmCl, a denaturant. (C) Analytical ultracentrifugation data (32 krpm) collected at 4° C in PBS (pH 7.0) at ∼100 μM peptide concentration (22). The natural logarithm of the absorbance at 235 nm is plotted against the square of the radial position. Dashed lines with increasing slopes indicate, respectively, the predicted data for dimer, trimer, and tetramer bundles. (D) Aromatic and amide-proton NMR spectra of the RH4 peptide at different times after transfer into D2O (21). The inset shows the volume of one amide resonance (the peak labeled with an arrow) plotted against exchange time. The data closely fit a single exponential decay with a half-life of ∼10 days.

Sedimentation equilibrium measurements (20) were used to determine the oligomerization state of each peptide under native conditions (Fig. 4C). As intended in the design, the RH2 peptide sediments approximately as a dimer, the RH3 peptide sediments approximately as a trimer, and the RH4 peptide sediments approximately as a tetramer. The RH3 and RH4 peptides exhibit no systematic dependence of molecular weight on concentration between 20 μM and 200 μM. The molecular weight of the RH2 peptide systematically decreases at concentrations below 100 μM (presumably because of dissociation in the low micromolar concentration range) but exhibits no systematic deviation from dimer molecular weight between 200 μM and 2 mM. Thus, each of the RH peptides assumes the oligomerization state for which it was designed.

To assess whether the designed oligomers associate in a “molten” fashion or with fixed tertiary structures, we studied the dynamic properties of RH3 and RH4 by hydrogen-deuterium amide-proton exchange (21) (Fig. 4D). Relative to polyd,l-alanine, the most slowly exchanging protons in RH3 were protected by a factor of >105, whereas the most slowly exchanging protons in RH4 were protected by a factor of >107. These protection factors are comparable to protection factors observed for the native states of naturally occurring proteins. The trimer and tetramer structures thus appear to assemble with native-like rigidity.

To evaluate the high-resolution features of the design, we determined the x-ray crystal structure of the RH4 peptide at 1.9 Å resolution (22). The RH4 structure was refined to a conventionalR-factor of 20.4% with a free R-factor of 24.8% and root mean square deviations from ideal bond lengths and bond angles of 0.009 Å and 1.97°, respectively. The designed peptide forms the intended parallel, right-handed superhelix structure (Fig. 5A). Moreover, the side-chain packing observed in the RH4 crystal structure closely matches that predicted by the design calculation (Fig. 5B). The engineered and observed rotamers for the core side chains are identical. The crystal structure differs from the designed conformation primarily at the COOH-terminus where crystal contacts cause the superhelix to be locally underwound. Such end effects were deliberately ignored by the design method in the interest of simplicity (8). Core side-chain and main-chain atom positions in the central undecatad of the crystal structure differ from atom positions in the calculated model by a root mean square deviation of 0.20 Å. The superhelical parameters (8, 13) (radius R 0, frequency ω0, and phase angle φ) for the designed structure and the NH2-terminal two undecatads of the crystal structure are as follows: R 0 = 7.29 Å, ω0 = 1.5 centiradians per residue, and φ = 11° for the model, andR 0 = 7.49 Å, ω0 = 1.6 centiradians per residue, and φ = 10° for the crystal structure. Thus, the experimentally observed structure for the RH4 peptide matches the designed structure in atomic detail.

Figure 5

Crystal structure of RH4. (A) Axial view of the left-handed GCN4-pLI tetramer (27) next to the right-handed RH4 tetramer (22). The view is from the NH2-terminus looking toward the COOH-terminus. Purple van der Waals surfaces identify residues at the a positions, green surfaces identify residues at the d positions, and yellow surfaces identify residues at the h positions. (B) Superposition of the central undecatad of the calculated (red) and crystallographically observed (blue) structures of RH4. Three cross-sections of the superhelix, centered at positions a, d, and h, are shown. In each case the calculated side-chain packing conformation matches the conformation observed in the crystal structure (11). (C) Superposition of the calculated (red) and crystallographically observed (blue) backbone conformations of the right-handed tetramer.

Practical implications. Our results demonstrate that empirical protein design principles combined with computational protein engineering methods can be used to predict and design novel backbone structures, with root mean square coordinate errors that approach 0.2 Å. This level of precision is encouraging with regard to the feasibility of designing protein catalysts, which may require the accurate positioning of reactive groups. Methods to engineer buried electrostatic interactions and to calculate accurately their energetic effects (23) should provide a second high-resolution design tool that is effectively orthogonal to engineered van der Waals packing. Efforts to parameterize the backbones of more complex protein folds have begun (24), and it will be interesting to see the extent to which the parametric-backbone approach to protein design presented here can be generalized.

  • * Present address: Department of Biochemistry, Stanford University, Stanford, CA 94305, USA.

  • To whom correspondence should be addressed. E-mail: dvorak{at}


View Abstract

Navigate This Article