"New View" of Protein Folding Reconciled with the Old Through Multiple Unfolding Simulations

See allHide authors and affiliations

Science  12 Dec 1997:
Vol. 278, Issue 5345, pp. 1928-1931
DOI: 10.1126/science.278.5345.1928


Twenty-four molecular dynamics trajectories of chymotrypsin inhibitor 2 provide a direct demonstration of the diversity of unfolding pathways. Comparison with experiments suggests that the transition state region for folding and unfolding occurs early with only 25 percent of the native contacts and that the root-mean-square deviations between contributing structures can be as large as 15 angstroms. Nevertheless, a statistically preferred unfolding pathway emerges from the simulations; disruption of tertiary interactions between the helix and a two-stranded portion of the β sheet is the primary unfolding event. The results suggest a synthesis of the “new” and the classical view of protein folding with a preferred pathway on a funnel-like average energy surface.

A “new view” of protein folding (1) that postulates many ways of reaching the native state is replacing the specific pathway concept attributed to Levinthal (2). In the new view, a general bias of the energy surface toward the native state reduces the search problem so that folding can occur on the experimental time scale without invoking specific pathways. This view is based on statistical mechanical models (3-5) and lattice simulations (5-8), but there is little direct experimental evidence for its validity. In this report we present results obtained from 24 unfolding trajectories of chymotrypsin inhibitor 2 (CI2) (Fig. 1). CI2 is of particular interest for such simulations because it is a small fast-folding protein, it has been studied experimentally by protein engineering methods, and it exhibits two-state behavior with the same transition state for folding and unfolding (9, 10). Molecular dynamics simulations were performed with the CHARMM program (11), a polar hydrogen model for the protein (11,12), and an implicit model for the solvent (13-15); the latter reduces the required computer time by more than an order of magnitude relative to that for simulations with explicit water molecules (16, 17). Although the trajectories show very diverse behavior, they have common structural features that define an average unfolding pathway.

Figure 1

Representation of the structure of CI2; the drawing was made with the Molscript program (35). The structural elements are defined as follows: strand β1, residues 1 to 11; helix, residues 12 to 26; strand β3, residues 27 to 33; loop, residues 34 to 44; strand β4, residues 45 to 53; strand β6, residues 54 to 64; for simplicity, strands β2 and β5, which consist of only two and four residues, respectively, are not shown [see (10, 14)]. One contact from each of the seven groups used to analyze the secondary and tertiary structure formed along the trajectories is indicated. The seven groups correspond to main-chain contacts in the helix (A: Val19-N, Glu15-O); β1 to helix (B: Trp5-CE2, Ile20-CG1); β1 to β4, β6 (C: Val63-N, Thr3-O); β3 to β4 (D: Val51-N, Leu32-O); β4 to β6 (E: Ala58-N, Phe50-O); loop and minicore contacts (F: Leu32-CD1, Val38-CG1); helix to β3, β4, β6 (G: Ile20-CD, Leu49-CD2).

The radius of gyration (Rg) and the root-mean-square deviation (rms) from the crystal structure vary widely as a function of time (Fig. 2A). For example, trajectory 16 (Tr16) exhibits a single concerted increase in rms and Rg to reach a denatured state, whereas Tr11 shows more complex behavior with the oscillations of both rms and Rg increasing as a function of time. The rms deviation is plotted against the radius of gyration Rg for the same set of trajectories to remove the effect of the distribution in unfolding times expected for any unimolecular reaction (that is, corresponding events could occur at different times in different trajectories) (Fig. 2B). The diversity in the trajectories is still present. Some trajectories (such as Tr1) show a simple approximately simultaneous increase in Rg and rms near the native state and much more complexity after partial denaturation, whereas others (such as Tr11 and Tr16) have complex behavior near the native state.

Figure 2

(A) Backbone rms from the crystal structure of CI2 and radius of gyration R g for four unfolding trajectories as a function of time; rms values are shown as a solid line with a scale on the vertical axis on the left;R g values are shown as a dotted line with a scale on the vertical axis on the right. (B) Root mean square versus R g for the same trajectories.

The time dependence of 52 native contacts provides a measure of the change in structure during unfolding (Fig.3); the types of contacts that are included are indicated (Fig. 1). Their use is analogous to the choice of the fraction of the native contacts, Q , as the progress variable in lattice simulations, and we use the designation Q here (3, 7, 18). In the late stages of unfolding (early stages of folding; Q = 0.25 to 0.3) most contacts are absent or are present only a small fraction of the time and only a few exist for more than 80% of the time. For Q ≅ 0.5, most contacts are present part of the time and there is a broad, relatively uniform distribution of native contact probabilities. Clearly, the system is becoming more compact but very diverse sets of structures are still being sampled. For Q ≅ 0.75, most of the contacts exist for more than 60% of the time, with a small number having a probability of less than 40%; they are contacts involving the β1 strand or the long loop (Fig. 1).

Figure 3

Number of contacts with a given probability of being formed as a function of Q, the fraction of native contacts. Contacts are defined in Fig. 1. Results correspond to an average over the 24 trajectories. Two structures with Q = 0.25 can have rms values as large as 15 Å (average, 9 Å); at Q = 0.5, the maximum rms is 12 Å (average, 5.5 Å); and at Q = 0.75, the maximum rms is 3.2 Å (average, 1.8 Å). At low Q(early in folding), there are no native contacts with high probability. As Q increases, there are many contacts with low probability and a few with high probability. Only at largeQ (late in folding) does the system fairly suddenly form most contacts with a high probability.

The effective energy (intraprotein energy plus solvation free energy from the implicit solvent model) as a function of the number of contacts, averaged over all the trajectories, has a general funnel-like form (Fig. 4A) (19) as the polypeptide chain approaches the native structure. Similar behavior has been observed in folding simulations with lattice models for fast-folding sequences (7, 18, 19) and in an all-atom plus explicit solvent simulation of the free energy surface of a protein (20). The complex highly nonmonotonic variation in the effective energy as a function of Q (Fig.4B) or of the time (Fig. 4C) in the individual trajectories contrasts sharply with the average funnel-like behavior and suggests that care is required in interpreting the latter. This is in accord with the fact that it has not been possible to refold a protein with an all-atom model either by low temperature dynamics (21) or by simulated annealing (22).

Figure 4

(A) Average effective energy, 〈W〉, as a function of the number of native contacts, N. To obtain these data, 50 frames from each of the 24 unfolding trajectories were subjected to 10 ps of molecular dynamics at 300 K and then to 300 steps of energy minimization. There are very few structures with more than 49 contacts because some contacts that are present in the crystal structure are broken in the simulation at room temperature. (B) W as a function of the number of native contacts for trajectory Tr1. (C) Time dependence of W for Tr1.

The evolution of the contacts (Fig. 3) can be related directly to specific structural changes (Fig. 5). In spite of the diversity of the unfolding trajectories (Fig. 2), there is a certain commonality in the disappearance of different structural features (Fig. 5B). After early displacement of the β1 strand, a key event is disruption of the hydrophobic core formed primarily by side chains of the residues of the α helix and of strands β3 and β4. The disappearance of the core occurred simultaneously with the destruction of the helix or the β3–β4 sheet or both in some trajectories; in other trajectories, the secondary structural elements persisted after the core was disrupted (Fig. 5A) (23). The contacts between the helix and these strands (h, β3–β4), the other major β-strand contacts (β4–β6), and the loop contacts show intermediate stability.

Figure 5

(A) Average time each group of contacts last appears in three trajectories: ——, a typical trajectory; – – –, β3–β4 contacts disappear early; - - -, α helix contacts disappear early. (B) Evolution of native contacts as a function of time. The average time each contact last appeared in the 24 trajectories is shown; bars = 1 SD.

The most detailed experimental data on the structural changes in CI2 during folding and unfolding are provided by proteinengineering experiments (10, 24). In such experiments, measurements are made of the ratio Φ of the effect of a mutation on the folding rate to that on the stability of the native state. If Φ = 1 for a given residue, the transition state is presumed to have a structure that is very similar to the native state in the neighborhood of that residue; if Φ = 0, it is presumed that the transition state is like the denatured state in that region. Intermediate values of Φ are more complicated to interpret (10,25). Itzhaki et al. (10) found that most residues (with a few important exceptions; see below) have Φ values between 0.2 and 0.3. From our simulations [see also discussion in (25)], this suggests that the transition state consists of an ensemble of configurations with the number of native contacts equal to about 25% (Fig. 3). Thus, the transition region is made up of a wide range of structures with an average rms deviation of 9 Å (see above). Another measure of the position of the transition region is provided by the dependence of the equilibrium constant and the rate constant for folding on the concentration of a denaturant, such as guanidinium hydrochloride (Gdn-HCl). The standard interpretation of such data (26) is that the coefficient m describing the Gdn-HCl concentration dependence of a rate or equilibrium constant can be simply related to the difference in exposed surface area in the two states involved ( m for the unfolded state versus the native state and mkf for the unfolded state versus the transition state). The experimental value of m[m = 1.80 kcal/mol-M, where M refers to the concentration of the denaturant (26)] and the exposed surface area of the crystal structure leads to about 8500 Å2 for the exposed surface area of the unfolded state (27); this value is equal to that obtained for the fully extended structure. The same calculation for the folding rate (mkf = 1.14 kcal/mol-M) yields about 5850 Å2 for the transition state. This value corresponds to that from the simulations (∼6000 Å2) when the fraction of native contacts Q is about 0.25, in agreement with the ensemble average interpretation of the Φ values given above. Values of Q = 0.2 and 0.3 have been assumed in other approaches to CI2 folding that made use of a statistical free energy functional (28) and lattice models (29), respectively.

Given the funnel-like form of the effective energy surface (Fig.4), the transition state barrier must arise from an entropic bottleneck; that is, a decrease in entropy as a function of Q that is greater than the effective energy decrease, so that a free energy barrier is generated near Q ≅ 0.25 when the early contacts are formed. Such a balance between the effective energy and the entropy is found in the high-temperature lattice simulations (7, 18) and also in an all-atom simulation of a three-helix bundle protein (20).

To obtain more insight into the structure of the transition state, it is of interest to consider certain hydrophobic residues that have experimental Φ values significantly larger than the average of Φ ∼ 0.2 to 0.3 (10). Most striking is residue Ala16 (Φ ≅ 1.1 for Ala16 → Gly), which is part of the α helix. In these simulations, as in those of Li and Daggett (16), the number of contacts made by the Ala side chain was found to depend primarily on the presence of the helix and not on interactions with β strands. The value of ΦMD(t), which is defined (16) as the ratio of the number of contacts at time t during the trajectory to that in the native state, remains between 0.8 and 1 as long as the α helix is present (30). The mutation of Val19 to Ala is characterized by a negative value of Φ (Φ = −0.3); the negative value arises because both the folding and unfolding rates are greater in the mutant, whereas the native state is destabilized by the mutation. This result can be explained by the fact that this substitution destabilizes the protein because of loss of packing interactions, but it stabilizes the isolated helix because of the higher helix propensity of an Ala residue (31). The mutations Ala16 to Gly and Val19 to Ala are both consistent with a transition state that contains a partially formed helix. Leu49 (experimental Φ ≅ 0.5) is involved in interactions between strand β3 and the helix in the native structure. Analysis of ΦMD(t) in our simulations and in those of Li and Daggett shows that ΦMD(t) remains between 0.5 and 0.8 throughout most of the unfolding trajectories; many of the interactions are with other residues in the β strand, in accord with the conclusion from the simulations that the β3–β4 sheet is present early in folding.

The protein engineering experiments for CI2 have been interpreted in terms of a nucleation-condensation mechanism (32). Both the simulations and the experiments point to the helix, which has weak native-like interactions in the denatured state (25, 33), as an essential nucleation element. The rate-limiting step in the predominant pathway appears to be a coalescence of the β1 strand and the helix with a few distant residues in the protein that leads to partial burial of the hydrophobic core. The simulations suggest that formation of the β3–β4 sheet may also be involved in the nucleation; the most probable contacts for Q = 0.25, the putative transition region, are in the helix and in the β3–β4 sheet (34). Once the nucleus is formed, the rest of the protein is expected to rapidly form the native structure (32).

The analysis of 24 CI2 trajectories has demonstrated that there exists a wide diversity in unfolding trajectories, in accord with the new view of protein folding. However, when the trajectories are analyzed in terms of the evolution of native contacts, a statistically predominant pathway emerges for this fast-folding protein. Thus, a strong preference for a certain order of events in the folding process, determined by the amino acid sequence, is compatible with a funnel-like (single global basin of attraction) average energy surface.


View Abstract

Navigate This Article