## Abstract

The transition path is the tiny fraction of an equilibrium molecular trajectory when a transition occurs as the free-energy barrier between two states is crossed. It is a single-molecule property that contains all the mechanistic information on how a process occurs. As a step toward observing transition paths in protein folding, we determined the average transition-path time for a fast- and a slow-folding protein from a photon-by-photon analysis of fluorescence trajectories in single-molecule Förster resonance energy transfer experiments. Whereas the folding rate coefficients differ by a factor of 10,000, the transition-path times differ by a factor of less than 5, which shows that a fast- and a slow-folding protein take almost the same time to fold when folding actually happens. A very simple model based on energy landscape theory can explain this result.

Theory predicts that folding mechanisms are heterogeneous, so that an individual unfolded molecule can self-assemble to form its biologically active, folded structure by means of many different sequences of conformational changes (*1*). The distribution of these folding pathways can now be calculated from atomistic molecular dynamics simulations (*2*–*6*). Information on pathway distributions from experiments must come from measurements on single molecules, because only average properties are obtained in experiments on the large ensemble of molecules in bulk experiments. A single-molecule, equilibrium protein folding-unfolding trajectory is illustrated in Fig. 1, as monitored by Förster resonance energy transfer (FRET) spectroscopy, and its relation to the free-energy barrier as it crosses between the folded and unfolded states is shown. The most interesting part of the trajectory is contained in what appears to be an instantaneous jump between the two states, called the transition path, which contains all of the information on the mechanism of folding and unfolding. The first step toward observing transition paths in protein folding, which we report here, is the determination of its average duration (transition-path time) for a fast-folding, all-β protein [39-residue formin-binding protein (FBP) WW domain] shown to be two-state in ensemble studies (*7*, *8*), as well as a markedly reduced upper bound compared with our previous study for the 56-residue, α/β protein GB1(the B1 immunoglobulin-binding domain of protein G from *Streptococcus*) (*9*). In contrast to a rate coefficient, which measures the frequency of a transition, the transition-path time is the duration of a successful barrier-crossing event (Fig. 1).

The strategy used in this study is to illuminate dye-labeled protein molecules at very high intensities to increase the number of detected photons per transition path, to discard the majority of photons from the less-interesting segments of the trajectories between transitions, and to analyze the transition region with a maximum likelihood method by using simple models for the transition path.

Photon trajectories were measured for immobilized WW domain and protein GB1 molecules with donor and acceptor fluorophores attached to cysteines incorporated into the proteins (Fig. 2). In these trajectories, two properties of each photon were recorded—the color, either donor green or acceptor red, and the absolute time of arrival to within ~0.5 ns. As shown in Fig. 3, A and B, transitions between states are clearly resolved in the binned fluorescence and photon trajectories, and the FRET efficiency distributions (Fig. 3, C and D) are bimodal, which indicates the presence of two states. The photon trajectories were extracted from the region near the transitions and analyzed using the Gopich-Szabo maximum likelihood method (*10*).

For a given model, the Gopich-Szabo method calculates the parameters of the model that can most accurately reproduce the photon trajectories (Fig. 3). We adopt a one-step model for the transition path, which may be viewed as the simplest discrete representation of how the FRET efficiency changes along the path. This picture can be represented in a kinetic model for a two-state system with a finite transition path by introducing a third virtual state, S, for which the FRET efficiency is midway between the folded and unfolded states [*E*_{S} = (*E*_{F} + *E*_{U})/2]. In this model, the lifetime of S (τ_{S}) corresponds to the average transition-path time, 〈*t*_{TP}〉 (Fig. 4A). S has the property of a transition state, because the rate coefficients from S to F and S to U (*k*_{S}) are the same, and therefore, the *p*_{fold} = ½.

The likelihood function for the *j*th photon trajectory is (*10*):**K** is the rate matrix [equation S6 (*11*)] containing the three rate coefficients (*k*_{F′}, *k*_{U′}, and *k*_{S}), *N* is the number of photons in the *j*th trajectory, *c _{i}* is the color of the

*i*th photon (donor or acceptor), and

*τ*is a time interval between the

_{i}*i*th and (

*i*– 1)th photons as shown in fig. S4B (

*11*). The photon color matrix

**F**depends on the color of a photon as

**F**(

*acceptor*) =

**and**

*E***F**(

*donor*) =

**I**–

**, where**

*E***is a diagonal matrix with elements that are FRET efficiencies of the three states (F, S, and U), and**

*E***I**is the unit matrix.

**n**is a diagonal matrix with elements that are photon count rates of the three states.

**v**

*and*

_{ini}**v**

*are vectors that describe the state (folded or unfolded) at the beginning and the end of the trajectory. Practically, log-likelihood functions were calculated, and the total log likelihood function of all trajectories was calculated by summing the log-likelihood functions (*

_{fin}*L*, τ

_{S}is the only variable parameter (

*11*).

The difference of the log-likelihood functions, Δln *L* = ln *L*(τ_{S}) – ln *L*(0), as a function of τ_{S}, is plotted in Fig. 4B for the WW domain. This function was calculated from 527 transitions between the folded and unfolded states. In this plot, the likelihood at τ_{S} = 0, *L* at the peak is higher than a certain confidence level, the value of τ_{S} at the peak corresponds to the assumed τ_{S} and does not arise from statistical fluctuations (fig. S6) (*11*). We used a confidence level that satisfies a condition *L*(τ_{S})/[*L*(τ_{S}) + *L*(0)] = 0.95, which assures 95% confidence in the significance of the maximum and corresponds to Δln *L* ≈ 3 (the dashed horizontal lines in Fig. 4). The value of 16 μs at Δln *L* = 7.8 is therefore a well-determined quantity and corresponds, in our model (Fig. 4A), to the average transition-path time 〈*t*_{TP}〉. That 〈*t*_{TP}〉 is the same for folding and unfolding transitions is shown in fig. S5 (*11*), which is consistent with the requirement of microscopic reversibility that 〈*t*_{TP}〉 for a barrier crossing be the same in both directions (*12*).

To extrapolate the value of 〈*t*_{TP}〉 to the viscosity in the absence of glycerol, we determined the rate coefficients at different viscosities (table S1) (*11*). Using a linear free-energy relation to account for the change in stability resulting from the addition of glycerol and guanidinium chloride (GdmCl), we find that the rate coefficients for folding and unfolding depend inversely on the first power of the viscosity (*11*), so 〈*t*_{TP}〉 should scale the same way (see Eqs. 2 and 3 below). Because the viscosity of 3 M GdmCl in 50% glycerol solution is found to be 10 times that of 2 M GdmCl (*11*), our best estimate of 〈*t*_{TP}〉 in the absence of a viscogen at 293 K is ~2 μs.

We have used the simplest possible model for determining 〈*t*_{TP}〉. However, more realistic models that depict a more gradual change in the FRET efficiency along a transition path—with two and three steps in the FRET efficiency in the transition path between states instead of just one (Fig. 4A)—yield very similar values for 〈*t*_{TP}〉 (fig. S9) (*11*). We also found that the value of 〈*t*_{TP}〉 is not sensitive to the choice of the FRET efficiency for S, as long as the value is between the two FRET efficiencies of the folded and unfolded states (0.6 ≤ *E*_{S} ≤ 0.7) (fig. S7) (*11*).

For proteins with very low free-energy barriers, it may be possible to estimate 〈*t*_{TP}〉 from ensemble measurements. Gruebele and co-workers have studied the kinetics of the ultrafast-folding, 33-residue FiP35 WW domain, which has a very similar fold to that of our WW domain (FBP28) and ~30% sequence identity (*13*). Prior to the ~10-μs folding-unfolding relaxation at the melting temperature of ~350 K, a ~1.5-μs relaxation was observed, which was called a “molecular phase” and attributed to a change in the small population of molecules at the top of a low free-energy barrier in response to the temperature jump. No molecular phase was observed for the FBP WW domain (*7*), presumably because it is a slower folder owing to a higher barrier, and there is therefore no detectable amplitude from the change in the barrier top population. In this interpretation, Gruebele’s ~1.5-μs relaxation corresponds to the lifetime, τ_{S}, of our kinetic model for the transition path (Fig. 4).

Shaw and co-workers have simulated equilibrium trajectories of the FiP35 WW domain using all-atom molecular dynamics calculations (*4*). They found 〈*t*_{TP}〉 to be 0.5 (±0.1) μs at 360 K using the TIP3P explicit water model (*6*). After rescaling for the difference in viscosity compared with real water, the simulated 〈*t*_{TP}〉 becomes ~1.5 μs (*14*). Although the sequences for the two WW domains are different, the finding of similar values for 〈*t*_{TP}〉 from the simulations and both ensemble and single-molecule experiments provides support for the accuracy of the simulations, for Gruebele’s interpretation of the molecular phase, and for our interpretation of the single-molecule photon trajectories.

The folding time of protein GB1 in 4 M urea is ~1 s. This time is far too long to observe folding transitions in trajectories simulated by atomistic equilibrium molecular dynamics, which makes even an upper bound for the transition-path time an interesting quantity. In previous work (*9*), we were able to determine an upper bound of ~200 μs, based on an analysis of individual trajectories. The photon count rate in those experiments was only 50 ms^{–1}, and the average time before photobleaching was ~100 ms. In the present experiments, the much higher count rate of 350 ms^{–1} from the increased illumination intensity, together with the collective analysis using the maximum likelihood method, has allowed us to determine a much more accurate upper bound. The penalty for the higher photon count rate is that the lifetime of the trajectories is shortened to ~10 ms by the more intense illumination, and transitions, albeit clearly resolved (Fig. 3B), are only observed in a very small fraction of the trajectories. Measurement at 4 M urea (with no added glycerol) of trajectories for ~47,000 molecules yielded just 114 transitions.

These 114 transitions were analyzed with the same model as for the WW domain. No peak is observed in the Δln *L* versus τ_{S} plot (Fig. 4C), so 〈*t*_{TP}〉 is too short to measure. Nevertheless, the analysis permits a determination of an upper bound for 〈*t*_{TP}〉. By analogy to the significance of the peak for the WW domain, we can set a confidence level for the answer to the question: How long can 〈*t*_{TP}〉 be before it becomes inconsistent with the data? The 95% confidence level that τ_{S} in a two-state model with a finite transition path is less consistent with the photon trajectories than a two-state model with an instantaneous transition path is given by its value at Δln *L* ≈ –3. In other words, 〈*t*_{TP}〉 cannot be longer than τ_{S} at Δln *L* = –3 and is therefore an upper bound on 〈*t*_{TP}〉. As shown in Fig. 4C, this upper bound is ~10 μs.

The major result of our experiments is that, whereas the folding rate coefficients for the WW domain and protein GB1 differ by four orders of magnitude, 10^{4} s^{–1} and 1 s^{–1}, the transition-path times differ by less than fivefold (~2 μs and <10 μs), which shows that a fast- and a slow-folding protein take almost the same time to fold when folding actually happens.

It is interesting that a simple model by A. Szabo, based on describing the kinetics of folding for a two-state system as diffusion over a barrier on a one-dimensional free-energy surface as in the energy landscape theory of Wolynes, Onuchic, and co-workers (*1*, *15*), can explain this result. According to Kramers’ theory for such a barrier crossing (Fig. 1A), the folding time (τ_{F} = 1/*k*_{F}) is given by:*D** is the diffusion coefficient at the barrier top, ω^{2} is the curvature of the unfolded well (near *x*_{0} in Fig. 1A), –(ω*)^{2} is the curvature at the barrier top, β = 1/*k*_{B}*T* (where *k*_{B} is Boltzmann’s constant and *T* is temperature), and Δ*G*_{F}* is the height of the folding free-energy barrier (*16*–*20*). For ω = ω*, 〈*t*_{TP}〉 is approximately given by (*9*, *12*):

The model predicts that 〈*t*_{TP}〉 is insensitive to the barrier height and that fast- and slow-folding proteins will have similar transition-path times as long as there are only small differences in the curvatures and the diffusion coefficients (i.e., small difference in τ_{0}). The diffusion coefficient depends on the roughness of the underlying energy landscape and could therefore differ substantially among proteins (*21*–*23*). The best current estimate for τ_{0} of fast-folding proteins is ~1 μs (*24*), which predicts a ratio of 〈*t*_{TP}〉 for protein GB1 and the WW domain of 1.4, compared with the experimental ratio of <5, if we assume the same τ_{0} for the two proteins. This ratio varies from 1.3 to 1.8 for τ_{0} between 0.1 and 10 μs.

Our determination of an average transition-path time is a first step toward the goal of obtaining information on the distribution of folding pathways from measurements of interdye distance versus time trajectories during transition paths. However, the result of this first step by itself has turned out to be extremely interesting. Folding involves a complex and intricate rearrangement of a polypeptide chain to form a unique structure, yet the time for this nontrivial self-assembly process is almost the same for two proteins with different topologies and vastly different folding rates.

## Supporting Online Material

www.sciencemag.org/cgi/content/full/335/6071/981/DC1

Materials and Methods

Figs. S1 to S9

Table S1

## References and Notes

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
Materials and methods are available as supporting online material on
*Science*Online. - ↵
- ↵
- ↵
- ↵
- ↵
In the case of protein GB1, there is the possibility of a sparsely populated intermediate between the folded and unfolded states (
*17*–*20*). In this study, we have implicitly defined the transition-path time for both the WW domain and protein GB1 in terms of just the two deep minima of the folded and unfolded states. - ↵
- ↵
- ↵
Clarke and co-workers (
*22*) have found, for example, domains with similar structures and stability that have folding rates that differ by ~3000-fold. The slower-folding domains show very little dependence on solvent viscosity, which suggests a large internal friction and, therefore, a much smaller*D** (*22*,*23*). - ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
**Acknowledgments:**We thank I. Gopich, A. Szabo, and G. Hummer for numerous helpful discussions and A. Aniana for technical assistance in the expression and purification of proteins. This work was supported by the Intramural Research Program of the National Institute of Diabetes and Digestive and Kidney Diseases, NIH.