You are on page 1of 45

Originally posted 27 October 2011; corrected 11 December 2015

www.sciencemag.org/cgi/content/full/334/6055/517/DC1

Supporting Online Material for

How Fast-Folding Proteins Fold


Kresten Lindorff-Larsen,* Stefano Piana,* Ron O. Dror, David E. Shaw*

*To whom correspondence should be addressed. E-mail:


david.shaw@DEShawResearch.com (D.E.S.); kresten.lindorff-
larsen@DEShawResearch.com (K.L.-L.); stefano.piana-
agostinetti@DEShawResearch.com (S.P.)

Published 28 October 2011, Science 334, 517 (2011)


DOI: 10.1126/science.1208351

This PDF file includes

Materials and Methods


Figs. S1 to S8
Tables S1 to S3
Full References

Correction (11 December 2015): The authors recently discovered and corrected a bug in
a piece of software used to estimate the melting temperatures (Tm) reported in the
supplementary materials of this Report. This bug does not affect any of the simulation
data or any of the conclusions reported in the text. However, as a result of the bug, some
of the melting temperatures mentioned in the SM—specifically, the values reported in
Table S1, and several mentions of those values later in the SM text—needed to be
corrected in the revision. The originally posted version can be seen here.
How fast-folding proteins fold

Kresten Lindorff-Larsen, Stefano Piana, Ron O. Dror, and David E. Shaw

Supporting Online Material

Nres Ns Time Nf Nu T ΔGf ΔHf ΔCvf Tm τf τu τtp


µs K kcal kcal kcal K µs µs µs
mol−1 mol−1 mol−1
K−1

Chignolin 10 1 106 39 38 340 −0.9 −6.1(1) −0.1(2) 381.0 0.6(1) 2.2(4) 0.04(1)
(361.0–393.0)

Trp-cage 20 1 208 12 12 290 0.8 −2.1(2) 0.0(1) 206.0 14(4) 3(1) 0.22(5)
(0.0–233.0)

BBA 28 2 325 14 14 325 0.7 −2.9(2) −0.2(2) undefined 18(5) 5(1) 0.7(1)

Villin 35 1 125 34 34 360 0.8 −15(1) 0.2(3) 343.0 2.8(5) 0.9(2) 0.27(3)
(339.0–346.0)

WW domain 35 2 1137 12 11 360 −0.9 −26(3) −0.1(2) 373.0 21(6) 80(24) 0.5(1)
(365.0–376.0)

NTL9 39 4 2936 17 14 355 −1.3 −26.3(3) −0.5(1) 370.0 29(7) 175(47) 0.9(1)
(367.0–371.0)

BBL 47 2 429 12 11 298 0.8 −2(1) 0.1(5) 251.0 29(8) 7(2) 3.1(6)
(0.0–267.0)

Protein B 47 1 104 19 19 340 0.6 −8(1) 0.0(2) 317.0 3.9(9) 1.6(4) 0.7(1)
(311.0–328.0)

Homeodomain 52 2 327 27 28 360 −0.7 −7.2(8) 0.2(3) > 360 3.1(6) 9(1) 1.1(1)

Protein G 56 4 1155 12 13 350 0.3 −22(1) −0.5(1) 345.0 56(16) 37(10) 1.8(3)
(339.0–348.0)

α3D 73 2 707 12 12 370 −0.1 −38(1) −0.6(2) 370.0 27(8) 31(9) 0.9(2)
(367.0–372.0)

λ-repressor 80 4 643 10 12 350 0.9 −11(1) −0.7(2) undefined 49(15) 13(4) 3.1(7)

1
Table S1. Summary of the thermodynamic and kinetic parameters of 12 proteins, calculated
from simulations of reversible folding. Nres is the number of amino acid residues; Ns is the
number of independent simulations performed; Time is the total simulation time; Nf and Nu are
the total number of folding and unfolding events observed in the simulations; T is the simulation
temperature; ΔGf is the free energy for the folding reaction at the simulation temperature,
calculated from the folding and unfolding rates; ΔHf is the enthalpy of the folding reaction at the
simulation temperature, calculated as the difference in potential energy between the unfolded and
the folded state; ΔCvf is the heat capacity change for the folding reaction, calculated from the
difference in the potential energy fluctuations in the folded and unfolded states; Tm is the melting
temperature (and 68% confidence interval) when it can be extrapolated from the calculated ΔGf,
ΔHf, and ΔCvf values; τf and τu are the mean folding and unfolding times; τtp is the mean transition
path time. Times are in µs, temperatures are in K and energies are in kcal mol−1. Standard errors
are reported in parentheses. A comparison of some of these quantities with the experimental data
is presented below in the sections describing the individual proteins.

Simulation details

We ran all production simulations on a specialized machine for molecular dynamics simulations
called Anton (2, 3) using the CHARMM22* force field (4) and the modified TIP3P water model
compatible with the CHARMM force field (27, 28). Lys, Arg, Asp and Glu residues as well as
the N- and C-termini were simulated in their charged states, and all His residues were neutral
unless otherwise indicated. Simulations were initiated in either the folded or unfolded state, or in
some cases from a fully extended chain. Reversible folding was observed in all simulations, and
different starting configurations resulted in no observed statistical difference. We equilibrated
each system in the NPT ensemble for 1 ns using Desmond (29) running on a PC cluster, and we
selected the frame with the volume closest to the average value as the starting structure for the
reversible equilibrium folding simulations in the NVT ensemble performed on Anton. On Anton

2
we coupled the system to a Nose-Hoover thermostat with a relaxation time of 1 ps and integrated
the equations of motion with a 2.5 fs time step. Trajectory frames were recorded every 200 ps of
simulation. Additional simulation details for each system are reported in the individual protein
sections of the SOM.

Analysis methods

Definition of folded and unfolded states

We performed a clustering analysis (26) on 50,000 structures taken at regular intervals from the
simulations of each protein using a Cα-RMSD cutoff of 2 Å. We defined the central structure of
the most populated cluster as the folded structure in the simulations and compared it to the
experimentally observed structure (Fig. 1 of main manuscript). We then set out to partition the
trajectory into folded, unfolded, and “transition path” segments. For each protein we monitored
a set of long-range native Cα–Cα contacts between residues that are separated by at least seven
residues in primary sequence and whose Cα atoms are closer than 10 Å for more than 80% of the
time in the folded state. We then defined a reaction coordinate, Q(t), that monitors the fraction
of these contacts that are formed at each frame interval as the sum:

N res N i
1
   
10 d ij t  d ij0 1
Qt  
i 1 j 1 1 e
N res

N
i 1
i

In this equation the Nis are the contacts of residue i, dij(t) is the distance (measured in units of Å)
between the Cα atoms of residues i and j at time t and d ij0 is the same distance in the native state.
We computed folding rates by applying the stable-states picture (30) with a dual-cutoff approach.
We defined a subset of the folded state to be those structures that had a value of Q larger than

3
0.90, after smoothing by applying a running average filter with a time window of 10 ns, and we
defined a subset of the unfolded state to be those structures that had a Q smaller than 0.10. We
recorded folding and unfolding transitions only when the system transitioned fully between these
two substates, thereby removing a number of spurious transitions that would otherwise have been
observed using only a single cutoff. In order to define transition paths, we applied the above
analysis in both the forward and reverse time directions. Frames that were assigned to one state
(e.g., unfolded) in the forward direction and a different state (e.g., folded) in the reverse direction
were defined to belong to the transition path connecting the two states. We calculated the
folding and unfolding times as the average lifetime in the unfolded and folded states,
respectively.

We tested the extent to which the calculated folding and unfolding times depend on the approach
used to define (the subset of) the folded state by comparing the results of the analysis described
above with an analysis in which we applied the dual-cutoff approach to RMSD traces (other
global quantities, including the radius of gyration, the solvent accessible surface area, and the
amount of secondary structure, exhibit some relaxation on the folding/unfolding timescale but do
not separate well the folded from the unfolded states and are thus not useful for this analysis). In
particular, we manually specified RMSD-based cutoffs for each protein—both for the full
protein and for shorter stretches of sequences (often corresponding to one or two individual
secondary structural elements)—and recorded folding or unfolding events when the protein
simultaneously satisfied all of the cutoff criteria. This latter approach requires protein-specific
cutoffs to be determined for each individual RMSD time series, and while it is rather robust with
respect to variations in the selected cutoffs, it is not as general as the Q-based approach.
Comparison of the two approaches shows, however, that they produce very similar results (Fig.
S1A), suggesting that the results presented here do not depend strongly on the criteria used to
define the folded and unfolded states.

4
As a seco
ond test for the
t robustness of the parrtitioning of conformatioons into the ffolded and
unfolded
d states, we used
u d procedure iin which wee compare thhe relaxation time
a recenttly described
calculated from the folding
f and unfolding
u rattes with the vvalue obtainned by fittingg to the
autocorreelation functtion of a reacction coordin
nate (4). Thhe results shoow that the rrelaxation tim
mes
implied by
b the rates calculated
c with
w the dual cutoff approoach are fullly consistentt with the
observed
d timescale for
fo the decay
y of the autoccorrelation fu
function of Q (Fig. S1B), lending furrther
support to
t the metric used to defiine folding trransitions.

S Robustneess of foldin
Figure S1. ng times estim
mates. (A) C
Comparisonn of folding ((black) and
unfolding
g (red) timess calculated either with a dual cutofff approach appplied to a sset of RMSD
D
time seriees (RMSD) or to the fracction of nativ
ve contacts ((Q); (B) Com
mparison off relaxation tiimes
obtained from folding and unfold
ding times caalculated witth the dual ccutoff approaach and
relaxation
n times estim
mated from an
a exponentiial fit of the autocorrelattion functionn of Q.

Residual structure in
n the unfoldeed state

5
We analyzed the amount of residual secondary structure in the unfolded state by determining the
percentage of residues that are found in either helical or sheet structure in the unfolded state.
Table S2 below shows these percentages for the 12 proteins. There is very little formation of
sheet structure in the unfolded state apart from in NuG2, where the designed and very stable
hairpin 1 is highly formed even in the unfolded state. In most of the proteins that have native
state helices, ≈10–25% of residues form helices in the unfolded state. The major outlier appears
to be the λ-repressor, for which there is a very large amount of residual helix in the unfolded
state.

Protein %Helix %Sheet %Non-native

Chignolin 0 0 0
Trp-cage 2 1 3
BBA 6 4 8
Villin 24 0 7
WW domain 6 5 9
NTL9 16 6 18
BBL 17 2 9
Protein B 22 2 10
Homeodomain 27 1 16
Protein G 11 28 8
α3D 20 4 10
λ-repressor 42 1 10

Table S2. Average fraction of residues forming secondary structure in the unfolded state. The
percentage of helix and sheet is reported together with the amount of secondary structure that is
non-native. While most of the helical residual structure appears to be native (i.e., within regions
that form helices in the native states) we do observe some level of non-native residual helical

6
structure in the unfolded state. Much of this is found in residues adjacent to the native-state
helices, but in both the WW domain and NTL9 we find residual helicity in residues that are not
near native-state helices. In the WW domain we find that residues 6–11 are ≈20% helical in the
unfolded state, and in NTL9 we find that residues 7–15 are ≈20% helical in the unfolded state.
Further examination of the non-native helices in these two proteins reveals that they either are
absent when folding is initiated or disappear early during the folding process.

Analyzing transition paths

Here we provide additional detail on the method used to analyze the transition paths in Fig. 2.
The method proceeds as follows: (i) using the method described in the previous section, we
partition the simulations into three sets: the unfolded state, the folded state and the transition
paths containing the folding and unfolding events that we observe, (ii) we calculate a set of
molecular properties (e.g., number of contacts, topological similarity to the native state,
secondary structure) over the full trajectory, (iii) we shift and normalize each quantity (e.g.,
number of contacts) such that the average values in the unfolded and folded states are zero and
one, respectively, (iv) we then examine each of these quantities for each transition path
individually and calculate the integral over the transition path only. This approach allows for the
comparison of the formation of different structural quantities for different proteins, and for
computation of an average over transition paths, whose lengths can vary substantially. By
calculating the integral (or equivalently, the mean value) of a quantity during the transition path,
we both reduce the noise inherent in the simulations and remove effects from the varying lengths
of transition paths, thereby allowing us to average the integrals over many folding and unfolding
transitions. Further, by shifting and normalizing the quantities we focus the metric on the
changes that actually occur during folding, rather than on structural elements that might be
preformed in the unfolded state or highly dynamic in the folded state. It should be noted that this
analysis was designed to determine the average order of formation of different structural features

7
during the folding process, and that it does not provide direct information about the relative
importance of these different features for the folding process.

For the data shown in Fig. 2, we used the total number of contacts (also used to calculate Q
above), the number of amino acid residues that were either in helices or sheets [as defined by
STRIDE (31)] and the topological similarity of the chain to the native state. For the latter
calculation, we used a previously described method in which Gauss integrals are used to describe
the chain topology (10). Before calculating the Gauss integrals we “smoothed” the protein chain
(9, 10) to decrease the dependency of the calculations on the secondary structure content. After
calculating the Gauss integrals on these smoothed chains, we calculated the similarity to the
native state as the Euclidian distance to the integrals from the native state.

In addition to the quantities presented in Fig. 2, we also performed this analysis on the solvent
accessible surface area (as a global measure of folding) and the contact order (as an alternative
measure of formation of a native topology). The results of these analyses are presented in Fig.
S2.

8
S Relative order of forrmation of th
Figure S2. he contact orrder, native ccontacts, seccondary struccture
and SASA during follding.

We also performed
p th
he analysis described
d ab
bove for eachh native conttact individuually. This
analysis thus
t shows which
w contaccts, on averaage, are form
med early andd late duringg folding. Thhe
results arre shown in Fig.
F S3, in which
w we collor code the native state contact mapp for the 12
proteins according
a o the averagee time series integral for each quantiity. These pllots show thhat
to
while mo
ost contacts indeed
i form late during folding (as aalso seen froom Fig. 2), thhere are a few
contacts that form eaarlier, helping
g establish th
he correct chhain topologgy and stabilizing the nattive
structural elements.

9
S The ordeer of formation of individ
Figure S3. dual long-raange native ccontacts duriing folding iss
reported for each of the
t 12 proteiins. Contactts are color-ccoded accordding to theirr average tim
me of
appearan he transition paths, as dettermined by the averagee time series integral during
nce during th
all the ob
bserved foldiing and unfo
olding eventss. Integrals were binnedd in units of 0.1 (see alsoo
legend in
n first plot) so that black points refer to contacts whose time series integrral is betweeen
0.0 and 0.1.
0

10
Finally, although
a the time series integral anallysis presentted here decrreases noisee arising from
m not
knowing perfectly wh
hen a folding transition begins
b and eends (i.e., thee reaction cooordinate chhosen
does not perfectly specify the end
d points for all transitionn paths), it iss still importtant to validaate
that the results
r obtain
ned do not depend substaantially on thhe chosen deefinition of ttransition paaths.
This is off particular importance
i since
s one of the quantitiees (number oof contacts) is also used to
define the transition paths.
p We th
herefore rep
peated the annalysis shownn in Fig. 2 uusing the RM
MSD-
based deffinitions of the
t folded an
nd unfolded state mentiooned above aand found thhat the
conclusio
ons are robust with respeect to the meetric used to define the trransition patths (Fig S4).

S Relative order of forrmation of th


Figure S4. he native toppology, nativve contacts annd secondarry
structure during foldiing. Transittion paths weere determinned using RM
MSD-based ccriteria.

Definition of a transiition path sim


milarity metrric

Inspired by previous work (14, 15) we defineed a similariity score for individual trransition patths.
Using thiis metric, wee clustered th
he individuaal folding andd unfolding events usingg affinity
propagation (32) by using
u the neg
gative distan
nce as a simiilarity score and using a common

11
“preference” value of −15. The individual transition paths were also projected in to two
dimensions using Stochastic Proximity Embedding (33).

To calculate the transition path distance metric, we subdivided each transition path into five
equally spaced bins with incremental values of Q from 0 to 1. For each bin i we calculated the
probability p(i,j), that a native contact j is formed. In these calculations, native contacts are
defined between residues separated by more than 4 residues in sequence and whose Cα-atoms
are closer than 8.5 Å in the native state. After applying this procedure to all transition paths, the
distance between any two paths k and l is then calculated as:

Nbins Ncontacts

   p  i, j   p  i , j  
2
dkl  k l
i 1 j 1

All transition paths are then assigned to a pathway by clustering the transition paths using dkl as
dissimilarity score for transition paths. We note that while we were preparing this manuscript, a
highly similar approach for comparing folding events was independently developed and
published elsewhere (34).

Subsequent to the assignment of transition paths to individual pathways, we calculated the


similarity between two or more pathways along the progress variable Q as the fraction of native
contacts shared between all the pathways (now averaging over all transition paths assigned to a
given pathway). The result is a plot of the pathway similarity as a function of Q (Fig. S5). As
expected, we find that for high values of Q all pathways converge to the folded state.
Unexpectedly, we also find that for most proteins the pathways share a large fraction of common
contacts (>60%), even at lower values of Q. This observation shows that even when a relatively
small fraction of native state contacts is formed, these contacts are shared between the different

12
pathwayss. This resultt suggests th
hat these path
hways are sttructurally veery similar aand in fact shhare
a commo
on folding nu
ucleus.

S Fraction of common native contaacts formed in different pathways ass a function oof
Figure S5.
the progrress coordinaate Q. The similarity
s waas calculatedd only for vaalues of Q whhere the
minimum
m number off formed con
ntacts in any path was larrger than 25%
%.

The strucctural similarrity between


n different paathways cann be further hhighlighted bby plotting thhe
order of formation
f off local nativee structure fo
or each pathw
way as a funnction of residue numberr. In
these calcculations wee first definee the “nativen
ness” of a reesidue i via ((2):

d i  1  e  0 .5* MSD i

where MSD
M i is the mean
m square deviation
d n units of Å2) from the naative structuure of a stretcch of
(in
five amin
no acid resid
dues centered
d on residue i. A value oof di close too zero impliees that residuue i

13
locally adopts a native-like structure, whereas values close to unity are found when the residue is
not in a native orientation. Residues that adopt a native-like structure early during the folding
process will thus reach di values close to zero early in the transition path. For each residue and
in each pathway, we therefore calculated the average value of di along Q as a progress variable
of the folding transition. A value of this integral close to 0 implies that a residue adopts a native
structure early in this pathway, whereas a value close to 1 implies a later transition towards a
native structure. Fig. 3 in the main text shows the results of these calculations for the 12 proteins
and for each of the observed pathways. With a few exceptions, the order of structure formation
is the same in all pathways. This provides independent evidence of the result in Fig. S5 that
most of the different pathways in fact share a common nucleus. In general, we find a strong
correlation between the order of structure formation and the propensity of a residue to form
native-like conformations in the unfolded state (Fig. S6). These results suggest that the initiation
sites for folding of these proteins correspond to those regions of the sequence that have a higher
propensity to form locally native-like conformations.

14
S Correlatiion between the average distance froom the nativee structure inn the unfoldded
Figure S6.
state and
d order of forrmation alon
ng the folding
g pathway foor (A) α-heliical proteinss and (B) prooteins
containin
ng β-sheet strructure. Note that valuees close to 0 imply a highh degree of nnativeness inn the
unfolded
d state and an
n early formaation along the
t folding ppathway, as ddefined by uusing Q as thhe
progress variable.

Analysis of the transiition state reegion

For each protein and each folding


g pathway, we
w have usedd a previoussly establisheed method (22,
oject the freee energy surrface along a one dimenssional reactioon coordinatte. This metthod
19) to pro
allows th
he identificattion of a subset of structu
ures that havve a probabillity of foldinng of ~50%. We
define these structurees as belongiing to the traansition statee region. Thhe averages oof several
propertiees for the tran
nsition state region of th
he dominant pathway of each proteinn (normalized
either to the unfolded
d state or to the
t range beetween the foolded and unnfolded statees) are reportted in
Table S3 below. Thee properties of the transition state reggions of the other pathw
ways are simiilar,

15
with the exception of Protein G and NTL9, which show distinct folding routes. For these two
proteins the properties of the transition state region of both pathways (a and b) are reported. The
substantial differences between the two ensembles for Protein G suggest that it might be possible
to distinguish these experimentally (13).

SASA Sec. Str. CO Q


Chignolin 1.12 0.56 0.00 0.00 0.61 0.22 0.31 0.26


Trp-cage 1.05 0.79 0.96 0.96 0.85 0.71 0.81 0.77
BBA 1.05 0.64 0.72 0.67 0.98 0.91 0.90 0.86
Villin 1.02 0.87 0.86 0.77 0.99 0.95 0.88 0.76
WW domain 1.11 0.48 0.85 0.84 0.75 0.62 0.73 0.68
NTL9 (a) 1.16 0.22 0.60 0.51 0.62 0.46 0.49 0.35
NTL9 (b) 1.13 0.37 0.51 0.40 0.80 0.71 0.51 0.37
BBL 1.00 0.99 0.97 0.93 1.00 0.96 0.97 0.93
Protein B 0.98 1.13 0.93 0.86 0.99 0.94 0.96 0.89
Homeodomain 1.02 0.76 0.79 0.59 0.97 0.65 0.86 0.60
Protein G (a) 1.02 0.83 0.87 0.71 0.97 0.95 0.84 0.71
Protein G (b) 1.07 0.48 0.68 0.32 0.62 0.28 0.70 0.44
α3D 1.01 0.93 0.81 0.73 0.77 0.45 0.84 0.72
λ-repressor 1.03 0.67 0.94 0.79 0.69 0.31 0.80 0.52

Table S3. Structural properties of the transition state regions of the 12 proteins. Values are
reported relative to the native state ( ) or to the difference between the folded and unfolded state
( ) for the Solvent Accessible Surface Area (SASA), the amount of native secondary structure
(Sec. Str.) (31), the contact order (CO) (35), and the number of native contacts (Q) (35). For
NTL9 and protein G the properties of both folding pathways (a and b) are reported.

16
Individual proteins: methods, results, and comparison with experimental data

For each protein we provide below additional information and detail on the methods employed, a
brief discussion of the experimental data available, and a figure showing:

– the structure of the native state obtained in the simulation (i.e., from clustering)
superimposed to the experimental structure (also reported in Fig. 1);
– the time series of the Cα-RMSD and Q-values for each simulations, where Q is the
fraction of native contacts as defined above;
– a 2-dimensional projection (33) of the transition paths, where different colors are used for
transition paths assigned to different clusters (i.e., folding pathways) by affinity
propagation (32), using the transition-path distance metric defined above;
– the free energy profile along a RMSD-based reaction coordinate (2) optimized using the
method described in (19, 36) (note that the 0 of this coordinate is the folded state; similar
results were obtained by optimizing a contact-based reaction coordinate) and 40
structures selected from the ensemble with Pfold ~ 0.5 were superimposed using
THESEUS (37);
– the dynamical content as a function of the timescale (2), calculated for the decay of the
autocorrelation function of the number of native contacts.

17
Chigno
olin

Simulatio
on details. The
T polypepttide with thee sequence T
TYR-TYR-A
ASP-PRO-GL
LU-THR-GL
LY-
THR-TR
RP-TYR, corrresponding to
t the cln025
5 peptide of (38), was soolvated in a ccubic box off ~40
Å side length contain
ning ~1900 water
w molecu
ules and the negative chharge of the ppeptide was
neutralized with two sodium ionss. One 106-µ
µs long MD med in the NVT
D simulation was perform
ensemblee (39–41). A cutoff of 9.5
9 Å was used for the LJJ and short-rrange electroostatic
interactio
ons; long-ran
nge electrosttatic interacttions were trreated with thhe Gaussiann Split Ewaldd
(GSE) method
m (42) and
a a 32 × 32
2 × 32 cubicc grid.

Compariison with exp


periment. Th
he NMR stru
ucture of clnn025 was repported in thee supplementtary
informatiion of ref (38
8), but it hass not been deeposited in thhe PDB dataabase. The C
Cα-RMSD oof the
center off the most po
opulated clusster in the sim
mulation froom this struccture is 1.0 Å
Å, which is
smaller th
han the diffeerence betweeen the NMR
R and the X--ray structurres of the sam
me protein

18
(1.75 Å) (38), The calculated meelting tempeerature of 38 1 K can be ccompared to the
experimeental value of
o 343 K. Th
he folding en
nthalpy at Tm (381K) cann be extrapoolated from thhe
calculated ΔHf and ΔCv ulting value of −10.3 ± 5 kcal mol−1 can be com
Δ f; the resu mpared to thee
experimeental folding 1 kcal mol−1 (38).
g enthalpy at Tm of −11.1

Trp-cage

Simulatio
on details. The
T polypepttide with thee sequence A
ASP-ALA-TYR-ALA-G
GLN-TRP-LE
EU-
ALA-AS
SP-GLY-GLY
Y-PRO-SER
R-SER-GLY
Y-ARG-PRO
O-PRO-PRO--SER, corressponding to the
K8A muttant of the th
hermostable Trp-cage vaariant TC10bb (43), was ssolvated in 665 mM NaCll in a
cubic box
x of ~37 Å side
s length containing ~1
1700 water m
molecules. O
One 208-µs long MD
simulatio
on was perfo
ormed in the NVT ensem
mble. A cutooff of 9.0 Å w
was used forr the LJ and

19
short-range electrostatic interactions; long-range electrostatic interactions were treated with the
Gaussian Split Ewald (GSE) method and a 32 × 32 × 32 cubic grid.

Comparison with experiment. The Cα-RMSD between the center of the most populated cluster
in the simulation and the experimental structure (PDB entry 2JOF) is 1.4 Å. The calculated
melting temperature (206 K) and unfolding enthalpy (2.1 ± 1 kcal mol−1) can be compared to the
experimental values of 335 K and 13 ± 2 kcal mol−1 (43). These results suggest that the
discrepancy between the calculated and experimental melting temperatures is enthalpic in origin,
although the underlying reason for the difference in stability is not clear. We note that Trp-cage
has an unusual sequence containing a large proportion of proline and glycine residues, and we
speculate that small residual errors in the parameterization of these two residues [which were not
modified in the parameterization of CHARMM22* (4)] may have an unusually large effect on
the stability of this fold.

20
BBA

Simulatio
on details. The
T polypepttide with thee sequence G
GLU-GLN-T
TYR-THR-A
ALA-LYS-TYR-
LYS-GLY-ARG-TH
HR-PHE-ARG
G-ASN-GLU
U-LYS-GLU
U-LEU-ARG
G-ASP-PHE
E-ILE-GLU-L
LYS-
PHE-LY
YS-GLY-ARG
G, correspon
nding to the FSD-EY pepptide in (44)), was solvatted in a cubiic
box of ~4
47 Å side len
ngth contain
ning ~3200 water
w molecuules. The poositive chargge of the peptide
was neutralized with four chlorid
de ions. Two
o MD simulaations of lenngths 223 µss and 102 µs were
performeed in the NV
VT ensemble.. A cutoff of 9.5 Å was used for thee LJ and shorrt-range
electrostaatic interactiions; long-raange electrosstatic interacctions were trreated with tthe GSE method
and a 32 × 32 × 32 cu
ubic grid.

21
Compariison with exp
periment. Th
he Cα-RMSD
D between tthe center off the most poopulated clusster
in the sim
mulation and
d the experim
mental structu
ure (PDB enntry 1FME) is 1.6 Å. Thhe simulationns do
not allow
w us to estim
mate robustly a melting teemperature; as far as we are aware, tthere is no
experimeental value reeported in th
he literature for this pepttide, as the nnative state oof this fold iss
relatively
y unstable in
n solution, which makes it difficult too determine a folded statte baseline (44).

Villin

Simulatio
on details. The
T polypepttide with thee sequence L
LEU-SER-A
ASP-GLU-AS
SP-PHE-LY
YS-
ALA-VA
AL-PHE-GLY-MET-TH
HR-ARG-SER
R-ALA-PHE
E-ALA-ASN
N-LEU-PRO
O-LEU-TRP--
NLE-GL
LN-GLN-HIS
S-LEU-NLE
E-LYS-GLU-LYS-GLY--LEU-PHE, correspondiing to the
Nle/Nle double
d mutaant of the C-tterminal frag
gment of thee villin headppiece (45), w
was solvatedd in
40 mM NaCl
N in a cub
bic box of ~54 Å side length containning ~4400 w
water molecuules. The H
His
w protonated. One 120
residue was 0-µs long MD simulationn was perforrmed in the N
NVT ensem
mble.

22
A cutoff of 9.5 Å was used for the LJ and short-range electrostatic interactions; long-range
electrostatic interactions were treated with the GSE method and a 32 × 32 × 32 cubic grid.

Comparison with experiment. The Cα-RMSD between the center of the most populated cluster
in the simulation and the experimental structure (PDB entry 2F4K) is 1.3 Å. The calculated
melting temperature (343 K), unfolding enthalpy at the melting temperature (19 ± 6 kcal mol−1),
and folding time (2.8 µs) can be compared to the experimental values of 361 K, 25(+2, −5) kcal
mol−1 and ~ 1 µs, respectively (45). The calculated folding free energy barrier of 2 kcal mol−1
appears to be larger than the value of ~ 1 kcal mol−1 estimated from calorimetry experiments (45,
46), which may help explain the small discrepancy between the calculated and observed folding
times.

23
WW do
omain

Simulatio
on details. The
T polypepttide with thee sequence G
GLY-SER-L
LYS-LEU-PR
RO-PRO-GL
LY-
TRP-GLU
U-LYS-ARG
G-MET-SER
R-ARG-ASP
P-GLY-ARG
G-VAL-TYR
R-TYR-PHE
E-ASN-HIS-IILE-
THR-GL
LY-THR-TH
HR-GLN-PHE
E-GLU-ARG
G-PRO-SER
R-GLY, corrresponding to the WW
domain variant
v GTT (47), was so
olvated in a cubic
c box off ~52 Å sidee length conttaining ~4300
water mo
olecules. Th
he positive ch
harge of the peptide wass neutralizedd with three cchloride ionss.
Two MD
D simulationss of lengths 651 µs and 486
4 µs were performed iin the NVT ensemble. A
cutoff off 9.0 Å was used
u for the LJ
L and shortt-range electtrostatic interractions; lonng-range
electrostaatic interactiions were treeated with th
he GSE methhod and a 322 × 32 × 32 ccubic grid.

24
Comparison with experiment. The Cα-RMSD between the center of the most populated cluster
in the simulation and the experimental structure of the closest homologue (PDB entry 2F21) is
1.2 Å. The calculated melting temperature (373 K) and folding time (21 ± 6 µs) can be
compared to the experimentally determined values of 371 ± 2 K and 5.7 µs (47).

25
NTL9

26
Simulation details. The polypeptide with the sequence MET-LYS-VAL-ILE-PHE-LEU-LYS-
ASP-VAL-LYS-GLY-MET-GLY-LYS-LYS-GLY-GLU-ILE-LYS-ASN-VAL-ALA-ASP-GLY-
TYR-ALA-ASN-ASN-PHE-LEU-PHE-LYS-GLN-GLY-LEU-ALA-ILE-GLU-ALA-CONH2,
corresponding to the K12M mutant of the 39 amino acid N-terminal fragment of ribosomal
protein L9 (NTL9), was solvated in 100 mM NaCl in a cubic box of ~50 Å side length
containing ~3800 water molecules. Four MD simulations of 1052 µs, 990 µs, 389 µs, and 377
µs were performed in the NVT ensemble. A force-shifted cutoff of 9.5 Å was used for the LJ
and electrostatic interactions (48).

Comparison with experiment. The Cα-RMSD between the center of the most populated cluster
in the simulation and the experimental structure (PDB entry 2HBA) is 0.5 Å. The calculated
melting temperature of 370 K can be compared to the value of 355 ± 2 K measured
experimentally (49). The calculated folding time of 21 ± 6 µs obtained from the simulations
performed at 355 K can be compared to the folding time of 730 ± 50 µs at 298 K (extrapolated
from urea denaturation experiments) (49); although the folding rate is expected to increase with
temperature, the temperature dependence has not been established, thus preventing a more
accurate comparison.

27
BBL

Simulatio
on details. The
T polypepttide with thee sequence G
GLY-SER-G
GLN-ASN-A
ASN-ASP-AL
LA-
LEU-SER
R-PRO-ALA
A-ILE-ARG
G-ARG-LEU-LEU-ALA--GLU-TRP--ASN-LEU-A
ASP-ALA-S
SER-
ALA-ILE
E-LYS-GLY
Y-THR-GLY
Y-VAL-GLY
Y-GLY-ARG
G-LEU-THR
R-ARG-GLU
U-ASP-VAL-
GLU-LY
YS-HIS-LEU
U-ALA-LYS-ALA, correesponding too the 47 aminno acid residdue-long
peripheraal subunit-biinding domaain BBL-H14
42W describbed in Neuweiler et al. (550), was solvvated
in 200 mM
m NaCl in a cubic box of
o ~57 Å sid
de length conntaining ~60000 water moolecules. Tw
wo
MD simu
ulations of leengths 272 µs
µ and 157 µs
µ were perfoormed in thee NVT ensem
mble. A cutooff of

28
9.5 Å was used for the LJ and short-range electrostatic interactions; long-range electrostatic
interactions were treated with the GSE method and a 32 × 32 × 32 cubic grid.

Comparison with experiment. The Cα-RMSD between the center of the most populated cluster
in the simulation and the experimental structure determined by NMR spectroscopy (PDB entry
2WXC) is 4.8 Å, but drops to 3.5 Å if residues 1–9 and 29–35—which are highly variable in the
NMR ensemble—are ignored. Unlike the 11 other proteins that we folded, the most populated
cluster is not the closest to the experimental structure, as the fourth most populated cluster has a
Cα-RMSD of only 1.5 Å, suggesting either some residual force-field discrepancy, or some
plasticity in the folded state ensemble above the melting temperature. In this context, it is worth
noting that experiments have shown that the folded state of BBL can depend subtly on
experimental conditions (8). The calculated melting temperature of 251 and folding time of
29 ± 8 µs can be compared to the values of 327.3 ± 0.2 K and 14 ± 1 µs (at 283 K) measured
experimentally (50).

29
Protein
nB

Simulatio
on details. The
T polypepttide with thee sequence L
LEU-LYS-A
ASN-ALA-IL
LE-GLU-AS
SP-
ALA-ILE
E-ALA-GLU
U-LEU-LYS
S-LYS-ALA-GLY-ILE-T
THR-SER-A
ASP-PHE-TY
YR-PHE-AS
SN-
ALA-ILE
E-ASN-LYS
S-ALA-LYS-THR-VAL-GLU-GLU
U-VAL-ASN-ALA-LEU--VAL-ASN--
GLU-ILE
E-LEU-LYS
S-ALA-HIE--ALA, corressponding to the K5I/K399V double m
mutant (51) oof the
Albumin
n binding dom
main of protein B, was solvated
s in 5 0 mM NaCll in a cubic bbox of ~56 Å side
ontaining ~5600 water molecules.
length co m One mulation wass performed in
O 105-µs loong MD sim
the NVT ensemble. A cutoff of 9.5
9 Å was ussed for the L
LJ and short--range electrrostatic
interactio
ons; long-ran
nge electrosttatic interacttions were trreated with thhe GSE methhod and a 322 ×
32 × 32 cubic
c grid.

Compariison with exp


periment. Th
he Cα-RMSD
D between tthe center off the most poopulated clusster
in the sim
mulation and
d the experim ure of the cllosest homollogue (PDB entry 1PRB) is
mental structu

30
3.3 Å. The
T calculated melting teemperature (3
317 K) and ffolding timee (3.9 ± 0.9 µ
µs) can be
compared
d to the valu
ues of >363 K and 1.0 ± 0.2 µs meassured experim
mentally (511).

Homeo
odomain

Simulatio
on details. The
T polypepttide with thee sequence M
MET-LYS-G
GLN-TRP-SE
ER-GLU-AS
SN-
VAL-GL
LU-GLU-LY
YS-LEU-LYS
S-GLU-PHE
E-VAL-LYS
S-ARG-HIS--GLN-ARG--ILE-THR-G
GLN-
GLU-GL
LU-LEU-HIS
S-GLN-TYR
R-ALA-GLN
N-ARG-LEU
U-GLY-LEU
U-ASN-GLU
U-GLU-ALA
A-
ILE-ARG
G-GLN-PHE
E-PHE-GLU
U-GLU-PHE-GLU-GLN
N-ARG-LYS,, correspondding to the
redesigneed sequence of the Droso
ophila Melan
nogaster hom
meodomain described inn (7), was

31
solvated in 45 mM NaCl in a cubic box of ~53 Å side length containing ~4400 water molecules.
Two MD simulations of 129 µs and 198 µs were performed in the NVT ensemble. A cutoff of
9.5 Å was used for the LJ and short-range electrostatic interactions; long-range electrostatic
interactions were treated with the GSE method and a 32 × 32 × 32 cubic grid.

Comparison with experiment. The Cα-RMSD between the center of the most populated cluster
in the simulation and the experimental structure (PDB entry 2P6J) is 3.6 Å; it drops to 1.9 A if
the flexible tails are omitted from the RMSD calculation. The calculated melting temperature
(> 360 K) and folding time (3.1 ± 0.6 µs) can be compared to the values of >372 K and 13 µs (at
308 K and based on a long extrapolation from measurements in ~5 M urea) measured
experimentally (7, 52).

32
Protein
nG

33
Simulation details. The polypeptide with the sequence ASP-THR-TYR-LYS-LEU-VAL-ILE-
VAL-LEU-ASN-GLY-THR-THR-PHE-THR-TYR-THR-THR-GLU-ALA-VAL-ASP-ALA-
ALA-THR-ALA-GLU-LYS-VAL-PHE-LYS-GLN-TYR-ALA-ASN-ASP-ALA-GLY-VAL-
ASP-GLY-GLU-TRP-THR-TYR-ASP-ALA-ALA-THR-LYS-THR-PHE-THR-VAL-THR-
GLU, corresponding to the N37A/A46D/D47A triple mutant of the redesigned protein G variant
NuG2 (53), was solvated in 100 mM NaCl in a cubic box of ~55 Å side length containing ~5100
water molecules. Four MD simulations of lengths 444 µs, 370 µs, 168 µs and 172 µs were
performed in the NVT ensemble. A force-shifted cutoff of 9.5 Å was used for the LJ and
electrostatic interactions.

Comparison with experiment. The Cα-RMSD between the center of the most populated cluster
in the simulation and the experimental structure of the closest homologue (PDB entry 1MIO) is
1.2 Å. The calculated folding time of 65 ± 20 µs can be compared to the value of ~60 µs
estimated from the experimental value for NuG2 extrapolated at 0 M GuaCl (53, 54), assuming
that the experimentally determined effects of the point mutations on the folding rate are additive.

34
α3D

Simulatio
on details. The
T polypepttide with thee sequence M
MET-GLY-S
SER-TRP-AL
LA-GLU-PH
HE-
LYS-GLN-ARG-LEU
U-ALA-ALA
A-ILE-LYS-THR-ARG
G-LEU-GLN-ALA-LEU--GLY-GLY--
SER-GLU
U-ALA-GLU
U-LEU-ALA
A-ALA-PHE
E-GLU-LYS
S-GLU-ILE--ALA-ALA--PHE-GLU-
SER-GLU
U-LEU-GLN
N-ALA-TYR
R-LYS-GLY
Y-LYS-GLY
Y-ASN-PRO
O-GLU-VAL
L-GLU-ALA
A-
LEU-AR
RG-LYS-GLU
U-ALA-ALA
A-ALA-ILE
E-ARG-ASP
P-GLU-LEU-GLN-ALA-TYR-ARG
G-
HIS-ASN
N, correspon
nding to the de
d novo–dessigned three--helix bundlee protein α3D (55), was
solvated in a cubic bo
ox of ~64 Å side length containing ~
~8000 waterr molecules. The negativve
charge off the protein was neutrallized with a sodium
s ion. Two MD simulations oof 346 µs andd

35
361 µs were performed in the NVT ensemble. A cutoff of 10 Å was used for the LJ and short-
range electrostatic interactions; long-range electrostatic interactions were treated with the GSE
method and a 32 × 32 × 32 cubic grid.

Comparison with experiment. The Cα-RMSD between the center of the most populated cluster
in the simulation and the experimental structure (PDB entry 2A3D) is 3.1 Å. The calculated
melting temperature (370 K), unfolding enthalpy (38 ± 1 kcal mol−1), unfolding heat capacity
(0.6 ± 0.2 kcal mol−1 K−1), and folding rate (27 ± 8 µs) can be compared to the experimental
values at neutral pH (melting temperature: >363K) and pH 2 in D2O [melting temperature: 346
K; unfolding enthalpy: 36 kcal mol−1; unfolding heat capacity: 0.65 kcal mol−1 K−1; and folding
rate: 3.2 ± 1.2 µs (56)], though we note that the folding rate may be probe-dependent (57).

36
λ-repre
essor

37
Simulation details. The polypeptide with the sequence PRO-LEU-THR-GLN-GLU-GLN-LEU-
GLU-ALA-ALA-ARG-ARG-LEU-LYS-ALA-ILE-TRP-GLU-LYS-LYS-LYS-ASN-GLU-
LEU-GLY-LEU-SER-TYR-GLU-SER-VAL-ALA-ASP-LYS-MET-GLY-MET-GLY-GLN-
SER-ALA-VAL-ALA-ALA-LEU-PHE-ASN-GLY-ILE-ASN-ALA-LEU-ASN-ALA-TYR-
ASN-ALA-ALA-LEU-LEU-ALA-LYS-ILE-LEU-LYS-VAL-SER-VAL-GLU-GLU-PHE-SER-
PRO-SER-ILE-ALA-ARG-GLU-ILE-TYR, corresponding to the “λD14A” quintuple mutant
(D14A/Y22W/Q33Y/G46A/G48A) of the λ-repressor (58), was solvated in 50 mM NaCl in a
cubic box of ~70 Å side length containing ~11000 water molecules. Four MD simulations of
170 µs, 161 µs, 158 µs and 154 µs were performed in the NVT ensemble. A cutoff of 11 Å was
used for the LJ and short-range electrostatic interactions; long-range electrostatic interactions
were treated with the GSE method and a 32 × 32 × 32 cubic grid.

Comparison with experiment. The Cα-RMSD between the center of the most populated cluster
in the simulation and the experimental structure of the closest homologue (PDB entry 1LMB) is
1.8 Å. The calculated folding rate (49 ± 15 µs) can be compared to the experimental value of
10 µs (58). The calculated folding free energy barrier of ~1.5 kT at 350 K can be compared to
the value of 1.5 kT at 338 K estimated from the experimental data (58). The large amount of
residual secondary structure that we observe in the unfolded state of the λ-repressor (Table S2) is
consistent with experimental findings in studies of the unfolded state of this protein under non-
denaturing conditions (59). The formation of helical structure during folding of the λ-repressor
has been studied experimentally using kinetic isotope effects, and the results of those studies
demonstrated that 70%–80% of the helical structure that is formed during folding is already
formed in the transition state ensemble (60), which can be compared to the 79% found in our
simulations (Table S3).

38
Repres
sentative
e protein structure
s es

S Represen
Figure S7. ntative structtures observ
ved in simulaation. For eaach protein w
we show thee
folded strructure obtained in simu
ulation superrimposed on the correspoonding expeerimental
structure. Both the structures obttained from a clustering analysis of our simulatiions (blue) aand
those from experimen
nts (red) aree the same ass those show
wn in Fig. 1, bbut here we also show thhose
sidechain
ns, other than
n alanine, th
hat are buried
d in the expeerimental stru SA < 30 Å2)..
ructure (SAS

39
S Represen
Figure S8. ntative structtures observ
ved in simulaation. (Samee as in the prrevious figurre,
but with structures diisplayed in a different orrientation.)

40
References

27. W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impey, M. L. Klein, Comparison


of simple potential functions for simulating liquid water. J. Chem. Phys. 79, 926 (1983).

28. A. D. MacKerell, Jr. et al., All-atom empirical potential for molecular modeling and
dynamics studies of proteins. J. Phys. Chem. B 102, 3586 (1998).

29. K. J. Bowers et al., Scalable algorithms for molecular dynamics simulations on


commodity clusters. Proc. ACM/IEEE Conf. Supercomputing (SC06) (ACM, New York,
2006).

30. S. H. Northrup, J. T. Hynes, The stable states picture of chemical reactions. I.


Formulation for rate constants and initial condition effects. J. Chem. Phys. 73, 2700
(1980).

31. D. Frishman, P. Argos, Knowledge-based secondary structure assignment. Proteins 23,


566 (1995).

32. B. J. Frey, D. Dueck, Clustering by passing messages between data points. Science
315, 972 (2007).

33. D. K. Agrafiotis, H. Xu, A self-organizing principle for learning nonlinear


manifolds. Proc. Natl. Acad. Sci. U.S.A. 99, 15869 (2002).

34. T. G. Graham, R. B. Best, Force-induced change in protein unfolding mechanism:


discrete or continuous switch? J. Phys. Chem. B 115, 1546 (2011).

41
35. K. W. Plaxco, K. T. Simons, D. Baker, Contact order, transition state placement and the
refolding rates of single domain proteins. J. Mol. Biol. 277, 985 (1998).

36. G. Hummer, From transition paths to transition states and rate coefficients. J. Chem.
Phys. 120, 516 (2004).

37. D. L. Theobald, D. S. Wuttke, THESEUS: maximum likelihood superpositioning and


analysis of macromolecular structures. Bioinformatics 22, 2171 (2006).

38. S. Honda et al., Crystal structure of a ten-amino acid protein. J. Am. Chem. Soc. 130,
15327 (2008).

39. S. Nosé, A unified formulation of the constant temperature molecular dynamics methods.
J. Chem. Phys. 81, 511 (1984).

40. W. G. Hoover, Canonical dynamics: equilibrium phase-space distributions. Phys. Rev. A


31, 1695 (1985).

41. G. J. Martyna, M. L. Klein, M. Tuckerman, Nosé-Hoover chains: the canonical ensemble


via continuous dynamics. J. Chem. Phys. 97, 2635 (1992).

42. Y. Shan, J. L. Klepeis, M. P. Eastwood, R. O. Dror, D. E. Shaw, Gaussian split Ewald: A


fast Ewald mesh method for molecular simulation. J. Chem. Phys. 122, 054101 (2005).

43. B. Barua et al., The Trp-cage: optimizing the stability of a globular miniprotein. Protein
Eng. Des. Sel. 21, 171 (2008).

42
44. C. A. Sarisky, S. L. Mayo, The ββα fold: explorations in sequence space. J. Mol. Biol.
307(5), 1411 (2001).

45. J. Kubelka, T. K. Chiu, D. R. Davies, W. A. Eaton, J. Hofrichter, Sub-microsecond


protein folding. J. Mol. Biol. 359, 546 (2006).

46. R. Godoy-Ruiz et al., Estimating free-energy barrier heights for an ultrafast folding
protein from calorimetric and kinetic data. J. Phys. Chem. B. 112, 5938 (2008).

47. S. Piana et al., Computational design and experimental testing of the fastest-folding β-
sheet protein. J. Mol. Bio. 405, 43 (2011).

48. C. J. Fennell, J. D. Gezelter, Is the Ewald summation still necessary? Pairwise


alternatives to the accepted standard for long-range electrostatics. J. Chem. Phys. 124,
234104 (2006).

49. J. C. Horng, V. Moroz, D. P. Raleigh, Rapid cooperative two-state folding of a miniature


alpha-beta protein and design of a thermostable variant. J. Mol. Biol. 326, 1261 (2003).

50. H. Neuweiler et al., The folding mechanism of BBL: Plasticity of transition-state


structure observed within an ultrafast folding protein family. J. Mol. Biol. 390, 1060
(2009).

51. T. Wang, Y. J. Zhu, F. Gai, Folding of a three-helix bundle at the folding speed limit. J.
Phys. Chem. B 108, 3694 (2004).

43
52. B. Gillespie et al., NMR and temperature-jump measurements of de novo designed
proteins demonstrate rapid folding in the absence of explicit selection for kinetics. J. Mol.
Biol. 330, 813 (2003).

53. S. Nauli, B. Kuhlman, D. Baker, Computer-based redesign of a protein folding pathway.


Nat. Struct. Biol. 8, 602 (2001).

54. E. L. McCallister, E. Alm, D. Baker, Critical role of beta-hairpin formation in protein G


folding. Nat. Struct. Biol. 7, 669 (2000).

55. S. T. R. Walsh, H. Cheng, J. W. Bryson, H. Roder, W. F. DeGrado, Solution structure


and dynamics of a de novo designed three helix bundle protein. Proc. Natl. Acad. Sci.
U.S.A. 96, 5486 (1999).

56. Y. Zhu et al., Ultrafast folding of alpha3D: a de novo designed three-helix bundle
protein. Proc. Natl. Acad. Sci. U.S.A. 100, 154861 (2003).

57. F. Liu, A one-dimensional free energy surface does not account for two-probe folding
kinetics of protein alpha(3)D. J. Chem. Phys. 130, 061101 (2009).

58. W. Y. Yang, M. Gruebele, Folding at the speed limit. Nature 423, 193 (2003).

59. P. Chugha, H. J. Sage, T. G. Oas, Methionine oxidation of monomeric lambda repressor:


the denatured state ensemble under nondenaturing conditions. Protein Sci. 15, 533
(2006).

60. B. A. Krantz et al., Understanding protein hydrogen bond formation with kinetic H/D
amide isotope effects. Nat. Struct. Biol. 9, 458 (2002).

44

You might also like