Professional Documents
Culture Documents
Generic CG Folding Aggre Models
Generic CG Folding Aggre Models
Downloaded 07 Oct 2012 to 152.14.136.96. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions
THE JOURNAL OF CHEMICAL PHYSICS 130, 235106 共2009兲
A generic coarse-grained 共CG兲 protein model is presented. The intermediate level of resolution 共four
beads per amino acid, implicit solvent兲 allows for accurate sampling of local conformations. It relies
on simple interactions that emphasize structure, such as hydrogen bonds and hydrophobicity.
Realistic ␣ /  content is achieved by including an effective nearest-neighbor dipolar interaction.
Parameters are tuned to reproduce both local conformations and tertiary structures. The
thermodynamics and kinetics of a three-helix bundle are studied. We check that the CG model is
able to fold proteins with tertiary structures and amino acid sequences different from the one used
for parameter tuning. By studying both helical and extended conformations we make sure the force
field is not biased toward any particular secondary structure. The accuracy involved in folding not
only the test protein but also other ones show strong evidence for amino acid cooperativity
embedded in the model. Without any further adjustments or bias a realistic oligopeptide aggregation
scenario is observed. © 2009 American Institute of Physics. 关DOI: 10.1063/1.3152842兴
Downloaded 07 Oct 2012 to 152.14.136.96. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions
235106-2 T. Bereau and M. Deserno J. Chem. Phys. 130, 235106 共2009兲
Downloaded 07 Oct 2012 to 152.14.136.96. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions
235106-3 Generic coarse-grained protein model J. Chem. Phys. 130, 235106 共2009兲
TABLE I. Bonded interaction parameters used in the model. The dihedrals denoted with an asterisk were determined during parameter tuning 共see Sec. IV兲.
All parameters are expressed in terms of the intrinsic units of the system. k represents the interaction strength of Fourier mode n 共see main text兲, with
equilibrium value 0. pro refers to the dihedral around the peptide bond for a proline residue. The sign of the improper dihedral angle 0 is linked to the
chirality of the isomer; the L form requires a negative sign. For each angular potential, only a single mode n was used.
Bond lengths
NC␣ C ␣C ⬘ C ⬘N C ␣C 
Bond angles
Dihedrals
ⴱ ⴱ pro Improper
k 共E兲 ⫺0.3 ⫺0.3 67.0 3.0 17.0
n 1 1 1 2 1
0 共deg兲 0 0 180 0 ⫿120
van der Waals radii were left as free parameters. Following free parameters as low as possible, such that judicious tuning
the abovementioned references, C was set at the location of by hand is still a viable option. We will see below that it is
the first carbon of the side chain 共hence our nomenclature兲, also successful. While optimization of side chain parameters
directly connected to the backbone. Its location will gener- will remain a long term goal, this is certainly not the point
ally not coincide with the center of mass of the atomistic side where to start.
chain 共which for larger and flexible side chains has no fixed Finally, amino acids that are in the middle of a protein
position with respect to the backbone兲, but the concomitant chain form peptide bonds with their neighbors. This is not so
substantial reduction in tuning parameters is necessary for at the ends of the chain, and the structure is slightly different.
our parametrization scheme, as we will see below. Nonetheless, we model the end beads identically.
All side chain beads have been given the same van der
Waals radius, except for glycine, which is modeled without a
C. Units
side chain. This accounts for the biggest difference in the
Ramachandran plot of amino acids, namely, the large flex- All lengths are measured in units of L, which we choose
ibility of an achiral glycine residue, as opposed to the sub- to be 1 Å. For the energies we found it convenient to relate
stantial chiral sterical clashes between all the others.25 On the them to the thermal energy, since it is this balance which
other hand, it does not represent the size differences between determines the overall protein conformation. We thus define
nonglycine residues and will thus likely cause problems if the energy unit E = kBTr = 1.38⫻ 10−23 J K−1300 K ⬇ 4.1
packing issues are important, e.g., inside globular proteins. ⫻ 10−21 J ⬇ 0.6 kcal mol−1 as the thermal energy at room
Both the location and the size of the side chain are thus temperature.
modeled in an approximate and highly simplified way. Why Masses will be measured in the unit “M,” which is the
not be more sophisticated? Since these degrees of freedom mass of a single CG bead. We will assume all beads to have
are accounted for, one might as well give them the best pos- the same mass. An amino acid weighs on average 110 Da. By
sible parameter values. Ideally this is indeed what one would distributing mass equally among the four beads N, C␣, C,
like to do, but the catch is that the necessary tuning is very and C⬘, this gives an average mass of M ⯝ 4.6⫻ 10−26 kg.
difficult. Having 20 different amino acids gives—in the The natural time unit in our simulation is = L冑M / E.
worst case—203 = 8000 local Ramachandran plots for the Using the length, energy, and mass mappings from above, we
共 , 兲 angles between three consecutive amino acids. These find ⬃ 0.1 ps. This unit of time correctly describes the in-
would first need to be determined atomistically and then— stantaneous dynamics of a fictitious CG bead-spring system
via some suitable matching procedure—translated into CG 共e.g., it leads to a value of the instantaneous velocity and
side chain properties. Clearly, many obvious simplifications associated kinetic energy that satisfies the equipartition theo-
would be possible and the task is not nearly as daunting. The rem兲. However, it is crucial to understand that it does not
number of free parameters would nevertheless be substan- measure the time which the real protein system requires to
tially increased and their tuning would require both auto- undergo the same conformational changes as observed in the
mated techniques and enormous computing resources. In simulation. The reason is that the reduction in degrees of
contrast, in the present model we aim to keep the number of freedom removes friction 共smoothes the free energy land-
Downloaded 07 Oct 2012 to 152.14.136.96. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions
235106-4 T. Bereau and M. Deserno J. Chem. Phys. 130, 235106 共2009兲
scape兲 and speeds up the motion through phase space. Trans- Vdih共兲 = kn关1 − cos共n − n,0兲兴, 共2兲
lating into a reasonable measure for actual dynamics re-
quires the determination of the associated speedup factor, with coefficient kn and phase n,0. In our model we represent
which is typically accomplished by mapping an easily ob- the peptide bond using only one minimum 共n = 1兲 centered
servable dynamic process between the experimental system around the trans conformation. In this case 0 ⬅ 1,0 is the
and the CG simulation 共such as diffusion兲. However, in the equilibrium orientation of the dihedral and k ⬅ k1 is the stiff-
case of the conformational dynamics of proteins the identifi- ness describing deviations from the equilibrium angle. For a
cation of a suitable dynamic process is much less obvious. peptide bond located right before a proline residue, we
We defer this task to future work. It should be recalled that model the isomerization by a dihedral potential with two
as far as equilibrium questions are concerned the precise minima 共n = 2 , k ⬅ k2兲, one at the cis conformation and the
time mapping is, of course, irrelevant. other one at trans. This allows for a more natural represen-
tation of the different conformations proline can take. De-
pending on the problem one is interested in, the energy bar-
rier can be tuned to either freeze the isomerization or set to a
III. INTERACTIONS
low value to allow efficient sampling. We chose the latter in
A. Bonded interactions this work. This choice will of course affect the kinetics of the
system.
The local structure is constrained by bonded interactions.
The central carbon C␣ not only links the backbone to the
Bonds and angle potentials are chosen to be harmonic:
side chain; its sp3 hybridization imposes a tilted orientation
Vbond共r兲 = 21 kbond共r − r0兲2 , 共1a兲 of the C␣C vector compared to the NC␣C⬘ plane. Its four
bonds are located at the vertices of a tetrahedron, linking the
Vangle共兲 = 21 kangle共 − 0兲2 . 共1b兲 backbone atoms N and C⬘, as well as the C side chain and
an extra hydrogen 共not modeled by us兲. This has an impor-
The spring constants kbond and kangle are set high enough to tant consequence, because a carbon atom with four different
keep these coordinates close to their minimum 共within substituents is chiral and hence optically active. All amino
⬃5%兲. Table I reports these parameters. acids except glycine exist as two different stereoisomers. The
Up to thermal fluctuations bonds and angles are thus L form is realized in native amino acids: looking at the cen-
fixed. Flexibility of the overall structure enters through the tral carbon C␣, with the hydrogen atom pointing away, the
dihedrals, the possibility to rotate around a chemical bond. In isomer has L form if the three other substituents C⬘, C, and
the case of proteins, two out of three backbone dihedrals are N are arranged in a counterclockwise fashion 共“CORN
very flexible and are responsible for the diverse set of local rule”兲. This amino acid chirality is a central feature in pro-
conformations. These dihedrals are the and coordinates, teins and their secondary structure, and we account for it by
defined by the sets of beads C⬘NC␣C⬘ and NC␣C⬘N, respec- including an “improper dihedral” between the beads
tively 共see Fig. 1兲. They describe the angle between two NC␣C⬘C. This keeps a tilt between the backbone plane,
planes 共e.g., is the angle between the planes C⬘NC␣ and NC␣C⬘, and the plane intersecting the side chain with two
NC␣C⬘兲 and obey the following convention: taking any four backbone beads, C␣C⬘C, such that all angles are correct and
beads 1, 2, 3, and 4 and looking along the vector from bead the CORN rule is satisfied. The interaction has the same
2 to bead 3, the angle “0” will correspond to the conforma- form as other dihedrals, given by Eq. 共2兲. The two stereoiso-
tion in which beads 1 and 4 point into the same direction mers only differ in the sign of the dihedral equilibrium angle
共i.e., when they visually overlap兲. The rotation of plane 1, 2, 0 and can thus both be modeled.
3 with respect to plane 2, 3, 4 away from this state defines
the angle; the counterclockwise sense counts positive. Be-
B. Nonbonded interactions
cause the potential of rotation around the bond between
sp3-and sp2-hybridized atoms has a rather low barrier com- Probably the biggest challenge in any coarse graining
pared to thermal energy at room temperature, we let the scheme is determining the nonbonded interactions. Unlike
beads rotate freely. However, we will later include a contri- bonded interactions, their form is not intrinsically obvious
bution to the coordinates and accounting for an effective and the system behavior depends very sensitively on them.
nonbonded dipolar interaction 共see below兲. In the following section every interaction introduced will
The third dihedral along the backbone chain, , defined require at least one free parameter that has to be determined
by C␣C⬘NC␣, is located at the peptide bond 共see Fig. 1兲. This by tuning. The key technical difficulty of this enterprise is
bond corresponds to the rotation around two sp2-hybridized that all parameters are typically highly correlated. Optimiza-
atoms, which involves a symmetric potential with two tion is thus an intrinsically multidimensional problem and we
minima, separated by a rather high barrier. The two confor- therefore intend to limit the number of free parameters as
mations, cis and trans, have angles of 0° and 180°, respec- much as possible. While one might envision “hands-off” tun-
tively. The cis conformation tends to be sterically unfavored ing schemes in which optimization occurs in an automated
for most amino acids, except for proline where there is no fashion,26 for the present problem we found this difficult to
specific preference due to its special side chain linkage. implement for two reasons: First, parameter variations often
Generally, dihedrals can be written as a Fourier series in have a rather inconspicuous impact on target observables,
the rotation angle. Here we will restrict to a single mode and and the determination of the right gradient in parameter
describe the interaction as space thus can require very substantial computer time. And
Downloaded 07 Oct 2012 to 152.14.136.96. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions
235106-5 Generic coarse-grained protein model J. Chem. Phys. 130, 235106 共2009兲
second, some optimization aims are hard to quantify in num- Miyazawa and Jernigan analyzed residue-residue con-
bers and rather require judgment and choice—e.g., the ques- tacts in crystallized proteins. By modeling interactions via
tion how one balances the quality of a local Ramachandran square-well potentials, they obtained interaction strengths
plot against global folding characteristics. While we are ⑀MJ
ij for every i-j pair of residues. We reduced the resulting
aware of several obvious extensions and improvements of 20⫻ 20 interaction matrix further by deconvolving it into 20
our present model that would ultimately benefit from such interaction parameters ⑀i 共one for each amino acid兲, which
automated fine-tuning, this is not the point where we wish to approximately recreate all interactions as the geometric mean
start. of the two amino acids involved, ⑀MJ 冑
ij ⬇ ⑀ij = ⑀i⑀ j, following
the Lorentz–Berthelot mixing rule. Each term is then normal-
1. Backbone ized,
Steric interactions are closely linked to secondary and
tertiary structures for two reasons: first, local interactions ⑀i − mink ⑀k
⑀i⬘ = , 共4兲
along the protein chain will shape the Ramachandran plot; maxk ⑀k − mink ⑀k
second, contact between distant parts of the amino acid chain
will determine protein packing on larger scales. In order to such that the most hydrophilic residue has a weight of 0 and
model a local excluded volume, we use a purely repulsive the most hydrophobic a weight of 1, and the normalized
Weeks–Chandler–Andersen 共WCA兲 potential interaction contact is denoted ⑀⬘ij = 冑⑀i⬘⑀⬘j . Finally, we multiply
Vbb共r兲 =
冦 4⑀bb 冋冉 冊 冉 冊 册
ij
r
12
−
ij
r
6
+
1
4
, r ⱕ rc ,
冧 共3兲
this term by the overall interaction scale ⑀hp. One limitation
in varying the interaction strength of a Lennard-Jones poten-
tial is that a low ⑀⬘ij will tend to flatten out the repulsive part
0, r ⬎ rc , of the interaction. This will, as a result, fade the excluded
where rc = 21/6ij and ij is the arithmetic mean between the volume effect for certain side chain beads, which is likely to
two bead sizes involved, following the Lorentz–Berthelot exacerbate packing problems in dense regions. To overcome
mixing rule. Just like the bead sizes, the energy ⑀bb is a free this issue and keep the same excluded volume for all side
parameter, though we use only one parameter for all chain beads, we model the overall interaction by using a
backbone-backbone and backbone–side chain interactions, Lennard-Jones potential for the attractive part linked to a
since for the WCA potential the energy scale is largely im- purely repulsive WCA potential for smaller distances. We
material. Following the practice in atomistic simulations, we join the two potentials at the minimum value of the interac-
do not calculate excluded volume interaction between beads tion in such a way that both the potential and its first deriva-
that are less than three bonds apart. tive are continuous. Overall, the interaction will have the
following form:
2. Side chain interactions
Vhp共r兲
冋冉 冊 冉 冊 册
Amino acids differ in their water solubility. This can be
冦 冧
quantified experimentally by measuring the partitioning of C 12
C 6
冋冉 冊 冉 冊 册
共e.g., Ref. 27兲. The ratio of densities of a residue in the two
environments can be translated into a free energy of transfer = C 12
C 6
Downloaded 07 Oct 2012 to 152.14.136.96. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions
235106-6 T. Bereau and M. Deserno J. Chem. Phys. 130, 235106 共2009兲
TABLE II. Normalized scale of amino acid hydrophobicities ⑀i using the Lorentz–Berthelot mixing rule for the cross terms, as well as relative and absolute error, ⌬⑀i and ij, from the diagonal elements of the MJ matrix n
冉 冊冉 冊
1.00
0.05
⫺0.38
1 Xi − X̄ Y i − Ȳ
c= 兺
Leu
共7兲
L
,
n i=1 X Y
0.97
0.04
⫺0.32
Phe
where n is the number of data points in each set, X̄ and Ȳ are
F
their averages, and X and Y are their standard deviations,
0.84
respectively. Our inferred 210 ⑀ij values and their original
0.02
⫺0.12
⑀MJ
Ile
0.35
⫺0.11
0.17
⫺0.05
0.11
Glyⴱ
local details.
3. Hydrogen bonds
0.13
0.20
⫺0.31
Gln
冋冉 冊 冉 冊 册
0.05
0.50
⫺0.45
Glu
hb 12
hb 10
= ⑀hb 5 −6
r r
再 冎
0.00
4.00
⫺0.48
Lys
where r is the distance between the two beads N and C⬘, hb
ij 共E兲
⑀i 共E兲
Downloaded 07 Oct 2012 to 152.14.136.96. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions
235106-7 Generic coarse-grained protein model J. Chem. Phys. 130, 235106 共2009兲
TABLE III. Nonbonded interactions. The length represents the diameter 5. Dipole interaction
of a bead. Most parameters were determined after parameter tuning, except
the ones denoted by an asterisk. See Sec. IV. The interactions described above were sufficient to fold
and stabilize ␣-helices but not -sheets. Chen et al.22 pointed
Backbone excluded volume out that there is an important contribution usually neglected
N 共Å兲 C␣ 共Å兲 C⬘ 共Å兲 ⑀bb 共E兲 in generic models: carbonyl and amide groups at the peptide
bond form dipoles that interact with each other. Mu and
2.9 3.7 3.5 0.02 Gao34 showed that the nearest-neighbor interaction is enough
to favor  over ␣ content. Effectively, all dipoles along a
Hydrophobicity
helix are parallel compared to more favorable antiparallel
C 共Å兲 ⑀hp 共E兲 rhp,cut 共Å兲
neighboring dipoles on a -sheet.
5.0 4.5 10ⴱ From a computational standpoint, a dipole-dipole inter-
action,
Hydrogen bonding
hb 共Å兲 ⑀hb 共E兲 rhb,cut 共Å兲 ⑀dd
Vdd共pi,p j兲 = 关pi · p j − 3共pi · r̂兲共p j · r̂兲兴, 共9兲
4.11 ⴱ
6 8 ⴱ r3
Downloaded 07 Oct 2012 to 152.14.136.96. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions
235106-8 T. Bereau and M. Deserno J. Chem. Phys. 130, 235106 共2009兲
180 0.06 TABLE IV. Table of free parameters in this CG model. The main test that
β was used to determine a given parameter is denoted in the second column.
120 0.04
Free parameters Tuning method
60 0.02
N , C␣ , C⬘ , C , ⑀bb Ramachandran plot
0 0
ψ
-120 -0.04
the “training set” would incorporate more information, pre-
-180 -0.06
-180 -120 -60 0 60 120 180 sumably leading to a better founded force field. There exist
φ various successful parametrization schemes that rest on large
ensembles of data.37,12,38 This, however, needs to be balanced
against the need to test how reliable a given force field
180 0.06
handles proteins that were not part of its training set—a point
120 β 0.04 we deemed more relevant.
Table IV lists the eight free parameters that need to be
60 0.02
determined. Because of time constraints and to obtain some
0 0 intuition and feeling of each interaction involved, we made a
ψ
Downloaded 07 Oct 2012 to 152.14.136.96. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions
235106-9 Generic coarse-grained protein model J. Chem. Phys. 130, 235106 共2009兲
ψ
information contained in these histograms is used to calcu-
late a consistent set of free energy differences between simu- 2
-60 1.5
lations. Converging these free energies was done by using a
recently developed highly efficient algorithm.47 Once the -120 1
density of states is reconstructed, one can obtain continuous 0.5
approximations to all thermodynamic observables. By com- -180 0
bining WHAM with parallel tempering, we effectively im- -180-120 -60 0 60 120 180
prove sampling by reducing correlations between data
φ
points.
180 5
4.5
A. Local conformations: Ramachandran plot
120 4
60 3.5
The Ramachandran plot48 records the occurrence and
3
frequency of successive 共 , 兲 angles in a protein. Since 0 2.5
ψ
backbone flexibility is almost exclusively due to these two 2
coordinates, the Ramachandran plot is an ideal reporter of -60 1.5
local 共secondary兲 structure: ␣-helices and -sheets belong to 1
peaks in different regions of the plot. And since proteins are -120
0.5
highly constrained systems, low energy points on the Ram- -180 0
achandran plot are rather well localized. Their accurate sam- -180-120 -60 0 60 120 180
pling is therefore prerequisite to the formation and stabiliza- φ
tion of reliable structures on larger scales. In the following
we will be concerned with the 共thermal兲 distribution of the FIG. 4. 共Color兲 Free energy plots of tripeptides Gly-Gly-Gly 共top兲 and Gly-
Ala-Gly 共bottom兲 as a function of successive dihedrals and , calculated
共 , 兲 angles surrounding some particular amino acid and, in at our reference temperature T = 1E / kB. The coloring represents the free
a slight stretch of standard terminology, also refer to this energy difference with the lowest conformation, in units of kBT.
probability density as a Ramachandran plot.
The free parameters that most directly constrain the Ra- tween the two regions are crucial for protein folding. This is
machandran plot are the different bead sizes tuned by the bead sizes and excluded volume energy but also
共N , C␣ , C⬘ , C兲 and, to a lesser extent, the excluded vol- depends on the dipole interaction kdip 共see below兲. The
ume energy prefactor ⑀bb. We disentangled hydrogen bond achiral glycine, on the other hand, has no side chain and
and hydrophobicity effects from the Ramachandran plot by permits many more conformations. One therefore often finds
studying systems made of only three amino acids. From a glycine residues at the ends of helices.
steric point of view we only distinguish between glycine and A particular challenge was the fact that we model neither
nonglycine amino acid, by either not having a side chain the amide-hydrogen nor the carbonyl-oxygen explicitly, yet
bead at all 共Gly兲 or by using a generic bead representing the their steric effects strongly shape the Ramachandran plot.49
19 other amino acids 共Ala, for the sake of concreteness兲. It is This required subtle adjustments of the bead sizes of the N
then sufficient to study the two Ramachandran plots of Gly- and C⬘ atoms compared to their conventional van der Waals
Gly-Gly and Gly-Ala-Gly tripeptides, the smallest systems radii.
that contain relevant information on successive dihedral A poor sampling of local conformations can thwart the
angles and . The reason why we surround the amino acid formation of realistic secondary structure. Moreover, the
of interest with two Gly is to avoid hydrophobic interactions relative weight of characteristic regions of the Ramachan-
between neighboring side chains. As a result, we solely dran plot determines to a large extent the ␣ /  content. Even
probe steric effects. The Ramachandran plots derived from though the analysis of abovementioned tripeptides accounts
the final set of parameters are shown in Fig. 4 as free energy for steric effects and the dipole interaction, it does not con-
plots obtained from using parallel tempering at temperatures sider hydrogen bonds and side chain interactions which are
kBT / E 苸 兵0.5, 0.7, 1.0, 1.3, 1.6, 1.9, 2.2, 2.5其 and reconstruct- also important to stabilize secondary structure. For this rea-
ing the density of states with WHAM. The free energy plot is son it is difficult to ascertain the quality of conformational
calculated at our reference temperature kBT / E = 1. The shad- distributions without studying larger structures.
ing represents the free energy difference with respect to the
B. Folding of a three-helix bundle
lowest conformation, in units of kBT. Notice the inherent
asymmetry in the Gly-Ala-Gly system, which reflects the In this section we study full size proteins to parametrize
chirality of the ␣-carbon. Both ␣-helix 共⫺60°, ⫺60°兲 and large-scale interactions. We used proteins found in the Pro-
-sheet 共⫺60°, 130°兲 regions are well populated, in agree- tein Data Bank that were resolved experimentally in aqueous
ment with Ho et al.49 Proper balance and connectivity be- solvent.
Downloaded 07 Oct 2012 to 152.14.136.96. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions
235106-10 T. Bereau and M. Deserno J. Chem. Phys. 130, 235106 共2009兲
TABLE V. Structure and amino acid sequence of all proteins studied in this paper.
-hairpin VVVVVDPGVVVVV
Our choice of reference protein is constrained by the and the integration started by warming up nonbonded inter-
limitations of our model. For instance, salt or disulfide actions to relax high energy steric clashes. We used parallel
bridges cannot yet be represented and should thus play no tempering for all simulations to avoid kinetic traps. Struc-
role in the reference protein either. Also, it was important to tural observables were measured at kBT = 1E, the temperature
start with a simple structure rather than a globular protein for at which the force field was tuned. Simulations were set at
which packing and cooperativity are more important. Fol- eight different temperatures: kBT / E 苸 兵1.0, 1.1, 1.2, 1.3,
lowing Irbäck et al.18 and Takada et al.,19 we also tuned our 1.4, 1.6, 1.9, 2.2其. MC swaps between different temperatures
force field on a three-helix bundle. Direct comparisons with were attempted every 10; the average acceptance rate was
their models are difficult, though. First, these authors do not around 10%. We tested convergence to a global minimum by
incorporate specificity on every amino acid and only repre- checking that different initial conditions consistently equili-
sent a few amino acid types 共e.g., hydrophobic, polar, gly- brate to the same structure. A combination of thermodynamic
cine residue兲. Second, they only compared their simulations and kinetic studies 共see below兲 will allow us to show two
to the lowest energy structure found during the simulation important features. First, the temperature used for parameter
rather than experimental data. In contrast, we use the de novo
tuning, kBT = 1E, is below the folding temperature T f of
protein 2A3D 共73 residues兲 and systematically compare our
2A3D, above which the unfolded conformation becomes the
results with the real structure resolved experimentally 共using
most stable state. Second, kBT = 1E is above the glass transi-
NMR兲.50 The amino acid sequence is given in Table V. A
tion temperature Tg, below which the energy landscape be-
similar protocol was followed by Favrin et al.20 in order to
comes very rugged and creates severe kinetic traps. It was
study a different three-helix bundle 共1BDD兲.
A first attempt in tuning parameters consisted of simu- indeed possible to observe folding events in conventional
lating proteins starting from their native structure. Testing for 共i.e., not using parallel tempering兲 simulations within this
stability is a rapid means to constrain parameter space but range of temperature.
not sufficiently so as to actually determine their values. This Quantitative comparison between the CG and the experi-
is consistent with the picture of a deep funnel-like free en- mental structures can be made by calculating the root-mean-
ergy landscape:6 the free energy minimum of a native state is square deviation 共RMSD兲 between corresponding ␣-carbons
sufficiently deep compared to unfolded states that a folded on the two chains 共after optimal mutual alignment兲. Figure 5
protein is very stable against force field parameter variations. reports the RMSD of a protein in the lowest 共kBT = 1E兲 rep-
Further tuning was therefore mainly achieved by studying lica of a parallel tempering MD run as a function of time,
folding events using a set of trial runs with different param- using the RMSD trajectory tool within the VMD package.51
eters. Observation of three-dimensional structures with VMD These results were obtained with the parameters reported in
共Ref. 51兲 was well suited to characterize simulations. The Tables I–III. The average error between the equilibrated
software was also used to render protein images in this paper. simulation and the NMR structure is around 4 Å after about
Folding was studied in the following way: The only in- 100 000 and at kBT / E = 1, temperature at which the native
put into our simulations was the sequence of amino acids and conformation represents the free energy minimum. A super-
the temperature. The initial conformation 共determined by the position of the simulated structure with the experimental one
collection of dihedral angles and 兲 was chosen randomly, is shown in Fig. 6. The STRIDE algorithm52 was used to as-
Downloaded 07 Oct 2012 to 152.14.136.96. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions
235106-11 Generic coarse-grained protein model J. Chem. Phys. 130, 235106 共2009兲
16 25
14 kB T = 1.3 E
20
12
8
10
6
kBT = 1.1 E
4 5
2
0 50 100 150 200 250 300 0
Time t [1000τ ] 0.3 0.4 0.5 0.6 0.7
Order parameter Q
FIG. 5. 共Color online兲 RMSD of the CG proteins 2A3D 共full line兲 and 1P68
FIG. 7. 共Color online兲 Free energy profile as a function of a nativeness order
共dashed line兲 compared with experimentally resolved structures. Both simu-
parameter Q below 共T = 1.1E / kB兲, at, and above 共T = 1.3E / kB兲 the folding
lations were run at T = 1E / kB. temperature T f = 1.2E / kB.
冓 冋 册冔
measured the average time it took to fold the protein to its
1 NMR CG 2 native conformation, if it ever did in the time scale of the
Q = exp − 共r − rij 兲 , 共11兲
92 ij ij simulation 共2 ⫻ 106兲. The results are reported in Fig. 9.
Temperatures 0.7E / kB and 0.8E / kB did not yield a single
where the average goes over all pairs ij. The folded confor-
folding event, suggesting the onset of glassy behavior.6,53
mation lies in the basin Q ⲏ 0.6 whereas all unfolded confor-
The glass transition temperature Tg can be estimated follow-
ing a simple pragmatic scheme suggested by Socci and
Onuchic:53 it is the temperature where the mean folding time
is the average of the minimum folding time min 共lowest
point in the graph兲 and the largest time scale one is willing to
0.8
0.7
Order parameter Q
0.6
0.5
0.4
0.3
0.2
0 1 2 3 4 5 6 7
Time t [106 τ ]
FIG. 6. 共Color兲 Equilibrated structures of 共a兲 2A3D and 共b兲 1P68 sampled at
T = 1E / kB. Superposition of simulated structure 共opaque兲 with experimental FIG. 8. 共Color online兲 Conventional 共i.e., not using parallel tempering兲
data 共transparent兲 is displayed. The STRIDE algorithm 共Ref. 52兲 was used simulation of 2A3D at T = 1.2E / kB. The nativity parameter Q is plotted
for secondary structure assignment 共thick ribbons represent ␣-helices on the against time. The protein alternates between folded 共Q ⲏ 0.6兲 and unfolded
figure兲. conformations 共Q ⱗ 0.5兲.
Downloaded 07 Oct 2012 to 152.14.136.96. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions
235106-12 T. Bereau and M. Deserno J. Chem. Phys. 130, 235106 共2009兲
500
VI. APPLICATIONS AND TESTS
0
A. Folding
0.8 0.9 1 1.1 1.2 1.3
Temperature T [E/kB] All simulations mentioned from this point onward have
not been part of the parameter tuning training set. They come
FIG. 9. 共Color online兲 Kinetic studies of the 2A3D three-helix bundle CG
protein. The average folding time t f is plotted against temperature. For tem- out as independent checks and features of the force field.
peratures ranging from T = 0.7E / kB to T = 1.2E / kB, about ten simulations Thermodynamic and kinetic studies were not performed for
were run and we measured the first passage time to the native state. The line the different proteins of this section. Here, we study the equi-
represents the average between the minimum folding time and the time scale
of the simulation. This can be used to estimate the glass transition tempera-
librium conformations of various sequences at a temperature
ture 共see text兲. of kBT / E = 1, which lies between Tg and T f for our reference
protein, 2A3D. In this respect, we expect to avoid glassy
invest in the simulation, max 共highest boundary in the behavior for similarly complex proteins whose native state is
graph兲: g = 共min + max兲 / 2. This average time is plotted as a folded.
horizontal line in the graph. One can then estimate what In order to test the folding features of the model, we first
temperature this folding time corresponds to 共Fig. 9兲. In our studied another de novo three-helix bundle, 1LQ7. Even
case, we can safely assume that Tg ⬍ 0.9E / kB, meaning that though the fold is very similar to 2A3D, it has 67 amino
the protein does not experience glassy behavior when simu- acids and a completely different primary sequence. Also, the
lating at our reference temperature T = 1E / kB. Moreover, native structure, obtained from NMR,56 has the opposite to-
combining results from thermodynamic calculations and ki- pology 共clockwise兲 compared to 2A3D. From ten indepen-
netic studies shows that there is a range of temperatures Tg dent parallel tempering runs, 300 000 long each, one of
⬍ T ⬍ T f in which the system is not experiencing glassy be- them did not fold within this amount of time 共helices formed
havior but is still “cold” enough such that the native state is but did not arrange properly兲. Out of the nine remaining
the most stable conformation. structures, five folded consistently to the native clockwise
Irbäck et al.18 as well as Takada et al.19 reported a de- topology 关Fig. 10共b兲兴 and four to the other one 关Fig. 10共a兲兴. It
generacy in the CG structures of their helix bundles: there should be noted that this sequence had been designed such
are two ways three helices can pack 共see Fig. 10兲, and their that its native structure leads to favorable salt-bridge
models were not able to discriminate the two different ter- interactions.56 As we do not incorporate electrostatics 共and
tiary structures. NMR experiments on 2A3D found a ratio thus salt bridges兲 explicitly, we expect the CG model to have
between clockwise and counterclockwise topologies of sev- difficulties in discriminating between the two tertiary struc-
eral percentage, leading to a free energy difference of a few tures.
kBT at room temperature.54 From 15 independent simulations In order to further probe the folding features of different
we ran, one of them did not fold within 300 000, and 13 ␣-helical rich folds, we studied a four-helix bundle, 1P68,
converged to the NMR structure—a counterclockwise topol- consisting of 102 amino acids.57 Even though the secondary
ogy 关Fig. 10共a兲兴; only one had the other topology 关共illustrated structure is overall rather similar to the abovementioned
in Fig. 10共b兲兴. While it is encouraging to see that our model three-helix bundle, the tertiary structure and amino acid se-
is able to distinguish these topologies, it is not guaranteed quence are completely different. Again, the reference struc-
that this will work equally well for other proteins. ture is taken from experimental data.57 From six independent
V. SIMULATION DETAILS parallel tempering runs, each 600 000 long, our force field
successfully folded the protein into a four-helix bundle for
MD simulations were performed with the ESPRESSO every simulation except one, which did not have time to
package.55 Simulations in the canonical ensemble 共NVT兲 properly align its fourth helix. The RMSD is shown in Fig. 5
for a simulation which converged to the right topology. As
can be seen, the RMSD went below 4 Å, which, again, is
very satisfactory considering the level of resolution and the
complete absence of structure bias in the force field. It
should be noted that what appears as large fluctuations on the
graph are actually frequent MC swaps between replicas of
the parallel tempering ladder. Even though their potential
energies are comparable 共which is the reason why they swap
temperatures兲, their structures are fairly different, as can be
seen on the RMSD plot. Just as in the three-helix bundle
FIG. 10. Schematic figure of the two possible topologies in forming a three-
helix bundle. 共a兲 The native fold of protein 2A3D corresponds to a counter- case, this protein can fold into several different topologies.
clockwise topology and 共b兲 that of 1LQ7 is clockwise. Out of the five simulations which converged to a four-helix
Downloaded 07 Oct 2012 to 152.14.136.96. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions
235106-13 Generic coarse-grained protein model J. Chem. Phys. 130, 235106 共2009兲
Downloaded 07 Oct 2012 to 152.14.136.96. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions
235106-14 T. Bereau and M. Deserno J. Chem. Phys. 130, 235106 共2009兲
aggregation, the authors also simulated a water-soluble con- and kinetic studies when the structure is not known, not well
trol peptide SQNGNQQRG and found a difference in the defined, strongly perturbed from the native state, or adjusts
amount of -sheets formed. We compared the phase behav- during aggregation events.
ior of GNNQQNY and SQNGNQQRG by using WHAM on
both sequences but did not find statistically significant differ- ACKNOWLEDGMENTS
ences over the studied temperature range. This suggests that We thank Zunjing Wang, Christine Peter, Cem Yolcu,
some of the details necessary to distinguish the thermody- and Bill DeGrado for valuable discussions and useful com-
namics of these two peptides are too subtle for our force field ments.
to represent. Since the simulation temperature of Gsponer et
al. in our case maps to T = 1E / kB, which is where we essen- 1
C. N. Pace, Trends Biochem. Sci. 15, 14 共1990兲.
tially find the phase transition 共Fig. 11兲, effects only captured 2
C. N. Pace, B. A. Shirley, M. McNutt, and K. Gajiwala, FASEB J. 10, 75
by the atomistic force field can indeed be expected to lead to 共1996兲.
3
Coarse-Graining of Condensed Phase and Biomolecular Systems, edited
substantial differences. Previous studies have shown how by G. A. Voth 共Taylor & Francis, New York, 2008兲.
differences in CG force field parameters affect structure, 4
G. S. Ayton, W. G. Noid, and G. A. Voth, Curr. Opin. Struct. Biol. 17,
-sheet propensity, and aggregation behavior of different 192 共2007兲.
5
sequences.66 V. Tozzini, Curr. Opin. Struct. Biol. 15, 144 共2005兲.
6
J. D. Bryngelson, J. N. Onuchic, N. D. Socci, and P. G. Wolynes, Pro-
All of these aggregation results were obtained using the teins: Struct., Funct., Genet. 21, 167 共1995兲.
same force field with no additional parameter adjustment. 7
A. Arkhipov, P. L. Freddolino, and K. Schulten, Structure 共London兲 14,
Other CG models have previously demonstrated aggregation 1767 共2006兲.
8
events on a larger scale.67,68 Here our goal was to show that A. Arkhipov, Y. Yin, and K. Schulten, Biophys. J. 95, 2806 共2008兲.
9
F. Ding, J. M. Borreguero, S. V. Buldyrey, H. E. Stanley, and N. V.
we can study aggregation events using a force field that is Dokholyan, Proteins: Struct., Funct., Genet. 53, 220 共2003兲.
tuned to reproduce simple folding features without biasing 10
J. M. Sorenson and T. Head-Gordon, J. Comput. Biol. 7, 469 共2000兲.
11
secondary or tertiary structure. This is important when look- A. Voegler Smith and C. K. Hall, Proteins: Struct., Funct., Genet. 44, 344
ing at spontaneous aggregation or misfolding pathways, 共2001兲.
12
Y. Fujitsuka, S. Takada, Z. A. Luthey-Schulten, and P. G. Wolynes, Pro-
where one aims to reproduce general behavior without con- teins: Struct., Funct., Genet. 54, 88 共2004兲.
straining the protein’s structure toward a certain state that 13
T. Head-Gordon and S. Brown, Curr. Opin. Struct. Biol. 13, 160 共2003兲.
14
might not even be known or well defined. K. F. Lau and K. A. Dill, Macromolecules 22, 3986 共1989兲.
15
N. Go, Annu. Rev. Biophys. Bioeng. 12, 183 共1983兲.
16
L. Monticelli, S. K. Kandasamy, X. Periole, R. G. Larson, D. P. Tiele-
man, and S.-J. Marrink, J. Chem. Theory Comput. 4, 819 共2008兲.
VII. CONCLUSION 17
L. Thøgersen, B. Schiøtt, T. Vosegaard, N. C. Nielsen, and E. Tajkhor-
shid, Biophys. J. 95, 4337 共2008兲.
We have presented a new CG implicit solvent peptide 18
A. Irbäck, F. Sjunnesson, and S. Wallin, Proc. Natl. Acad. Sci. U.S.A. 97,
model. Its intermediate resolution of four beads per amino 13614 共2000兲.
19
acid permits accurate sampling of local conformations and S. Takada, Z. Luthey-Schulten, and P. G. Wolynes, J. Chem. Phys. 110,
thus secondary structure. Following cautious parameter tun- 11616 共1999兲.
20
G. Favrin, A. Irback, and S. Wallin, Proteins: Struct., Funct., Genet. 47,
ing, the CG model is able to fold simple proteins such as 99 共2002兲.
21
helix bundles. Folding of a three-helix bundle was used to A. S. Yang and B. Honig, J. Mol. Biol. 252, 366 共1995兲.
22
incorporate large-scale aspects of the force field, whereas the N. Y. Chen, Z. Y. Su, and C. Y. Mou, Phys Rev. Lett. 96, 078103 共2006兲.
23
H. Lodish, A. Berk, S. L. Zipursky, P. Matsudaira, D. Baltimore, and J.
successful folding event of other helical bundles provided
Darnell, Molecular Cell Biology 共Freeman, New York, 2000兲.
independent checks of reliability. Thermodynamic and ki- 24
P. T. Lansbury and H. A. Lashuel, Nature 共London兲 443, 774 共2006兲.
25
netic studies of the three-helix bundle were carried out to A. V. Finkelstein and O. B. Ptitsyn, Protein Physics 共Academic, New
make sure the folding temperature T f was above the glass York, 2002兲.
26
H. Meyer, O. Biermann, R. Faller, D. Reith, and F. Müller-Plathe, J.
transition temperature Tg for this protein. The model was Chem. Phys. 113, 6264 共2000兲.
systematically compared to NMR data in order to optimize 27
J. L. Fauchere and V. Pliska, Eur. J. Med. Chem. 18, 369 共1983兲.
28
parameter tuning and precisely determine how much fine- S. Miyazawa and R. L. Jernigan, J. Mol. Biol. 256, 623 共1996兲.
29
scale information this CG model still contains. Of course, J. Skolnick, L. Jaroszewski, A. Kolinski, and A. Godzik, Protein Sci. 6,
676 共1997兲.
our model is not intended to compete with atomistic simula- 30
S. Miyazawa and R. L. Jernigan, Proteins: Struct., Funct., Genet. 36, 357
tions, which is not the point of CG models; yet, carefully 共1999兲.
31
balancing several key contributions to the force field is a M. R. Betancourt and D. Thirumalai, Protein Sci. 8, 361 共1999兲.
32
prerequisite to perform meaningful studies involving second- Z. H. Wang and H. C. Lee, Phys. Rev. Lett. 84, 574 共2000兲.
33
M. S. Cheung, A. E. Garcia, and J. N. Onuchic, Proc. Natl. Acad. Sci.
ary and tertiary structure formation. Globular shaped pro- U.S.A. 99, 685 共2002兲.
teins have proven more difficult to stabilize, presumably be- 34
Y. Mu and Y. Q. Gao, J. Chem. Phys. 127, 105102 共2007兲.
35
cause accurate packing and strong cooperativity are not well E. H. Yap, N. L. Fawzi, and T. Head-Gordon, Proteins: Struct., Funct.,
enough captured. We also observe aggregation events of Bioinf. 70, 626 共2008兲.
36
C. L. Guo, M. S. Cheung, H. Levine, and D. A. Kessler, J. Chem. Phys.
small -sheets without retuning the force field. A realistic 116, 4353 共2002兲.
␣ /  balance, coupled with basic folding features, make the 37
L. A. Mirny and E. I. Shakhnovich, J. Mol. Biol. 264, 1164 共1996兲.
38
CG model very suitable for the large-scale and long-term S. Matysiak and C. Clementi, J. Mol. Biol. 363, 297 共2006兲.
39
P. Das, M. Moll, H. Stamati, L. E. Kavraki, and C. Clementi, Proc. Natl.
regime that many biological processes require. Indeed, a
Acad. Sci. U.S.A. 103, 9885 共2006兲.
force field that is not biased toward the protein’s native con- 40
C. Micheletti, A. Laio, and M. Parrinello, Phys. Rev. Lett. 92, 170601
formation will likely give rise to insightful thermodynamic 共2004兲.
Downloaded 07 Oct 2012 to 152.14.136.96. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions
235106-15 Generic coarse-grained protein model J. Chem. Phys. 130, 235106 共2009兲
41
G. M. Torrie and J. P. Valleau, J. Comput. Phys. 23, 187 共1977兲. Commun. 174, 704 共2006兲.
42
S. Park and K. Schulten, J. Chem. Phys. 120, 5946 共2004兲. 56
Q. H. Dai, C. Tommos, E. J. Fuentes, M. R. A. Blomberg, P. L. Dutton,
43
R. H. Swendsen and J. S. Wang, Phys. Rev. Lett. 57, 2607 共1986兲. and A. J. Wand, J. Am. Chem. Soc. 124, 10952 共2002兲.
44
A. M. Ferrenberg and R. H. Swendsen, Phys. Rev. Lett. 61, 2635 共1988兲. 57
Y. N. Wei, S. Kim, D. Fela, J. Baum, and M. H. Hecht, Proc. Natl. Acad.
45
S. Kumar, D. Bouzida, R. H. Swendsen, P. A. Kollman, and J. M. Rosen- Sci. U.S.A. 100, 13270 共2003兲.
berg, J. Comput. Chem. 13, 1011 共1992兲. 58
A. Go, S. Kim, J. Baum, and M. H. Hecht, Protein Sci. 17, 821 共2008兲.
46 59
S. Kumar, J. M. Rosenberg, D. Bouzida, R. H. Swendsen, and P. A. A. Mondragon, S. Subbiah, S. C. Almo, M. Drottar, and S. C. Harrison,
Kollman, J. Comput. Chem. 16, 1339 共1995兲. J. Mol. Biol. 205, 189 共1989兲.
47 60
T. Bereau and R. H. Swendsen, “Optimized convergence for multiple S. Cho and D. W. Hoffman, Biochemistry 41, 5730 共2002兲.
histogram analysis,” J. Comput. Phys. DOI:10.106/j.jcp.2009.05.011 共to 61
M. T. Pastor, M. L. de la Paz, E. Lacroix, L. Serrano, and E. Perez-Paya,
be published兲. Proc. Natl. Acad. Sci. U.S.A. 99, 614 共2002兲.
48 62
G. N. Ramachandran, C. Ramakrishnan, and V. Sasisekharan, J. Mol. S. H. Gellman, Curr. Opin. Chem. Biol. 2, 717 共1998兲.
Biol. 7, 95 共1963兲. 63
P. Ferrara, J. Apostolakis, and A. Caflisch, J. Phys. Chem. B 104, 5000
49
B. K. Ho, A. Thomas, and R. Brasseur, Protein Sci. 12, 2508 共2003兲. 共2000兲.
50 64
S. T. R. Walsh, H. Cheng, J. W. Bryson, H. Roder, and W. F. DeGrado, I. F. Thorpe, J. Zhou, and G. A. Voth, J. Phys. Chem. B 112, 13079
Proc. Natl. Acad. Sci. U.S.A. 96, 5486 共1999兲. 共2008兲.
51 65
W. Humphrey, A. Dalke, and K. Schulten, J. Mol. Graphics 14, 33 J. Gsponer, U. Haberthur, and A. Caflisch, Proc. Natl. Acad. Sci. U.S.A.
共1996兲. 100, 5154 共2003兲.
52 66
D. Frishman and P. Argos, Proteins: Struct., Funct., Genet. 23, 566 G. Bellesia and J.-E. Shea, J. Chem. Phys. 126, 245104 共2007兲.
共1995兲. 67
S. Peng, F. Ding, B. Urbanc, S. V. Buldyrev, L. Cruz, H. E. Stanley, and
53
N. D. Socci and J. N. Onuchic, J. Chem. Phys. 101, 1519 共1994兲. N. V. Dokholyan, Phys. Rev. E 69, 041908 共2004兲.
54
F. W. DeGrado 共personal communication兲. 68
N. L. Fawzi, Y. Okabe, E. H. Yap, and T. Head-Gordon, J. Mol. Biol.
55
H. J. Limbach, A. Arnold, B. A. Mann, and C. Holm, Comput. Phys. 365, 535 共2007兲.
Downloaded 07 Oct 2012 to 152.14.136.96. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/about/rights_and_permissions