You are on page 1of 9

Chemical Physics Letters 446 (2007) 182–190

www.elsevier.com/locate/cplett

On-the-fly string method for minimum free energy paths calculation


1
Luca Maragliano *, Eric Vanden-Eijnden
Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, United States

Received 21 May 2007; in final form 3 August 2007


Available online 10 August 2007

Abstract

An improved and simplified version of the string method in collective variables for computing minimum free energy paths is proposed.
The string is discretized into a finite number of images and in the new method these images are evolved concurrently with replicas of the
original system while keeping their inter-distances equal via reparametrization. There is no need to compute the mean force by time-aver-
aging, nor to project the force perpendicularly to the string. In this Letter, the algorithmic aspects of the on-the-fly string method are
presented in detail and the method is tested on the solvated alanine dipeptide molecule.
 2007 Elsevier B.V. All rights reserved.

1. Introduction reparametrize it [5,6]. As shown in Ref. [1] (see also Section


2), this two-step procedure guarantees that the string con-
In the study of reactive events by molecular dynamics verges to a MFEP. The method proposed in Ref. [1] has the
(MD) the minimum free energy path (MFEP) plays an advantage over other free energy mapping approaches
important role. Given a set of collective variables to [7–11] that it can be used to compute the MFEP even when
describe a reaction, the MFEP is the reaction path of max- the number of collective variables is very large, as was dem-
imum likelihood in these variables. This was shown in Ref. onstrated in Ref. [12]. On the other hand, the two-step pro-
[1] using the framework of transition path theory (TPT) [2– cedure requires to re-initialize the MD system after every
4]. An algorithm was also proposed in Ref. [1] to compute update of the string and to adjust several parameters such
the MFEP. The main purpose of the present Letter is to as the length of the time-averaging window. This adds
give an improved and simplified version of this algorithm. some extra complexity to the method.
The algorithm in Ref. [1] iterates upon the following In this Letter, we propose a new algorithm for the com-
two-step procedure: (i) use short restrained or constrained putation of the MFEP which is simpler in two key aspects.
MD simulations to compute the gradient of the free energy First, we use the simplified and improved string method
(mean force) $F and a metric tensor M locally along a dis- proposed in Ref. [6], which does not require any perpendic-
cretized curve (or string) via time-averaging while keeping ular projection (this projection is made implicitly by
the string fixed, then (ii) utilize this data to update the posi- reparametrization). This leads to an algorithm which is
tion of the string using the string method, i.e. move every not only simpler, but also more stable and more efficient.
point (or image) along the string in the direction of the Second, we evolve the images along the string concurrently
component of M$F perpendicular to the string, and then with a collection of replicas of the original MD system.
Each replica is attached to a particular image and provides
on-the-fly the data needed to evolve this image. This is
*
achieved by enforcing that the string evolves slower than
Corresponding author. Fax: +1 212 995 4121.
E-mail addresses: maraglia@cims.nyu.edu (L. Maragliano), eve2@
the replicas and relying on averaging theorems for systems
cims.nyu.edu (E. Vanden-Eijnden). with multiple time scales [13–15], which have already been
1
Fax: +1 212 995 4121. exploited in the context of free energy calculations in [16].

0009-2614/$ - see front matter  2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.cplett.2007.08.017
L. Maragliano, E. Vanden-Eijnden / Chemical Physics Letters 446 (2007) 182–190 183

The procedure allows to bypass the computation of the Given two minima in F(z) located at, say, za and zb, the
mean force via time-averaging. More importantly, it elimi- MFEP was defined in [1] (see also [4]) as the curve f
nates the problem of re-initializing the MD systems after connecting za and zb such that the tangent vector along f
every update of the string. is parallel to the vector M(z)$F(z) at every point z along
Besides its simplicity, the new on-the-fly string method f; here $F(z) is the gradient of the free energy (1) (i.e. the
also has several other advantages worth mentioning up- mean force), and M(z) is a z-dependent tensor whose com-
front: ponents are given by
Z X N
1 oha ðxÞ ohb ðxÞ
 Each image along the string is coupled to one or two M ab ðzÞ ¼ Z 1 eF ðzÞ=kB T
replicas of the MD system but no direct communication RN k¼1 mk oxk oxk
between the different replicas is needed, which makes the Ym

method trivially parallelizable.  eV ðxÞ=kB T dðhm ðxÞ  zm Þdx


m¼1
 The two end-points of the string evolve freely towards * +
minima of the free energy, which facilitates the initializa- X
N
1 oha ðxÞ ohb ðxÞ
 ; a; b ¼ 1; . . . ; m;
tion of the string (the location of these minima do not k¼1
mk oxk oxk
hðxÞ¼z
need to be known exactly beforehand).
 A single configuration of the original MD system can be ð2Þ
used as initial condition for all the replicas at the very where mk is the mass of xk. Equivalently, the MFEP f sat-
beginning of the simulation, which also facilitates isfies the following equation (using compact notations for
initialization. the product of the tensor M and the vector $F)
 As a byproduct, the method permits to compute the free
?
energy along the MFEP or, in fact, at every point visited ðMðfÞrF ðfÞÞ ¼ 0; ð3Þ
by the string during its evolution towards the MFEP. ^
where (Æ) denotes projection perpendicular to f.
We believe that the above features of the on-the- The tensor M(z) in (3) arises due to the fact that the col-
fly string method make it a powerful method for comput- lective variables h(x) are, in general, curvilinear coordinates
ing the MFEP. We note, however, that several techniques [1]. The MFEP is reminiscent of the intrinsic reaction coor-
closely related to the one originally proposed in Ref. dinate (IRC) in internal coordinates defined by Fukui [20]
[1] have since been introduced to compute the MFEP (see also [21]), except that the MFEP usually involves pro-
[17–19]. jection onto a smaller set of collective variables (i.e. m < N
The remainder of this Letter is organized as follows. In in general), whereas the IRC is defined in a space of inter-
Section 2, we recall the definition of MFEP and how to nal coordinates whose dimensionality is the same as that of
move a string to identify it (Section 2.1); we introduce a the original Cartesian configurational space (except maybe
new set of dynamical equations to evolve concurrently for removing degrees of freedom redundant due to symme-
the string and a collection of replicas of the MD system tries by translation, rotation, etc.). As mentioned in the
(Section 2.2); and finally we introduce the new on-the-fly introduction, the MFEP naturally arises within transition
string method and discuss its properties (Section 2.3). path theory [2–4] as the reaction path of maximum likeli-
Then, in Section 3, we apply the on-the-fly string method hood when the reaction is described in the collective vari-
to the study of the conformational transition of alanine able space [1].
dipeptide (AD) in water. The basic idea behind the string method is to evolve a
parametrized curve in the collective variable space until
convergence to a solution of (3) [5,6,1]. Specifically, let
2. String method for computing minimum free energy paths f(t) be a curve parametrized as f(t) = {z(s, t) : s 2 [0, 1]};
then a simple and efficient way to evolve this curve in free
2.1. MFEP and curve dynamics energy space so that it converge to a solution of (3) is to use
z_ ðs; tÞ ¼ Mðzðs; tÞÞrF ðzðs; tÞÞ þ kðs; tÞz0 ðs; tÞ: ð4Þ
Consider a system whose configurational state is speci-
fied by x 2 RN , and define a set of collective variables Here z_ ¼ oz=ot and z 0 = oz/os. The term k(s, t)z 0 (s, t), being
h(x) = (h1(x), . . . , hm(x)). Given z = (z1, . . . , zm), the free tangent to the curve f(t) = {z(s, t) : s 2 [0, 1]}, does not
energy F(z) of the system is defined as (assuming that no influence the evolution of this curve, but it affects its
molecular constraints are present) parametrization; this term is added to enforce the particu-
Z Y
m lar parametrization chosen for the curve (for example by
F ðzÞ ¼ k B T ln Z 1 eV ðxÞ=kB T dðha ðxÞ  za Þdx; ð1Þ normalized arclength, in which case we impose that
RN a¼1
R jz 0 (s, t)j = cst(t) and this fixes k(s, t)). At steady state, the
V ðxÞ=k B T
where Z ¼ RN e dx, V(x) is the potential energy of right hand-side of (4) is equal to 0, which, after decompos-
the system, kB is Boltzmann constant, and T is the ing it into its components perpendicular and parallel to the
temperature. string, can be written as
184 L. Maragliano, E. Vanden-Eijnden / Chemical Physics Letters 446 (2007) 182–190

?
0 ¼ ðMðzðsÞÞrF ðzðsÞÞÞ ; are concerned in (6), the only difference with respect to
0 k standard MD is that these replicas each feel, instead of
kðsÞz ðsÞ ¼ ðMðzðsÞÞrF ðzðsÞÞÞ : ð5Þ
the bare atomic force fi(x), the effective force
The first equation is identical to (3), showing that the stea- Xm
dy state solution of (4) is indeed a MFEP. The second ohb ðxÞ
fieff ðx; zðs; tÞÞ ¼ fi ðxÞ  j ðhb ðxÞ  zb ðs; tÞÞ : ð8Þ
equation in (5) is irrelevant for the position of the curve b¼1
oxi
f = {z(s) : s 2 [0, 1]} (it simply allows to control its param-
The key result of this paper is that, for proper choices of
etrization by fixing k(s)).
c  1 and j  1, the equation for z(s, t) in (6) approxi-
mates (4). This follows from standard averaging theorems
2.2. Concurrent evolution of the string and a collection of
for systems with multiple time scales [13–15], which have
MD replicas
already been exploited in free energy calculations in Ref.
[16], and can be summarized as follows [23].
In Ref. [1], an algorithm based on the string method [5]
By taking c sufficiently large, we artificially slow down
was proposed to solve (4). This algorithm requires to com-
the dynamics of z(s, t) compared to that of x(s, t) and
pute the quantities $F(z) and M(z) along the string at every
y(s, t), and guarantee that the x(s, t) and y(s, t) variables
update of the string. Both $F(z) and M(z) are conditional
have time to equilibrate before the collective variables
expectations on h(x) = z (see (2) for M(z); a similar expres-
z(s, t) move significantly. Assuming that a proper thermo-
sion can be written for $F(z) [1]). As a result, these quan-
stat is used in (6), like e.g. Langevin, isokinetic, Nosé-Hoo-
tities can be computed by performing constrained or
ver, etc., the variables x(s, t) and y(s, t) thermalize on the
restrained simulations of the original MD system at the
following canonical probability density function:
proper h(x) = z values, by using for example the blue moon
sampling method [22], or some other biased sampling tech- .j ðx; y; zðs; tÞÞ ¼ qj ðx; zðs; tÞÞqj ðy; zðs; tÞÞ: ð9Þ
nique, as proposed in [1]. However, after each update of the Here
string, the configuration of the MD system may be far from
satisfying the new constraints (since z has changed but h(x) qj ðx; zðs; tÞÞ ¼ Z 1
j expðU j ðx; zðs; tÞÞ=k B T Þ; ð10Þ
has not), which may lead to re-initialization problems. It R
where Z j ¼ RN expðU j ðx; zðs; tÞÞ=k B T Þdx and Uj(x; z(s, t))
also requires to determine the length of the time-averaging is the extended potential
window used to compute $F(z) and M(z) at every step,
which is an extra parameter to adjust in the method. 1 Xm
U j ðx; zðs; tÞÞ ¼ V ðxÞ þ j ðhb ðxÞ  zb ðs; tÞÞ2 ; ð11Þ
In order to simplify things, we propose to use, instead of 2 b¼1
(4), the following dynamics in which the string z(s, t) is
evolved concurrently with two independent one-parameter in which z(s, t) enters as a mere parameter (the effective
families of replicas of the MD system, {x(s, t) : s 2 [0, 1]} force (8) is minus the gradient of this potential with respect
and {y(s, t) : s 2 [0, 1]}: to x). Correspondingly, when c is large enough, the equa-
8 tion for z(s, t) in (6) can be replaced by an effective equation
> Pm
where the right hand-side has been averaged with respect to
>
> c_za ðs; tÞ ¼ e ab ðxðs; tÞÞjðhb ðyðs; tÞÞ  zb ðs; tÞÞ
M
>
> the probability density (9). A direct calculation shows that
>
> b¼1
>
> Z
>
> þkðs; tÞz0a ðs; tÞ; Xm
>
> e ab ðxÞjðhb ðyÞ  zb ðs; tÞÞ.j ðx; y; zðs; tÞÞdx dy
> M
< m €x ðs; tÞ ¼ f ðxðs; tÞÞ  j P ðh ðxðs; tÞÞ  z ðs; tÞÞ ohb ðxðs; tÞÞ
> m
i i i b b RN RN b¼1
b¼1 oxi m Z Z
>
> X
>
> þthermostat; ¼ ~ ab ðxÞqj ðx; zðs; tÞÞdx
M jðhb ðyÞ  zb ðs; tÞÞ
>
>
>
> Pm ohb ðyðs; tÞÞ RN RN
>
>
b¼1
> mi €y i ðs; tÞ ¼ fi ðyðs; tÞÞ  j ðhb ðyðs; tÞÞ  zb ðs; tÞÞ
>  X
m
>
> b¼1 oy i oF j ðzðs; tÞÞ
>
:  qj ðy; zðs; tÞÞdy  M jab ðzðs; tÞÞ ;
þthermostat: b¼1
ozb
ð6Þ
ð12Þ
Here i = 1, . . . , N, a = 1, . . . , m (for clarity we wrote the where we defined
equations componentwise), j > 0 and c > 0 are parameters Z
whose role is explained next, f(x) = $V(x) is the atomic F j ðzÞ ¼ k B T ln Z 1 eU k ðx;zÞ=kB T dx; ð13Þ
RN
force, and we defined
and
XN
1 oha ðxÞ ohb ðxÞ Z
e ab ðxÞ ¼
M : ð7Þ
mk oxk oxk M jab ðzÞ ¼ e ab ðxÞqj ðx; zÞdx:
M ð14Þ
k¼1
RN

The reason for introducing two independent sets of MD Note that (12) relies on the statistical independence of the x
replicas in (6) will become clear shortly (see equations (9) and y variables and explains why we introduced two inde-
and (12) below). Notice that, as far as the MD replicas pendent MD replicas of the system in (6). (12) implies that,
L. Maragliano, E. Vanden-Eijnden / Chemical Physics Letters 446 (2007) 182–190 185

when c is large, the dynamics of the collective variables in details of the MD step are given explicitly in (22)
(6) can be approximated by the following effective equation: and (23) below, using Langevin and isokinetic,
Xm respectively, as thermostats.
oF j ðzðs;tÞÞ
c_za ðs;tÞ ¼  M jab ðzðs;tÞÞ þ kðs; tÞz0a ðs;tÞ: ð15Þ (2) Reparametrization of the string. In order to enforce
b¼1
ozb the equal-arclength parametrization on the evolved
Finally, to see that (15) reduces to (4) when j is large, images we interpolate a piecewise linear curve across
notice that the functions Fj(z) and M jab ðzÞ are smoothed the images zp;H a , p = 1, . . . , R, measure the total length

versions of the functions F(z) and M(z) defined in equa- L of this curve, and re-distribute new images along
tions (1) and (2). In particular, when j is large, we have the piecewise linear curve equidistantly, i.e. every
L/(R  1) (see the illustration in Fig. 1). In formula:
X m
oF j ðzðs; tÞÞ X m
oF ðzðs; tÞÞ denote by L(p) the length up to image p of the piece-
M jab ðzðs; tÞÞ  M ab ðzðs; tÞÞ :
b¼1
oz b b¼1
ozb wise linear curve interpolated across the images zp;H a ,
p = 1, . . . , R, i.e. L(1) = 0 and
ð16Þ
X
p
The calculation leading to (16) is performed explicitly LðpÞ ¼ jzq;H  zq1;H j; p ¼ 2; . . . ; R: ð18Þ
in Ref. [16], and we refer the reader to this paper for details. q¼2
Summarizing: Eq. (6) can be used with large but finite c
Then L(R) is the total length of the piecewise linear
and j to compute an approximation to the MFEP. How to
curve and ‘(p) = L(R)(p  1)/(R  1), p = 2, . . . , R is
choose these parameters in a practical application will be
the distance along this curve from the end point
illustrated in Section 3. Let us simply note that taking c
z1,w up to the point at which we need to put the
and j too small impedes the accuracy of the calculation
new pth image in order to make these images equidis-
since then (6) may not approximate (4) (for explicit expres-
tant along the curve. This is done by setting
sions of the errors this introduces, see [16,1]); but taking
z1(tn+1) = z1,w and zR(tn+1) = zR,w (by construction
them too big slows down the calculation since the larger
the end points are unaffected by the reparametriza-
j, the stiffer the MD variables, and the larger c, the slower
tion step), and for p = 2, . . . , R  1, taking
the collective variables. In other words, j and c should be
taken as small as the desired accuracy allows. zp ðtnþ1 Þ ¼ zqðpÞ1;H
zqðpÞ;H  zqðpÞ1;H
2.3. On-the-fly string method þ ð‘ðpÞ  LðqðpÞ  1ÞÞ ;
jzqðpÞ;H  zqðpÞ1;H j
ð19Þ
We now describe the algorithm of the on-the-fly string
method which is based on (6). First, we discretize the curve where q(p) = 2, . . . , R is such that L(q(p)  1) < ‘(p) 6
z(s, t) using R images, and denote zp(t) = z(pDs, t), with L(q(p)).
Ds = 1/(R  1) and p = 1, . . . , R. Correspondingly, we
introduce 2R independent replicas of the MD system,
xp(t) = x(pDs, t) and yp(t) = y(pDs, t), p = 1, . . . , R. Then
we iterate upon the following two steps, whose details are
given here after: (i) concerted evolution of zp(t), xp(t) and
yp(t) consistent with (6), and (ii) reparametrization of the
zp’s to ensure that they remain equidistant (which, at con-
tinuous level, corresponds to using normalized arc-length
to parametrize z(s, t)).

(1) Concerted evolution of the string and the replicas of the


original system. For each image and associated rep-
lica at the nth iteration, we use
8
> Pm
> p;H p
> za ¼ za ðtn Þ  c
1 e ab ðxp ðtn ÞÞjðzpb ðtn Þ  hb ðy p ðtn ÞÞÞDt;
M
>
>
>
> b¼1
>
< xp ðtnþ1 Þ ¼ standard MD update using the effective force
i
>
> ð8Þ evaluated at zðs; tÞ ¼ zp ðtn Þ;
>
>
>
> p
y i ðtnþ1 Þ ¼ standard MD update using the effective force Fig. 1. Schematic illustration of the reparametrization step. The images
>
>
: zp;H
a , p = 1, . . . , R (here with R = 5) are shown as empty diamonds, and
ð8Þ evaluated at zðs; tÞ ¼ zp ðtn Þ;
they are not equidistant; the polygonal curve is the curve obtained by
ð17Þ linear interpolation between these images; the full circles are the new
zpa ðtnþ1 Þ, p = 1, . . . , R, obtained by re-distributing the images equidistantly
where tn = nDt with n 2 N and Dt is some appropri- along this curve using (19) (the end points are not affected by this
ate time-step. For the reader’s convenience, the operation, so the diamonds and the circles superimpose there).
186 L. Maragliano, E. Vanden-Eijnden / Chemical Physics Letters 446 (2007) 182–190

This algorithm makes the images zp(t) converge toward Equation (20) needs the estimation of the mean force oF(z)/
the corresponding R equidistant discretization points along oz. This can be done at the end of the calculation when the
the MFEP, except for one caveat: the finiteness of c intro- z variables have converged, i.e. when z(s, t)  z(s), via time
duces statistical fluctuations in the dynamics of zp(t), which averages along the trajectories of the MD replicas: for T
means that these variables do not actually reach a fixed and j large enough (see [1])
point but keep oscillating around it. We will discuss in Sec- Z T
tion 3 a simple criterion to determine when the algorithm 1 oF ðzðsÞÞ
jðza ðsÞ  ha ðxðs; tÞÞÞdt  : ð21Þ
has converged, i.e. when the remaining motion of zp(t) is T 0 oza
due solely to statistical fluctuations. These spurious fluctu-
ations can be minimized either by increasing the value of c, These equation will be used in Section 3 to compute the
or by time-averaging the position of zp(t) at the end of the free energy along the MFEP.
calculation when their remaining motion is due solely to Finally, we want to stress out that, with respect to the
statistical fluctuations. The latter option is actually selection of the collective variables, the method here
cheaper, and is the one we will use in Section 3. described is the same as the original formulation in [1], in
A few additional comments are in order. that it does not tell a priori how to pick these variables.
The on-the-fly string method is simple. The 2R replicas of This shortcoming is, however, somewhat alleviated by the
the MD system are updated independently in a step which fact that the string method, unlike other free energy map-
involves standard MD except for the presence of the ping techniques [7–11], allows one to use a very large num-
restraining force. This means that this calculation can be ber of collective variables, making their selection less
distributed over 2R different computing nodes and per- restrictive. For example, 129 000 collective variables were
formed in parallel. The only step in the method which used in [12]. In addition, the collective variables can be
requires communications between the nodes is the repara- tested a posteriori with a committor test. Indeed, having
metrization step, described in equations (19), and it only picked an appropriate set of collective variables and identi-
involves the collective variables z. This step can be per- fied the MFEP in these variables, in principle the isocom-
formed by a straightforward message passing procedure mittor surfaces are the lift up in state space of the
where the set of z’s is gathered to a node, processed, and hyperplanes whose normal is parallel to $F along the
then scattered back to the rest of nodes. Since on some MFEP. (How to identify such hypersurfaces was explained
clusters communication between nodes can be an issue, in [1], and can also be found in [4]). Hence, given a con-
we mention that the reparametrization step can also be verged string in a set of collective variables, we can identify
performed at given time-intervals, rather than at every these hypersurfaces and test them as isocommittor ones, in
time-step, provided that during such intervals the images the same way as done in [1,12]. If the chosen set of vari-
have not drifted much in order to not compromise the res- ables is not appropriate to describe the reaction, the com-
olution of the string. In general, we expect the reparamet- mittor test will fail. In this case, it will be necessary to
rization/communication step to be of negligible time cost include more collective variables to describe the reaction.
with respect to the calculation of atomic forces in the If, on the contrary, the collective variables are redundant,
MD code. the committor test will succeed, but the normal vector to
The initialization of the procedure is rather simple as well. the hyperplanes will have non-zero components only in a
The end points of the string evolve freely towards minima subset of the collective variables. The variables in which
of the free energy, which means that the locations of these such components are nearly zero are mostly irrelevant for
minima do not have to be known exactly beforehand. In the reaction, and can then be removed from the set of col-
addition, we may use as a starting condition for all the lective variables.
MD replicas a single configuration, and start the simula- We also note that, even if the right collective variables
tions with a smaller j such that each MD replica is are used, there might be more than one reaction channel.
smoothly dragged to the proper configuration consistent In this case, in order to identify the different (local) MFEPs
with the constraints h(x) = z and h(y) = z. Then, the j associated with them, it will be necessary to vary the initial
can be progressively increased up to the desired value. This condition for the string. The relative probability of the dif-
procedure will be illustrated in Section 3. ferent channels can be deduced from the free energy barrier
The free energy along the string can be easily computed. along the associated MFEPs, the MFEP with the lowest
Even if the number of collective variables is very large, free energy barrier being the most probable one. The case
the string method can be used to compute the MFEP in which there are also intermediate metastable states along
and we can calculate the free energy along it by thermody- the channels is not a problem, since the string is parame-
namic integration, as was done in Ref. [1]: trized by arclength, not time, and hence it can go through
all the intermediate states without getting trapped or even
Z Z sX
m
s
dF ðzðs0 ÞÞ 0 oF ðzðs0 ÞÞ 0 slowing down.
F ðzðsÞÞ  F ðzð0ÞÞ ¼ ds ¼ z0a ðs0 Þ ds : Details of the MD update in (17). If a Langevin thermo-
0 ds0 0 a¼1
oza
stat with friction coefficient c is used in (6), the MD update
ð20Þ step in (17) reads explicitly:
L. Maragliano, E. Vanden-Eijnden / Chemical Physics Letters 446 (2007) 182–190 187
8 p  
>
> vi ðtnþ12 Þ ¼ 1  12 Dtc  18 Dt2 c2 vpi ðtn Þ Letter. However, let us note that collective variables for
>
>  
>
> the description of the solvent can and have been used in
>
> þ 12 Dt  18 Dt2 c m1 eff p p
i fi ðx ðt n Þ; z ðt n ÞÞ
>
> pffiffiffiffiffi   [12] in the context of the string method. Hence, taking into
>
>
>
> þ 12 Dtri npi ðtn Þ  14 Dt3=2 cri 12 npi ðtn Þ þ p1ffiffi3 gpi ðtn Þ ; consideration the solvent would represent no principle dif-
>
< ficulty for the on-the-fly string method.
xpi ðtnþ1 Þ ¼ xpi ðtn Þ þ Dtvpi ðtnþ12 Þ þ Dt3=2 ri 2p1 ffiffi3 gpi ðtn Þ; A first calculation of the MFEP using the algorithm
>
> vp ðt Þ ¼ 1  1 Dtc  1 Dt2 c2 vp ðt 1 Þ
>
> based on (6) was performed starting from a string of eight
>
> i nþ1 2 8 i nþ
>
> 1  1 eff p2 images on a straight line between (/1, w1) = (40, 130)
>
> 1 2
þ 2 Dt  8 Dt c mi fi ðx ðtnþ1 Þ; zp ðtn ÞÞ
>
> and (/8, w8) = (40, 45). All the replicas of the MD sys-
>
> pffiffiffiffiffi  
>
: þ 12 Dtri npi ðtn Þ  14 Dt3=2 cri 12 npi ðtn Þ þ p1ffiffi3 gpi ðtn Þ ; tem were initiated from a single configuration. A small
value of the coupling parameter, j = 40 kcal/mol/rad2,
ð22Þ was used for the very first steps of the simulation in order
and similarly for y pi ðtnþ1 Þ. Here we have introduced to allow the various MD replicas to adjust to their respec-
the velocity vpi ðtÞ of the pth MD replica xpi ðtÞ, ri ¼ tive constraints. After 10 ps of simulation, j was linearly
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p p increased up to j = 1000 kcal/mol/rad2 in 100 ps. Through-
2k B T cm1 i and n (tn) and g (tn) are independent Gaussian
out all the simulations, we also used c = 500 kcal · s/mol/
variables with mean zero and covariance hnpi ðtn Þnpj ðtn Þi ¼
rad. Recall that the value of this parameter influences the
hgpi ðtn Þgpj ðtn Þi ¼ dij . (22) is the second-order integrator
fluctuations of the images during their evolution. Smaller
introduced in Ref. [24] and it can be easily modified if
values of c make the motion of the images more noisy,
molecular constraints are present by applying an extra step
but make them converge faster to the MFEP. Note that
of SHAKE [25].
the parameters j and c were not optimized for efficiency.
If the isokinetic scheme is used in (6), the MD update
In Fig. 2 we report four successive snapshots of the
step in (17) reads explicitly
string during its evolution (black circles) from its initial
8 p
>
> x ðt 1 Þ ¼ xpi ðtn Þ þ Dtvpi ðtn Þ; condition to the MFEP. As already mentioned, once con-
> i nþ2
>   verged the string essentially oscillates due to statistical
>
< vp;H ¼ vp ðt Þ þ 1 Dtm1 f eff xp ðt 1 Þ; zp ðt Þ ;
i i n 2 i i nþ 2
n
ð23Þ errors introduced by the finiteness of c. In order to repre-
> pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p;H p;H sent a converged string, we compute the average position
>
> p
vi ðtnþ1 Þ ¼ ðN  1Þk B T vi =jv j;
>
> of the images at the latest stage of the calculation. The
: p
xi ðtnþ1 Þ ¼ xpi ðtn Þ þ Dtvpi ðtnþ1 Þ; average positions of the images are plotted as black dia-
P 1=2 monds in Fig. 2. We also report the extent of the statistical
N p;H 2
and similarly for y pi ðtnþ1 Þ. Here jvp;H j ¼ ðv
i¼1 i Þ .
fluctuations by plotting the histograms of the values of the
(23) is a simplified version of the scheme proposed in
Ref. [26] (which could be used as well).
Other thermostats, such as Nosé-Hoover can be used as
well but, for economy, we will refrain writing the corre-
150
sponding schemes explicitly.

100
3. Example: the solvated alanine dipeptide

As an application of the on-the-fly string method we 50


ψ

chose the transition between the b and aR equilibrium con-


formers of solvated alanine dipeptide at 300 K. This is a
0
widely studied example [27–30], and it gives us the possibil-
ity to check the accuracy of our results by superimposing
the MFEP identified by the string method on the free −50
energy landscape calculated with the temperature acceler-
ated method [16].
−100
We implemented the on-the-fly string method in the MD −150 −100 −50 0
code MOIL [31], by parallelizing it in such a way that each φ
image of the string is run on two different nodes (one per Fig. 2. MFEP calculation for alanine dipeptide in water at 300 K. The
MD replica), and one node is used for the reparametriza- black circles are the initial condition and snapshots during the evolution of
tion step (which we performed at every time-step). the string (which occurs from right to left). The black diamonds show the
We chose as collective variables the backbone dihedral average string computed at convergence (which is achieved in less than
200 ps). The fluctuations of the images due to the finiteness of c are also
angles, (z1, z2) = (/, w). It is known that these angles are
reported as a density histogram. The results are overimposed on the free
not the optimal variables to describe the transition, because energy map in the variables / and w at the same temperature obtained
solvent molecules are actively involved in it (see [28,32]), with the method described in [16]. The contour levels of the free energy
but such consideration goes beyond the purpose of this map are drawn every 0.55 kcal/mol.
188 L. Maragliano, E. Vanden-Eijnden / Chemical Physics Letters 446 (2007) 182–190

string’s images at convergence. The data are overimposed


on the free energy map of the system with respect to the 150

/ and w angles which was calculated via the temperature


accelerated method described in Ref. [16] with the same 100
value of j.
The evolution of the string was monitored by computing 50
the following distance from the initial condition,
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

ψ
u R 0
1u X
dðtÞ ¼ t jzp ðtÞ  zp ð0Þj2 ; ð24Þ
R p¼1 −50

where zp(0) is the initial value of the p image and R is the −100
number of images. A plateau in the profile of d(t) is an indi-
cation that the calculation has converged. −150
The values of d(t) along a 1.1 ns simulation are reported
−150 −100 −50 0
in Fig. 3. As it can be seen, convergence is obtained in less
φ
than 200 ps, after which the string essentially oscillates
according to statistical fluctuations coming from the finite- Fig. 4. MFEP calculation using a periodic string. The diamonds are the
ness of c. The simulation was extended up to 1.1 ns to ver- average images at convergence, and their fluctuations are reported as
ify stability, but all the data necessary to compute the mean density histogram. The results are overimposed on the free energy map in
the variables / and w at the same temperature. The contour levels of the
location of the zp(t) (and hence the MFEP) can be collected free energy map are drawn every 0.55 kcal/mol.
in less than 200 ps.
In Fig. 4 we report the result of a MFEP calculation
using a periodic string of 20 images, i.e. a string which loops
were used. As can be seen, the agreement with the previous
on itself (no end points) using the periodicity of the collec-
result is excellent.
tive variables / and w. As before the black diamonds are the
In Fig. 5 we show the free energy along the periodic
average values of the string images at convergence, and
string computed using (20) and (21) with T = 400 ps.
their fluctuations are represented via a density histogram.
Details of the MD simulation set up. The AMBER/OPLS
The data are overimposed on the free energy map of the sys-
[33] force field was used. A starting structure for the alanine
tem with respect to the / and w angles which was calculated
dipeptide molecule was inserted at the center of a box con-
via the temperature accelerated method described in Ref.
taining 522 water molecules, and of size (25 Å)3. Periodic
[16] with the same value of j. For this calculation, a differ-
boundary conditions were used. Van der Waals interac-
ent initial condition was also tested, a straight line between
tions were truncated at 9 Å. Electrostatic interactions were
(/1, w1) = (160, 162) and (/20, w20) = (160, 180).
treated with the Smooth Particle Mesh Ewald method [34]
The same values of c and j as in the previous simulation

30
1

25
Free Energy (kcal/mol)

0
20
distance (degs)

15 -1

10
-2

-3
2 4 6 8 10 12 14 16 18 20
0
0 0.2 0.4 0.6 0.8 1 Image index
time (ns)
Fig. 5. Free energy calculated along the MFEP shown in Fig. 4 using (20)
Fig. 3. Distance of the string from the initial condition during the and (21). The images index used here corresponds to following the string
simulation measured according to (24). As can be seen, convergence is from bottom to top in Fig. 4: image 8 is (/8, w8)  (102, 41) while
achieved in less than 200 ps – afterwards, the motion is solely due to image 19 is (/19, w19)  (115, 152), and these are the two minima of the
fluctuations due to the finiteness of c. free energy along the MFEP.
L. Maragliano, E. Vanden-Eijnden / Chemical Physics Letters 446 (2007) 182–190 189

with real space cutoff 9 Å, a grid of 323 points, and fourth- term kBT div M(z(s, t)) in (26) arises because, when s = 0,
order spline interpolation (in order to be in the high accu- we loose the statistical independence between M e ab ðxðs; tÞÞ
racy range, according to [34]). The TIP3 model [35] was and j(hb(x(s, t))  zb(s, t)). As a result, letting both c and
used for the water molecules. Non-bonded interaction lists j be large enough, the dynamics of the collective variables
were updated every 10 steps. All chemical bonds in the sys- in (25) with s = 0 can be approximated by the following
tem were kept fixed with SHAKE algorithm [25]. The effective equation (using compact notations again):
Velocity Verlet algorithm was used for the dynamics of
c_zðs; tÞ ¼ Mðzðs; tÞÞrF ðzðs; tÞÞ þ k B T div Mðzðs; tÞÞ
the original system, with time step 1 fs. In order to obtain
the initial configuration, the system was first equilibrated þ kðs; tÞz0 ðs; tÞ: ð27Þ
for 50 ps by keeping the solute molecule fixed (i.e. by zero-
This effective equation differs from (4) by the second term
ing forces and velocities of its atoms) and by assigning to
at the right hand-side, kBT div M(z(s, t)). The steady state
all water atoms at every step velocities sampled from a
solution of (27) is the curve f = {z(s) : s 2 [0, 1]} such that
Maxwell distribution at 300 K. Then, the full system was
(compare (5))
equilibrated for 500 ps with random assignment of veloci-
ties at every step. During the production runs, all velocities 0 ¼ ðMðzðsÞÞrF ðzðsÞÞ  k B T div MðzðsÞÞÞ? ;
were scaled at every step to keep the temperature at 300 K. ð28Þ
kðsÞz0 ðsÞ ¼ ðMðzðsÞÞrF ðzðsÞÞ  k B T div MðzðsÞÞÞk ;
4. Addendum: a modified approach where again, only the first equation matters as far as the
location of the curve is concerned. In other words, the stea-
Suppose that instead of (6), we introduce the following dy state solution of (27) is not exactly the MFEP, even
system in which there is only one one-parameter family though it should be close to it since the correction term is
of replicas of the MD system: only of order kBT. In practical applications, we suspect
8 that the difference between the MFEP and the solution of
> Pm
>
> c_
z ðs; tÞ ¼ e ab ðxðs; tÞÞjðhb ðxðs; t  sÞÞ
M the first equation in (28) will be small, so much so in fact
>
>
a
>
> b¼1
that it may be difficult to distinguish from other numerical
>
>
>
< zb ðs; tÞÞ þ kðs; tÞz0a ðs; tÞ; errors. The example in Section 3 supports this conclusion.
P
m ð25Þ We repeated the calculation using the simplified algorithm
>
> mi€xi ðs; tÞ ¼ fi ðxðs; tÞÞ  j ðhb ðxðs; tÞÞ
>
> based on (25) with s = 0. The results (not shown) were
>
>
b¼1
>
> essentially the same. This indicates that, in this example
>
> oh b ðxðs; tÞÞ
: zb ðs; tÞÞ þ thermostat: at least, the term kBT div M(z(s, t)) in (27) is negligible
oxi
(which may be either because kBT is small enough or be-
Here s is a lagging time which should be chosen large cause the tensor M(z) is quasi-constant in z).
enough so that x(s, t) and x(s, t  s) be uncorrelated, but It is straightforward to modify the algorithm (17) if (25)
small enough so that z(s, t)  z(s, t  s) – satisfying both is used instead of (6).
conditions simultaneously is possible when c is large since
z(s, t) is much slower than x(s, t) in this case. With such a Acknowledgments
s, and when c and j are large enough, the dynamics of
the collective variables in (25) is again given by (4), so The ideas presented here were discussed during a
(25) can also be used to compute the MFEP. This approach meeting organized at CECAM in February 2007 with
is cheaper in since it uses only one MD replica per image Giovanni Ciccotti, Mauro Ferrario, Maddalena Venturoli
along the string, but it requires to adjust one more para- and Rodolphe Vuilleumier. We thank them all for their
meter, s. useful suggestions, and we are grateful to Berend Smit
It is also worth pointing out that if we take s = 0 in (25) for making the meeting possible. We also want to warmly
and let c be large enough, the equation for z(s, t) in (25) can thank Ron Elber for useful discussions and help with
be replaced by an effective equation where the right hand- MOIL code, Giovanni Ciccotti for his careful reading of
side has been averaged as (compare (12)) the manuscript and Weinan E. for discussions about
Z X m concurrent coupling. L.M. is grateful to COST for partial
e ab ðxÞjðhb ðxÞ  zb ðs; tÞÞqj ðx; zðs; tÞÞdx dy
M support under its Action P13. This work was partially sup-
RN b¼1
ported by NSF Grants DMS02-09959 and DMS02-39625,
X
m
oF j ðzðs; tÞÞ Xm
oM jab ðzðs; tÞÞ and by ONR Grant N00014-04-1-0565.
¼ M jab ðzðs; tÞÞ þ kBT ;
b¼1
ozb b¼1
ozb
ð26Þ References
where qj(x; z(s, t)) is the probability density function de- [1] L. Maragliano, A. Fischer, E. Vanden-Eijnden, G. Ciccotti, J. Chem.
fined in (10) and Fj(z) and M jab ðzÞ are given by (13) and Phys. 125 (2006) 024106.
(14), respectively (the expression at the left hand-side for [2] W. E, W. Ren, E. Vanden-Eijnden, Chem. Phys. Lett. 413 (2005) 242.
the average in (26) follows by direct calculation). The extra [3] W. E, E. Vanden-Eijnden, J. Stat. Phys. 503 (2006) 123.
190 L. Maragliano, E. Vanden-Eijnden / Chemical Physics Letters 446 (2007) 182–190

[4] E. Vanden-Eijnden, in: M. Ferrario, G. Ciccotti, K. Binder (Eds.), [21] T. Lazaridis, D.J. Tobias, C.L. Brooks III, M.E. Paulaitis, J. Chem.
Computer Simulations in Condensed Matter: From Materials to Phys. 95 (1991) 7612.
Chemical Biology, Springer-Verlag, Berlin, 2006. [22] E.A. Carter, G. Ciccotti, J.T. Hynes, R. Kapral, Chem. Phys. Lett.
[5] W. E, W. Ren, E. Vanden-Eijnden, Phys. Rev. B 66 (2002) 52301. 156 (1989) 472.
[6] W. E, W. Ren, E. Vanden-Eijnden, J. Chem. Phys. 126 (2007) 164103. [23] E. Vanden-Eijnden, Commun. Math. Sci. 5 (2007) 495.
[7] C. Bartels, M. Karplus, J. Comput. Chem. 18 (1997) 1450. [24] E. Vanden-Eijnden, G. Ciccotti, Chem. Phys. Lett. 429 (2006) 310.
[8] E. Darve, A. Pohorille, J. Chem. Phys. 115 (2001) 9169. [25] J.P. Ryckaert, G. Ciccotti, H.J.C. Berendsen, J. Comput. Phys. 23
[9] L. Rosso, P. Mináry, Z. Zhu, M.E. Tuckerman, J. Chem. Phys. 116 (1977) 327.
(2002) 4389. [26] F. Zhang, J. Chem. Phys. 106 (1997) 6102.
[10] A. Laio, M. Parrinello, Proc. Nat. Acad. Sci. USA 99 (2002) 12562. [27] P.J. Rossky, M. Karplus, J. Am. Chem. Soc. 101 (1979) 1913.
[11] M. Iannuzzi, A. Laio, M. Parrinello, Phys. Rev. Lett. 90 (2003) [28] P.G. Bolhuis, C. Dellago, D. Chandler, Proc. Natl. Sci. USA 97
238302. (2000) 5877.
[12] T.F. Miller III, E. Vanden-Eijnden, D. Chandler, Proc. Natl. Acad. [29] W. Ren, E. Vanden-Eijnden, P. Maragakis, E. Weinan, J. Chem.
Sci. USA, in press. Phys. 123 (2005) 134109.
[13] R.Z. Khasminsky, Theory Prob. Applicat. 11 (1966) 211. [30] A.M.A. West, R. Elber, D. Shalloway, J. Chem. Phys. 126 (2007)
[14] T.G. Kurtz, J. Funct. Anal. 12 (1973) 55. 145104.
[15] G.C. Papanicolaou, Rocky Mountain J. Math. 6 (1976) 653. [31] R. Elber et al., Comput. Phys. Commun. 91 (1994) 159.
[16] L. Maragliano, E. Vanden-Eijnden, Chem. Phys. Lett. 426 (2006) 68. [32] T. McCormick, D. Chandler, J. Phys. Chem. B 107 (2003) 2796.
[17] D. Branduardi, F.L. Gervasio, M. Parrinello, J. Chem. Phys. 126 [33] W.L. Jorgensen, J. Tirado-Rives, J. Am. Chem. Soc. 110 (1988) 1666.
(2007) 054103. [34] U. Essmann, L. Perera, M.L. Berkowitz, T. Darden, L.G. Pedersen, J.
[18] A. van der Vaart, M. Karplus, J. Chem. Phys. 126 (2007) 164106. Chem. Phys. 103 (1995) 8577.
[19] I.V. Khavrutskii, C.L. Brooks III, J. Chem. Phys., submitted for [35] W.L. Jorgensen, J. Chandrasekhar, J.D. Madura, R.W. Impey, M.L.
publication. Klein, J. Chem. Phys. 79 (1983) 926.
[20] K. Fukui, Acc. Chem. Res. 14 (1981) 363.

You might also like