You are on page 1of 36

Chapter 11

The Adaptive Path Collective Variable: A Versatile Biasing


Approach to Compute the Average Transition Path
and Free Energy of Molecular Transitions
Alberto Pérez de Alba Ortı́z, Jocelyne Vreede, and Bernd Ensing

Abstract
In the past decade, great progress has been made in the development of enhanced sampling methods, aimed
at overcoming the time-scale limitations of molecular dynamics (MD) simulations. Many sampling schemes
rely on adding an external bias to favor the sampling of transitions and to estimate the underlying free
energy landscape. Nevertheless, sampling molecular processes described by many order parameters, or
collective variables (CVs), such as complex biomolecular transitions, remains often very challenging. The
computational cost has a prohibitive scaling with the dimensionality of the CV-space. Inspiration can be
taken from methods that focus on localizing transition pathways: the CV-space can be projected onto a
path-CV that connects two stable states, and a bias can be exerted onto a one-dimensional parameter that
captures the progress of the transition along the path-CV. In principle, such a sampling scheme can handle
an arbitrarily large number of CVs. A standard enhanced sampling technique combined with an adaptive
path-CV can then locate the mean transition pathway and obtain the free energy profile along the path. In
this chapter, we discuss the adaptive path-CV formalism and its numerical implementation. We apply the
path-CV with several enhanced sampling methods—steered MD, metadynamics, and umbrella sampling—
to a biologically relevant process: the Watson–Crick to Hoogsteen base-pairing transition in double-
stranded DNA. A practical guide is provided on how to recognize and circumvent possible pitfalls during
the calculation of a free energy landscape that contains multiple pathways. Examples are presented on how
to perform enhanced sampling simulations using PLUMED, a versatile plugin that can work with many
popular MD engines.

Key words Enhanced sampling, Metadynamics, Path sampling, Path collective variable, Free energy,
Molecular dynamics, PLUMED, DNA, Hoogsteen base-pairing

1 Introduction

Enhanced sampling methods have expanded the accessible time-


scale of molecular dynamics (MD) simulations, and, with that, our
understanding of complex biomolecular phenomena. Molecular
processes that are very slow or infrequent with respect to the
molecular vibrations and thermal motions, such as most chemical
reactions, conformational changes, and nucleation events—all

Massimiliano Bonomi and Carlo Camilloni (eds.), Biomolecular Simulations: Methods and Protocols, Methods in Molecular Biology,
vol. 2022, https://doi.org/10.1007/978-1-4939-9608-7_11, © Springer Science+Business Media, LLC, part of Springer Nature 2019

255
256 Alberto Pérez de Alba Ortı́z et al.

characterized by a transition over a free energy barrier—can now, in


principle, be tackled by a wide range of enhanced sampling techni-
ques. Popular approaches include: adding an external bias to the
system, e.g., using metadynamics [1–3], steered MD [4, 5],
umbrella sampling [6], the adaptive biasing force method [7], etc.
[8–14]; increasing the temperature, e.g., by parallel tempering
[15, 16], multi-canonical sampling [17], the temperature acceler-
ated method [18], etc. [19–21]; or finding transition pathways,
e.g., through transition path sampling (TPS) [22, 23], the string
method [24–27], nudged elastic band (NEB) [28], and so forth
[29–34]; and combinations of these [35, 36]. Many of these tech-
niques, and in particular those applying an external bias, allow for
estimating the Landau free energy of the process, from which
transition rates and equilibrium constants can be computed. The
key challenge in these schemes is choosing an adequate, and prefer-
ably small, set of collective variables (CVs), which are the key order
parameters that describe the transition. For relatively simple transi-
tions, a few well-chosen CVs can be used to steer the process of
interest without problems of hysteresis or degeneracy. However,
many interesting biomolecular transitions involve concerted dis-
placements of many groups of atoms, requiring large sets of CVs for
the description of the process. This gives rise to high-dimensional
free energy landscapes that are very cumbersome to sample and
converge. Sometimes, the problem can still be handled by sheer
computational power, or by combining in a smart manner several
CVs into fewer, more complex, ones. But generally, sampling high-
dimensional free energy landscapes poses a notoriously difficult
problem in computational studies of biomolecular processes.
A promising route to tackle the problem of sampling high-
dimensional CV-spaces is to map the CV-space onto a so-called
path-CV [26, 33, 34, 37–41], a parameterized curve in the
CV-space that describes the transition between the reactant and
the product states. By performing the sampling along this path—
for example, with the path-metadynamics (PMD) method
[37, 41]—the dimensionality problem is circumvented. The chal-
lenge now consists in optimizing the path-CV in the space spanned
by the original set of descriptive CVs, such that it “falls” into the
channel corresponding to the average transition path. Optimizing a
parameterized curve as a string of nodes to locate the minimum free
energy path (MFEP) can be done by computing and following the
gradients of the nodes in the perpendicular direction to the path
[26, 33]. However, a further speedup can be realized by optimizing
via the average sampled density of the CVs, which—under reason-
able assumptions—leads to the average transition flux density
[37, 41]. Irregularities on the free energy landscape, such as
ill-defined or multiple transition channels, can be managed by
restricting the sampling to the neighborhood of the path and by
tuning the path flexibility.
The Adaptive Path Collective Variable 257

In this chapter, we present a concise introduction to the theory


of the path-CV and its use in PMD. We also include the necessary
scripts to run a case study using the path-CV implementation in the
PLUMED software [42], a plugin that can be linked with several
popular MD engines to perform enhanced sampling simulations.
The chapter is organized as follows: In Subheading 2, we introduce
the basic theory behind the path-CV definition (Subheading 2.1),
its use in a biasing method (Subheading 2.2), the path optimization
algorithm (Subheading 2.3), and the implementation in PLUMED
(Subheading 2.4). In Subheading 3, we list the computational tools
used to run simulations. Subheading 4 consists of a step-by-step
guide on how to study a complex biomolecular process at the
hand of an illustrative transition in double-stranded DNA: the
Watson–Crick to Hoogsteen base-pairing transition. The set of
CVs for this process is defined in Subheading 4.2; the stable states
in this CV-space are discussed in Subheading 4.3. An initial steered
MD simulation along an evolving path-CV is performed in
Subheading 4.4, followed by an exploratory multiple-walker meta-
dynamics [43] run in Subheading 4.5. The Watson–Crick to
Hoogsteen transition involves multiple mechanisms, and we discuss
which measures are taken to converge on a specific path, using a
multiple-walker metadynamics simulation (Subheading 4.6). We
also perform umbrella sampling along the optimized path, and
reconstruct a free energy profile using the weighted histogram
analysis method (WHAM) [44] (Subheading 4.7). In Subheading
4.8, we show how to carry out an a posteriori path-CV optimization
from a pre-existing trajectory along a different pathway, from which
also the free energy profile can be computed. We end this chapter
with a compilation of “notes” (Subheading 5) on how to use the
path-CV to compute the average transition pathway and the free
energy of a molecular transition, including several “rules of thumb”
on how to choose parameters and how to monitor for, and avoid,
possible pitfalls.

2 Theory

2.1 Defining the Path Let us consider an L-particle system with positions qðtÞ∈3L and
Collective Variable velocities vðtÞ∈3L . The dynamics of the system is governed by a
potential U(q) and follows a canonical distribution at a temperature
T. Let us also assume that the system has an underlying free energy
surface (FES) with two stable states A and B. The FES can be fully
described by a set of N key descriptive degrees of freedom, the
collective variables (CVs) {zi(q)}, with i ¼ 1,. . .,N. We aim to find
the average transition path between A and B in the space of the
CVs, zi, and define the progress along it as a reaction coordinate.
Provided that the CVs are sufficiently good descriptors of
the system, the reaction coordinate is well-defined in terms of
258 Alberto Pérez de Alba Ortı́z et al.

transition path theory [45] and the committor distribution


[46, 47]. That is, along the path, we can determine the committor
probability pB(q) that a trajectory starting with random Maxwell–
Boltzmann distributed velocities arrives in state B before going
through state A. As the system moves near the A or B basins in
CV-space, pB(q) approaches, respectively, 0 or 1. In this CV-space it
is possible to define an isocommittor surface comprising all points
where pB(q) ¼ 0.5. Furthermore, the isocommittor surfaces span-
ning all committor values from 0 to 1 provide a continuous folia-
tion of CV-space from A to B. In these hyperplanes, we can define
the transition flux density ρ as the number of trajectories going
through the surface per unit area. This flux density peaks at the
transition channel in the FES between A and B, resulting in the
average transition path that we wish to localize.
To locate the average transition path, we make the following
assumptions [37]:
1. The average transition path can be represented in CV-space by
a path-CV: a curve sðσÞ :  ! N , where the parameter
σðzÞjs : N ! ½0,1 yields the progress along the path from
A to B, such that s(0) ∈ A and s(1) ∈ B. This quantity can
in principle be connected to the committor value.
2. In the vicinity of the path, the isocommitor planes Sσ are
perpendicular to s(σ).
3. In the vicinity of the path, the normalized transition flux den-
sity ρ can be represented by configurational probability
pðzÞ ¼ expðF ðzÞ=kB T Þ, where F is the free energy, and kB is
the Boltzmann constant.
Given the first and second assumptions, it is possible to project
any point z in CV-space onto its closest point on the path s(σ), and
derive the path progress parameter σ(z). Moreover, since we wish
the curve s(σ) to follow the transition channel of maximum flux
density, we can take the third assumption and approximate the
average transition path as:
Z
sðσÞ ¼ dS σ z0 pσ ðz 01 , . . . , z 0N Þ, with

Z
1
pðz 01 , . . . , z 0N Þ ¼ dqe βU ðqÞ δðz 1  z 01 Þ . . . δðz N  z 0N Þ,
Z
ð1Þ
where pσ ðz 01 , . . . , z 0N Þ
is the flux probability density at the isocom-
mitor surface perpendicular to s(σ), with the Dirac delta function δ
and the partition function Z.
The Adaptive Path Collective Variable 259

2.2 Finding In principle, Eq. 1 enables the calculation of the average transition
the Average path by sampling and making a histogram of z during an MD run,
Transition Path or a Monte Carlo simulation. However, the transition is a rare event
on the time-scales accessible to standard simulations. The flux
density away from the neighborhood of the A or the B basins will
be poorly sampled. This problem is typically overcome with
enhanced sampling methods—e.g., metadynamics—by biasing
directly the dynamics of the CVs, z. But in practice, such an
approach is limited to low-dimensional CV-spaces. The path-CV
aims to overcome this limitation.
Let us exert a time-dependent metadynamics bias,
X
ð ðσ 2W Þ,
2
 σ ðtÞÞ
g g
V bias ðσ g , tÞ ¼ H ðtÞexp 2 ð2Þ
t

onto the one-dimensional path progress parameter, σ(t), along a


guess transition path σ g ¼ σðzÞjsg , with Gaussian height H and
width W. This growing repulsive potential drives the system back
and forth along the path, away from already visited configurations.
If the path remains fixed, the metadynamics bias potential con-
verges, eventually, to (minus) the free energy along the guess path
Fg ¼ Vbias(σ g, t) [1].
To improve our guess path and use it to locate the average
transition path, we replace in Eq. 1 the ensemble average of transi-
tion points through the hyperplanes by a time average:
Z tZ
hziσ g ¼ lim zðt 0 ÞdS σ g dt 0 : ð3Þ
t!1
0 S σg

We can now optimize the path toward the average transition path
by iteratively relocating the guess path to the cumulative average
density sg ðσ g Þ ¼ hziσ g (Fig. 1). Simultaneously, the metadynamics
algorithm will adapt the bias potential after each path update as it
keeps adding layers of Gaussian potentials to the total bias, over-
writing the free energy at previous trial paths [48], and continu-
ously converging toward the free energy at the average transition
path F ¼ Vbias(σ, t).
Most of the extensive machinery developed in the last years for
metadynamics can be applied directly with the PMD method. For
example, the well-tempered method [2, 3] can aid in converging a
free energy profile by gradually reducing the height of the Gaus-
sians on the fly. The same effect can be obtained by manually
reducing the Gaussian height at every recrossing from A to
B [37] (see Note 1). Another interesting feature is multiple-walker
metadynamics [43], which can be applied with the path-CV to
speed up the simulation. Here, several replicas of the system are
simulated simultaneously in parallel for the exploration of different
regions of CV-space, while each replica communicates its updates
260 Alberto Pérez de Alba Ortı́z et al.

Fig. 1 Graphical representation of an initial guess path-CV section (red) converging to the average transition
path (green) between basins A and B. The curve points sg(σ g) are relocated to the cumulative average density
hziσg in CV-space, which peaks at the valley in the free energy F (yellow) at each hyperplane S σ g

on both the path and on the bias potential to the other replicas. By
initializing walkers both in the A and B basins, the shape of the path
can be rendered significantly faster (see Note 2).
The presented recipe to locate the average transition path with
an adaptive path-CV is not exclusively coupled to metadynamics.
Other enhanced sampling methods, such as steered MD, con-
strained MD, umbrella sampling, or even TPS, can be used with
the path-CV, without changes to the path formalism. In this chap-
ter, we will exemplify the use of the path-CV with steered MD,
metadynamics, and with umbrella sampling.

2.3 Path-CV In order to implement the method numerically, we must provide a


Optimization discrete definition of the path-CV as a parameterized curve. This is
Algorithm done by representing the curve as a string of M ordered nodes,
sg ðσ g , tÞ ! stj i , with j ¼ 1,. . .,M labeling the nodes on the string
2.3.1 Projection of CV-
space onto a String
and ti representing the discrete time parameter at path update
of Nodes
step i. Then, the projection of a point in CV-space, z, onto the
path—which yields the value of the path progress parameter σ—is
given by:
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
m ðv1  v3 Þ2  jv3 j2 ðjv1 j2  jv2 j2 Þ  ðv1  v3 Þ þ jv3 j2
σ g ðzÞ ¼  ,
M 2M jv3 j2
with
v1 ¼ sm  hzi,
v2 ¼ hzi  sm1 ,
v3 ¼ smþ1  sm ,
ð4Þ
The Adaptive Path Collective Variable 261

where sm is the closest path node to z, and sm1 and sm+1 are its
neighboring nodes. This expression implies that points beyond the
first or last nodes are mapped to values of σ g < 0 and σ g > 1,
respectively. To have control over this mapping at and outside the
stable states, extra trailing nodes can be added at both ends of the
original path. If necessary, wall potentials can be added to restrict
the sampling on a particular σg-region. Note also that the projec-
tion in Eq. 4 requires that the nodes are equidistant. This require-
ment is imposed by a reparameterization step [26] after each path
update.

2.3.2 Evolution The path update step, which sets sg ðσ g Þ ¼ hziσ g , uses the time
of the Path-CV averaged distance between the sampled z-points and their projected
points on the path, sg(σ g(z)). This distance is weighted by a weight,
w, which is only non-zero for the two closest nodes, giving the
following path node propagation equation:
P
wj , k  ðzk  st i ðσðzk ÞÞÞ
t iþ1 ti
sj ¼ sj þ k P , with
k wj , k
2 0  13
 ti  ð5Þ
sj  st i ðσðzk ÞÞ
6 B C 7
wj , k ¼ max40, @1    ti
 A5,

sj  stj þ1
i


where k is the current MD step and Δt ¼ ti+1  ti is the time


interval between two path updates. See Fig. 2 for an illustration
of the path update calculation. In order to slow down or accelerate
the convergence of the path, an additional fade factor,
ξ ¼ expð ln ð2Þ=τÞ, can be introduced, with a half-life parameter,
τ, being the number of MD steps after which a distance measured
from the path contributes only 50% of its original value to the
average. We reformulate:
P
t iþ1 ti wj , k  ðzk  st i ðσðzk ÞÞÞ
sj ¼ sj þ k P t k : ð6Þ
kξ wj , k
i

2.3.3 Tube Potential, When facing landscapes with multiple or ill-defined transition
CV-space Scaling, Multiple valleys, it can be beneficial to not only bias the sampling along the
Walkers, and Corner- path, but also restrain the sampling to the path vicinity. A harmonic
Cutting restraint potential set at jjzk  st i ðσðzk ÞÞjj, either at zero distance or
allowing some freedom, can help in converging a transition path.
We refer to this restraint as a “tube” potential. In the limit of an
infinitesimally narrow tube potential, the path is optimized by
following the local free energy gradient—in a similar fashion to
the string method [26]—and PMD converges to the MFEP closest
to the initial guess path instead of to the average transition path.
Thus, the tube potential allows to control the behavior of PMD by
switching between a path optimization based on CV density, quick
262 Alberto Pérez de Alba Ortı́z et al.

Fig. 2 Graphical explanation of a path node update. The sampled average density hziσg is projected onto the
path at sg(σ g). The two closest path nodes sm1 and sm are relocated according to weights that depend on
their distances from sg(σ g). A subsequent reparameterization step redistributes the nodes along the path to
make them equidistant again

for well-defined landscapes with a single channel, and a path opti-


mization based on the free energy gradient, suitable for more
complex scenarios with multiple channels. This versatility is key to
the adaptive path framework presented in this chapter, especially
when considering that the essential distinction between different
path-CV implementations is the optimization rule [26, 33, 34,
37–39, 41]. Of course, when using a tube potential, care has to
be taken regarding its effect on the entropic contribution to the free
energy (see Note 3).
Another useful algorithmic extension is the scaling of
CV-space. Imagine a set of CVs with numerical ranges differing
several orders of magnitude. In order to keep an equidistant set of
nodes under these conditions, the node distribution across dimen-
sions would need to be severely unbalanced. As a consequence, the
path progress parameter σ would also be defined mostly by the
most widely ranging CV. To avoid this imbalance, one may rescale
the CVs in a manner that the space to be sampled is in all dimen-
sions normalized to one (see Note 4). This is particularly helpful
when dealing with CVs of different units (e.g., rad and deg) or
dimensions (e.g., rad and nm). To rescale the CV-space, it is useful
to have a priori knowledge of the minimum and maximum values
that each CV can have.
A final remark on the algorithm concerns a side effect of the
reparameterization step to ensure node equidistance. The imple-
mented reparameterization algorithm [26] turns out to favor
straight paths somewhat and displays a tendency to “cut corners”
while redistributing the nodes. While this tendency is often benefi-
cial, as it maintains a smooth, non-curling, or self-intersecting
curve, there are obvious drawbacks. In particular, when the meta-
dynamics is temporarily sampling one end of the path, the repeating
reparameterization after every path update redistributes also the
The Adaptive Path Collective Variable 263

nodes at the other end of the path, moving them gradually back to a
straight path and thus undoing previous path optimization. This
side effect is much reduced in the multiple-walker PMD implemen-
tation, because in that case the sampling is more continuous along
the entire path. Apart from preventing the information loss, of
course the multiple-walker option also results in an almost trivial
parallelization speedup for the sampling of the path and the free
energy, thus providing a powerful extension to the original method
[41] (see Note 5).
In summary, the path-CV consists of a set of ordered nodes
describing the transition from basin A to basin B in the high-
dimensional CV-space. The system can be biased to move along
the path, while the positions of the nodes in CV-space can be
optimized by following the average density of the sampled points,
which peaks at the free energy valley. By means of this sampling
along and around the path, we can converge the average transition
path and the free energy along it. Additional actions can be taken to
control the extent of the sampling and the flexibility of the path
when facing challenging, forking free energy landscapes. Namely,
we can add a tube potential to restrain the sampling in the direction
perpendicular to the path and switch from a density-based optimi-
zation toward the average transition path, to a gradient-based
optimization toward the closest MFEP to the initial guess path.

2.4 Path-CV The theoretical and numerical framework discussed above has been
Implementation implemented into the PLUMED software [42] as a function of
in PLUMED CVs. Invoking the action PATHCV in PLUMED requires that the
following keywords are specified:
l LABEL: sets the identifier for this instance.
l ARG: sets the list of (priorly defined) CVs that span the space in
which the path exists.
l GENPATH: generates a straight path between two points in
CV-space; the two points typically marking the stable states. It
takes 3 integers as arguments, corresponding to the number of
anterior trailing nodes, actual transition nodes, and posterior
trailing nodes, followed by the CV-space coordinates of the
initial and final transition nodes separated by commas.
l INFILE: points to a file containing an input path.
l FIXED: indicates the two fixed nodes corresponding to the
initial and final states. The default values are the first and last
nodes, thus assuming no trailing nodes.
l OUTFILE (PATH) : points to a file where the path updates are
printed, concatenated one after the other.
l SCALE: lists the scaling factors to normalize the CV-space. The
default value is one for each CV.
264 Alberto Pérez de Alba Ortı́z et al.

l STRIDE (PACE) : indicates the frequency for printing the path


in MD steps.
l PACE (0) : indicates the frequency for optimizing the path
nodes in MD steps.
l HALFLIFE (-1) : indicates the number of MD steps after which
a previously measured path distance weighs only 50% in the
average. A negative number sets it to infinity.
The parenthesized arguments indicate the default values for the
keywords when relevant. The format of the INFILE and OUTFILE
comprehends a first column with the numbering of the nodes,
followed by N columns with the value of each CV at each node
position, which in turn are followed by N columns showing the
cumulative measured displacement of the system from the given
node along each CV. The final column contains the cumulative
weight wj,k for the corresponding node, such that the cumulative
displacement divided by the cumulative weight gives the average
distance between the path and the measured average transition
density. After each path update, the cumulative displacement is
reset to zero for the non-fixed nodes, but the cumulative weights
remain.
To use the multiple-walker implementation, one should also
provide:
l WALKERS_ID: indicates the ID of the current walker, starting
from zero.
l WALKERS_N: indicates the total number of walkers.
l WALKERS_DIR: points to the directory where all walkers write
and read each other’s files.
l WALKERS_RSTRIDE: indicates the reading frequency for walkers
in MD steps.
Two quantities can be extracted, and biased, from the path-CV:
the components s and z, corresponding to the progress along
the path, σðzk Þjst i , and the displacement from the path,
jjzk  st i ðσðzk ÞÞjj. In the PLUMED syntax, the components are
called by LABEL.s and LABEL.z, respectively.

3 Materials

Simulations of the Watson–Crick to Hoogsteen DNA base-pairing


transition are carried out using GROMACS version 5.1.4 [49]
compiled with PLUMED version 2.1.3 [42] with the added
PMD code available on http://www.acmm.nl/ensing/software/
PathCV.cpp. Figures have been rendered with Gnuplot [50]. All
simulations were performed in Carbon, the local computing cluster
of the Van ’t Hoff Institute for Molecular Sciences at the University
of Amsterdam. The reader is referred to the documentation of
The Adaptive Path Collective Variable 265

GROMACS, PLUMED, and our implemented path-CV for details


about how to install and execute these software codes.

4 Methods

In this section, we present an application of path-CV enhanced


sampling to study the transition between Watson–Crick
(WC) [51] and Hoogsteen (HG) [52] base pairs (bps) in double-
stranded DNA. HG bps form via a 180∘ rotation of the purine base
around the glycosidic bond from an anti to sin conformation with
respect to WC [53–56] (Fig. 3). This conformational change has
raised attention after discovering that DNA presents a dynamical
equilibrium between the two base-pairings, and that the alternative
HG conformation can be involved in biological functionalities
related to recognition, replication, damage induction, and repair
[53–56].
Experimental [53, 55] and computational [53, 57, 58] studies
have been carried out to elucidate the mechanistic pathways, the
free energy differences, and the barriers between the WC and HG
bps. Previous simulations have focused, among others, on the
transition of the A16–T9 bp of A6-DNA, which we will also do in
this chapter. In these studies, the transition was described by two
CVs: the glycosidic torsion, χ, and the base opening angle, θ. These
two CVs can in principle distinguish between two suggested transi-
tion mechanisms involving the base rotation (Fig. 3c) and base
flipping outside of the double helix (Fig. 3d). In these previous
investigations, the pathways and free energy profiles were deter-
mined by conjugate peak refinement (CPR), connecting the two
stable conformers on the adiabatic free energy surface [53], and by
umbrella sampling simulations, restrained to specific points on the
(χ, θ)-plane followed by an a posteriori search for possible pathways
on the free energy surface [57]. Six different transition pathways
have been identified, as one can distinguish two directions for the
rotation around the glycosidic bond (clockwise and counter-

Fig. 3 (a) Double-stranded A6-DNA; (b) WC A16–T9 bp; (c) HG A16–T9 bp after the 180∘ rotation of the
adenine; (d) Base flipping of the adenine in the A16–T9 bp
266 Alberto Pérez de Alba Ortı́z et al.

clockwise) and three kinds of base flipping (opening to the DNA


major groove, opening to the minor groove, or without base
opening).
In the following sections, we will describe the application of the
adaptive path-CV to study this transition in DNA. The path-CV
allows us to include a larger set of descriptive CVs to enhance the
sampling, giving a more detailed picture of the process, including
conformational changes in the DNA backbones, reorganization of
neighboring bps, and the overall shape of the DNA double helix.
Moreover, previous work on this system in our Computational
Chemistry Group using TPS simulations [59, 60] (manuscript in
preparation) provides us not only with a prepared and tested setup
for the molecular model system, but also a large set of reactive
trajectories between the WC and HG states that serve as compari-
son for the path-CV enhanced sampling results.
This section is organized as follows: Subheading 4.1 deals with
the equilibration of the system in each of the two stable states
(WC and HG). Subheading 4.2 provides a description of the CVs.
Subheading 4.3 describes the localization of the stable states in the
CV-space. In Subheading 4.4 we describe how to use the steered
MD method along the path-CV to test the CV set. In Subheading
4.5, we illustrate an exploratory multiple-walker PMD simulation,
which reveals key properties of the studied transition. Subheading
4.6 shows a more exhaustive multiple-walker PMD run to converge
a transition path without base flipping. In Subheading 4.7,
umbrella sampling is performed along this path to extract a free
energy profile using WHAM [44]. In Subheading 4.8, we perform
an a posteriori path optimization for a TPS trajectory of the base-
flipping transition and compute from this another free energy
profile.

4.1 System The setup of the aqueous double-stranded DNA system for classical
Preparation MD simulation with the GROMACS software was done as part of a
previous work [59, 60] and will be described in full detail elsewhere.
In brief, an ideal B-DNA duplex structure is created with
the make_na webtool [61]. The nucleotide sequence 50 -
CGATTTTTTGGC-30 —and its complementary strand—is repro-
duced from ref. 53. The chain is placed in a periodic dodecahedron
box and solvated in 6691 TIP3P water molecules [62] with 28 Na+
and 6 Cl ions (25 mM NaCl), resulting in a charge-neutral system of
20868 atoms. The AMBER03 force field [63] is used to describe
atomic interactions. Non-bonded interactions are treated with a
cut-off at 0.8 nm and long-range electrostatics are handled by the
particle mesh Ewald method [64, 65]. Energy minimization is per-
formed by conjugate gradient (with a threshold of 100 N), and
followed by a 1 ns position restrained run (with a force constant of
1000 kJ/(mol nm2) for DNA heavy atoms). Equilibration is per-
formed in nine, 200 ns long, MD runs starting with different random
The Adaptive Path Collective Variable 267

Maxwell–Boltzmann distributed velocities and using a time step of


2 fs, the v-rescale thermostat [66] at 300 K, and the Parrinello–Rah-
man barostat [67] at 1 bar. These equilibrations are used to obtain an
initial metadynamics-biased transition to initialize the TPS algo-
rithm. For our path-CV simulations, two 100 ns re-equilibrations,
one in WC and one in HG, are performed after changing the force
field to parmbsc1 [68], which contains special parameters for DNA.
The same setup is employed for production runs.

4.2 Collective Determining a good set of CVs to describe the slow dynamics of a
Variables system is a highly non-trivial problem. For a long time, chemical
intuition was the main way to identify geometric order parameters
to describe a molecular transition. Today, state-of-the-art efforts in
automating the discovery of CVs include: the spectral gap optimi-
zation of order parameters (SGOOP) [69], the time-structure
based independent component analysis (tICA) [70], and the har-
monic linear discriminant analysis (HLDA) [71]. Regardless of the
method for choosing CVs, one big advantage of the path approach
is that it enables to use a large set of CVs (i.e., more than the usual
two or three). We may even have some redundancy or include some
irrelevant CVs, without running immediately into problems. Here,
we constructed the set of CVs after a careful analysis of available
TPS trajectories of the transition and using a trial-and-error
approach at the hand of test runs with steered MD and PMD.
Key aspects aimed for in the test run included: (1) structural con-
sistency of the stable states and (2) of the overall DNA double helix,
(3) consistent capture of the transition mechanism, and (4) mini-
mizing hysteresis effects in the path finding and free energy estima-
tion. The set contains the following 7 CVs (Fig. 4 for a graphical
illustration):
l dWC: The distance of the characteristic WC H-bond between
A16 (N1) and T9 (N3).
l dHG: The distance of the characteristic HG H-bond between
A16 (N7) and T9 (N3).
l dHB: The distance of the conserved H-bond, present both in
WC and in HG, between A16 (N6) and T9 (O4).
l dCC: The distance between A16 (C1’) and T9 (C1’).
l dNB: The distance between the centers of mass P1’ and P2’.
l tGB: The torsion around the glycosidic bond defined by the
pseudo-dihedral angle formed by the axis A16 (C1’-N9) and
the vectors P2-P1 and P5-P6.
l tBF: The base-flipping torsion defined by the pseudo-dihedral
angle (P1+P2)-P3-P4-P5.
The first two CVs—dWC and dHG—discriminate the H-bond
forming and breaking that distinguishes each of the stable states.
On the other hand, dHB and dCC ensure the alignment and
268 Alberto Pérez de Alba Ortı́z et al.

Fig. 4 (a) WC bp with graphical representations of the CVs: dWC, dHB, and dCC; (b) HG bp with graphical
representations of the CVs: dHG, dHB, and dCC; (c) A16–T9 bp and its two neighboring bps with graphical
representations of the centers of mass involved in the calculation of CVs: dNB, tGB, and tBF

complete forming of both types of bps. By including these CVs in


the path definition we make sure that the bps are well-formed when
running the biased dynamics, preventing stacking and dislocations.
During the transition without base flipping, dNB represents a slight
displacement of the neighboring bps, which provides the required
space for the rotation of the base. The last two CVs, tGB and tBF,
which have been already introduced in previous work [53, 57],
describe the rotation of the purine base, A16, and its flipping out
the helix, respectively.
The Adaptive Path Collective Variable 269

It should be noted that the definition of the glycosidic torsion,


tGB, involves the centers of mass of the neighboring bps rather than
just the immediate sugar atoms, as done in refs. 53 and 57. This
alteration was introduced after the observation of sugar and back-
bone perturbations when running biased dynamics. Such deforma-
tions prevent an exact measurement of the base rotation, as the
atoms used as reference also rotate. The extra flexibility that we
observe in DNA can be a consequence of the chosen force field, or
of the fact that the sampling is done by evolving dynamics, instead
of minimization steps [53] or restrained windows [57].
Our final remark on this CV set concerns the non-periodic
nature of the path-CV. The WC-to-HG transition can occur either
with a clockwise or counter-clockwise rotation around the glyco-
sidic torsion, tGB. However, the current formulation of the path can
only capture one direction, as it does not capture a periodic crossing
from π to  π. This implies that a particular definition for tGB is also
bound to a specific kind of rotation. In this chapter, we focus on the
rotation which crosses the zero radian line in our definition of tGB
(with the A16 six-ring pointing toward A17), as it corresponds to
the MFEP reported in ref. 53, and the second MFEP in ref. 57. We
refer to this rotation as clockwise. In Note 6, further discussion is
provided about possible ways to handle periodic CVs in a path. We
do not contemplate this issue for tBF, which would correspond to
the base opening to the DNA major groove and coming back in the
minor groove. In our definition, negative values of tBF imply base
flipping to the major groove, while positive values do so to the
minor groove.

4.3 Stable States Before we can embark on introducing and optimizing a transition
path in CV-space, we need to define the two stable states. To this
end, we perform two 100 ns equilibration simulations of the WC
and HG states with the parmbsc1 force field. The resulting trajec-
tories are stored for analysis using the PLUMED driver (see Note
7). The PLUMED driver is the stand-alone feature of the
PLUMED package, which does not require linking to an MD
code to operate on configurations generated during a simulation,
but rather operates on configurations from an input trajectory. This
tool allows us to compute and print CVs from a pre-existent trajec-
tory (.xtc file). The inclusion of a protein data bank (PDB) file
[72] is required to provide atomic masses to the PLUMED driver.
We execute:

plumed driver --pdb dna.pdb --mf_xtc traj.xtc --plumed plumed.dat

with the plumed.dat file, which contains the CV definitions and


some printing instructions.

# set units
UNITS LENGTH=A TIME=ps ENERGY=kcal/mol
270 Alberto Pérez de Alba Ortı́z et al.

# define centers of mass


p1: CENTER ATOMS=234,235,237,238,242,243,244,246,247,518,519,521,522,523,524,527,
528,530,531 MASS
p2: CENTER ATOMS=456,457,459,461,462,465,466,467,298,299,301,302,303,304,305,307,
308,311,312 MASS
p1p2: CENTER ATOMS=234,235,237,238,242,243,244,246,247,518,519,521,522,523,524,527,
528,530,531,456,457,459,461,462,465,466,467,298,299,301,302,303,304,305,307,308,
311,312 MASS
p1_prime: CENTER ATOMS=518,519,521,522,523,524,527,528,530,531 MASS
p2_prime: CENTER ATOMS=456,457,459,461,462,465,466,467 MASS
p3: CENTER ATOMS=505,506,507,508,509 MASS
p4: CENTER ATOMS=473,474,475,476,477 MASS
p5: CENTER ATOMS=486,487,489,490,499 MASS
p6: CENTER ATOMS=490,491,495,499,496,498 MASS

# define CVs
dWC: DISTANCE ATOMS=495,276
dHG: DISTANCE ATOMS=489,276
dHB: DISTANCE ATOMS=275,492
dCC: DISTANCE ATOMS=264,484
dNB: DISTANCE ATOMS=p1_prime,p2_prime
tGB: TORSION VECTOR1=p2,p1 AXIS=484,486 VECTOR2=p5,p6
tBF: TORSION ATOMS=p1p2,p3,p4,p5

# output
PRINT ARG=dWC,dHG,dHB,dCC,dNB,tGB,tBF STRIDE=10 FILE=COLVAR

Note that, depending on the MD engine and setup, periodic


boundary conditions (PBC) might need to be treated using the
WHOLEMOLECULES command in PLUMED.
Table 1 shows the average values for the CVs in the stable
states, obtained from this analysis, which will subsequently be
used in the following sections for the location of the two fixed,
initial and final, nodes of the path. The path progress parameter, σ,
is equal to 0 in the WC state, and equal to 1 in the HG state.

4.4 Steered MD We employ steered MD simulations [4, 5] to perform a first assess-


ment of the CVs. The goal is to validate whether the CV set
provides an accurate enough descriptor to drive the transition and

Table 1
Average values of the seven CVs in the stable WC and HG states

bp/CV dWC (Å) dHG (Å) dHB (Å) dCC (Å) dNB (Å) tGB (rad) tBF (rad)
WC 2.9 6.4 3.0 10.6 7.9 1.5 0.1
HG 5.9 3.0 3.0 9.0 7.8 1.7 0.0
The Adaptive Path Collective Variable 271

successfully arrive at each of the stable states. From this relatively


fast analysis one can gain already an approximate idea of the shape
of the path curve, the features of the immediate landscape, the
number of required nodes (see Note 8), and the free energy differ-
ence between states.
We start a simulation in the WC state and define an initial
straight path (that is, a linear interpolation in the CV-space between
the fixed nodes defining the stable states) to the HG state, contain-
ing 50 nodes. No trailing nodes are necessary at this stage, as we do
not pull the system beyond the stable states. We apply a harmonic
restraint on the path, with a force constant of 2000 kcal/mol per
squared normalized path unit. During a simulation of 1 ns, we
gradually, and with a constant velocity, move the center of the
restraint from σ ¼ 0 to σ ¼ 1 . Thus steering the system toward
the final HG state. We also include a tube potential with a force
constant of 50 kcal/mol per squared normalized path unit, to keep
the system close to the path. Simultaneously, we update the path
nodes every 0.2 ps, so that the system can find its way toward the
transition valley. A short half-life (0.2 ps) increases the path flexibil-
ity; a convenient choice when starting from an uninformed first
guess (see Note 9).
We add the following commands to the CV definitions in the
plumed.dat file.

# define path-CV
PATHCV LABEL=pcv ARG=dWC,dHG,dHB,dCC,dNB,tGB,tBF
GENPATH= 0,50,0,2.9,6.4,3.0,10.6,7.9,1.5,-0.1,5.9,
3.0,3.0,9.0,7.8,-1.7,0.0 HALFLIFE=100 PACE=100

# set tube potential


RESTRAINT LABEL=tube ARG=pcv.z AT=0.0 KAPPA=50.0

# do steered MD
MOVINGRESTRAINT ...
LABEL=steer
ARG=pcv.s
STEP0=50000 AT0=0.00 KAPPA0=2000.0
STEP1=550000 AT1=1.00 KAPPA1=2000.0
... MOVINGRESTRAINT

# output
PRINT ARG=dWC,dHG,dHB,dCC,dNB,tGB,tBF,pcv.s,pcv.z,
steer.pcv.s_cntr,steer.bias,steer.work,steer.
force2,tube.bias STRIDE=10 FILE=COLVAR

and start the simulation with the command:

gmx mdrun -plumed plumed


272 Alberto Pérez de Alba Ortı́z et al.

1.4 2 12
sampled config. sampled config. sampled config.
1.2 restraint center 1.5
1 10
1
0.8
0.5 8

tBF (rad)

dHG (Å)
0.6 HG WC
0 WC
σ

0.4
−0.5 6
0.2
0 −1
4
HG
−0.2 −1.5

−0.4 −2 2
0 200 400 600 800 1000 1200 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2 4 6 8 10 12
t (ps) tGB (rad) dWC (Å)

Fig. 5 Left: time evolution of the path progress parameter, σ (gray) and of the target restraint value (purple)
during the steered MD simulation; Center: sampled configurations projected onto the torsion angle CVs, tGB
and tBF (gray); Right: sampled configurations projected onto the distance CVs, dWC and dHG (gray). The stable
states are indicated by crosses

Steering along the path successfully achieves the WC-to-HG


transition, as seen in Fig. 5, left. The initial, intermediate, and final
structures show a conserved double helix and the bps are properly
H-bonded and aligned. From the two selected projections of the
7-dimensional CV-space (Fig. 5, center and right), characteristic
features of the path can be observed. In particular, we observe the
sampling of the major groove base flipping from the WC state, seen
as a negative value of tBF and an increase of both dWC and dHG. In
the steered MD simulation, the base is driven to flip before com-
pleting a rotation without much base opening. We also note that
the number of nodes (further discussed hereafter) appears adequate
for the transition, as the path can capture the curvature as dictated
by the mechanism, and does not loop or overlap (see Note 8).
The average path obtained after this short steered MD run,
sampling only one crossing between the stable states, is still far from
a converged result. Still, by iteratively repeating this procedure,
feeding the last path optimization of each iteration as the initial
guess of the next, an optimal path can be converged. Furthermore,
by applying the Jarzynski method [5] a free energy difference can
be calculated after collecting enough statistics. However, in this
chapter we will not follow this route and instead continue with the
application of the path-CV with other sampling schemes.

4.5 Metadynamics After the steered MD simulation, we set out for an exploratory
metadynamics run. We use the multiple-walker PMD approach,
using 16 walkers—8 starting in the WC state and 8 in the HG
state—all probing and updating the same path and biasing poten-
tial. Gaussians with a height of 0.05 kcal/mol and a width of 0.05
normalized path units are deposited every 0.5 ps (see Notes 2 and
10). Since the range of the biased variable is known a priori, it is
easy and computation-efficient to use a grid to store the potential.
The Adaptive Path Collective Variable 273

For the initial guess path, we take the same linear interpolation as
used before in the steered MD case, but now add 20 trailing nodes
at the beginning and at the end of the original 50 transition nodes
(see Notes 8 and 11). The path update has a half-life of 1 ps (see
Note 9) and is performed every 0.5 ps (see Note 10). No tube
potential is set, so that the CV-space perpendicular to the path
is sampled freely. Harmonic walls with force constants of 1000
kcal/mol per squared normalized path unit are set on σ to limit
the sampling inside the [0.2,1.2] interval. Similarly, harmonic
walls with a force constant of 1000 kcal/(mol rad2) are set on tGB
to prevent counter-clockwise rotations, which are mapped by the
path as sudden jumps from negative values to values greater than
one. Note that, when putting a wall on a periodic CV, it is necessary
to define a non-periodic instance of it to avoid force artifacts.
We generate 16 plumed.{ID}.dat files that include the pre-
vious CV definitions, followed by:

# define non-periodic tGB


tGB_np: COMBINE ARG=tGB COEFFICIENTS=1. PERIODIC=NO

# define path-CV
PATHCV LABEL=pcv ARG=dWC,dHG,dHB,dCC,dNB,tGB,tBF GEN-
PATH=20,50,20,2.9,6.4,3.0,10.6,7.9,1.5,-0.1,5.9,3.0,3.0,9.0,7.8,-1.7,0.0
FIXED=21,70 HALFLIFE=500 PACE=250 WALKERS_RSTRIDE=250 WALKERS_ID={ID} WALK-
ERS_N=16 WALKERS_DIR=.

# do metadynamics
METAD LABEL=metadyn ARG=pcv.s SIGMA=0.05 HEIGHT=0.05 PACE=250 GRID_MIN=-1.0
GRID_MAX=2.0 WALKERS_MPI

# set walls on pcv.s and non-periodic tGB


LOWER_WALLS LABEL=s_lwall ARG=pcv.s AT=-0.2 KAPPA=1000.0
UPPER_WALLS LABEL=s_uwall ARG=pcv.s AT=1.2 KAPPA=1000.0
LOWER_WALLS LABEL=tGB_lwall ARG=tGB_np AT=-2.2 KAPPA=1000.0
UPPER_WALLS LABEL=tGB_uwall ARG=tGB_np AT=2.0 KAPPA=1000.0

#output
PRINT ARG=dWC,dHG,dHB,dCC,dNB,tGB,tBF,pcv.s,pcv.z,metadyn.bias,s_lwall.
bias,s_uwall.bias,tGB_lwall.bias,tGB_uwall.bias STRIDE=10 FILE=COLVAR

and run 16 parallel GROMACS simulations, 8 starting in the WC


state and 8 in the HG state, each with different random Maxwell–-
Boltzmann distributed velocities. The PMD multiple-walker simu-
lation is started with the command:

gmx mdrun -plumed plumed -multi 16

The distribution of the walkers along the path in time (Fig. 6,


left) shows how the deposited Gaussians generate a repulsive effect
274 Alberto Pérez de Alba Ortı́z et al.

1.4 2 20 1
1.2 1.5 18
0.5
1 1 16
0.8 14 0
0.5

tBF (rad)

tBF (rad)
dHG (Å)
0.6 HG WC 12
0 −0.5
σ

0.4 10
−0.5
0.2 8 WC −1
0 −1 6
−1.5
−0.2 −1.5 4 HG
−0.4 −2 2 −2
0 1000 2000 3000 4000 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2 4 6 8 10 12 14 16 18
t (ps) tGB (rad) dWC (Å)

Fig. 6 Left: time evolution of the path progress parameter, σ, for each of the 16 PMD walkers (green-pink
colormap); Center: sampled configurations by all walkers projected onto tGB and tBF; Right: sampled
configurations by all walkers projected onto dWC and dHG. The stable states are indicated by crosses

among the walkers. The walkers quickly de-correlate and sample


different sections of the path. This makes the path updates more
efficient, as most nodes are now sampled and relocated at each path
update. The sampled CV values, illustrated in the middle and right
panels in Fig. 6, show evidence for three competing pathways: one
with a small degree of flipping (tBF  0.5 rad) toward the minor
groove, one with no flipping, and one which opens up to tBF π/
2 rad toward the major groove. Due to the partial overlap of these
three pathways, the PMD simulation does not succeed in converg-
ing to a single average transition path. Instead, the nodes oscillate
between the path of no base flipping and several paths with various
degrees of base opening, which prevents convergence. In the fol-
lowing section, we will show how to converge the path-CV to a
specific mechanistic pathway.

4.6 Converging Starting again from the guess path based on the linear interpolation
a Path between the stable states, we will now aim to converge the path-CV
to a specific mechanistic transition pathway and its corresponding
free energy profile. To do so, we use a tube potential to control the
sampling in the neighborhood of the path. This restraint prevents
bifurcation, such as that seen in Fig. 6, and maintains all walkers
exploring the same valley. The force constant for the tube, after
several trials, is set to 20 kcal/mol per squared normalized path unit
(see Note 3). The number of walkers is decreased to eight (4 starting
in each stable state), and the Gaussian height to 0.04 kcal/mol, as
smaller and more infrequent increases of the bias potential favor the
ability of metadynamics to overwrite and self-heal a free energy
profile [48]. We also reduce to 500 kcal/mol per squared normal-
ized path unit the force constant of the wall potentials on σ that
restrict the sampling to the [0.2,1.2] range. The rest of the
parameters are the same as in Subheading 4.5. Eight parallel runs
are started with different random Maxwell–Boltzmann distributed
velocities.
The Adaptive Path Collective Variable 275

1.4 2 8
sampled config. sampled config.
1.2 1.5 path at t=3500 ps path at t=3500 ps
7
1 WC
1
0.8 6
0.5

tBF (rad)

dHG (Å)
0.6 HG WC
0 5
σ

0.4
−0.5
0.2 4
0 −1 HG
3
−0.2 −1.5

−0.4 −2 2
0 1000 2000 3000 4000 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2 3 4 5 6 7 8
t (ps) tGB (rad) dWC (Å)

Fig. 7 Left: time evolution of the path progress parameter, σ, for each of the 8 PMD walkers (in colors);
Center: sampled configurations by all walkers projected onto tGB and tBF (gray), and the optimized path
(purple) with trailing nodes (black); Right: sampled configurations by all walkers projected onto dWC and dHG
(gray), and the optimized path (purple) with trailing nodes (black). The stable states are indicated by crosses

Similar to before, we observe that the walkers repel each other


and explore different sections of the path (Fig. 7, left). The full
range along the path is sampled, although most walkers do not
actually explore σ over the entire range. The walkers are only able to
diffuse above and below a narrow region close to σ ¼ 0.3 that is not
crossed as much; we will comment on this feature at the end of the
section. The middle and right panels of Fig. 7 show that the
sampling is now mostly focused on a single transition pathway.
The path shows only a very small degree of base flipping toward
the minor groove, and rotates mainly inside of the DNA double
helix, with the bases remaining at a close distance from each other
for the entire transition.
In Fig. 8, we show the sampling along the path projected onto
each of the 7 CVs. Apart from the slight flipping to the minor
groove seen from tBF, we observe an increase in the neighboring bp
distance, dNB, to provide space for the rotation. The H-bond
distances, dWC, dHG, and dHB, as well as the inter-strand C1’–C1’
distance, dCC, show that the bp is well-formed and aligned during
the sampling, and that there is no significant dislocation of the pair.
The rotation of the adenine, seen from tGB, occurs early in the
transition, as the mid-rotation—which is expected to coincide
with the peak of the free energy barrier—occurs at σ ¼ 0.3. This
is somewhat contradictory with respect to previous work, which
reported a late transition state, more similar in structure to the HG
state [55, 57]. The discrepancy with our results can be explained
either by the different force field, or by our redefinition of the
glycosidic torsion CV, tGB, which is based on centers of mass, rather
than single atomic positions (Subheading 4.2).
The convergence of the path-CV to the found transition path is
rather robust and invariant upon modest variations of the PMD
parameters. Reduction of the height of the Gaussian height from
276 Alberto Pérez de Alba Ortı́z et al.

node
0 10 20 30 40 50 60 70 80 90
8.0

dWC (Å)
6.0
4.0
2.0

8.0

dHG (Å)
6.0

4.0

2.0

5.0
dHB (Å)

4.0

3.0

2.0

12.0
dCC (Å)

10.0

8.0

12.0
dNB (Å)

10.0

8.0

6.0

2.0
tGB (rad)

0.0

−2.0

0.4
tBF (rad)

0.0
−0.4
−0.4 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4
σ

Fig. 8 Sampled configurations by the 8 PMD walkers projected onto each of the
7 CVs and σ (gray), and the optimized path nodes for each CV (in colors)
The Adaptive Path Collective Variable 277

0.04 to 0.02 kcal/mol leads to a slower buildup of the bias poten-


tial and, with that, a slower diffusion over the path, but the same
mechanism is found. Increasing the number of nodes from 50 to
70 makes the path-CV somewhat more flexible, but the final result
is not different (interestingly, increasing the number of trailing
nodes from 20 to 30 causes them to capture the pathway with
base flipping toward the major groove, described in Subheading
4.5, while the transition nodes remain in the same path with no bp
opening; a behavior pointed out in Note 11). Finally, increasing
the force constant of the harmonic tube potential from 20 to
30 kcal/mol per squared normalized path unit still leads to the
same results; however, a decrease to 10 kcal/mol per squared
normalized path unit is not restrictive enough to prevent bifurca-
tions to other pathways, which hinders the convergence.
To assess the convergence of the path-CV, we show in Fig. 9
the time evolution of the distance of the system from its projection
onto the path in CV-space, jjzk  st i ðσðzk ÞÞjj, averaged over the
walkers. At the start of the simulation, we observe peaks at both
stable states, as the walkers attempt to find their way out of the
basins. Next, we observe smaller peaks up to around 1200 ps,
corresponding to the first crossings of the walkers. Eventually, as
the path adapts to the sampled transition density, the sampled
distance from the path continues to decrease. The convergence of
PMD can also be assessed by the time evolution of the root-mean-
weighted average distance to the path

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1

1.4
1.2
1
0.8
0 0.6
500 0.4
1000 0.2 σ
1500
2000 0
2500 −0.2
3000
t (ps) 3500
4000 −0.4

Fig. 9 Time evolution of the distance from the path, jjzk  st i ðσðzk ÞÞjj, with
respect to the path progress parameter, σ, averaged over the 8 PMD walkers.
The color map on the z-axis shows the average distance from the path,
jjzk  st i ðσðzk ÞÞjj, weighted according to the inverse distance to the grid points
278 Alberto Pérez de Alba Ortı́z et al.

square deviation (RMSD) of the path, as well as of the free energy


profile, with respect to some references (e.g., the initial linearly
interpolated path and flat profile). Increasingly smaller RMSD
changes reflect a convergent simulation, as shown in ref. 41.
Although the path is seen to nicely converge, the estimation of
the free energy profile along the path is not as well-behaved. Gen-
erally, there are several approaches to converge the free energy
estimation of a metadynamics simulation. One can gradually reduce
the height of the Gaussian potentials after each recrossing [37], or
use the well-tempered approach [2, 3] to reduce the Gaussian size
in an automatic manner, or one can compute a running average of
the free energy starting from the moment of the first recrossing
back to the initial state [41] (see Note 1). First, however, we check
for hysteresis in the construction of the metadynamics bias poten-
tial, which would be an indication that our set of CVs is incomplete,
or that the path-CV has not properly converged to the average
transition path. Unfortunately, we indeed observe hysteresis in the
time evolution of the bias potential, despite our rather extensive set
of CVs to describe the base-rotation transition. This hysteresis
explains the already observed depletion around σ ¼ 0.3 in Fig. 7
(left), which indicated the difficulty of all walkers to cross from WC
to HG, and vice versa, even after the minima had been filled. The
hysteresis is even more evident in a single walker metadynamics run
along the fixed optimized path: during each crossing, the minima
are over-filled and the previously constructed profile becomes again
undone. By analyzing the structures before and after crossing, we
hypothesize that our CV set is missing a CV describing a large-scale
conformational change in the DNA strands. This is consistent with
the tilting and rotation of DNA chains around HG bps reported in
ref. 56. To test this possibility, we restrain the position of all DNA
atoms except for those in the rotating bp and its immediate neigh-
boring bps, and run metadynamics along the optimized path. Now,
we observe that the system is unable to cross from one state to the
other, until eventually the simulation crashes due to too high forces
on the atoms. In this manner of (1) monitoring for hysteresis,
(2) comparing configurations before and after crossing, and
(3) testing candidate “hidden CVs” by constraining them, we can
systematically discover the essential CVs to describe the process.
Further PMD simulations with a more extensive set of CVs are
described in a forthcoming publication, in which we focus on the
mechanistic, thermodynamic, and kinetic details of the WC-to-HG
transition and their DNA sequence-dependence. For this chapter
instead, we proceed with two other illustrations of the path-CV: in
an umbrella sampling simulation and in an a posteriori path
optimization.

In this section, we compute the free energy profile of the WC-to-


HG transition, using the path-CV optimized in the previous section
The Adaptive Path Collective Variable 279

4.7 Umbrella in combination with umbrella sampling simulations. By dividing


Sampling the sampling into regions along a CV (here, the path progress
parameter, σ) using window potentials, umbrella sampling is less
sensitive to hysteresis due to a “hidden CV” (i.e., an imperfect
reaction coordinate) than metadynamics due to the dynamic nature
of the latter. Of course, the resulting free energy would still suffer
from the incomplete description of the process, giving a too low
barrier, which would become evident in an a posteriori transmission
coefficient calculation or a committor analysis.
We run ten, 20 ns long, umbrella sampling simulations. The
harmonic window potentials are placed at every 0.1 normalized
path units to restrain the sampling to different regions along the
optimized path, from σ ¼ 0 to σ ¼ 1. The initial molecular struc-
tures for each window are obtained via steered MD along the
optimized path. The force constants for the window potentials are
of 2000 kcal/mol per squared normalized path unit. An additional
window at σ ¼ 0.34 was added to fill a gap in the distribution
overlap. We also apply a tube potential with a force constant of
30 kcal/mol per squared normalized path unit.
The PLUMED input file for the path-CV umbrella sampling
simulations is as follows:

# define path-CV
PATHCV LABEL=pcv ARG=dWC,dHG,dHB,dCC,dNB,tGB,tBF
INFILE=PMD_path.input HALFLIFE=-1 PACE=0

# set tube potential


RESTRAINT LABEL=tube ARG=pcv.z AT=0.0 KAPPA=30.0

# set umbrella
RESTRAINT LABEL=umbrella ARG=pcv.s AT={WINDOW} KAPPA=2000.0

# output
PRINT ARG=dWC,dHG,dHB,dCC,dNB,tGB,tBF,pcv.s,pcv.z,
umbrella.bias,tube.bias STRIDE=10 FILE=COLVAR

To construct the free energy profile, we apply WHAM [44] to


the last 10 ns of each simulation. The result, shown in Fig. 10,
confirms the early barrier (σ ¼ 0.3) already observed in the PMD
simulation, which coincides with the halfway point of the base
rotation (Fig. 8). An indistinct metastable state can be identified
at σ ¼ 0.7, which signifies the completion of the base rotation, but
with the HG bp still not fully aligned.
Here we performed the umbrella sampling along a fixed path
that was already optimized with PMD. But of course, the path can
also be evolved with umbrella sampling using multiple walkers; the
280 Alberto Pérez de Alba Ortı́z et al.

20

16

FE (kcal/mol)
12

0
0 0.2 0.4 0.6 0.8 1
σ

Fig. 10 WHAM free energy profile obtained from the umbrella sampling simula-
tions along the WC-to-HG path without base flipping optimized by PMD

required PLUMED keywords to do so are found in Subheadings


4.5 and 4.6.

4.8 A Posteriori Path As a final illustration of optimizing the adaptive path-CV to find a
Optimization transition mechanism, we will employ the path-CV as an a poster-
iori analysis tool on a pre-existing trajectory (see Note 12). Of
course, this trajectory should contain configurations from at least
one transition between two stable states, for example, from a very
long (or very lucky) brute-force MD or Monte Carlo simulation, or
from an enhanced sampling simulation. Here, we will analyze a
reactive trajectory obtained from a TPS simulation [59], and opti-
mize a path for one WC-to-HG transition involving the base open-
ing to the major groove (described in more detail in Subheading
4.5). We start with a straight path and use the PLUMED driver to
optimize it using the trajectory as input. The driver can compute
and print CVs, and, for the current purpose, also compute the
distance to the path-CV in each trajectory frame to optimize the
path. Typically, a single run of the PLUMED driver over the
trajectory is not enough to converge the path-CV. But by running
the driver several times on the trajectory, while every time providing
the previously optimized path as the initial guess for the next run,
the path-CV can be optimized efficiently in a few iterations.
The plumed.dat file contains the CV definitions and the
following commands:

#INIT PATHCV LABEL=pcv ARG=dWC,dHG,dHB,dCC,dNB,tGB,tBF GEN-


PATH=0,50,0,2.9,6.4,3.0,10.6,7.9,1.5,-0.1,5.9,3.0,3.0,9.0,7.8,-1.7,0.0
SCALE=0.1,0.1,0.1,0.3,0.2,0.6,0.5 HALFLIFE=250 PACE=100 OUTFILE=path.out
#RESTART PATHCV LABEL=pcv ARG=dWC,dHG,dHB,dCC,dNB,tGB,tBF INFILE=path.input
SCALE=0.1,0.1,0.1,0.3,0.2,0.6,0.5 HALFLIFE=250 PACE=100 OUTFILE=path.out

PRINT ARG=dWC,dHG,dHB,dCC,dNB,tGB,tBF,pcv.s,pcv.z STRIDE=1 FILE=colvar.out


The Adaptive Path Collective Variable 281

and we execute a simple bash script to automate the iterative


procedure for optimizing the path-CV with the PLUMED driver:

#!/bin/bash

# run first optimization


sed ’s/#INIT//’ plumed.dat > plumed.input
plumed driver --pdb dna.pdb --mf_xtc traj.xtc --
plumed plumed.input

# do 100 iterations
for (( i=0; i<100; i++ )); do

# set last optimized path as input


tail -56 path.out > path.input

# save input and output files


cp path.input path.input_$i
mv path.out path.out_$i

# run optimization
sed ’s/#RESTART//’ plumed.dat > plumed.input
plumed driver --pdb dna.pdb --mf_xtc traj.xtc --
plumed plumed.input
done

# clean up
mv path.out path.out_$i
rm bck.*

After 100 iterations a good fit is obtained for all CVs. Notice that
this simulation was done without a tube, which explains the fluc-
tuating sampling (Fig. 11).
To obtain the free energy along the optimized path, we use
umbrella sampling, similar as we did before for the mechanism
without base flipping found by PMD. Harmonic window potentials
are placed at every 0.1 normalized path units, from σ ¼ 0 to σ ¼ 1,
with force constants of 2000 kcal/mol per squared normalized path
unit. Additional windows with the same force constant are placed at
σ values of 0.35, 0.65, and 0.95; and with a force constant of 3000
kcal/mol per squared normalized path unit at σ ¼ 0.1.
Figure 12 shows the resulting free energy profile for the base-
flipping transition, which is characterized by additional small bar-
riers close to σ ¼ 0.1 and σ ¼ 0.9. These barriers mark the flipping
of the nucleotide out of, and back into, the confines of the double
helix. In between these steps, the base rotation takes place. In this
mechanism, the top of the central barrier is not located at the
mid-rotation state, but to a stage in which the base starts to
282 Alberto Pérez de Alba Ortı́z et al.

node
0 10 20 30 40 50
20.0

15.0

dWC (Å)
10.0

5.0

20.0

15.0

dHG (Å) 10.0

5.0

20.0

15.0
dHB (Å)

10.0

5.0

16.0

14.0
dCC (Å)

12.0

10.0

8.0

10.0

9.0
dNB (Å)

8.0

7.0

6.0

2.0
tGB (rad)

0.0

−2.0

0.0
tBF (rad)

−0.5

−1.0

−1.5
0 0.2 0.4 0.6 0.8 1
σ

Fig. 11 Sampled configurations by the TPS trajectory projected onto each of the
7 CVs and σ (gray), and the optimized path nodes for each CV (in colors)
The Adaptive Path Collective Variable 283

20
no base−flipping path
base−flipping path

FE (kcal/mol) 16

12

0
0 0.2 0.4 0.6 0.8 1
σ

Fig. 12 WHAM free energy profiles obtained from umbrella sampling simulations along WC-to-HG transition
paths with and without base flipping

re-enter into the double helix. As expected, the H-bond distances


between bps, dWC, dHG, and dHB, increase more than in the mech-
anism without base opening, and the distance of the neighboring
bps, dNB, does not play a role. The inconsistency in the free energy
difference between the WC and HG states in both profiles could be
a consequence of the missing CV related to the DNA large-scale
motion.
The free energy barriers for both mechanisms—with and with-
out base flipping—are similar according to our calculations. This
differs from previous results [53, 57], and is likely rooted in the
chosen force field and the different CV definitions. The similarity of
the barriers, and the greater depth and width of the WC basin,
explains the base opening escapades observed both in steered MD
and metadynamics simulations. For a system starting in the WC
basin, it is indeed more favorable to climb the first base-flipping
barrier, rather than the base-rotation one. It is not until later in the
pathway that the barrier of the base-flipped mechanism becomes
higher. On the other side, for a system starting in the HG valley,
base rotation without opening is consistently the most favorable
mechanism.
Naturally, the path-CV can be adapted for the other transition
mechanisms, either on the fly or by a posteriori optimization. In
search for other transition channels, one can fully exploit the versa-
tility of the path-CV, and of the enhanced sampling methods
applied to it. Moreover, the WC-to-HG transition in DNA contains
a complexity due to the involvement of various local and non-local
degrees of freedom that is rather common to conformational
changes in biomolecules and bio-assemblies. In that sense, we
believe that the path-CV framework is a promising tool to unravel
the mechanisms and obtain the free energy profiles of all sorts of
biomolecular phenomena (see Note 13).
284 Alberto Pérez de Alba Ortı́z et al.

5 Notes

1. It is a common practice in metadynamics to gradually reduce


the Gaussian height to converge a free energy profile. This can
be done manually after each recrossing on the path, or via the
well-tempered method. When using the latter option, one
should use a somewhat larger bias factor than advertised for
“normal” (non-adaptive) CVs, because path optimization
requires generally the sampling of several barrier crossings,
before which the Gaussian height should not have already
been reduced too much. In our experience, a well-tempered
bias factor of around 10–15 times the height of the barrier is a
working choice. Another very interesting option is to use
transition-tempered metadynamics (TTMetaD), a method
that first fills the valleys in a non-tempered phase, and then
converges the free energy profile in a well-tempered
fashion [73].
2. When using multiple-walker metadynamics, it is always recom-
mended to keep Gaussians narrow and small. Otherwise, we
risk the walkers not actually exploring the underlying free
energy profile, but only feeling each other’s repulsive poten-
tials. This can be assessed by analyzing the diffusion of σ over
time for all walkers. There should indeed be some repulsion,
but also crossings between walkers.
3. The tube potential is a convenient handle to temper a too
flexible path evolution and to restrict the sampling to a specific
transition pathway by blocking bifurcations. However, it
should be noted that, as long as the path is not yet optimal,
the tube potential acts as an additional hurdle, forcing the
system to cross outside the intrinsic transition valley. Secondly,
after the path optimization is converged, the tube potential
affects the sampling of the degrees of freedom orthogonal to
the path, thus affecting the entropic contribution to the profile.
This biasing by a tube potential can be relaxed somewhat by
setting the harmonic wall at a non-zero value of
jjzk  st i ðσðzk ÞÞjj. In this spirit, it can be convenient to first
measure the widths of the stable state basins, and then use
this to set a tube potential that does not affect the stable state
valleys.
4. Scaling of CV-space can be done once the range of each CV
during the transition is known. We simply calculate each scaling
factor as 1/(zi,max  zi,min) for each CV, zi.
5. One can use multiple walkers to continuously explore all
regions of the path and avoid the reparameterization step
from undoing the node optimization in temporarily unsampled
regions. When doing this, it is recommended to have at least as
The Adaptive Path Collective Variable 285

many walkers as expected stable and metastable states along the


path. Another very interesting way to use multiple walkers is to
include a dummy walker—which updates the path, but not the
metadynamics bias—restrained to a particular point in
CV-space. Thus, we can find the optimal path which crosses
that region.
6. The current numerical implementation of the path-CV does
not support periodic crossings. This implies that periodic CVs
must always be handled in a specific direction. An alternative
way to circumvent this is to map the periodic CV to its sine and
cosine, and include those two in the path-CV. In this way, the
periodic movement can be represented by a curve in the given
2D space. Nevertheless, one should be cautious when
performing metadynamics on this CV set, as the hard boundary
of the sine and cosine can generate artifacts.
7. It is highly recommended to save the trajectory files of the
equilibration runs in each stable state. In this way, they can
always be re-analyzed using the PLUMED driver tool. After
adding a new CV, we can quickly determine the location and
width of the stable state basins in the new CV-space.
8. Determining a good number of nodes to capture a transition is
a trial-and-error procedure. As a rule of thumb, one can start
with a small number of transition nodes (20–30). If the result-
ing curve is able to capture all CV fluctuations of the transition,
then one has succeeded. Otherwise, one can gradually add
more nodes until all features of the transition are represented.
In general, the path is resilient to changes in this parameter. We
have increased the number of transition nodes up to 40%
without affecting the final result. However, when too many
nodes are added, the path tends to coil or loop around the
stable states, and oscillate excessively around small fluctuations
of the CVs [41].
9. There is no satisfactory default value for the half-life parameter.
The general rule of thumb is to start with a relatively short half-
life if the initial guess path is likely to be outside the intrinsic
transition valley, and then switch to a large value, possibly even
to infinite, once the (neighborhood of the) intrinsic transition
valley is found. However, a too small half-life may yield a very
flexible and dynamical evolution, which leads to curvy paths
that do not guide the biased system over the transition barrier.
With a too large half-life, the path evolves evermore slowly
during the simulation, as each newly sampled transition density
weights less due to the ever-growing history of previous sam-
ples. To check whether the path has stopped evolving because
of the large half-life, or because it has actually found the transi-
tion valley, one should analyze the time evolution of the node
weights from the path output files. If very large weight values
286 Alberto Pérez de Alba Ortı́z et al.

appear early on in the simulation, a shorter half-life is probably


needed.
10. In a PMD simulation, the parameters for the Gaussian deposi-
tion pace and the path update pace should be somewhat bal-
anced for optimal efficiency. Setting a high metadynamics
deposit frequency with a low path update frequency leads to
many recrossings before finding an optimum path, while at the
same time the initial crossings may take place at high-energy
states outside the intrinsic transition valley. On the other hand,
if one sets a slow metadynamics deposit pace and a high path
update frequency, it may take a long time for the barrier
crossing to occur, but it will most likely take place over paths
that are already close to optimal. Note that the effect of increas-
ing the metadynamics deposit frequency is generally the same
as increasing the size of the Gaussian potentials. A sensible
initial setup is to have the same pace for the metadynamics
deposit frequency and for the path updates; the Gaussian
height can be set to around two orders of magnitude less
than the expected barrier (generally smaller than kBT to allow
for self-healing) and the width to around 0.05 normalized path
units. Then, the path flexibility can be controlled using the
half-life parameter.
11. Typically, we set a small number of additional trailing nodes
(10–20), as we require just enough of them to capture the
valleys at both ends of the path. However, sometimes the
trailing nodes can be exploited in other clever ways. For exam-
ple, one can intentionally direct them to steep regions in the
free energy landscape and get a natural wall effect to restrain
the sampling. This works as long as the trailing nodes are not
relocated. Alternatively, one can have the trailing nodes probe a
secondary relevant transition channel. We have performed
simulations on other systems, in which the transition nodes
capture the optimal path, while the trailing nodes fall into the
second optimal path (although both ends do not touch and
therefore that second path is not fully captured). In these cases,
care must be taken that the trailing nodes do not approach the
primary channel of the transition nodes. If this occurs, points in
CV-space lying close to both sets of nodes can be suddenly
mapped from one σ value to the other, leading to ill-defined
sampling. In some calculations where the sampling does not go
beyond the stable basins (as we saw in the application of steered
MD), trailing nodes are not needed.
12. A posteriori path optimization can be a very powerful tool
when a TPS ensemble is already available. We collaborated
with the developers of the OpenPathSampling (OPS) Python
library [74, 75], to include a PLUMED patch that enables the
The Adaptive Path Collective Variable 287

integrated use of the path-CV, as well as the rest of the


PLUMED CVs [76].
13. In this chapter, we focused our attention on a non-chemical,
conformational transition, modeled by classical MD simula-
tion. However, applications that require quantum mechanical
simulations (e.g., density functional theory (DFT) MD as
implemented in Car–Parrinello MD or Born–Oppenheimer
MD) may also benefit highly from the use of a path-CV. The
computationally costly dynamics can be focused on a specific
chemical transition of interest. In addition, physical insight can
be extracted from the properties of the system along the opti-
mized path [41].

Acknowledgements

We wish to acknowledge our fellow group members Peter


G. Bolhuis and David W. H. Swenson for their previous TPS
work on the DNA WC-to-HG transition. They provided us with
readied structures and MD protocols, as well as valuable and moti-
vating comparison and discussion points. We also acknowledge
Davide Branduardi for his support in coding the first version of
the PMD method in PLUMED. We thank the Mexican National
Council for Science and Technology (CONACYT), which provided
funding for Alberto Pérez de Alba Ortı́z during his PhD research at
the University of Amsterdam.

References
1. Laio A, Parrinello M (2002) Escaping free- 7. Darve E, Pohorille A (2001) Calculating free
energy minima. Proc Natl Acad Sci USA 99 energies using average force. J Chem Phys
(20):12562–12566 115:9169
2. Barducci A, Bussi G, Parrinello M (2008) Well- 8. Carter EA, Ciccotti G, Hynes JT, Kapral R
tempered metadynamics: a smoothly converg- (1989) Constrained reaction coordinate
ing and tunable free-energy method. Phys Rev dynamics for the simulation of rare events.
Lett 100(2):020603 Chem Phys Lett 156:472
3. Bonomi M, Barducci A, Parrinello M (2009) 9. den Otter WK, Briels WJ (1998) The calcula-
Reconstructing the equilibrium Boltzmann tion of free-energy differences by constrained
distribution from well-tempered metady- molecular dynamics simulations. J Chem Phys
namics. J Comput Chem 30:1615–1621 109:4139
4. Grubmüller H, Heymann B, Tavan P (1996) 10. Huber T, Torda A, van Gunsteren W (1994)
Ligand binding: molecular mechanics calcula- Local elevation: a method for improving the
tion of the streptavidin-biotin rupture force. searching properties of molecular dynamics sim-
Science 271(5251):997–999 ulation. J Comput Aided Mol Des 8:695–708
5. Jarzynski C (1997) Nonequilibrium equality 11. Grubmüller H (1995) Predicting slow struc-
for free energy differences. Phys Rev Lett 78 tural transitions in macromolecular systems:
(14):2690 conformational flooding. Phys Rev E 52:2893
6. Torrie GM, Valleau JP (1977) Nonphysical 12. Voter A (1997) Hyperdynamics: accelerated
sampling distributions in Monte Carlo free- molecular dynamics of infrequent events. Phys
energy estimation: umbrella sampling. J Com- Rev Lett 78:3908
put Phys 23(2):187–199
288 Alberto Pérez de Alba Ortı́z et al.

13. Babin V, Roland C, Sagui C (2008) Adaptively 28. Jónsson H, Mills G, Jacobsen KW (1998)
biased molecular dynamics for free energy cal- Nudged elastic band method for finding mini-
culations. J Chem Phys 128:134101 mum energy paths of transitions. In: Berne B,
14. Wang F, Landau DP (2001) Efficient, multiple- Ciccotti G, Coker DF (eds) Classical and quan-
range random walk algorithm to calculate the tum dynamics in condensed phase simulations.
density of states. Phys Rev Lett 86:2050 World Scientific, Singapore, pp 385–404
15. Hansmann UH (1997) Parallel tempering 29. Crooks GE, Chandler D (2001) Efficient tran-
algorithm for conformational studies of sition path sampling for nonequilibrium sto-
biological molecules. Chem Phys Lett 281 chastic dynamics. Phys Rev E 64:026109
(1):140–150 30. Van Erp TS, Moroni D, Bolhuis PG (2003) A
16. Sugita Y, Okamoto Y (1999) Replica-exchange novel path sampling method for the calculation
molecular dynamics method for protein fold- of rate constants. J Chem Phys 118:7762
ing. Chem Phys Lett 314:141–151 31. Faradjian AK, Elber R (2004) Computing time
17. Berg B, Neuhaus T (1992) Multicanonical scales from reaction coordinates by mileston-
ensemble: a new approach to simulate first- ing. J Chem Phys 120:10880
order phase transitions. Phys Rev Lett 68:9–12 32. Allen RJ, Frenkel D, ten Wolde PR (2006)
18. Maragliano L, Vanden-Eijnden E (2006) A Simulating rare events in equilibrium or non-
temperature accelerated method for sampling equilibrium stochastic systems. J Chem Phys
free energy and determining reaction pathways 124:94111
in rare events simulations. Chem Phys Lett 33. Branduardi D, Gervasio FL, Parrinello M
426:168–175 (2007) From A to B in free energy space. J
19. Kirkpatrick S, Gelatt C, Vecchi M (1983) Opti- Chem Phys 126:054103
mization by simulated annealing. Science 34. Pan AC, Sezer D, Roux B (2008) Finding
220:671–680 transition pathways using the string method
20. Sorensen M, Voter A (2000) Temperature- with swarms of trajectories. J Phys Chem B
accelerated dynamics for simulation of infre- 112(11):3432–3440
quent events. J Chem Phys 112:9599–9606 35. Bussi G, Gervasio FL, Laio A, Parrinello M
21. Rosso L, Minary P, Zhu Z, Tuckerman M (2006) Free-energy landscape for beta hairpin
(2002) On the use of the adiabatic molecular folding from combined parallel tempering and
dynamics technique in the calculation of free metadynamics. J Am Chem Soc
energy profiles. J Chem Phys 116:4389–4402 128:13435–13441
22. Dellago C, Bolhuis PG, Csajka FS, Chandler D 36. Piana S, Laio A (2007) A bias-exchange
(1998) Transition path sampling and the calcu- approach to protein folding. J Phys Chem B
lation of rate constants. J Chem Phys 108 111:4553–4559
(5):1964–1977 37. Dı́az Leines G, Ensing B (2012) Path finding
23. Bolhuis PG, Chandler D, Dellago C, Geissler on high-dimensional free energy landscapes.
PL (2002) Transition path sampling: throwing Phys Rev Lett 109(2):020601
ropes over rough mountain passes, in the dark. 38. Gallet GA, Pietrucci F, Andreoni W (2012)
Annu Rev Phys Chem 53:291 Bridging static and dynamical descriptions of
24. Weinan E, Ren W, Vanden-Eijnden E (2002) chemical reactions: an ab initio study of CO2
String method for the study of rare events. interacting with water molecules. J Chem The-
Phys Rev B 66(5):052301 ory Comput 8:4029–4039
25. Weinan E, Ren W, Vanden-Eijnden E (2005) 39. Pietrucci F, Saitta AM (2015) Formamide reac-
Finite temperature string method for the study tion network in gas phase and solution via a
of rare events. J Phys Chem B 109 unified theoretical approach: toward a reconcil-
(14):6688–6693 iation of different prebiotic scenarios. Proc
26. Maragliano L, Fischer A, Vanden-Eijnden E, Natl Acad Sci USA 112:15030–15035
Ciccotti G (2006) String method in collective 40. Chen C (2017) Fast exploration of an optimal
variables: minimum free energy paths and iso- path on the multidimensional free energy sur-
committor surfaces. J Chem Phys 125 face. PLoS One 12(5):e0177740
(2):024106 41. Pérez de Alba Ortı́z A, Tiwari A,
27. Vanden-Eijnden E, Venturoli M (2009) Revi- Puthenkalathil R, Ensing B (2018) Advances
siting the finite temperature string method for in enhanced sampling along adaptive paths of
the calculation of reaction tubes and free ener- collective variables. J Chem Phys 149
gies. J Chem Phys 130(19):194103 (7):072320
42. Tribello GA, Bonomi M, Branduardi D,
Camilloni C, Bussi G (2014) PLUMED 2:
The Adaptive Path Collective Variable 289

new feathers for an old bird. Comput Phys a structure-based survey. Nucleic Acids Res 43
Commun 185(2):604–613 (7):3420–3433
43. Raiteri P, Laio A, Gervasio FL, Micheletti C, 57. Yang C, Kim E, Pak Y (2015) Free energy
Parrinello M (2006) Efficient reconstruction of landscape and transition pathways from Wat-
complex free energy landscapes by multiple son–Crick to Hoogsteen base pairing in free
walkers metadynamics. J Phys Chem B 110 duplex DNA. Nucleic Acids Res 43
(8):3533–3539 (16):7769–7778
44. Grossfield A (2013) WHAM: the weighted his- 58. Chakraborty D, Wales DJ (2017) Energy land-
togram analysis method, version 2.0.9. http:// scape and pathways for transitions between
membrane.urmc.rochester.edu/content/ Watson–Crick and Hoogsteen base pairing in
wham DNA. J Phys Chem Lett 9(1):229–241
45. Ferrario M, Ciccotti G, Binder K (2007) Com- 59. Vreede J, Bolhuis PG, Swenson DW (2016)
puter simulations in condensed matter: from Predicting the mechanism and kinetics of the
materials to chemical biology, vol 1. Springer, Watson-Crick to Hoogsteen base pairing tran-
Berlin sition. Biophys J 110(3):563a–564a
46. Onsager L (1938) Initial recombination of 60. Vreede J, Bolhuis PG, Swenson DW (2017)
ions. Phys Rev 54(8):554 Path sampling simulations of the mechanisms
47. Bolhuis PG, Dellago C, Chandler D (2000) and rates of transitions between Watson-Crick
Reaction coordinates of biomolecular isomeri- and Hoogsteen base pairing in DNA. Biophys J
zation. Proc Natl Acad Sci USA 97 112(3):214a
(11):5877–5882 61. Macke TJ, Case DA (1998) Modeling unusual
48. Ensing B, Laio A, Parrinello M, Klein ML nucleic acid structures. In: Leontes NB, Santa-
(2005) A recipe for the computation of the Lucia J Jr (eds) Molecular modeling of nucleic
free energy barrier and the lowest free energy acids. American Chemical Society, Washington,
path of concerted reactions. J Phys Chem B DC, pp 379–393
109(14):6676–6687 62. Jorgensen WL, Chandrasekhar J, Madura JD,
49. Berendsen HJC, van der Spoel D, van Drunen Impey RW, Klein ML (1983) Comparison of
R (1995) GROMACS: a message-passing par- simple potential functions for simulating liquid
allel molecular dynamics implementation. water. J Chem Phys 79:926–935
Comput Phys Commun 91(1–3):43–56 63. Duan Y, Wu C, Chowdhury S, Lee MC,
50. Williams T, Kelley C et al (2013) Gnuplot 4.6: Xiong G, Zhang W, Yang R, Cieplak P,
an interactive plotting program. http:// Luo R, Lee T, Caldwell J, Wang J, Kollman P
gnuplot.sourceforge.net/ (2003) A point-charge force field for molecular
51. Watson JD, Crick FH et al (1953) Molecular mechanics simulations of proteins based on
structure of nucleic acids. Nature 171 condensed-phase quantum mechanical calcula-
(4356):737–738 tions. J Comput Chem 24:1999–2012
52. Hoogsteen K (1959) The structure of crystals 64. Darden T, York D, Pedersen L (1993) Particle
containing a hydrogen-bonded complex of mesh Ewald: an NLog(N) method for Ewald
1-methylthymine and 9-methyladenine. Acta sums in large systems. J Chem Phys
Crystallogr 12(10):822–823 98:10089–10092
53. Nikolova EN, Kim E, Wise AA, O’Brien PJ, 65. Essmann U, Perera L, Berkowitz ML,
Andricioaei I, Al-Hashimi HM (2011) Tran- Darden T, Lee H, Pedersen LG (1995) A
sient Hoogsteen base pairs in canonical duplex smooth particle mesh Ewald method. J Chem
DNA. Nature 470(7335):498–502 Phys 103:8577–8593
54. Nikolova EN, Zhou H, Gottardo FL, Alvey 66. Bussi G, Donadio D, Parrinello M (2007)
HS, Kimsey IJ, Al-Hashimi HM (2013) A his- Canonical sampling through velocity rescaling.
torical account of Hoogsteen base-pairs in J Chem Phys 126:014101
duplex DNA. Biopolymers 99(12):955–968 67. Parrinello M, Rahman A (1981) Polymorphic
55. Alvey HS, Gottardo FL, Nikolova EN, transitions in single crystals: a new molecular
Al-Hashimi HM (2014) Widespread transient dynamics method. J Appl Phys 52:7182–7190
Hoogsteen base-pairs in canonical duplex 68. Ivani I, Dans PD, Noy A, Pérez A, Faustino I,
DNA with variable energetics. Nat Commun Walther J, Andrio P, Goñi R, Balaceanu A, Por-
5:4786 tella G et al (2016) Parmbsc1: a refined force
56. Zhou H, Hintze BJ, Kimsey IJ, field for DNA simulations. Nat Methods 13
Sathyamoorthy B, Yang S, Richardson JS, (1):55–58
Al-Hashimi HM (2015) New insights into 69. Tiwary P, Berne B (2016) Spectral gap optimi-
Hoogsteen base pairs in DNA duplexes from zation of order parameters for sampling
290 Alberto Pérez de Alba Ortı́z et al.

complex molecular systems. Proc Natl Acad Sci 74. Swenson D, Prinz JH, Noe F, Chodera JD,
USA 113:2839–2844 Bolhuis PG (2019) OpenPathSampling: a
70. Sultan MM, Pande VS (2017) tICA- Python framework for path sampling simula-
metadynamics: accelerating metadynamics by tions. I. Basics. J Chem Theory Comput
using kinetically selected collective variables. J 15:813–836
Chem Theory Comput 13(6):2440–2447 75. Swenson D, Prinz JH, Noe F, Chodera JD,
71. Mendels D, Piccini G, Parrinello M (2018) Bolhuis PG (2019) OpenPathSampling: a
Collective variables from local fluctuations. J Python framework for path sampling simula-
Phys Chem Lett 9(11):2776–2781 tions. II. Building and customizing path
72. Berman HM, Westbrook J, Feng Z, ensembles and sample schemes. J Chem The-
Gilliland G, Bhat TN, Weissig H, Shindyalov ory Comput 15:837–856
IN, Bourne PE (2000) The Protein Data Bank. 76. Pérez de Alba Ortı́z A (2017) PLUMED
Nucleic Acids Res 28:235–242 Wrapper for OpenPathSampling. https://e-
73. Dama JF, Rotskoff G, Parrinello M, Voth GA cam.readthedocs.io/en/latest/Classical-MD-
(2014) Transition-tempered metadynamics: Modules/modules/OpenPathSampling/ops_
robust, convergent metadynamics via on-the- plumed_wrapper/readme.html
fly transition barrier estimation. J Chem The-
ory Comput 10(9):3626–3633

You might also like