Professional Documents
Culture Documents
Tine Curk
CAS Key Laboratory of Soft Matter Physics, Beijing National Laboratory for Condensed Matter Physics,
Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China &
Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, UK
Thomas E. Ouldridge
Department of Bioengineering & Centre of Synthetic Biology,
Imperial College London, London SW7 2AZ, UK
During the last decade coarse-grained nucleotide models have emerged that allow us to study
DNA and RNA on unprecedented time and length scales. Among them is oxDNA, a coarse-grained,
sequence-specific model that captures the hybridisation transition of DNA and many structural
properties of single- and double-stranded DNA. oxDNA was previously only available as standalone
software, but has now been implemented into the popular LAMMPS molecular dynamics code.
This article describes the new implementation and analyses its parallel performance. Practical
applications are presented that focus on single-stranded DNA, an area of research which has been so
far under-investigated. The LAMMPS implementation of oxDNA lowers the entry barrier for using
the oxDNA model significantly, facilitates future code development and interfacing with existing
LAMMPS functionality as well as other coarse-grained and atomistic DNA models.
0 0.7 0.46 3.141592653589793 0.7 4.0 & only one bond type.
1.5707963267948966 0.45 4.0 & 4 atom types
1.5707963267948966 0.45 1 bond types
pair coeff 1 4 oxdna2/hbond seqdep 1.0678
8.0 & The dimensions of the simulation box are defined as
0.4 0.75 0.34 0.7 1.5 0 0.7 1.5 0 0.7 follows:
1.5 & -20.0 20.0 xlo xhi
0 0.7 0.46 3.141592653589793 0.7 4.0 & -20.0 20.0 ylo yhi
1.5707963267948966 0.45 4.0 & -20.0 20.0 zlo zhi
1.5707963267948966 0.45 Although already stated in the input file, we need to
pair coeff 2 3 oxdna2/hbond seqdep 1.0678 provide again the masses of the nucleotides.
8.0 &
Masses
0.4 0.75 0.34 0.7 1.5 0 0.7 1.5 0 0.7
1 3.1575
1.5 &
2 3.1575
0 0.7 0.46 3.141592653589793 0.7 4.0 &
3 3.1575
1.5707963267948966 0.45 4.0 &
4 3.1575
1.5707963267948966 0.45
pair coeff * * oxdna2/xstk 47.5 0.575 0.675 The nucleotides are defined after the keyword Atoms.
& Each row contains the atom-ID (1,2,3 in the example
0.495 0.655 2.25 0.791592653589793 0.58 below), the atom type (1,1,4), the position (x,y,z), the
& molecule ID (all 1 in this case), an ellipsoidal flag (1)
1.7 1.0 0.68 1.7 1.0 0.68 1.5 0 0.65 1.7 and a density (1).
& Atoms
0.875 0.68 1.7 0.875 0.68 1 1 0.00000 0.00000 0.00000 1 1 1
pair coeff * * oxdna2/coaxstk 58.5 0.4 0.6 & 2 1 0.13274 -0.42913 0.37506 1 1 1
0.22 0.58 2.0 2.891592653589793 0.65 1.3 3 4 0.48461 -0.70835 0.75012 1 1 1
& .
.
.
0 0.8 0.9 0 0.95 0.9 0 0.95 40.0 &
3.116592653589793 Next we set the initial velocities to the desired value,
pair coeff * * oxdna2/dh 0.1 1.0 0.815 here all equal to 0. The first column contains the
atom-ID (1,2,3), the following three columns the
Please note that according to the LAMMPS parsing translational velocity and the last three columns the
rules the ampersands (&) represent line breaks. angular momentum.
Visit the LAMMPS online documentation and manual Velocities
for more information and for information on oxDNA2. 1 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.0
D. Data File and Setup Tool .
.
.
Note that this is our special choice in the setup tool.
The data file contains all relevant structural The velocities can be generally initialised to any value.
parameters for the simulation, i.e. details about the Large values will lead to the FENE springs becoming
number of atoms, the topology of the molecules, overstretched and may provoke an early abortion of
the size of the simulation box, initial velocities, etc. the run.
The LAMMPS implementation of oxDNA follows the The ellipsoids are defined with atom-ID, shape
standard form as discussed in the LAMMPS user (1.17398 to produce the correct moment of inertia)
manual. We outline the relevant parts below. and initial quaternion (last four columns).
At the beginning of the data file the total number Ellipsoids
of particles and bonds has to be given. As we are 1 1.17398 1.17398 1.17398 1.00000 0. 0. 0.
using hybrid particles, we need to set the same number 2 1.17398 1.17398 1.17398 0.95534 0. 0.
of ellipsoids. For a standard DNA duplex consisting 0.29552
of 8 complementary base pairs we need 16 atoms, 16 3 1.17398 1.17398 1.17398 0.82534 0. 0.
ellipsoids and 14 bonds, 7 on each of the two single 0.56464
.
.
strands. If the strands are nicked, which we do not .
assume here, the number of bonds would be reduced. Finally, we specify the bond topology. The first col-
16 atoms umn contains the bond-ID (1,2,3), the second one the
16 ellipsoids bond type (1) and the third and fourth the IDs of the
14 bonds two bond partners.
We use four atom types to represent the four different Bonds
nucleotides in DNA (A=1, C=2, G=3, T=4). We use 1 1 1 2
6
integrators or the integrity of the pair interactions at standard integrator fluctuates more around the cor-
a given timestep size ∆t. rect value. This, however, is more or less a transient
The C integrator in Ref. [31], to which we refer effect as longer runs show there is no permanent drift
as “DOT-C integrator”, is invoked by replacing the away from the correct result. For Langevin dynam-
standard NVE integrator for aspherical particles and ics, it is not possible to evaluate the accuracy and
the fix for Langevin dynamics stability in the same way. We opted instead for an
fix 1 all nve/asphere estimate based on the average kinetic, rotational, po-
fix 2 all langevin 0.1 0.1 0.03 457145 tential and total energy of the benchmark. Again,
angmom 10 we performed runs of τ = 10000 Lennard-Jones time
units length, this time thermalised, and averaged the
with one single fix results over the time interval. The number of MD-
fix 1 all nve/dotc/langevin 0.1 0.1 0.03 & timesteps and the output frequency for each timestep
457145 angmom 10 size were adapted so that the total physical simula-
To measure the accuracy of the new integrators, tion time and the statistical basis of the error calcu-
we run a test case consisting of a short, nicked du- lations were consistent. The temperature in reduced
plex with 8 base pairs (16 nucleotides). Fig. 3 shows LJ-units was set to T = 0.1, whereas the translational
the accuracy measured through the normalised dif- and rotational friction or damping coefficients were
ference between the total energy Etot for this partic- set to γ = 1/0.03 and Γ = 1/0.3, respectively. The
ular benchmark and the total energy at the begin- results are summarised in Tab. I. These values were
∗
ning of the run Etot . We compared the standard fix used during the verification of the LAMMPS imple-
nve/asphere integrator, which is based on a Richard- mentation because they produced relatively smooth
son iteration in the update of the quaternion degrees trajectories that could be easily followed. For actual
of freedom, to the new DOT integrator, which uses a production runs it may be more appropriate to use
rotation sequence to update the quaternions. Shown different values to allow a better and more efficient
are results for two different timestep sizes ∆t = 10−3 sampling of the configuration space.
and ∆t = 10−4 . Both simulations were run for the Based on three translational and three rotational
same physical simulation time to allow direct com- degrees of freedom per nucleotide and 8 base pairs we
parison of the deviations of a dynamical run. As this expect kinetic and rotational energies Ekin = Erot =
is done in the NVE ensemble and without noise, the 2.4 for a temperature settings T = 0.1. This is
energy should be exactly conserved. This corresponds very well achieved for all timestep sizes and both
to a straight, horizontal line at 0. integrators, the standard LAMMPS integrator fix
nve/asphere & fix langevin and the DOT-C inte-
grator fix nve/dotc/langevin. However, there ap-
16
nve/asphere dt=1e-3 pears to be a slight decrease in the DOT-C integrator
14 nve/asphere dt=1e-4 for very large step sizes (∆t = 2·10−2 ). The deviation
nve/dot dt=1e-3
12 nve/dot dt=1e-4 of the total energy between all timestep sizes, admit-
tedly an ad hoc criterion to quantify the stability of
(Etot / E*tot -1) · 10-5
10
the integrators, but one that is rather hard for the
8 integrators to get exactly right, is in the sub-percent
6 range. It is actually slightly better for the DOT-C in-
4 tegrator than for the standard LAMMPS integrator.
The statistical errors, reported in Tab.I, are the stan-
2
dard deviations of a linear least square fit and show
0 that the deviations are well above the uncertainty of
-2 the fits.
-4 Remarkably, for the DOT-C integrator the limit for
0 2000 4000 6000 8000 10000
τ (LJ time units)
a stable integration is ∆t = 2 · 10−2 , which represents
a very large timestep size. This is about 4 times larger
∗ ∗ than the maximum timestep size for which the stan-
FIG. 3. Relative normalised accuracy (Etot −Etot )/Etot of
dard LAMMPS Langevin integrator produces sound
the standard LAMMPS NVE integrator for aspherical par-
ticles and the NVE DOT integrator from Ref. [31]. Etot ∗ results. Because of the more complex rotations in
is the total free energy at the beginning of the simulation quaternion space and various additional transforma-
runs. tions that the DOT-C integrator requires there is a
small overhead of about 15% compared to the stan-
dard LAMMPS integrator. Nevertheless, this small
It is obvious that above a certain timestep size the overhead of the DOT-C integrator is very well com-
accuracy of the new DOT integrator is slightly infe- pensated by the computational efficiency and possi-
rior compared to the standard integrator. Up to a bility to increase the timestep size by 400% (from a
certain point the DOT integrator actually seems to maximum of ∆t = 5 · 10−3 for the standard LAMMPS
deviate further from the correct result, whereas the integrator to ∆t = 2·10−2 for the DOT-C integrator).
8
TABLE I. Average kinetic, rotational, potential and total energy for the standard LAMMPS integrator fix nve/asphere
& fix langevin and the DOT-C integrator nve/dotc/langevin for different timestep sizes.
V. PERFORMANCE ANALYSIS
1
We devised a few simple benchmarks to study the 103
Parallel Efficiency
Speedup vs 24 MPI-tasks
parallel performance of the LAMMPS implementa-
tion. The size of each benchmark is well beyond
the current capabilities of the standalone version, so
each demonstrates as well a minimal performance re- 102 MPI-tasks
quirement. The benchmarks consisted of arrays of 0
100 1000 10000
101
Ideal performance
60 kbp
960 kbp
100
100 1000 10000
MPI-tasks
marks relative to the single node performance with factor 1.6 with a performance of roughly 100 µs per
24 MPI-tasks. The code performs well for the LD day. Note that comparing the wall times is only an
benchmark up to about 128 MPI-tasks with a parallel approximate way to compare the performance as there
efficiency around 95% (see the inset). Beyond sev- is no guarantee that similar processes take a similar
eral hundred MPI-tasks a gradual performance degra- simulation time in the two models.
dation is observed. At 2048 MPI-tasks the parallel Apart from the enhanced stability of the rigid
efficiency has decreased to about 45% and the total body integrator, this difference in performance will be
speedup is roughly 930-fold compared to the single caused by the different number of degrees of freedom
core performance (39-fold compared to the single node that both models require: oxDNA/oxDNA2 uses only
performance). 13 degrees of freedom per nucleotide (3 coordinate po-
A look at the ratio of the number of local atoms, sitions, 3 translational momenta, 3 angular momenta
i.e. those that are inside a process boundary, to the and 4 quaternion degrees of freedom), whereas 3SPN.2
number of ghost atoms, i.e. those which need to be uses 18 degrees of freedom per nucleotide (3 particles
communicated via neighbour lists, proves that the ob- with each 3 coordinate positions and translational mo-
served performance degradation is due to the com- menta).
parably small size of the problem. At the largest Unfortunately, we could not measure the parallel
core counts there are on average only about 60 local performance of 3SPN.2. But this conceptual differ-
atoms present on each process, whereas the number ence between the two models is very likely to entail
of ghost atoms is with about 225 atoms almost four further detrimental effects when running in parallel.
times larger. LAMMPS is known to require at least a With the larger number of degrees of freedom per
few hundred local atoms or more for a good parallel nucleotide in 3SPN.2, communication overheads are
performance [35]. The speedup is still relatively good likely to build up more quickly and neighbour lists
because the fraction of time that the algorithm spends are longer and probably have to be rebuilt more fre-
in the force calculation is still comparably large. For quently. On the other hand, the current LAMMPS
the HD benchmark, 16 times larger than the LD case, implementation of oxDNA offers further potential for
the performance degradation is more or less mirrored optimisation as it spends a good part its time com-
at core counts that are about 16 times larger. For puting the inverse cosine (around 12%, see Appendix
the HD benchmark the total speedup at 16384 MPI- A). This could be alleviated for instance through the
tasks is 9680-fold with respect to the single core per- introduction of appropriate lookup tables for trigono-
formance (400-fold compared to the single node per- metric functions.
formance) and the parallel efficiency is still at around
60%.
These two examples are of course slightly idealised VI. APPLICATIONS
in the sense that both benchmarks fulfil easily the
requirement of good load-balancing, which is neces- The structural properties of DNA such as the persis-
sary to obtain a good scaling performance. LAMMPS, tence length, radius of gyration and torsional rigidity
however, features sophisticated load-balancing algo- play an important role in its function. Characteris-
rithms which permit good scaling behaviour also for ing these properties and their dependence on different
very inhomogeneous systems. We are planning to ex- conditions is therefore fundamental for highly complex
tend the existing implementation to benefit further processes such as DNA packaging, replication and de-
from recent developments pertaining to threaded par- naturation. Experimentally, however, making these
allelisation on shared memory architectures such as measurements is not an easy task as it requires subtle
many-core chips and general purpose graphical pro- manipulation of single molecules and direct measure-
cessing units (GPGPUs). ment of their response to applied forces or displace-
One of the major advantages of the new LAMMPS ments, which can then be related to the elasticity of
implementation is that it can be directly compared DNA. By using coarse-grained computational models
with other coarse-grained models that are also based like oxDNA, we can study these systems in more de-
on the LAMMPS code. To this end, we compared tail. These simulations can in turn provide insights
the single core performance of oxDNA2 with that of into experimental data or the performance of other
3SPN.2 [14]. The benchmark consisted of two comple- theoretical approaches.
mentary dsDNA duplexes of 8 bps with implicit ions. The radius of gyration is a particularly useful de-
In order to compare both models we set the transla- scriptor of the structure and compactness of macro-
tional friction coefficient γ to about (300 fs)−1 . We molecules. For ssDNA the radius of gyration Rg can
opted for the maximum timestep size that provided be defined as
a stable integration, which was ∆t = 35 fs (3SPN.2)
and ∆t = 48 fs (oxDNA2 + DOT-C integrator), re- N
spectively. 1 X
Rg2 = (ri − r̄)2 (6)
On a single Intel Core i7 2.8 GHz processor us- N i=1
ing the latest version of LAMMPS (16 March 2018)
3SPN.2 delivered a performance of about 60 µs per where N is the number of nucleotides, ri is the po-
PN
day. oxDNA2 was able to surpass this by about a sition of the i − th nucleotide and r̄ = N1 i=1 ri is
10
Contact Fraction fh
oxDNA2 model for two different systems: a sequence
of ssDNA from a λ-bacteriophage that has a multi- 0.3
Rg [nm]
14
tude of applications in microbial and molecular ge- Contact Fraction λ-ssDNA
Rg λ-ssDNA
netics and serves e.g. as cloning vector, as well as 13
0.2
complete ssDNA sequence of the pUC19 plasmid, an-
other model organism and cloning vector, which con- 12
veys antibiotic resistance. 0.1
We performed Langevin dynamics simulations of 11
18
ing strength (see poly-A-avstk and poly-T-avstk). Fi-
16 nally, for higher temperatures self-hybridisation be-
comes less important and the radius of gyration ap-
14 proaches the same plateau value for all sequences.
12
10
0 10 20 30 40 50 60
Temperature [°C]
In Fig.6 the initial 500 nucleotide long sequence of FIG. 8. Simulation snapshots of the λ-ssDNA sequence for
ssDNA λ-DNA is compared with different linear DNA three different temperatures, at 0◦ C, 20◦ C and 60◦ C. The
molecules of the same length, namely poly-A and poly- type of each nucleotide is represented by a colour scheme:
T strands. The radius of gyration as a function of A (white), T (Cyan), G (blue) and C (red).
temperature is shown. For λ-DNA we observe that Rg
increases with temperature until a plateau is reached
at around 50◦ C. While the λ-DNA sequence allows A fraction of complementary nucleotides (A-T or
hybridisation along the ssDNA (see Fig.7), the same G-C) on the single-stranded λ-DNA chain are close
11
Contact Fraction fh
DNA and significantly more rigid hairpins of dsDNA
(the persistence length of dsDNA is 50 nm, around 0.3
Rg [nm]
45
thirty times larger than that of ssDNA) is mediated
by e.g. changes in the temperature, salt concentra- 40 0.2
tion or pH value. In Fig.7 we show the radius of gy-
ration Rg and the contact fraction (the number of
35
contacts Nc normalised by half the number of nu- 0.1
cleotides in the ssDNA strand, which is the maxi-
30
mum number of possible base pairs) for the single-
0
stranded λ-DNA versus temperature. A contact was 0 10 20 30 40 50 60
defined when the hydrogen-bonding interaction sites Temperature [°C]
of any two nucleotides were less than 0.45 length units
apart, regardless of the individual bases. In princi- FIG. 9. Dependence of the radius of gyration Rg and
ple, this criterium cannot prevent stacked, nearest- the fraction of direct intra-chain nucleotide contacts in the
neighbour nucleotides from being counted as a con- ssDNA pUC19 plasmid sequence on the temperature.
tact. Nevertheless it proved sufficiently accurate for
a perfect dsDNA duplex where the number of con-
tacts Nc = N/2. Additionally, this definition will
tend to include mismatched base pairs in a duplex
as contacts. It will thus overestimate the number of
correctly-formed Watson-Crick base pairs, but for our
purposes it is more important that 2Nc provides a
good estimate of the number of bases incorporated
into hairpin structures.
At 0◦ C, around 44% of the nucleotides are involved
in contacts. When the temperature increases, the sys-
tem destabilises and the number of contacts decreases
significantly until it flattens out at 50◦ C (the same
temperature at which Rg has a plateau). However,
while the contact fraction changes dramatically (more
than a factor 40 from about 0.44 to 0.01) in this tem-
perature range, there is only a small change in the
radius of gyration (around a factor 1.45 from 11.3 nm
to 16.4 nm). Related snapshots from simulations at
selected temperatures are given in Fig. 8.
We apply the same protocol as before to the pUC19
plasmid, consisting of a ssDNA sequence of 2686 nuc-
leotides. For simplicity we opted for a linear molecule FIG. 10. Simulation snapshots of the pUC19-ssDNA se-
with freely rotating ends. The radius of gyration as quence for two different temperatures, at 20◦ C and 60◦ C.
a function of temperature is shown in Fig.9. The be- The type of each nucleotide is represented by a colour
haviour is very similar to the one of λ-ssDNA, par- scheme: A (white), T (Cyan), G (blue) and C (red).
ticularly the minor effect that temperature changes
have on Rg despite dramatic changes in the number
of contacts between nucleotides. While for λ-ssDNA
the radius of gyration at 20◦ C equals 4.5% of its total same temperature. As reference we also modelled the
contour length, in the case of the plasmid Rg repre- double-stranded linear pUC19 plasmid, for which we
sents only 2.2%. Using the theoretical expression for measured values of Rg in the region of 130 nm at 20◦ C
Rg in Eq. 8 below and monomer length a =0.65 nm, and 170 nm at 60◦ C, respectively, so about a factor 3
Kuhn segment length b =2 nm and Flory exponent to 4 larger than the values of Rg we obtained for the
ν = 0.588, this gives Rg /aN = 5% (λ-DNA) and ssDNA sequence.
2.5% (plasmid), respectively. Hence, the computa- In Fig.10 we can see that at 20◦ C several nucleotides
tional values are about 10% smaller than the theoret- have hybridised, forming hairpin structures of 20-
ical values, but generally consistent with the latter. At 30 bp located along the plasmid. When we increase
around 50◦ C Rg reaches a plateau, which is at least the temperature of the system up to 60◦ C the hair-
constant within the error bars. It is interesting to see pins disappear as self-hybridisation is suppressed, ac-
that the λ-ssDNA exhibits the same tendency at the counting for the substantial reduction of intra-chain
12
VII. CONCLUSIONS
fh2 fh2
hRg2 i = Rg,ss
2
1− 2
+ Rg,h 4fh − 3 (13)
The implementation of the oxDNA model for
k k
coarse-grained DNA modelling into a community
with Rg,ss and Rg,h given by Eqs. (9) and (10), re- molecular dynamics code such as LAMMPS reduces
spectively, and the contact fraction fh = 2Nc /N . the entry barrier of using the model significantly.
We have assumed that hairpins do not interact Moreover, it allows to combine this coarse-grained
with the ssDNA polymer, or with other hairpins, and force field with different features that are already en-
that all k hairpins are of the same length. How- abled in LAMMPS.
ever, even this relatively simple, idealised derivation The Langevin-type rigid-body integrators that are
demonstrates that the radius of gyration depends on distributed together with the LAMMPS USER-
both the number of contacts and the hairpin length. package, particularly the DOT-C integrator, offer ad-
This is shown in Fig. 11 for a sequence of N = 500 ditional advantages over the existing standard rigid-
nucleotides, i.e. the length of our λ-ssDNA. The de- body integrators for Langevin dynamics. They show
pendence on the number of contacts is obviously non- improved stability at the costs of a very small over-
monotonous. The values k = 1 and k = Nc /2 (assum- head. This permits larger timesteps and therefore
ing a hairpin needs at least 2 contacts to be labelled larger physical simulation times.
as a hairpin) are the limits of the possible hairpin The parallel performance of the MPI-only imple-
distribution and corresponding values for hRg2 i pro- mentation, as demonstrated through scaling tests us-
vide the upper and lower physical limit for the ex- ing a simple benchmark, is excellent provided there
pected value of the radius of gyration. The simula- are at least a few dozen particles per MPI-task. These
tions, Fig. 7, result in a radius of gyration around results show effectively that the oxDNA model is well
12.6 nm and 11.3 nm at the observed contact frac- suited for large and extremely large problems in DNA
tion of around 32% and 45%, respectively, in good and RNA modelling. It can tackle problem sizes that
agreement with the theoretical prediction shown on were well beyond the reach of the original standalone
Fig. 11. We also see that for an even larger number of implementation of the model. It is worth mentioning
contacts the possible range of values of hRg2 i is quite that the GPU-accelerated version of the standalone
wide. This is of course a much idealised and simpli- code is also limited to speedups of typically a factor
fied reasoning, but it elucidates the non-trivial nature 30 compared to the single core performance. Based
of these interdependencies.The theory neglects the ex- on the scaling analysis of the benchmarks it could be
cluded volume of regions cut out of the contour length said that this is matched by the performance of a sin-
by hybridisation; taking this into would increase the gle multi- or many core chip.
Rg2 in Eq. (12), while additional bases cut out by hy- The applications we opted for, a sequence of lin-
bridisation but not contributing to Nc would decrease ear, single-stranded λ-bacteriophage and pUC19 plas-
it. We speculate that the two effects cancel out, to a mid DNA, are motivated primarily by currently ongo-
degree, resulting in a good agreement between theory ing projects in the under-investigated area of single-
and simulations. stranded DNA, rather than by an attempt to harvest
the performance of the new
������ �� �������� (<� � �>)�/� [��]
remain there. However, by using a number of inde- computing Service (eCSE05-10). YAGF acknowl-
pendent configurations we have presumably reached edges support from the Mexican National Coun-
states that are representative, although these are not cil for Science and Technology (CONACyT, PhD
guaranteed to be the most stable ones. Grant 384582). TC acknowledges support from the
In the future it may be possible to focus on ring mo- Herchel Smith Scholarship and the CAS PIFI Fel-
lecules that contain superhelical twist and have differ- lowship. TEO acknowledges his Royal Society Uni-
ent number of helical turns compared to their natural versity Research Fellowship. OH acknowledges sup-
form. These rings may be opened by introducing a port from the EPSRC Early Career Fellowship Scheme
single strand break, which releases the superhelical (EP/N019180/2).
twist, a mechanism that is known to be highly rele-
vant during gene replication and expression.
AUTHORS CONTRIBUTIONS
[39] P.-G. de Gennes, Scaling concepts in polymer physics The large fraction of the inverse cosine is more dif-
(Cornell University Press, 1979). ficult to optimise. It emerges in the stacking, cross-
and coaxial stacking and hydrogen bonding interac-
tions through a partial derivative with respect to the
APPENDIX
relative distances. A previous version of the imple-
mentation spent a whopping 29% of its time calculat-
ing the inverse cosine. This prohibitively large figure
Appendix A: Profiling could be cut down to the current 12% by introduc-
ing appropriate early-rejection criteria in each force
Profiling allows a detailed analysis of the imple- calculation. Further improvements might be possible
mentation and gives an overview of how much time through small-argument approximations of the inverse
the code spends in each individual subroutine. We cosine. This will be tested in a future version of the
used the Craypat Performance Tools on the ARCHER code (e.g. for the upgrade to oxDNA 2.0).
UK National Supercomputing Service to conduct sam- At 2048 MPI-tasks, shown on the right of Fig. 12,
pling experiments of the high and low density bench- the code spends more than 50% of its time in call
marks. Although the experiments where actually per- to the MPI-library. The percentage of time in the
formed with the oxDNA model, the results are rep- force calculation has fallen to about 43%. As stated
resentative as well for oxDNA2 as the only difference above, this is primarily the consequence of an insuffi-
between the two is a different local geometry of the cient number of local atoms with respect to the num-
interaction sites and an additional pair interaction in ber of ghost atoms, and does not reflect a problem
form of a Debye-Hückel potential. with the parallel performance of the implementation.
Fig. 12 shows a pie chart of the low density (LD) For the HD benchmark on a single node, shown on
run. The image on the left shows the results on a sin- the left in Fig. 13, calls to the MPI-library are below
gle node with 24 MPI-tasks, whereas the image on the 3%. The conversion of quaternions to 3-vectors
right is for 2048 MPI-tasks. Focussing first on a single (MathExtra::q to exyz) and the calculation of the in-
node, calls to the MPI-library are below 5% and do not verse cosine (acos) are constant at about 12%. At 2048
appear with an individual pie section. The total time MPI-tasks we observe a parallel efficiency of about
spent in the force calculation is around 86% (according 85%. The time spent in the force calculation is still
to the LAMMPS breakdown). Interestingly, a signifi- about 82% (according to the LAMMPS breakdown)
cant fraction of the time is spent on calculating the lo- with calls to the MPI-library amounting to just below
cal body coordinate system of the nucleotide from the 13%. The CPU time of the quaternion conversion to
quaternion degrees of freedom (MathExtra::q to exy, the local body frame of the nucleotide and the inverse
11.3%). cosine each at are around 9% due to the larger share
A significant portion falls also on the calculation of of the calls to the MPI-library.
the inverse cosine (acos, 12.1%). The conversion from
quaternions to 3-vectors is done separately in every
Appendix B: Derivation of hRg2 i
single interaction. This has been done for simplic-
ity, but represents a 6-fold overhead as it could be
optimised by calculating the 3-vectors only once per We assume that the position of hairpin bases, as
timestep, then saving the for later use by the inter- well as the orientation of hairpins, is uniformly ran-
actions. This optimisation would come at increased dom and uncorrelated along the ssDNA contour, for-
communication as the additional nine components of mally: hxi i = 0, hx2i i = Rg,ss2
, hxi xj i = 0 for i 6= j,
the three unit vectors would have to be communi- and similarly for the orientation: hn̂i i = 0, hn̂2i i = 1,
cated across the process boundaries. Another pos- hn̂i n̂j i = 0 for i 6= j, hxi n̂j i = 0. These properties
sibility, and a major adaptation, would be to formu- result in
late the entire force calculation in generalised quater-
hcm i = 0
nion forces and torques, therefore avoiding the trans-
formation in the first place. We decided deliberately and
against this possibility as this would require calcula-
fh2 2
tion of four force and torque components in quaternion hc2m i = (Rg,ss + l2 /4).
space. The calculation with 3-vectors on the other k
hand, as currently implemented, requires only three The average Rg2 becomes
force and torque components. Perhaps most impor-
tantly, they can be made available directly to the other hRg2 i = (1 − fh )Rg,ss
2
+ (1 − fh )hc2m i + fh Rg,h
2
+
LAMMPS routines. It is thus very likely that a perfor- fh D X l E
mance gain from avoiding the transformation would + (xi + n̂i − cm )2 . (B1)
k i
2
be outweighed either by the larger number of addi-
tional components and generalised quaternion forces The average of the sum is
and torques which also had to be communicated across k k n
the process boundaries or by disadvantages from a
DX l E X l2
(xi + n̂i − cm )2 = hx2i i + hc2m i + hn̂2i i +
software engineering point of view. i=1
2 i=1
4
16
FIG. 12. Craypat performance analysis of a sampling experiment for the low density benchmark (60 kbp) on
a single node (left, 24 MPI-tasks) and for 2048 MPI-tasks (right). Note that the assigned colour code for the
functions is different in both cases.
FIG. 13. Craypat performance analysis of a sampling experiment for the high density benchmark (960 kbp) on
a single node (left, 24 MPI-tasks) and for 2048 MPI-tasks (right). Note that the assigned colour code for the
functions is different in both cases.
17
o
2
+ lhxi n̂i i − 2hxi cm i − lhn̂i cm i 12Rg,h .
Using these relations the expected value for the
l2 l2 squared radius of gyration is obtained
= kRg,ss + fh2 (Rg,ss
2
+ l2 /4) + k 2
− 2fh Rg,ss − fh
4 2
(B2)
f2 f2
hRg2 i = 2
Rg,ss 1− h + 2
Rg,h 4fh − 3 h , (B3)
k k
P fh P l
using that i hxi cm i = k ij hxi (xj + 2 n̂j )i = with Rg,ss and Rg,h given by Eqs. (9) and (10), re-
2 l 2
P
fh Rg,ss and i hn̂i cm i = fh 2 . Furthermore, l = spectively, and the contact fraction fh = 2Nc /N .