Professional Documents
Culture Documents
Molecular Dynamics
with GROMACS
Molecular Modeling Course 2007
For most purposes these approximations work great, but they cannot reproduce quantum effects like
bond formation or breaking. All common force fields subdivide potential functions in two classes.
Bonded interactions cover covalent bond-stretching, angle-bending, torsion potentials when rotating
around bonds, and out-of-plane “improper torsion” potentials, all which are normally fixed throughout
a simulation – see Fig. 2. The remaining nonbonded interactions consist of Lennard-Jones repulsion and
dispersion as well as electrostatics. These are computed from neighbor lists updated every 5-10 steps.
Given the potential and force (negative gradient of potential) for all atoms, the coordinates are up-
dated for the next step. For energy minimization, the steepest descent algorithm simply moves each atom
a short distance in direction of decreasing energy, while molecular dynamics is performed by integrating
Newton’s equations of motion:
∂V (r1,...,rN )
Fi = −
∂ri
∂ 2ri
mi = Fi
∂t 2
The updated coordinates are then used to evaluate the potential energy again, as shown in the flow-
chart of Fig. 3.
Typical biomolecular simulations
€ apply periodic boundary conditions to avoid surface artifacts, so that
a water molecule that exits to the right reappears on the left; if the box is sufficiently large the molecules
will not interact significantly with their periodic copies. This is intimately related to the nonbonded in-
teractions, which ideally should be summed over all neighbors in the resulting infinite periodic system.
Simple cut-offs can work for Lennard-Jones interactions that decay very rapidly, but for Coulomb inter-
actions a sudden cut-off can lead to large errors. One alternative is to “switch off” the interaction before
the cut-off as shown in Fig. 4, but a better option is to use Particle-Mesh-Ewald summation (PME) to
calculate the infinite electrostatic interactions by splitting the summation into short- and long-range
parts. For PME, the cut-off only determines the balance between the two parts, and the long-range part
is treated by assigning charges to a grid that is solved in reciprocal space through Fourier transforms.
Cut-offs and rounding errors can lead to drifts in energy, which will cause the system to heat up dur-
ing the simulation. To control this, the system is normally coupled to a thermostat that scales velocities
during the integration to maintain room temperature. Similarly, the total pressure in the system can be
adjusted through scaling the simulation box size, either isotropically or separately in x/y/z dimensions.
The single most demanding part of simulations is the computation of nonbonded interactions, since
millions of pairs have to be evaluated for each time step. Extending the time step is thus an important
way to improve simulation performance, but unfortunately errors are introduced in bond vibrations
already at 1 fs. However, in most simulations the bond vibrations are not of interest per se, and can be
The purpose of this tutorial is not to master all parts of Gromacs’ simulation and analysis tools in
detail, but rather to give an overview and “feeling” for the typical steps used in practical simulations.
Since the time available for this exercise is rather limited we will focus on a sample simulation system
and perform some simplified analyses - in practice you would typically use one or several weeks for the
production simulation.
Fig. 2: Typical molecular mechanics interactions used in Gromacs.
removed entirely by introducing bond constraint algorithms (e.g. SHAKE or LINCS). Constraints make
it possible to extend time steps to 2 fs, and in addition the fixed-length bonds are actually better ap-
proximations of the quantum mechanical grounds state than harmonic springs!
In principle, the most basic system we could simulate would be water or even a gas like Argon, but
to show some of the capabilities of the analysis tools we will work with a protein: Lysozyme. This is a
fascinating enzyme that has ability to kill bacteria (kind of the body’s own antibiotic), and is present e.g.
in tears, saliva, and egg white. It was discovered by Alexander Fleming in 1922, and one of the first pro-
tein X-ray structures to be determined (David Phillips, 1965). The two sidechains Glu35 and Asp52 are
crucial for the enzyme functionality. It is fairly small by protein standards (although intermediate to
large by simulation standards...) with 164 residues and a molecular weight of 14.4 kdalton - including
hydrogens it consists of 2890 atoms (see illustration on front page), although these are not present in
the PDB structure since they only scatter X-rays weakly (few electrons).
In the tutorial, we are first going to set up your Gromacs environments (might already be done for
you), have a look at the structure, prepare the input files necessary for simulation, solvate the structure
in water, minimize & equilibrate it, and finally perform a short production simulation. After this we’ll
test some simple analysis programs that are part of Gromacs.
1. You will need a Unix-type system with an X11 window environment (look at the install disks if you
have a mac). You can use CygWin under windows, although this is a bit more cumbersome, and you
will also need to install a separate X11 server.
2. Gromacs uses FFTs (fast fourier transforms) for some interactions, in particular PME. While
Gromacs comes with the slower FFTPACK libraries built-in, it is a very good idea to install the freely
available and faster FFT libraries from http://www.fftw.org. If you have an Intel mac you’re in luck,
since the Gromacs package comes with this included. You can also find finished RPMs for x86 Linux on
the Gromacs or FFTW sites. If you need to install this from source, unpack the distribution tarball with
“tar -xzvf fftw-3.1.2.tar.gz”, go into the directory and issue “./configure --enable-float” (since Gromacs
uses single precision by default) followed by “make” and finally become root and issue “make install”. If
you do not have root access you can install it under your home directory, using the switch “--prefix=/
path/to/somewhere” when you run the configure script.
3. Install Gromacs from a package, or compile
it the same way as FFTW. Initial input data:
Interaction function V(r) - "force field"
4. Install PyMol from the web site
coordinates r, velocities v
http://www.pymol.org, or alternatively VMD.
No
The PDB Structure - think Done!
before you simulate! Fig. 3: Simplified flowchart of a standard molecular
The first thing you will need is a structure of dynamics simulation, e.g. as implemented in Gromacs.
the protein you want to simulate. Go to the
Protein Data Bank at http://www.pdb.org, and
search for “Lysozyme”. You should get lots of hits, many of which were determined with bound ligands,
mutations, etc. One possible alternative is the PDB entry 1LYD, but feel free to test some other alterna-
tive too.
It is a VERY good idea to look carefully at the structure before you start any simulations. Is the reso-
lution high enough? Are there any missing residues? Are there any strange ligands? Check both the PDB
web page and the header in the actual file. Finally, try to open it in a viewer like PyMOL, VMD, or
RasMol and see what it looks like. In PyMOL you have some menus on the right where you can show
(“S”) e.g. cartoon representations of the backbone!
There are several places where you can find documentation for the Gromacs commands. You should
always get some help with “pdb2gmx -h”, which first includes a brief description of what the program
does, then a list of (optional) input/output files, and finally a list of available flags. The same informa-
tion is present with “man pdb2gmx” (if it was properly installed), and you can also use the
www.gromacs.org website: Choose “documentation” in the top menu, and then “manuals” in the left-
side menu. There is both an extensive book describing everything and a shorter online reference to the
commands and file formats.
For the first command we’ll help you. pdb2gmx should read its input from a coordinate file, and just
for fun we’ll pick a specific water model and force field. Type
The program will ask you for a force field - for this tutorial I suggest that you use “OPLS-AA/L”, and
then write out lots of information. The important thing to look for is if there were any errors or warn-
ings (it will say at the end). Otherwise you are probably fine. Read the help text of pdb2gmx (-h) and
try to find out what the different files it just created do, and what options are available for pdb2gmx!
The important output files are (find out yourself what they do)
If you issue the command several times, Gromacs will back up old files so their names start with a
hash mark (#) - you can safely ignore (or erase) these for now.
However, before adding solvent we have to determine how big the simulation box should be, and
what shape to use for it. There should be at least half a cutoff between the molecule and the box side,
and preferably more. Due to the limited time you be cheap here and only use 0.5nm margin between
the protein and box. To further reduce the volume of the box you could explore the box type options
and for instance test a “rhombic dodecahedron” box instead of a cubic one. The program you should use
to alter the box is “editconf”. This time you will have to figure out the exact command(s) yourself,
but the primary options to look at in the documentation are “–f”, “–d”, “–o”, and “–bt”.
As an exercise, you could try a couple of different box sizes (the volume is written on the output!)
and also other box shapes like “cubic” or “octahedron”. What effect does it have on the volume?
The last step before the simulation is to add water in the box to solvate the protein. This is done by
using a small pre-equilibrated system of water coordinates that is repeated over the box, and overlapping
water molecules removed. The Lysozyme system will require roughly 6000 water molecules with 0.5nm
to the box side, which increases the number of atoms significantly (from 2900 to over 20,000).
GROMACS does not use a special pre-equilibrated system for TIP3P water since water coordinates can
be used with any model – the actual parameters are stored in the topology and force field. For this rea-
son the program only provides pre-equilibrated coordinates for a small 216-water system using a water
model called “SPC” (simple point charge). The best program to add solvent water is “genbox”, i.e.
you will generate a box of solvent. You can select the SPC water coordinates to add with the flag “–cs
spc216.gro”, and for the rest of the choices you can use the documentation. Remember that your
topology needs to be updated with the new water molecules! The easiest way to do that is to supply the
topology to the genbox command.
Before you continue it is once more a good idea to look at this system in PyMOL. However, Py-
MOL and most other programs cannot read the Gromacs gro file directly. There are two simple ways to
create a PDB file for PyMOL:
1. Use the ability of all Gromacs programs to write output in alternative formats, e.g.:
genbox –cp box.gro –cs spc216.gro –p topol.top –o solvated.pdb
2. Use the Gromacs editconf program to convert it (use -h to get help on the options):
editconf -f solvated.gro -o solvated.pdb
If you just use the commands like this, the resulting structure might look a bit strange, with water in
a rectangular box even if the system is triclinic/octahedron/dodecahedron. This is actually fine, and de-
pends on the algorithms used to add the water combined with periodic boundary conditions. However,
to get a beautiful-looking unit cell you can try the commands (backslash means the entire command
should be on a single line):
We’re not going to tell you what they do - play around and use the -h option to learn more :-)
GROMACS uses a separate preprocessing program grompp to collect parameters, topology, and
coordinates into a single run input file (e.g. em.tpr) from which the simulation is then started (this
makes it easier to prepare a run on your workstation but run it on a separate supercomputer).
As we just said, the command grompp prepares run input files. You will need to provide it with
the parameter file above, the topology, a set of coordinates, and give the run input file some name (oth-
erwise it will get the default topol.tpr name). The program to actually run the energy minimization is
mdrun - this is the very heart of Gromacs. mdrun has a really smart shortcut flag called “-
deffnm” to use a string you provide (e.g. “em”) as the default name for all options. If you want to
get some status update during the run it is also smart to turn on the verbosity flag (guess the letter,
or consult the documentation). The minimization should finish in a couple of minutes.
------pr.mdp------
integrator = md
nsteps = 5000
dt = 0.002
nstlist = 10
rlist = 1.0
coulombtype = pme
rcoulomb = 1.0
vdw-type = cut-off
rvdw = 1.0
tcoupl = Berendsen
tc-grps = protein non-protein
tau-t = 0.1 0.1
ref-t = 298 298
Pcoupl = Berendsen
tau-p = 1.0
compressibility = 5e-5 5e-5 5e-5 0 0 0
ref-p = 1.0
nstenergy = 100
define = -DPOSRES
------------------
For a small protein like Lysozyme it should be more than enough with 100 ps (50,000 steps) for the
water to fully equilibrate around it, but in a large membrane system the slow lipid motions can require
several nanoseconds of relaxation. The only way to know for certain is to watch the potential energy,
and extend the equilibration until it has converged. To run this equilibration in GROMACS you
should once more use the programs grompp and mdrun, but (i) apply the new parameters, and (ii)
start from the final coordinates of the energy minimization.
------run.mdp------
integrator = md
nsteps = < select yourself! >
dt = 0.002
nstlist = 10
rlist = 1.0
coulombtype = pme
rcoulomb = 1.0
vdw-type = cut-off
rvdw = 1.0
tcoupl = Berendsen
tc-grps = protein non-protein
tau-t = 0.1 0.1
ref-t = 298 298
nstxout = < select yourself! >
nstvout = < select yourself! >
nstxtcout = < select yourself! >
nstenergy = < select yourself! >
------------------
Storing full precision coordinates/velocities (nstxout/nstvout) enables restart if runs crash (power
outage, full disk, etc.). The analysis normally only uses the compressed coordinates (nstxtcout). Use the
documentation to figure out the names of the corresponding files. Perform the production run with the
commands grompp and mdrun (remember the verbosity flag to see how the run is progressing).
make_ndx –f run.gro
At the prompt, create a group for GLU22 with “r 22”, ARG137 with “r 137” (for residue 22 and
137), and then “q” to save an index file as index.ndx. The distance and number of hydrogen bonds can
now be calculated with the commands g_dist and g_hbond. Both use more or less the same input
(including the index file you just created). For the first program you might e.g. want the distance as a
function of time as output, and for the second the total number of hydrogen bonds vs. time. In both
cases you should select the two groups you just created. Two hydrogen bonds should be present almost
throughout the simulation, at least for 1LYD.
Making a movie
A normal movie uses roughly 30 frames/second, so a 10-second movie requires 300 simulation tra-
jectory frames. To make a smooth movie the frames should not be more 1-2 ps apart, or it will just ap-
pear to shake nervously. To create a PDB-format trajectory (which can be read by PyMOL and VMD)
with roughly 300 frames you can use the command trjconv. The option -e will set an end time (to
get fewer frames), or you can use -skip e.g. to only use every fifth frame. Write the output in PDB
format by using .pdb as extension, e.g. movie.pdb.
Choose the protein group for output rather than the entire system (water move too fast). If you open
this trajectory in PyMOL you can immediately play it using the VCR-style controls on the bottom
right, adjust visual settings in the menus, and even use photorealistic ray-tracing for all images in the
movie. With MacPyMOL you can directly save the movie as a quicktime file, and on Linux you can
save it as a sequence of PNG images for assembly in another program. Rendering a movie only takes a
few minutes, and if that doesn’t work you can still export PNG files. PyMOL can also create (rendered)
PNG files with transparent background if you do: set ray_opaque_background, off
If you want to continue we’d like to recommend the Gromacs manual which is available from the
website: It goes quite a bit beyond just being a program manual, and explains many of the algorithms
and equations in detail! There are also a few videos & slides available online - Google is your friend!