You are on page 1of 10

Assignment 4: Amber Tutorial 1a

Due date: Sep. 30, 2014


AMBER (Assisted Model Building with Energy Refinement) is a suite of programs for use in
molecular modeling and molecular simulations, particularly on biomolecules. The Amber
software suite is divided into two parts: AmberTools13, a collection of freely available programs
mostly under the GPL license, and Amber14 , which is centered around the sander and pmemd
simulation programs, and which continues to be licensed as before, under a more restrictive
license.
Website: ambermd.org. (Please explore, tons of information there)
Manuals: http://ambermd.org/doc12/Amber14.pdf. (Please at least read Section 1.1 of the
Reference Manual. ) On mercer, amber 14 and AmberTools 14 have been installed. It can be
accessed by first loading the module:

> module load amber/mvapich2/intel/14.03
More info about modules on nyu hpc can be found :
https://wikis.nyu.edu/display/NYUHPC/Modules
Any terminal on Linux or Cygwin should do X Forwarding, Mac users need to run "Applications
> Utilities > XTerm".
An alternative is to use Xmanager to access the HPC. This terminal can easily transfer the GUI
to local laptop. Although its a commercial program, one can download its beta version for free.

Attention: As the general rule of running jobs on bowery cluster is to submit jobs to a queue
by using qsub, to run the previous command directly is not actually appropriate, please check
out a compute node from interactive queue for one hour via the command:

> qsub -I -q interactive -l nodes=1:ppn=8,walltime=1:00:00

Another general rule is that the users should run jobs from /scratch/YOUR NetID/. folder.

The purpose of this tutorial is to provide an initial introduction to setting up and running
simulations using the AMBER software. In this tutorial we run a series of simulations of

human neutrophil elastase (HNE). We will first figure out how to generate a starting
structure and then use this structure to construct the necessary input files for running
sander, the main molecular dynamics engine supplied with AMBER 14. In order to run a

classical molecular dynamics simulation with Sander a number of files are required. These
are (using their default filenames):

prmtop a file containing a description of the molecular topology and the necessary force
field parameters.
inpcrd (or a restrt from a previous run) a file containing a description of the atom coordinates and optionally velocities and current periodic box dimensions.
mdin the sander input file consisting of a series of namelists and control variables that
determine the options and type of simulation to be run.
The approximate order of this tutorial will be as follows:
Create the prmtop and inpcrd files: This is a description of how to generate the
initial structure and set up the molecular topology/parameter and coordinate files
necessary for performing minimization or dynamics with sander.
Minimization and molecular dynamics in explicit solvent: Setting up and running
equilibration and analysis.

1. Preparation of the human neutrophil elastase (HNE) system


The first stage of this tutorial is to create the necessary input files required by sander for
performing minimization and molecular dynamics. There are a number of methods for creating
these input files. In this example we will use a program written specifically for this purpose:
LEaP. This is a program that reads in force field, topology and coordinate information and
produces the files necessary for production calculations (i.e. minimization, molecular
dynamics, analysis, ...). There are two versions of this program provided with AMBER 14
& AmberTools 13. A graphical version called xleap and a terminal interface called tleap.
The first step in any modeling project is developing the initial model structure. For
HNE, we first need to fetch the PDB file, check the contents of it, parse it, and
subsequently build the necessary Amber files. Understanding the contents of the PDB file is
crucial in order to perform a successful MD simulation.
1.1 Obtain and parse the PDB file
The HNE structure we are interested in has PDB ID 2Z7F. It is crystalized with an inhibitor
protein and two non-covalently bound ligands, NAG and FUC. Moreover, several residues
have alternative configurations, denoted A and B along with the residue number in the PDB
file. The PDB file also contains several water molecules which might be of structural
importance. Finally, we need to investigate the protonation of the residues and identify
disulfide bonds in the structure. We wish to perform the simulation on the apo-protein, and
without the two ligands. We thus have to extract this from the original PDB file. You can do
this with several programs, like Chimera. Alternatively, the linux command "grep" will also

do the job:
> grep "ATOM " 2Z7F.pdb | egrep -v " I " > 2z7f_chainE.pdb
> grep "HOH " 2Z7F.pdb | egrep -v " I " > 2z7f_chainE_XWAT.pdb
These commands produces two new PDB called 2z7f_chainE.pdb (protein only) and
'2z7f_chainE_XWAT.pdb (crystal waters). Please use Chimera to examine these two pdb
files as well as the original pdb file. Pay special attention to His, Cys, Glu, Asp, Lys and Arg.

1.2 Protonate the protein
The second task is to add hydrogen atoms to our system. One option is to use The
PDB2PQR webserver (http://nbcr-222.ucsd.edu/pdb2pqr_1.8/) can do this for us. For the
histidine, its pKa is right around 7, so that it can have either its delta nitrogen or epsilon
nitrogen protonated, or it can both be protonated, yielding an amino acid with a +1 charge.
This webserver will optimize the hydrogen bonding network within the protein to
determine where the hydrogens should go in the optimal case. Go to the pdb2pqr site, and
submit the structure. Use the AMBER naming scheme, and ensure that you have checked on
these settings: (1) "Ensure that new atoms are not rebuilt too close to existing atoms", and
(2) "Optimize the hydrogen bonding network". Start the job, and await your results.
Retrieve the newly generated PQR file and save it in your "HNE_MD" folder. Open this file in
a text editor, or by the command
> less 2z7f_chainE.pqr
The histidines are now called HID, HIE, or HIP, in order specify whether they have a
hydrogen at the delta or epsilon nitrogen, or both (charged).

1.3 Add crystal waters, disulfide bonds, solvate the system and prepare prmtop and
inpcrd files
In the next step, we will use tleap to add crystal waters, disulfide bonds and solvate the
system, and generate the prmtop and inpcrd files for simulation studies. Since the
2z7f_chainE.pqr file only contains standard amino acid residues, their force field
parameters and corresponding topology information are already available. ( On bowery,
those info are stored at:
/share/apps/amber/14.03/mvapich2/intel/amber14/dat/leap/parm ). Here we will use
ff99SBildn force field. Here we will use tleap to finish the following tasks. On Bowery, you
first need to loading the module:
> module load amber/mvapich2/intel/14.03
Then type ``tleap, and then the type in the following ( Those starting with # is the

explanation, and you do not need to type in):


# Load the forcefield ff99SBildn
> source leaprc.ff99SBildn
# load the protein
> prot = loadpdb 2z7f_chainE.pqr
# save the protein to pdb
> savepdb prot 2z7f_chainE_renum.pdb
> quit

Then you will find 2z7f_chainE_renum.pdb in your current folder.


Examining 2z7f_chainE.pqr and 2z7f_chainE_renum.pdb using text editor. You can find that the
residue number in 2z7f_chainE.pqr are starting from 16 and they are not continuous. There are 9
insertions like 62, 62A, 62B and about 20 breaks. The residue number in
2z7f_chainE_renum.pdb is staring from 1 to 218 in a continuous way. You also can check this by
Chimera (version 1.8). For example, open 2z7f_chainE.pdb by Chimera, go to Tools->Sequence>Sequence, it will show you the sequence with starting number 16. Exit this section by File>Close section. Open 2z7f_chainE_renum.pdb and check the pdb number again.
Notes: In pdb from Protein Data Bank, the residue number is not always starting from 1 and it
always has insertions, different conformations for some residues and breaks. However, it will
renumber the residue starting from 1 in a continuous way when save the structure out in
AmberTools. When we do the simulation and analysis, we will use residue number as in
2z7f_chainE_renum.pdb. When we finally report our analysis results, we need to use the residue
number in 2z7f_chainE.pqr or 2z7f_chainE.pdb. So it is important to keep both of these files and
know how to convert the residue number in 2z7f_chainE_renum.pdb to 2z7f_chainE.pqr file. For
more details, read AmberTools13 Manual page 181.

Starting here, for 2z7f chain E, we will use the structure 2z7f_chainE_renum.pdb. Open
2z7f_chainE_renum.pdb by Chimera, go to Tools->General Control-> Command Line, it will
show command panel at the bottom of the Chimera window. Type
show :cyx
You can see Cyx will show as sticks model and sulfide atoms will be colored in yellow. There
are four pairs Cyx-Cyx: cyx26-cyx42, cyx122-cyx179, cyx169-cyx194, cyx152-cyx158. We
need to bond the disulfide in tleap in next step.
Type tleap again:

> source leaprc.ff99SBildn


# load protein
> prot = loadpdb 2z7f_chainE_renum.pdb
# First we check bonding information of one sulfide atom for example Cys26@SG
> desc prot.26.SG
# It will show information of this atom and it indicates it bond to only one atom in last line
# bond disulfide bond
> bond prot.26.SG prot.42.SG
# check the bonding information again
> desc prot.26.SG
# Tell the difference
# Bond other disulfide bonds
> bond prot.169.SG prot.194.SG
> bond prot.152.SG prot.158.SG
> bond prot.122.SG prot.179.SG
# save the pdb
# load the crystal waters
xwat = loadpdb 2z7f_chainE_XWAT.pdb
# combine the protein and waters
complex = combine {prot xwat}
# save amber topology and coordinate files
saveamberparm prot HNE_noWAT.prmtop HNE_noWAT.inpcrd
# check the charge of the complex system and we need to add ions to neutralize it
charge complex
# add Cl- ions to neutralize the system, there are two command to do it, check the AmberTools13
# manual and tell the difference between addions and addions2 in page 112, we will use
# addions2 and why?
addions2 complex Cl- 0
# solvate the protein in a TIP3P water box, and we want at least 12 A between our protein
# surface and the edge of the water box.

solvatebox complex TIP3PBOX 12.0


# check the structures and it will show some information or warning or errors.
check complex
# save solvated complex
saveamberparm complex HNE_sol.prmtop HNE_sol.inpcrd
# save a pdb of the solvated HNC system
savepdb complex HNC_sol0.pdb
# exit the tleap section
quit
Please use chimera to visualize the solvated HNC system, check all those His and Cys residues,
and prepare figures to illustrate your prepared system, demonstrate that you have prepared His
and Cys residues properly. It should be noted that HNE_sol.prmtop is a file containing a
description of the molecular topology and the necessary force field parameters; while the
HNE_sol.inpcrd is a file containing a description of the atom coordinates and optionally velocities
and current periodic box dimensions.

2. Minimization of the system.


Prior to the MD simulation we need to relax the system by performing energy minimization
as our current geometry may be quite far away from a stable structure. For example, some
atoms in the system may be too close. It is always a good idea to minimize the structure
before performing molecular dynamics simulations. Failure to successfully minimize the
structure may lead to instabilities when we run MD. So, given the initial solvated HNE
system that we have prepared, we will first carry out some initial minimizations to remove
some largest strains in the system using sander. The basic usage for sander is as follows:
sander [-O] -i mdin -o mdout -p prmtop -c inpcrd -r restrt [-ref refc] [-x mdcrd] [-v mdvel]
[-e mden] [-inf mdinfo]
Arguments in []'s are optional.
If an argument is not specified, the default name will be used.
-O overwrite all output (the default behavior is to quit if any output _les already
exist)
-i the name of the input (which describes the simulation options), mdin by default.
-o the name of the output, mdout by default.

-p the parameter/topology, prmtop by default.


-c the set of initial coordinates for this run, inpcrd by default.
-r the final set of coordinates from this MD or minimization run, restrt by default.
-ref reference coordinates for positional restraints, if this option is specified in the input,
refc by default.
-x the molecular dynamics trajectory _le (if running MD), mdcrd by default.
-v the molecular dynamics velocities _le (if running MD), mdvel by default.
-e a summary file of the energies (if running MD), mden by default.
-inf a summary file written every time energy information is printed in the output file for
the current step of the minimization of MD, useful for checking on the progress of
simulation, mdinfo by default.

We will do our minimization in two steps. The first step involves the relaxation of the water
molecules only. In this part, we will hold the protein fixed by adding a restraint to the
protein residues (1-218). Such restraints work by specifying a reference structure, in this
case our starting structure, and then restraining the selected atoms to conform to this
structure via the use of a harmonic potential. This can be visualized as attaching a spring to
each of the solute atoms that connects it to its initial position. Moving the atom from this
position results in a force which acts to restore it to the initial position. By varying the
magnitude of the force constant so the restrain effect can be increased or decreased. This
can be especially useful with structures based on homology models which may be a long
way from equilibrium and so require a number of stages of minimization / MD with the
force constant being reduced each time. The second step involves the minimization of the
whole system. In order to perform the energy minimization we need to build an input
file(mdin). This file specifies the myriad of possible options for this run. An example is
given below, which is the one we will use for our system.

The min_sol.in file for minimization of waters:

Minimization of water molecules


&cntrl
imin = 1, # perform energy minimization
maxcyc=5000, # 5000 steps in total
ncyc=3000,# 3000 of these are steepest descent
ntb=1, # Constant volume
cut=10, # Non-bonded cut-off

ntpr=5,# Write energy information every 5th step


ntr=1, restraintmask=':1-218',
restraint_wt=2.0,
/
Remove the comments above when write them to min_sol.in file or you will get error
message.

The min_all.in file minimization of the whole system

Minimization of the whole system


&cntrl
imin = 1,
maxcyc=10000, ncyc=7000,
ntb=1, ntr=0, cut=10, ntpr=5,
/
Please check the amber manual to see what are other keywords there, and what they
control.

2.1 Running sander for the first time


Please check out a compute node from interactive queue for two hours. And load the module as
indicate above. Create a new folder only for minimization with name min will be helpful. Put all
*.prmtop and *.inpcrd file into min folder. Write min_sol.in and min_all.in files in min folder.
You can run the minimization with the commands below. This will start "sander" and the
two minimization jobs specified in the input files (assuming a multi core CPU):

mpirun -np 6 sander.MPI -O -i min_sol.in -o min_sol.out -p HNE_sol.prmtop -c HNE_sol.inpcrd -r min_sol.rst -ref HNE_sol.inpcrd


mpirun -np 6 sander.MPI -O -i min_all.in -o min_all.out -p HNE_sol.prmtop -c min_sol.rst -r min_all.rst -ref min_sol.rst


The first one will take about 7 mins to complete and second will take 14 mins.
Take a look at the output file produced during the minimization (min_all.out). How do the
total energy between the first and last steps for both minimization processes? Generate
PDB files from two minimized structures, and perform a comparison between tem as well
as compare them with the initial structure. What are the RMSDs? Which part do you
observe the largest change? A pdb file can be created from the parm topology and
coordinates (inpcrd or restrt) using the program ambpdb. For example:

> ambpdb -p HNE_sol.prmtop < min_all.rst > min_all.pdb


Program process_minout.perl (AmberTools13 Manual page 22) could be used for parses the
minout files from minimization output. (There is also have a program process_mdout.perl for
MD part). In order to use it, go to the folder where the min_all.out locates.
> mkdir minall
> cd minall
> process_minout.perl ../min_all.out
It will give your summary files, and summary.ENERGY is the one with total energy. You can
plot it with xmgrace in bowery or other programs.
More information could be found in 3.3.1 part in
http://ambermd.org/tutorials/basic/tutorial1/section3.htm

Items to hand in:


A lab report on this tutorial.


If you happen to have any suggestions, comments or revisions on this tutorial, I would be
very much like to know.
Based on the reading list below on AMBER force field, please
a) write down its force field function form,
b) Summarize and discuss its parameterization strategy
c). find and list the atom types and force field parameters necessary to model the molecule
N-methylacetamide.
D). Please discuss the main differences among Amber94, amber99, Amber99SB, and
Amber99SBildn force field. If you will simulate a protein, which force field will you use ?
Why ?

W. D. Cornell, P. Cieplak, C. I. Bayly, I. R. Gould, K. M. Merz, Jr., D. M. Ferguson,
D. C. Spellmeyer, T. Fox, J. W. Caldwell and P. A. Kollman, "A Second Generation
Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules", J.
Am. Chem. Soc., 117, 5179-5197 (1995). (amber94.pdf)

J. Wang, P. Cieplak and P. A. Kollman, "How Well Does a Restrained Electrostatic
Potential (RESP) Model Perform in Calcluating Conformational Energies of Organic
and Biological Molecules?", J. Comput. Chem., 21, 1049-1074 (2000). (amber99.pdf)

Viktor Hornak, Robert Abel, Asim Okur, Bentley Strockbine, Adrian Roitberg, Carlos
9

Simmerling, Comparison of multiple Amber force fields and development of improved


protein backbone parameters", Proteins, 65, 712-725 (2006). (amber99SB.pdf)

Lindorff-Larsen K, et al. (2010) Improved side-chain torsion potentials for the Amber
ff99SB protein force field. Proteins 78:19501958. (amber99SBildn.pdf)

10

You might also like