You are on page 1of 35

# Introduction to Molecular Modeling Techniques

## Molecular Modeling Techniques are a Critical Component of Determining a

Protein Structure by NMR:
• Protein structures are calculated by augmenting traditional modeling functions with
experimental NMR data
Molecular Modeling/Molecular Mechanics is a method to calculate the structure and energy of
molecules based on nuclear motions.
• electrons are not considered explicitly
• will find optimum distribution once position of nuclei are known
• Born-Oppenheimer approximation of Shrödinger equation
 nuclei are heavier and move slower than electrons

## molecular modeling treats a molecule as a collection of weights

connected with springs, where the weights represent the nuclei
and the springs represent the bonds.
Introduction to Molecular Modeling Techniques
Force Field used to calculate the energy and geometry of a molecule.

• Collection of atom types (to define the atoms in a molecule), parameters (for bond lengths,
bond angles, etc.) and equations (to calculate the energy of a molecule)
• In a force field, a given element may have several atom types.
 For example, phenylalanine contains both sp3-hybridized carbons and aromatic carbons.

##  aromatic carbons have a trigonal bonding geometry.

 C-C bond in the ethyl group differs from a C-C bond in the phenyl ring

 C-C bond between the phenyl ring and the ethyl group differs from all other C-C

## bonds in ethylbenzene. The

force field contains parameters for these different types of bonds.
Introduction to Molecular Modeling Techniques

## Force Field used to calculate the energy and geometry of a molecule.

• Total energy of a molecule is divided into several parts called force potentials, or potential
energy equations.
• Force potentials are calculated independently, and summed to give the total energy of the
molecule.
 Examples of force potentials are the equations for the energies associated with bond

stretching, bond bending, torsional strain and van der Waals interactions.
 These equations define the potential energy surface of a molecule.
Introduction to Molecular Modeling Techniques
Potential Energy Equation (Bonds Length)
• Whenever a bond is compressed or stretched the energy goes up.
• The energy potential for bond stretching and compressing is described by an equation
similar to Hooke's law for a spring.
• Sum over two atoms

## lo – expected/natural bond length From what we know about protein structures 

kl – force constant what we have been discussing up to this point
l – actual/observed bond length
From the structure
Plot of Potential
Energy Function
for Bond Length
Introduction to Molecular Modeling Techniques
Potential Energy Equation (Angles)
• As the bond angle is bent from the norm, the energy goes up.
• Sum over three atoms

## φo – expected/natural bond angle From what we know about protein structures 

what we have been discussing up to this point
kφ – force constant
φ – actual/observed bond angle From the structure

Plot of Potential
Energy Function
for Bond Angle
Introduction to Molecular Modeling Techniques
Potential Energy Equation (Improper Dihedrals)
• As the improper dihedral is bent from the norm, the energy goes up.
• Sum over four atoms

## Sum over all improper

dihedrals in the structure

## ωo – expected improper dihedral (usually set to 0o)

kω – force constant
ω – actual/observed improper dihedral

Plot of Potential
Energy Function
for Improper
Dihedrals (ω o = 0)
Introduction to Molecular Modeling Techniques
Potential Energy Equation (Dihedral Angles)
• As the dihedral angle is bent from the norm, the energy goes up.
• The torsion potential is a Fourier series that accounts for all 1-4 through-bond
relationships
• Sum over four atoms

## Sum over all

dihedrals in the structure

… Fourier Series

## φο – expected improper dihedral

An – force constant for each Fourier term Plot of Potential
φ – actual/observed improper dihedral Energy Function
n – multiplicity (same parameter seen in the XPLOR constraint file) for Dihedrals

Multiple minima
Introduction to Molecular Modeling Techniques
Potential Energy Equation (Dihedral Angles)
• Need to include higher terms non-symmetric bonds

Distinguish trans, gauche conformations

## Different multiplicities identify which

torsion angles are energetically equivalent

## For χ1, 60, -60 & 180 are all equivalent

and should yield 0 torsion energy
Introduction to Molecular Modeling Techniques
Potential Energy Equation (Nonbonded interactions)
• van der waals interaction
 Act only at very short distances

##  Attractive interaction by induced dipoles between uncharged

atoms ~r6
 When atoms get too close, valence shell start to overlap and

repel ~r12

## Van der Waals potential energy function

Than becomes
repulsive

Interaction first
attractive
Introduction to Molecular Modeling Techniques

## Potential Energy Equation (Nonbonded interactions)

• electrostatic interaction
 Electrostatic interaction of charged atoms

 Long-range forces

 Coulomb’s Law

Coulomb’s Law

## Positive interaction that

Negative interaction if of inversely increases
the same charge distance
Introduction to Molecular Modeling Techniques
Potential Energy Equation (Nonbonded interactions)
• electrostatic interaction
 Problem  defining dielectric constant (ε)

##  dielectric constant differs in solvent and protein interior

 ε
protein ~ 2-4
 ε
solvent ~80
 For protein calculations using NMR constraints, typically

## turn electrostatics off

 How to properly define solvent, buffers, salts, etc?

##  Can explicitly define solvent  increases complexity

of calculations.
 With electrostatics off during the structure

## calculations, can use the potential energy calculation

after the fact to determine the quality of the NMR
structure
Introduction to Molecular Modeling Techniques
Potential Energy Equation (Nonbonded interactions)
• electrostatic interaction

Problem  defining dielectric constant (ε)
1) Don’t use electrostatic potential energy during structure
calculation
2) Use a single dielectric constant
 εprotein ~ 2-4; εsolvent ~80
 Use explicit solvent in structure calculation
 Improved structure quality
 Increased computational time
 Properly defining solvent
 Properly defining force fields
behavior in solvent

## PROTEINS: Structure, Function, and Genetics 50:496–506 (2003)

Introduction to Molecular Modeling Techniques
Why Is the Potential Energy Function Not Sufficient to Fold A Protein?
• It is Not A complete function
 primarily short-range geometry with many equal solutions
 VDW and electrostatics only contribute over short distances

How do you bring distant regions of the primary chain into contact?

## • Too many possible conformations

 3N where N is the number of amino acids

## • Other factors that drive the protein folding process

 hydrophobic interactions, hydrogen-bond formation, secondary structure interactions (helix

## dipole), effects of solvent, compactness of structure, etc

 How do you define a mathematical equation defining these contributions?

## Improving the Potential

Energy Function,
improving the parameters
and defining alternative
ab inito methods of
folding a protein are
major areas of Molecular
Modeling research.
Introduction to Molecular Modeling Techniques
Potential Energy Equation (NMR Constraints)
• hydrogen bond constraint

based on an empirical formula derived from high quality X-ray structures in the PDB
 violation energy is based on deviations from expected h-bond length (R) and angle (φ)

## Violation occurs if this term is not zero

(relationship between R and φ)

Nrestra int s
EHB = kHB ∑ (1
m =1
/ R 3
− A − [ B /{2 . 07 + cos φ 3 2
NHO} ])

## where A = 0.019 and B = 0.21Å 3

Introduction to Molecular Modeling Techniques
For a Given Set of Atomic Coordinates, An Energy for the Structure Can Be
Calculated Based on the Set of Potential Energy Functions
ETOTAL = Echem + wexpEexp
Eexp = ENOE + Etorsion + EH-bond + Egyr + Erama + ERDC + ECSA + Epara
Echem = Ebond + Eangle + Edihedral + Evdw + Eelectr

What Relationship Does This Energy Value Have to Any Experimental Observation?
NOTHING!

The energy value simply indicates how well the structure conforms to the expected parameters.

## It does not indicate the relative stability of one protein to another.

It does not indicate the stability of the protein (∆G).
Calculating a ∆E between the protein with/without a ligand does not indicate the binding
affinity of the ligand or the induced stability of the complex

## Do Not Over Interpret the Meaning of this Energy Function!

Introduction to Molecular Modeling Techniques
What Relationship Is There Between the Force Constants and Experimental
Observations?

## For geometric parameters (bonds, angles), force constants

come from IR, raman spectroscopy and ab inito calculations.

## Experimental force constants have been

determined by “trial & error” or empirically to
obtain a proper balance and weighted
contribution of each experimental parameter to
the calculated structure.
Introduction to Molecular Modeling Techniques
What Do We Mean By a Proper Balance?

## C-H bond length of 1.1Å with 410 kcal/mol force constant 10 Å H

H
H-H distance constraint of 3.0 Å with 25 kcal/mol force C C
(ceiling of 100 kcal/mol)
Distance constraint is violated (properly)
with no distortion in bond lengths

## C-H bond length of 1.1Å with 10 kcal/mol force constant 3Å

H H
H-H distance constraint of 3.0 Å with 500 kcal/mol force

C C
Want to Keep All Known Geometric Values Distance constraint is satisfied (improperly)
Within Proper Ranges with large distortion in bond lengths
Introduction to Molecular Modeling Techniques

## ETOTAL = Echem + wexpEexp

Eexp = ENOE + Etorsion + EH-bond + Egyr + Erama + ERDC + ECSA + Epara
Echem = Ebond + Eangle + Edihedral + Evdw + Eelectr

Molecular Minimization starting from some structure (R), find its potential energy using the
potential energy function given above. The coordinate vector R is then varied using an optimization
procedure so as to minimize the potential energy ETOTAL(R).

## Molecular Dynamics the motion of a molecule is simulated as a function of time. Newton's

second law of motion is solved to find how the position for each atom of the system varies with
time. To find the forces on each atom, the derivative vector (or gradient) of the potential energy
function given above is calculated. Factors such as the temperature and pressure of the system
can be included in the treatment.
Introduction to Molecular Modeling Techniques
Anfinsen's Thermodynamic Hypothesis
The native conformation of a protein is the conformation with the lowest free energy (∆G)

global minimum of the free energy surface.
 Rather difficult (and expensive) to calculate free energies

## In 1957, Anfinsen showed that

denatured ribonuclease A (124
amino acids, 4 disulphides)
produced in 8 M urea and reducing
agent (β -mercaptoethanol) could be
re-activated by dialysing out the
denaturant in oxidizing conditions
Introduction to Molecular Modeling Techniques
If the entire folding process was a random search, it would require too much time

Initial stages of folding must be nearly random.
 Conformational changes occur on a time scale of 10-13 seconds.

##  Consider a 100 residue protein:

 if each residue has only 3 possible conformations (far less than reality)

## 3100 conformation x 10-13 seconds = 1027 years

 Even if a significant number of these conformations are sterically disallowed, the folding

## time would still be astronomical

Energy barriers probably cause the protein to fold along a definite pathway
Introduction to Molecular Modeling Techniques
Molecular Minimization
• moves the Cartesian coordinates (X,Y,Z position) for each atom to obtain minimal energy

## ETOTAL = Echem + wexpEexp

Eexp = ENOE + Etorsion + EH-bond + Egyr + Erama + ERDC + ECSA + Epara
Echem = Ebond + Eangle + Edihedral + Evdw + Eelectr
• result is dependent on the starting structure
• finds local not global minima
• typically, only small movements in atom position are made

starting structure looks similar to ending structure
 large changes may occur for significantly distorted structures (stretch bonds)

## Large bond change

could invert chirality
Introduction to Molecular Modeling Techniques
Molecular Minimization
• minimization will fail for severely distorted structures

a poorly docked ligand onto a protein where bonds or atoms are overlapped

## Highly unlikely that this structure

would minimize since the Cδ of the
Leu side-chain penetrates the center
of the benzene ring

 In order to properly refine this poor structure, the minimization protocol would need to pull
the Cδ back through the ring which would require first going to a higher energy structure.
 This will not occur since the trend for minimization is to move towards a lower energy.

 The “minimized” structure will probably result with a stretched and distorted Cδ-Cγ

bond as it moves the Cδ away from the ring from the other direction
 the benzene ring and the remainder of the Leu side-chain will also be distorted in an

## effort to accommodate the overlapped structure

Introduction to Molecular Modeling Techniques

## Structural landscape is filled with

peaks and valleys.

## Minimization protocol always

moves “down hill”.

## No means to “see” the overall

structural landscape

## No means to pass through higher

intermediate structures to get to a
lower minima.

## The initial structure determines the results of the minimization!

Introduction to Molecular Modeling Techniques

## Another perspective of the Structural Landscape is a 3D funnel view that

leads to the global minimum at the base of the funnel.
Introduction to Molecular Modeling Techniques
Molecular Minimization
• Process Overview
The molecular potential U depends on two types of variables:

## The necessary condition for a minimum is that the function gradient

is zero:
or
Where xi denote atomic Cartesian coordinates and N is the number
of atoms

## The sufficient condition for a minimum is that the second derivative

matrix is positive definite, i.e. for any 3N-dimensional vector u:

## A simpler operational definition of this property is that all

eigenvalues of F are positive at a minimum. The second derivative
matrix is denoted by F in molecular mechanics and H in mathematics,
and is defined as:

One measure of the distance from a stationary point is the rms gradient:
Introduction to Molecular Modeling Techniques
Molecular Minimization
• Process Overview
 minima occurs when the first derivative is zero and when the second derivative is positive.

## • U(Q) is a complicated function varying quickly with atomic coordinates Q

 molecular energy minimization is often performed in a series of steps

 the coordinates at step n+1 are determined from coordinates at previous step n

## where δn is called a step.

the initial step is a guess
 a systematic or random search is not practical (Levinthal Paradox)

## • Steepest Descent Method

 search step (δ ) is performed in the direction of fastest decrease of U,
n

## where α is a factor determining the

length of the step.

##  not efficient, but good for initial distorted

structures
 may be very slow near a solution
Introduction to Molecular Modeling Techniques
Molecular Minimization
• Process Overview
 modify steepest descent to increase efficiency

##  Initial steps are steepest descent

current step vector is not similar to previous step vectors
 accumulates information about the energy function from one iteration to the next

## One of two factors determines when a

minimization calculations is completed:

## • Number of defined steps (δn) have

been calculated.
• a predefined value of the gradient (g)
rarely actually reaches exactly zero)
Introduction to Molecular Modeling Techniques
Molecular Dynamics (MD)
• moves the Cartesian coordinates (X,Y,Z position) for each atom by integrating their equations of
motion
 change in position with time gives velocity

##  change in velocity with time (acceleration) gives force

 follow the laws of classical mechanics, most notably Newton's Second law:

 The force on atom i can be computed directly from the derivative of the potential energy
function (U) with respect to the coordinates ri, Fi = -δU/δri.

 This is done using a random number generator using the constraint of the Maxwell-

Boltzmann distribution.

where:
Hamiltonian H(Γ) where Γ represents the set of positions and momenta
Target Temperature (T)
Boltzman constant (kB)
Introduction to Molecular Modeling Techniques
Molecular Dynamics (MD)
 The temperature is defined by the average kinetic energy of the system according to the
kinetic theory of gases.
– internal energy of the system is U = 3/2 NkT
– kinetic energy is U = 1/2 Nmv2
where :
N is the number of atoms
v is the velocity
m is the mass
T is the temperature
k is the Boltzman constant
 By averaging over the velocities of all of the atoms in the system the temperature can

be estimated.
 Maxwell-Boltzmann velocity distribution will be maintained throughout the

simulation.
• If system has been energy minimized  potential energy is zero and temperature is zero
• Need to “heat” system up to desired temperature
scale velocities: v = (3kT/m)1/2
• Calculate a trajectory in a 6N-dimensional phase space (3N positions and 3N momenta)
 measure trajectories in small time steps, usually 1 femto-second (fs)

##  typical duration of dynamics run is 10-100 peco-seconds (ps)

Energy Only (Univariate)
Method
• Simplest to implement
• Proceeds one
direction until energy
increases, then turns
90º, etc.
• Least efficient
– many steps
– steps are not guided
• Not used very much.
Steepest Descent Method
• Simplest method in use
• Follows most negative
• Fastest method from a
poor starting geometry
• Converges slowly near
the energy minimum
• Can skip back and forth
across a minimum.
of steepest descent
method to implicitly
gather 2nd derivative
information to
guide the search.
• Variations on this
procedure are the
Fletcher-Reeves, the
Davidon- Fletcher-Powell
and the Polak-Ribiere
Second Derivative Methods
• The 2nd derivative of the
energy with respect to
X,Y,Z [the Hessian]
determines the pathway.
• Computationally more
involved, but generally
fast and reliable, esp.
near the minimum.
• Quasi-Newton, Newton-Raphson,
block diagonal Newton-Raphson
Approaches to Locating the
Global Minimum Energy

Structure
Dihedral driving (systematic)
• Randomization-minimization (Monte Carlo)
• Molecular dynamics (Newton’s laws of motion)
• Simulated Annealing (reduce T during MD run)
conformations; modify slightly; retain lowest
energy ones, repeat)
• Trial & error (poor)
Methods are tedious, but absolutely necessary
if the result is to be meaningful!