You are on page 1of 36

Finding the Transition State of a Chemical

Reaction of Interest Using Avogadro,


ORCA, and IboView
Kate Boyd

1
Table of Contents
Table of Contents 2
Example Reaction - Aspirin 4
Methods of Finding the Transition State 6
Relaxed Surface Scan 6
Nudged Elastic Band (NEB) 7
“Run Times” Table 8
Choosing a basis set 10
Geometry Optimizations 11
Making a Relaxed Surface Scan Input 15
Understanding the Relaxed Surface Scan Output 18
Making a NEB Input 21
Types of NEB 21
The Input File 21
Understanding the NEB Output 22
Running a Transition State Optimization and Numerical Frequency Calculation 26
OptTS 26
NumFreq 27
Visualizing Orbitals in IboView 30
Using the GPU 30
Finding Orbitals 30
Continuing Calculations That Have Been Terminated Early 35
Works Cited 37

2
Example Reaction - Aspirin
In this walkthrough we will go over how to find the transition state geometry and bond orbitals
of a chemical reaction of interest using Avogadro, ORCA, and IboView. This walkthrough will
focus on the behavior of aspirin, one of the most common anti-inflammatory agents.

Aspirin works by targeting the enzyme cyclooxygenase-2 (COX2). It covalently modifies COX2
by actualizing (transfering an acetyl group) Serine 530, a side chain near the active site of COX2,
thereby inhibiting the binding of the native substrate and leading to irreversible inhibition.

COX2 is responsible for the conversion of arachidonic acid to prostaglandin H2, which is
eventually involved in the expression of inflammation. For this walkthrough we don’t need to
know the specifics of the pathway by which COX2 results in inflammation, but we should
recognize that aspirin’s binding to the serine side chain is a reaction of interest for further
research into how aspirin works, possible side effects, and selectivity.

Image 1. The full reaction mechanism of aspirin reacting with the Serine 530 side chain on the COX2
enzyme. Aspirin is the molecule on the left with a ring structure. Serine is the molecule on the right with
an amine functional group. The Ser530 side chain has been modeled as an independent serine; it is much
too computationally expensive and unreasonable to attempt to model the entire enzyme. In the true
reaction, Ser530 would have been attached to another molecule at the far right carboxylic acid. It would
also have been reasonable to truncate Ser530 into ethanol for these calculations.

Modeling the transition state of a reaction can tell us about some qualities of the reaction. For
instance, chemical reactivity is largely determined by the relative energies of molecules. A
minimum energy path that connects reactants and products or an activation energy graph can be

3
quickly assessed for reaction favorability and viability by optimizing the structures of the
reactant and product in question and calculating the energy difference between the two.

Similarly, one could test proposed mechanisms for reactions of interest by optimizing the
structure of a proposed transition state and comparing it to the optimized reactants and products.
The goal is to see if the energy differences between the transition state and reactant match
experimentally observed activation energies, which can give an idea of how accurate the
proposed transition state is. If the calculated activation energy is very close to the experimental
value, it is likely that the proposed mechanism is correct.

Thus, we set out to find the transition state of our aspirin/serine reaction of interest, in the greater
context of aspirin irreversibly inhibiting the enzyme COX2 to prevent inflammation.

4
Methods of Finding the Transition State
To find the transition state (TS), we ultimately want to run a transition state optimization and
numerical frequency calculation (commands OptTS and NumFreq in ORCA; more on these
later). To do this, the geometry of your reaction must already be very close to the transition state.
There are two main ways to find the geometry configuration closes to the transition state for your
reaction:

Relaxed Surface Scan

The relaxed surface scan is the first of two methods for finding an approximate minimum energy
path connecting two minima (i.e. your reactants and products). It works by scanning a vector
between atoms on the reactants, bringing the atoms closer together and optimizing the geometry
at every step to find the energy of that conformation.

The relaxed surface scan is good for simple bond making or breaking reactions or
conformational changes. However, since the relaxed surface method can only scan along one
vector at a time, it will only find the transition state for one elementary step of the reaction per
vector specification.

For example, with my aspirin/serine reaction, I used the relaxed surface scan for the first step of
the reaction, where the carboxylic oxygen on aspirin attacks the hydroxyl hydrogen on serine:

Image 2. The first step in the aspirin/serine reaction. The relaxed surface scan operated between the
carboxylic oxygen on aspirin (left reactant) and the hydroxyl hydrogen on serine (right reactant).

However, this is only the first of four elementary steps in the reaction. I only ran this first step
with the relaxed surface scan, but I would have had to run separate scans for the next three steps
as well, and compare the highest energy steps from each reaction to predict the transition state.
For a multi-step reaction like mine, the relaxed surface scan didn’t fit my criteria perfectly.
Instead, I used the nudged elastic band method.

A tutorial on how to run a relaxed surface scan can be found on page 14.

5
Nudged Elastic Band (NEB)

The nudged elastic band (NEB) method is a black box method that can find a guess for the
transition state geometry based upon the geometries of the reactants and products. Instead of
scanning along a bond or vector, for the NEB you input the starting and finishing coordinates,
and ORCA will find the minimum energy pathway between the two geometries. The NEB is
convenient for multi-step reactions, as it will combine multiple steps into one calculation and
guess the TS based upon the multi-step reaction path. However, because it combines multiple
steps it is also computationally expensive.

There are three different versions of the NEB command, which will be covered on page 20.

It is also worth noting that the NEB method is only available on the most recent version of
ORCA (as of the time of writing, that is ORCA 4.1). At the time of writing, the computers on
campus and the VPNs only have ORCA 4.0, and therefore could not run NEBs. I ran the NEB
for my aspirin/serine reaction on my own device, which I recommend considering for anyone
thinking about using this method. In my personal experience, this calculation ran the longest. A
table of calculations and run times for those computations I ran can be found in the next section
as a general reference to how long calculations may take.

Information on how to set up and run a NEB calculation can be found on page 20.

6
“Run Times” Table
In the course of finding the transition state for my reaction, I ran through a number of ORCA
calculations, from geometry optimizations to relaxed scans, NEBs, numerical frequency
calculations, and numerous variations thereof. Here I have compiled some of the calculations I
ran and how long they took in hopes of giving future transition state-seekers an idea of how long
calculations may take.

Take note that calculations can rely heavily on the number of atoms involved and that there are
many ways of completing the same calculation by varying data sets, methods, and number of
iterations.

Table 1. Purpose and run times for various ORCA calculations. Note that the atom labeling differs
between Avogadro and ORCA. A = Avogadro. O = ORCA. NEB = nudged elastic band. Only the most
relevant and/or completed calculations are listed here. The full table is in the linked Google spreadsheet.
Command Purpose Run Time # of atoms
! RHF SP def2-SVP Opt Relaxed surface scan and 1 day 34
%geom Scan optimization between oxygen (14A, 12 hours
B 13 20 = 1.534, 0.4, 8 13O) and hydrogen (21A, 20O) 24 minutes
end 56 seconds
end 786 msec
* xyz -1 1
(coords go here)

! RHF SP def2-SVP Opt Relaxed surface scan and 2 hours 16


%geom Scan optimization between oxygen (4A, 56 minutes
B 3 4 = 3.084, 1.0, 12 3O) and hydrogen (5A, 4O) for the 44 seconds
end TRUNCATED versions of aspirin and 45 msec
end serine (deprotonated acetic acid and
* xyz -1 1 ethanol)
(coords go here)
! B3LYP def2-SVP Opt D4 Geometry optimization of the 14 hours 34
reactants for the NEB 48 minutes
* xyz -1 1 40 seconds
(coords here) 296 msec
! B3LYP def2-SVP Opt D4 Geometry optimization of the 9 hours 34
products for the NEB 7 mins
* xyz -1 1 33 sec
(coords here) 266 msec

7
! B3LYP def2-SVP D4 NEB-TS NEB-TS for aspirin/serine reaction. 4 days 34
%NEB NEB_END_XYZFILE NEB-TS should run a NEB, identify a 12 hours
"NEB_prod_Opt.xyz" END climbing image (CI) and guess a 13 minutes
transition state (TS), then optimize 12 seconds
* XYZfile -1 1 NEB_reac_Opt.xyz the transition state guess to converge 7 msec
on the real transition state and run a
numerical frequency (NumFreq) to
identify the vibrational modes.
! PM3 NEB NEB with PM3 command 10 hours 34
%neb 41 minutes
NEB_END_XYZFILE 54 seconds
"NEB_prod_Opt.xyz" 601 msec
Nimages 6
end

* XYZfile -1 1 NEB_reac_Opt.xyz
! RHF OPT def2-SVP Geometry optimization and orbital 25 min 21
calculation for aspirin 12 sec
!Normalprint 376 msec
%output
Print[ P_Basis] 2
Print[ P_MOs ] 1
end

* xyz 0 1
(coords here)
! RHF OPT def2-SVP Geometry optimization and orbital 5 min 14
!Normalprint calculation for serine 5 sec
%output 973 msec
Print[ P_Basis] 2
Print[ P_MOs ] 1
end
* xyz 0 1
(coords here)
! PM3 OptTS NumFreq Transition state geometry 4 hours 34
optimization and numerical frequency 59 mins
%geom calculation (finding vibrational modes) 54 seconds
inhess Read using PM3 from the TS guess from 375 msec
InHessName the NEB-TS calculation
"PM3_NEB_try2.042.hess"
maxiter 2000
end

* xyzfile -1 1 input_im3.xyz

8
Choosing a basis set

In the command lines above you may notice different keywords being used on the command line
(B3LYP, RHF, PM3, DFT, etc). These are different basis sets available for ORCA calculations. I
used B3LYP and def2-SVP for most of my calculations. This table from the ORCA Input Library
may help you decide which basis set to use for your own calculation. Note that different types of
reactions have advantages and disadvantages for different basis sets.

I would recommend choosing one basis set/keyword and using it throughout your calculations
for consistency. When choosing a basis set, remember that while the very detailed sets may be
attractive for accuracy, they will likely be computationally expensive.

In addition, it is important to note that IboView, the program used in this walkthrough for
visualizing bond orbitals, does not work with the 6-31G basis set, so choose a different basis set
if you’re planning to model with IboView.

9
Geometry Optimizations
The first step for running either a relaxed surface scan or NEB in ORCA is to optimize the
geometry of the reactants (and, in the case of the NEB, the products too). This process should be
familiar to CBE 310 students, as we optimize geometries as part of the coursework. Therefore,
this walkthrough will assume a basic understanding of how to use Avogadro, with some
additional tips for organizing and streamlining the process.

Make your reactant geometries in Avogadro. Optimize them using “Extensions > Optimize
Geometry” until the atoms in Avogadro move very little. Use “Extensions > ORCA > Generate
ORCA Input” and change “Calculation” to “Geometry Optimization”. Choose your basis set (I
used def2-SVP for all my calculations. This table from the ORCA Input Library may help you
decide which basis set to use for your own calculation). Remember to add a charge if your
reactant has one. In my reaction, I removed the hydrogen on the initial acetylsalicylic acid
molecule so my reactant started with a charge of -1. If your calculations give your errors, a good
first step is to make sure your charge and multiplicity are correct.

Image 3. The ORCA input from Avogadro for a geometry optimization of reactants. Included is the
command for printing molecular orbitals. This step isn’t strictly necessary, but it may be interesting to
look at the starting geometry orbitals before beginning the reaction.

Once your ORCA input is saved, navigate to the folder in which it is saved. I recommend
allocating a new folder for every set of calculations you do (a lot of intermediate temporary files
will be created while ORCA works; they will clutter your workspace). Open a
PowerShell/command line in the folder and pipeline your ORCA input into an output file:

10
PS C:\YourFileLocation> orca aspirin_geomOpt.inp > aspirin_geomOpt.out

You will know the calculation has completed when a) your PowerShell/command line moves the
cursor to the next line awaiting a new command; b) the folder your input and output files are in
no longer has all the temporary intermediate files that were present during the calculation; and c)
when you open the .out file and scroll to the bottom, you will find the following:

****ORCA TERMINATED NORMALLY****


TOTAL RUN TIME: 0 days 0 hours 25 minutes 12 seconds 376 msec

If your calculation terminates early for any reason, don’t panic. There are ways to continue
calculations that may have not been completed. More on this later, on page 34.

From this ORCA calculation, as from most ORCA calculations, you will receive several
different types of files: .out, .opt, .engrad, .gbw, .xyz, and .prop, as well as the .inp file you began
with. For most of the TS calculations, we are only interested in the .out and .xyz files.

If you are curious to see the molecular orbitals of your reactants, you can change your .out file
into an .ascii file and open it in Avogadro. Otherwise, from here on out we will mostly be
operating from the .out and .xyz files, so make sure you have a way of opening these files (on
PC, Notepad is a good one).

Repeat this process for each reactant in your reaction. Then, open the optimized Avogadro files
for each reactant and move them both into one Avogadro window. You can open the optimized
ORCA output of the reactants by opening the .xyz files output from the ORCA calculation.
These .xyz files contain the optimized configuration of your molecule. You can run an additional
geometry optimization on this file with all reactants present, but this is largely optional as both
the relaxed surface scan and NEB will optimize the geometry at every step anyway.

Now is where if you are running a NEB, there is an additional step.

The NEB operates by comparing the location of every atom in your reactants to the location of
every atom in your products. This means that the atom numbering in both reactant and product
geometries must be the same. The best way to do this is to start with your optimized reactant
Avogadro file and physically move the atoms, making and breaking bonds to form your products.
During this process, do not add or delete any atoms. Doing so will disrupt the automatic atom
numbering of Avogadro.

It is useful during this process to know the exact curved arrow mechanism of your reaction so
you may trace where exactly each atom moves.

11
Image 4. On the left are the reactants of the aspirin/serine reaction (acetylsalicylic acid and serine) and on
the right are the products of the reaction (salicylic acid and acetylated serine). Notice how the numbers of
each atom are consistent from reactants to products.

Beware that by default, Avogadro labels atoms by symbol and atom number, so labels will read
“Carbon 1, Carbon 2, Hydrogen 1, Oxygen 1” etc. This is not how you want to read the atoms.
To fix this, open “Display Settings” and check the box next to “Label,” then click on the tool
icon to the right. Change “Text” to “Atom number”. This will label every atom uniquely by
number. Another helpful tip is to turn off “Adjust Hydrogens” under the draw tool. This ensures
that when you add, delete, or move bonds, hydrogens will not automatically generate to fill in the
gaps.

12
Image 5. Change the way atoms are labeled in Avogadro to understand where atoms move and correctly
label atoms for ORCA input.

For the NEB, optimize your reactant file and the product geometry with a geometry optimization.
Once optimized, these files will be ready to be inputs for the NEB.

13
Making a Relaxed Surface Scan Input
The first step to using a relaxed surface scan is to decide what bond/vector you want to scan
over. The relaxed surface scan can scan over an existing bond, between two atoms, or between
two points in space (this last method requires dummy atoms, which will not be covered here but
can be found in this video tutorial by IaNuisha. Beware that some aspects of this tutorial are out
of date).

Image 6. The first step in the aspirin/serine reaction that I want to model with the relaxed surface scan.
The relaxed surface scan operated between the carboxylic oxygen on aspirin (left reactant) and the
hydroxyl hydrogen on serine (right reactant).

For my relaxed surface scan, I wanted to model the first step of my aspirin/serine reaction, where
the carboxylic oxygen on aspirin (atom 14) attacks the hydroxyl hydrogen on serine (atom 21).
Therefore, I defined my vector between these two atoms of interest. The vector will bring the
atoms closer together in a series of steps and evaluate the energy of each step/distance between
the atoms.

14
Image 7. The reactants for the first step of the aspirin/serine reaction. The bond I want to scan along is in
green. Note that the oxygen on the aspirin (atom 14) does not have a hydrogen and therefore harbors a -1
charge. Therefore, as the hydroxyl hydrogen of the serine (atom 21) approaches, the electronegativity of
the carboxylic oxygen should pull the hydrogen atom over.

The input for a relaxed surface scan should look like this:

# avogadro generated ORCA input file


# Basic Mode
#
! RHF SP def2-SVP Opt
%geom Scan
B 13 20 = 3.084, 1.0, 12
end
end

* xyz -1 1
C 0.94700 0.48119 1.00837
C 2.28751 0.54183 0.66342
C 2.81064 1.58708 -0.09803
C 0.09322 1.50238 0.61385
C 0.59282 2.56410 -0.12088
C 1.93568 2.60219 -0.48370
C 4.30402 1.53387 -0.45974
H 2.97913 -0.22365 0.98363
H 0.57275 -0.35070 1.59202
H -0.95498 1.48057 0.88585
H -0.04224 3.38952 -0.41411
O 2.35289 3.73365 -1.12762
O 5.02355 0.92802 0.30327
O 4.60485 2.08400 -1.53575
C 2.58271 3.85525 -2.44409
O 2.97504 4.88982 -2.85188
C 2.24206 2.68704 -3.32835
H 2.97974 1.90777 -3.15757
H 2.28772 3.02552 -4.35995
H 1.24899 2.29706 -3.10642
H 6.00682 1.99145 -2.15053
O 6.90392 2.02616 -2.55730
H 7.44607 3.92908 -3.09631
C 7.48315 3.23226 -2.24914
H 6.97426 3.71783 -1.40845
H 9.44380 2.54641 -2.72351

15
C 8.94768 2.97519 -1.84967
O 9.95068 4.96380 -2.64566
N 9.08505 2.07154 -0.73335
C 9.64087 4.27972 -1.54658
H 10.31631 5.79651 -2.37599
H 8.41915 1.32870 -0.86246
H 8.82467 2.54363 0.11528
O 9.86811 4.71557 -0.46776
*

Note that the hashtag (#) marks a comment line that will not be considered as an input to the
reaction. The exclamation mark (!) marks the main command for ORCA. The xyz coordinates
are generated by Avogadro and the xyz -1 1 denotes that for this reaction, the initial charge is -1.

! RHF SP def2-SVP Opt


%geom Scan
B 13 20 = 3.084, 1.0, 12
end
end

The %geom tells ORCA there is an additional parameter to accompany the command. In this
case, the %geom block specifies a scan along the bond between atoms 13 and 20. ORCA and
Avogadro have different methods of labeling atoms. While Avogadro starts at atom 1, ORCA
starts at atom 0. This means that when you’re creating your ORCA input, you must remember to
subtract one from the atom numbers of the vector you want to scan along! So while the atoms in
Avogadro are atoms 14 and 21, ORCA will read them as atoms 13 and 20.

The first number on the right hand side of the equals sign is the initial distance between the
atoms/of the vector being scanned in angstroms. You can find this distance in Avogadro by using
the “Click to Measure” tool and selecting both your atoms. The second number is the distance in
angstroms you would like the bond to scan to. The third number is the number of steps you
would like the calculation to output for your viewing. When the calculation is completed, you
will have an .xyz and .gbw file for each step.

Essentially, your command will read:

Scan along:
Bond atom#1 atom#2 = initial_distance, target_distance, #steps

Pipeline this input file through ORCA in your folder Powershell/command line.

16
Understanding the Relaxed Surface Scan Output
After you’ve run the relaxed surface scan, the next step is to interpret the results ORCA has
given you. The files you will want to look at are the trajectory file and output file.

The trajectory file will be a .xyz file with the name of your file with _trj added to the end.

attempt4_inp_trj.xyz

Open this file in Avogadro and choose “Extensions > Animation”. Check the box for “Dynamic
Bonds”. This will allow you to see how the bonds are changing as the reaction proceeds. Click
the play button to see how ORCA brought the atoms along the vector closer together step by
step, optimizing at each new location. In my reaction, I have about 316 trajectory steps. The
hydrogen transfers from the hydroxyl oxygen to the carboxylic oxygen around step 287, at a
distance of 1.189 angstroms.

Next, open the output file (.out). Make sure the calculation is finished by scrolling to the bottom
and ensuring ORCA has output “ORCA terminated normally”. Just above this output is a
section:
**** RELAXED SURFACE SCAN DONE ***

SUMMARY OF THE CALCULATED SURFACE

----------------------------
RELAXED SURFACE SCAN RESULTS
----------------------------

Column 1: NONAME

The Calculated Surface using the 'Actual Energy'


3.08400000 -381.04985528
2.89454545 -381.05235586
2.70509091 -381.05214557
2.51563636 -381.05076658
2.32618182 -381.04917365
2.13672727 -381.04777320
1.94727273 -381.04876555
1.75781818 -381.04983778
1.56836364 -381.05145864
1.37890909 -381.04515182

17
1.18945455 -381.03233235
1.00000000 -381.01766581

The Calculated Surface using the SCF energy


3.08400000 -381.04985528
2.89454545 -381.05235586
2.70509091 -381.05214557
2.51563636 -381.05076658
2.32618182 -381.04917365
2.13672727 -381.04777320
1.94727273 -381.04876555
1.75781818 -381.04983778
1.56836364 -381.05145864
1.37890909 -381.04515182
1.18945455 -381.03233235
1.00000000 -381.01766581

The Calculated Surface using the ‘Actual Energy’ is what we are interested in for finding the TS.
The first column of numbers in this output is the distance between the atoms in the vector in
angstroms. The second column is the energy in Hartrees (Eh). Copy these numbers and bring
them into an application for graphing (e.g. Excel or MATLAB. I heavily recommend MATLAB
as copying these numbers from the Notepad output will not copy them in columns, and in
MATLAB it is easy to make the output into a matrix and graph it by column).

Graphing the distance between atoms against the energy can reveal where the geometry of your
reaction is highest, therefore indicating a potential transition state configuration.

18
Image 8. The MATLAB script and minimum energy path output from the relaxed surface scan. The
distance between atoms for this scan was 3.084-1.0 angstroms, as can be seen on the x-axis of the graph
above. Note that the steps run right to left, as ORCA starts at a larger distance and moves the vector
closer.

Here we can see that as the atoms were brought closer together, their energy increased,
decreased, and increased again (read right to left). We should take step 6 as our guess for the TS
geometry as it is a local maximum. It is likely that the dramatic rise in energy between steps 9
and 12 is due to the atoms being forced too close together (a result of the final bond distance
initially specified in the ORCA input) and can therefore be ignored.

There you have it - a guess for the TS geometry of your reaction using a relaxed surface scan.
This geometry can be inspected in Avogadro by loading the .xyz file from the ORCA calculation
corresponding to the step of your reaction which you have singled out as the TS guess. How to
confirm this TS with a numerical frequency and optimization calculation is covered on page 25.

19
Making a NEB Input
Once you have your optimized reactant and product files, you can jump right into writing the
NEB input. First, though, you will need to choose what type of NEB to run.

Types of NEB

There are three variations of the NEB command to be aware of: NEB, NEB-CI, and NEB-TS.

The NEB keyword will have the elastic band minimized until it converges on a minimum energy
path. This is the least computationally expensive but will also result in the most crude estimate of
the TS.

The NEB-CI keyword will use a climbing-image variant of NEB, where an approximate
minimum energy path is found and one step will be converged into a “climbing image” guess for
the TS geometry. This method trades converging the full minimum energy path for finding the
“climbing image” step with a low convergence threshold (the TS guess will be more accurate).

The NEB-TS keyword will use both the high convergence threshold for the minimum energy
path and a medium convergence threshold for the climbing image. In addition, once a climbing
image is found, the geometry of that step is fed into a transition state optimization calculation
(OptTS) and an optimized geometry for the transition state will be a part of your output. This
means that while the other steps, NEB, NEB-CI, and relaxed surface scan alike will have to run
an OptTS command separately, the NEB-TS command will run it automatically after converging
on a likely TS geometry. However, since the NEB-TS combines multiple commands and requires
high convergence, it is computationally the most expensive.

For my reaction, having ORCA guess a TS and optimize it for me sounded appealing and I chose
to use the NEB-TS command.

The Input File

The input file for ORCA should look like this:

# avogadro generated ORCA input file


# Basic Mode
#
! B3LYP def2-SVP D4 NEB-TS
%NEB NEB_END_XYZFILE "NEB_prod_Opt.xyz" END

* XYZfile -1 1 NEB_reac_Opt.xyz

The important commands to look at in this command line are “NEB-TS”, which is the actual
command to have ORCA run a NEB with TS optimization, and the %NEB command block. In

20
the %NEB command block, we tell ORCA what the product of our reaction will look like:
NEB_prod_Opt.xyz should be your .xyz file from the ORCA geometry optimization of your
products. Likewise, we load the optimized reactants XYZfile as the starting geometry:
NEB_reac_Opt.xyz. Thus, we have told ORCA what the starting and ending geometries of our
reaction are. ORCA will use the NEB to find the minimum energy path between these two
points.

It is worth noting that the default number of geometry optimizations the NEB-TS command will
allow is 102. I found this out after my NEB-TS did not converge on an optimized TS geometry
after 102 cycles. To increase the number of geometry optimization cycles, insert the following
command block into your input file:

%geom
maxiter 500
end

Where the number after “maxiter” is the number of iterations of geometry optimization. I will
caution that I experimented with this number quite a bit, and ended up needing many (>1,500 in
some cases) cycles to get the TS optimization and numerical frequency calculation to converge
with the simplest basis set. More on that later (page 25).

Pipeline this input file through ORCA in your folder Powershell/command line.

Understanding the NEB Output

After you’ve run the NEB, the next step is to interpret the results ORCA has given you. The files
you will want to look at are the trajectory file, output file, and interp file.

First, check the .out file for completion of the calculation. For my reaction, the NEB-TS
calculation took 4 and a half days, and notably did not converge on a TS geometry. However, it
did still give me a climbing image step, the step closest to the TS geometry that I could then run
through a OptTS and NumFreq for confirmation of the TS later.

The elastic band and climbing image have converged successfully to a MEP in
108 iterations!

*********************H U R R A Y*********************
*** THE NEB OPTIMIZATION HAS CONVERGED ***
*****************************************************

21
---------------------------------------------------------------
PATH SUMMARY
---------------------------------------------------------------
All forces in Eh/Bohr.

Image Dist.(Ang.) E(Eh) dE(kcal/mol) max(|Fp|) RMS(Fp)


0 0.000 -1045.87714 0.00 0.00014 0.00003
1 3.675 -1045.86431 8.05 0.00298 0.00069
2 4.959 -1045.81680 37.87 0.00430 0.00127
3 5.743 -1045.77625 63.31 0.00187 0.00065 <= CI
4 6.613 -1045.82119 35.11 0.00376 0.00099
5 8.065 -1045.83431 26.88 0.01106 0.00223
6 9.808 -1045.85985 10.85 0.00314 0.00091
7 12.962 -1045.86533 7.41 0.00207 0.00056
8 16.631 -1045.86883 5.21 0.00105 0.00045
9 20.855 -1045.88803 -6.83 0.00059 0.00018

ORCA guessed that the TS geometry is closest to step 3 (“CI” stands for “Climbing Image”). We
will double check ORCA’s guess with the minimum energy pathway and Avogadro’s trajectory
animation.

The minimum energy path trajectory file will be a .xyz file with the name of your file with
_MEP_trj added to the end:

input_MEP_trj.xyz

“input” is the name of my file.

Open this file in Avogadro and choose “Extensions > Animation”. Check the box for “Dynamic
Bonds”. This will allow you to see how the bonds are changing as the reaction proceeds. Click
the play button to see how ORCA models the reaction pathway step by step. The NEB
automatically creates 10 trajectory steps.

22
Image 9. Here is step 4 (step 4 in Avogadro corresponds to step 3 in ORCA) of the NEB-TS minimum
energy reaction pathway, the geometry which ORCA guesses is closest to the TS. Here we can see the
base for aspirin on the left and the base for serine on the right, with an “ethanol” transitioning from the
aspirin to the serine. Note that when Avogadro enters animation mode, double bonds disappear and
present as single bonds, though they are still present and accounted for in calculations.

Next, we will make a minimum energy pathway graph. There is an .interp file output from
ORCA’s calculation that holds the reaction coordinate in the first column, distance in Bohr radii
in the second, and energy of the reaction in Hartree (Eh) in the third. Copy these numbers and
bring them into an application for graphing (e.g. Excel or MATLAB. I heavily recommend
MATLAB as copying these numbers from the Notepad output will not copy them in columns,
and in MATLAB it is easy to make the output into a matrix and graph it by column).

The NEB .interp file outputs two sets of data, one short set consisting of 10 “images” (steps) and
one longer set consisting of more steps (my file had 81 steps for the longer set). Both sets are
valid; ORCA just compresses the many steps into 10 evenly spaced steps for viewing purposes.

Note that for these calculations when we are merely focused on finding the TS geometry, ORCA
assumes our reactants are floating in space, which essentially models them as gases. In a true
reaction, there would be a number of solvents around the reaction stabilizing it, and interactions
with other molecules. Thus be aware the calculated reaction energies may differ from literature
values.

Graphing the reaction coordinate against the energy can reveal where the geometry of your
reaction is highest, therefore indicating a potential transition state configuration.

23
Image 10. The .interp file’s energy graphed against the reaction pathway. Unlike the relaxed surface scan,
the x-axis has no physical relation to the reaction; it is merely a representation of the reaction proceeding
with time.

The two graphs in image 9 are the results of plotting the .interp file. The two graphs describe the
same behavior, one just has more points. Both graphs suggest a transition state at coordinates
(0.2754, 0.1009), which corresponds to step 3 of ORCA’s output (remember, ORCA starts at step
0, so our TS is step 3 rather than step 4 when reading ORCA files).

Thus, our three methods of directly interpreting the NEB-TS output all agree; step 3 is the TS
guess for our aspirin/serine reaction. With this knowledge, we can move on to the next step for
confirming the TS geometry for our reaction of interest: the optimization of the transition state
geometry and numerical frequency calculation.

24
Running a Transition State Optimization and Numerical Frequency
Calculation
Now that we have a guess for the transition state geometry, whether we came by that guess via
relaxed surface scan or NEB, we can confirm the geometry of the transition state via
optimization of our guess and a numerical frequency calculation.

OptTS

OptTS is the ORCA command that will start at the approximate TS geometry guess you provide,
and converge to the TS and optimize the geometry of the TS. There are two variations on the
OptTS command to know about: calculating the TS with an approximate Hessian and with an
exact Hessian.

Calculating the TS with an approximate Hessian will have a command line that looks like this:
! RHF OptTS NumFreq

* xyz -1 1
(coordinates of your TS guess)

This method will only work if you are already close to a TS, and they are not guaranteed to work
- you may end up in a minimum instead of a TS.

Instead, it is usually necessary to use the OptTS command with an exact Hessian. The command
line will look like this:

! RHF OptTS NumFreq


%geom
Calc_Hess true # Calculate Hessian in the beginning
NumHess true # Request numerical Hessian (analytical not available)
Recalc_Hess 5 # Recalculate the Hessian every 5 steps
end

*xyz -1 1
(coordinates of your TS guess)

Calculating a Hessian is more expensive than approximating one, and this specific command line
has ORCA recalculate the Hessian every five geometry optimization cycle steps.

The reason I advertise both methods here is that if your reaction is simple and your TS guess
happens to be very close to the true TS, the OptTS command with an approximate Hessian could
save you a lot of time. I personally ran both versions of OptTS - with and without the exact

25
Hessian - and both took a very long time and hundreds of geometry optimization cycles. I ran
both OptTS calculations for six days before my computer had a mishap and terminated both
optimizations.

It is worth noting that the extreme length of these calculations may have been due to the limits of
my own device, but I think it is more likely that the number of atoms involved in my reaction
required many more geometry optimizations and therefore a lot of time. Keep this in mind when
performing your own calculations, and consider starting very early to let your calculations have
time to complete!

NumFreq

NumFreq is the other command for confirming the TS geometry of your reaction. It is often
paired with OptTS, as shown in the command lines above. NumFreq calculates harmonic
vibrational frequencies. As we know from our CBE 310 knowledge, the vibrational frequency of
a transition state is imaginary. ORCA will show this by calculating the vibrational frequencies of
the TS and the steps around it. An output of one negative (imaginary) frequency is ideal and
confirms that the optimized TS structure is indeed a saddle point.

NumFreq must be run on optimized structures or you may get confusing or inaccurate results.
The best way to ensure the NumFreq runs on an optimized structure is to pair it in an OptTS or
Opt+Freq job, even if the geometry has already been optimized.

My OptTS + NumFreq command for ORCA looked like this:

# Basic Mode
# input from input_im3.xyz from NEB_1.
#

! PM3 OptTS NumFreq

%geom
Calc_Hess true # Calculate Hessian in the beginning
NumHess true # Request numerical Hessian (analytical not available)
Recalc_Hess 5 # Recalculate the Hessian every 5 steps
maxiter 2000
end

* xyzfile -1 1 input_im3.xyz

26
Here I called upon the TS guess from the NEB-TS calculation as my input geometry
(“input_im3.xyz” is the coordinates of the third image/step). You can also call upon the TS guess
for your relaxed surface scan by pulling the .xyz file that corresponds to your TS guess step.

It is worth noting that I added the PM3 basis set as it is the simplest of basis sets, and I hoped to
speed up the calculation. I tried this OptTS + NumFreq calculation in a variety of ways, and this
version was the only that converged. It took 82 geometry optimization cycles and five hours to
run. However, some versions of this calculation took hundreds or thousands of geometry
optimization cycles. If you are unsure about how many cycles your optimization will take, you
can go ahead and set the maxiter high. ORCA won’t run through the number you preallocate
once it converges, it will just terminate. Setting a high maxiter initially may save you having to
continue a calculation that terminates early (how to do this on page 21).

The output gave me a fully optimized transition state geometry and confirmation with imaginary
numerical frequencies:
-----------------------
VIBRATIONAL FREQUENCIES
-----------------------

Scaling factor for frequencies = 1.000000000 (already applied!)

0: 0.00 cm**-1
1: 0.00 cm**-1
2: 0.00 cm**-1
3: 0.00 cm**-1
4: 0.00 cm**-1
5: 0.00 cm**-1
6: -172.65 cm**-1 ***imaginary mode***
7: 19.60 cm**-1
8: 26.27 cm**-1
9: 47.47 cm**-1
10: 51.13 cm**-1
11: 73.96 cm**-1
12: 80.42 cm**-1
13: 90.91 cm**-1
14: 113.26 cm**-1
15: 161.59 cm**-1
16: 179.71 cm**-1
...

27
In an ideal calculation, there will only be one imaginary mode to indicate one TS. The ORCA
Input Library notes that small imaginary modes (~ below 100 cm-1) can be indicative of noise in
the geometry optimization, and large imaginary modes (~ > 450 cm-1) can be indicative of a
symmetric saddlepoint. The imaginary mode above is -172.65 cm-1. Is ideal and confirms the
optimized TS has been found.

Image 11. The transition state geometry from this OptTS + NumFreq calculation as represented in
Avogadro.

Now that we have confirmed the TS pathway and geometry, we can move onto visualizing
molecular orbitals with IboView.

28
Visualizing Orbitals in IboView
IboView is a great tool for visualizing the intrinsic bond orbitals (IBOs) of molecules. IboView
can show you the bond orbitals for a molecule in a single configuration, like the transition state
geometry, or for a series of geometries, like the minimum energy path produced by a relaxed
surface scan or NEB, to make a small animation of orbitals changing.

Using the GPU

IboView is a graphics-heavy program, and is therefore only available on GPU computers


(different from normal VPNs or computers in the labs on campus). I compiled a quick guide on
how to access the GPU via remote desktop:

How to Access IBOview

If you encounter any problems, I recommend contacting ETS.

Finding Orbitals

Using IboView is pretty easy once you know what to search for.

First, email yourself or otherwise download your trajectory files and .xyz file for your final
geometry optimization of the TS onto the remote desktop. Open IboView and select
“Load/Exec…” on the right hand side of the screen. Load your .xyz file for the TS geometry or
the minimum energy pathway.

Once your molecule is loaded, select “Compute/Analyze Wf…” on the right side of the screen.
This will let you calculate the wavefunction for your molecule geometry. It is critical at this point
that you put in any charge your molecule may have, as well as specify the basis set you used for
your optimization. Specifying these options will decrease the time taken for the wavefunction
calculation significantly.

29
Image 12. Selecting the correct options when calculating the wavefunction for your molecule in IboView
will save a lot of computational time. I used the def2-SVP basis sets for my calculations, and my
molecule has a -1 charge. These options have been input into IboView, as can be seen in the photo.

Click “OK”. It should take a few minutes for the wavefunction to compute; a little longer if
you’re calculating the wavefunction for a trajectory file that has multiple geometries within it.

When the computation is complete, you’ll get this output at the bottom of the pop-up window:
*** WAVE FUNCTION COMPUTATION FINISHED.

Hit “Close” and then “Data Sets” on the right-hand side of the screen. This will pull up a list of
orbitals available on your molecule. Double click on an orbital to see it appear on your molecule.

If you have uploaded a trajectory file, click “Frames” under the “Data Sets” tab. Notice how your
minimum energy pathway predicted by your graph is displayed in the upper right hand corner
uner “Energy, Gradient, & Orbital Change”. Try changing the frame with the “Current Frame”
dial or the “#Frame” option at the bottom of the tab. Watch how your molecule follows the
minimum energy path predicted by Avogadro. Next try spinning the “Track Orbital” dial. Notice
how as the orbitals change, pathways appear on the energy graph above.

30
Image 13. On the left-hand side of the screen we have our reactants for the aspirin/serine reaction. On the
right-hand side of the screen we see the minimum energy reaction pathway and the energies of some
selected bond orbitals.

Scroll through the orbitals with the “Track Orbital” dial and when you come across an orbital
with an energy more than zero, click “Show # __'' to show the orbital on the molecule and retain
the energy pathway on the graph. This is a quick and easy way to find orbitals of interest (as
opposed to clicking through all hundreds of orbitals to find interesting ones). When you have
selected all orbitals of interest. hit Ctrl+T to trace the surfaces of each orbital. This may take a
second.

Now swivel the “Current Frame” dial again and watch your selected orbitals move and interact!
Follow the line on the energy graph that tells you what step you’re at.

31
Image 14. This is the transition state for my aspirin/serine reaction. You can see that in this configuration,
bond orbitals are in the middle of stretching and transitioning between molecules.

Save photos of your reaction by going back to the “Actions & Files” tab and hitting “Save
Picture as”. Take note that this button will save a photo of your current frame, not all of them.
You will need to save a new photo at every frame and compile them to get an animation.

32
Image 15. An animation of the aspirin/serine reaction’s bond orbitals transforming through the course of
the 10-step minimum energy pathway. Bonds calculated and visualized with IboView.

Also note that you cannot save IboView files to come back to later. You must either save your
molecule as images or recalculate the wavefunction for your reaction the next time you open
IboView.

Congratulations! You’ve successfully visualized the transition state orbitals of a chemical


reaction of interest. Enjoy your reaction!

Image 16. Some funky orbitals for a proposed transition state of the aspirin/serine reaction that ended up
being incorrect, but still looks rad!

33
Continuing Calculations That Have Been Terminated Early
If you had a calculation running that terminated early for any reason - computer graphing, VPN
connection failing, or running out of geometry optimization cycles - there are ways of picking up
where the calculation left off without having to start over again. The trick is usually just getting
very familiar with ORCA files and what they tell you.

Here’s an example of continuing a calculation that had been terminated early:

This OptTS + NumFreq calculation had to be terminated early after running for ~7 hours on a
campus computer when I had to move locations.

! PM3 OptTS NumFreq

%geom
Calc_Hess true # Calculate Hessian in the beginning
NumHess true # Request numerical Hessian (analytical not available)
Recalc_Hess 5 # Recalculate the Hessian every 5 steps
maxiter 2000
end

* xyzfile -1 1 input_im4.xyz

The calculation was 211 geometry cycles into optimizing the transition state and had calculated
42 Hessians (this command had ORCA recalculate the Hessian every five steps). Instead of
scrapping this calculation and finding another day to sit down for 7+ hours and re-run the OptTS
+ NumFreq, I emailed myself these key files to run on a different computer another time:

1. The input file (PM3_NEB_try2.inp). Having this file allows me to check the initial input
and remember what the goal of the calculation is.
2. The output file (PM3_NEB_try2.out). The output file allows me to remember exactly
where the calculation left off, how many geometry optimization cycles it had gone
through, how many Hessians calculated, and why the calculation stopped.
3. The latest Hessian file (PM3_NEB_try2.042.hess). This file will be key for resuming the
calculation where it left off.
4. The .xyz file (PM3_NEB_try2.xyz). This file contains the latest geometry optimization
coordinates. It will be key for resuming the calculation where it left off.

I recommend not deleting or “running over” any existing files. Instead, create a new folder for
the key files you need for your next run, and start over there. I personally found that when I tried
to re-run a calculation without changing anything (name, folders, etc.) the PowerShell never
entered to the next line to indicate completion of the calculation, and the timestamps were rarely

34
updated in the folder. It’s easier both for organization and peace of mind to copy those files into
another location and begin again.

The new input from the incomplete calculation:

# written by me via relaxed scan/TS tutorial > ORCA input file.


12/12/20
# Basic Mode
# input from input_im3.xyz from NEB_2. Starting with Hessian 42 from
PM3_NEB_try2 (PM3_NEB_try2.042.hess)
#

! PM3 OptTS NumFreq

%geom
inhess Read
InHessName "PM3_NEB_try2.042.hess"
maxiter 2000
end

* xyzfile -1 1 PM3_NEB_try2.xyz

Note that the command line (! line) is the same, but the initial Hessian and input coordinates are
different. These are the last Hessian the previous calculation made, and the last known geometry.
By loading these files from the previous calculation into a new one, we are essentially starting
the calculation where the previous one left off while re-starting the geometry optimization max
iterations.

Also notice I made some notes at the top of the command line to remind myself what I was doing
and where the files I am calling upon come from. With ORCA there can get to be an
overwhelming number of files quite easily, so it’s a good idea to come up with a system for
staying organized and stick to it throughout your calculations!

You can use this system of calling upon previous outputs for new calculations for most
calculations that terminate early.

35
Works Cited
Blobaum, Anna. “Structural and Functional Basis of Cyclooxygenase Inhibition.” Journal of
Medicinal Chemistry 50, no. 7 (April 5, 2007): 1425-1441.

J.R Vane, R.M Botting, “The mechanism of action of aspirin.” Thrombosis Research, 110 no.
5–6 (2003): 255-258.

Klein, Johannes E M N. “Electron Flow in Reaction Mechanisms-Revealed from First


Principles.” Angewandte Chemie (International Edition in English) 54, no. 18 (April 27,
2015): 5518–5522.

Knizia, Gerald. “Intrinsic Atomic Orbitals: An Unbiased Bridge Between Quantum Theory and
Chemical Concepts.” Journal of chemical theory and computation : JCTC. 9, no. 11
(November 12, 2013): 4834–4843.

Zhang, Yingkai. “Mechanistic Insights into a Classic Wonder Drug—Aspirin.” Journal of the
American Chemical Society. 137, no. 1 (January 14, 2015): 70–73.

Other useful sites utilized in the making of this walkthrough:


Learning Avogadro Handbook
ORCA Manual
IBOView Website
ORCA Input Library
- Geometry Optimizations
- Saddlepoint
- Vibrational Frequencies
ORCA Tutorials Site
Brief Walkthrough of TS Search by the University of Waterloo
How to Access IBOView tutorial
Video Walkthrough of this Tutorial

36

You might also like