You are on page 1of 31

How to Conduct a Computational

Chemistry Research Project …


1.WHAT DO I WANT TO KNOW?

2.HOW ACCURATELY?
Search in the literature for the
last information in a subject
2. HOW ACCURATELY?
For example, compare experimental with computational chemistry

ERROR
SUBJECT OPERATION ESTIMATION

Analytical chemistry
Standard deviation

Repetition of measurements

Computational chemistry
Practically
same results!
with the exception
of Monte Carlo
calculations.
Compare with
•Experimental answers or
•More rigorous computations
2. HOW ACCURATELY?
What do I want to consider?

for example…
•Should I add a zero point energy (ZPE)
correction?

•Is there spin contamination?

•Is spin-orbit coupling important?

•Is the effect of the solvent relevant?


2. HOW ACCURATELY?
The election of the methodology…

•type of calculation

•basis sets

•ECPs

•force fields
2. HOW ACCURATELY?

In a imperfect world…

•Many methods exist


because each is best for some situation.

•The trick is to determine


which one is best for a given project.
2. HOW ACCURATELY?
The election of the methodology:

•Look at other computational research


studies.

•Perform a short study to verify


the method's accuracy.

•Know merits and drawbacks of various


methods and software packages.

in order to make an informed choice.


Look at other computational research studies.

a particular set of studies examine the accuracy of


computational techniques for modelling a particular
compound or set of related compounds.
•This is usually done by giving some sort
of average error for a large collection of
molecules.

•Most of them focused on collections of


organic and light main group compounds.
Look at the following examples that compare the
methods (left column) with their calculated properties
Let’s take an example:
A researcher wants to start a study of a
polymer with complex monomers:

1. The researcher examines the literature to determine that an ab initio


method with a moderately large basis set will give the desired accuracy
of results.

2. Single point for the monomer, 2 minutes


3. Geometry optimization for the monomer, 20 minutes

4. The calculation scales as N4

5. Geometry optimization for the trimer 34 x 20 minutes or about 27h.

6. Would like to model up to a 15-unit chain, which would require 15 4 x 20


minutes or about 2 years.

7. Geometry optimization is not acceptable because…


…the
researcher
wants to
complete the
work in a
reasonable
amount of
time!
Calculation of integrals are usually a bottleneck
in ab-initio calculations
Some alternatives:
conventional integral evaluation
many ab initio programs use hard disk space to store numbers that are
computed once and used several times during the course of the
calculation.
(these are the integrals that describe the overlap between various
basis functions)

direct integral evaluation


the numbers are recomputed as needed.
(use less disk space at the expense of requiring more CPU time to do the
calculation)

incore algorithm
stores all the integrals in RAM memory thus saving on disk space at the
expense of requiring a computer with a very large amount of memory.

semidirect algorithm
which uses some disk space
and a bit more CPU time to obtain the optimal balance of both.
TIME COMPLEXITY
(how the use of computer resources (CPU, time, memory, etc.) changes
as the size of the problem changes)

total amount of CPU time required to do a HF calculation scales as


N4 + N3 + N + C

C = time which have to be done regardless of the size of calculation,


such as initializing variables and allocating memory.

The standard matrix inversion algorithm for HF requires N3 operations.

Computing the two-electron Coulomb and exchange integrals for a HF


calculation takes N4 operations.

At the end of the calculation, the orbital energies must be added. Since
there are N orbitals, there will be N addition operations.

But if N is large then N4 dominates.


M is the number of atoms

L the length of one side of the box containing the


molecules in a calculation using periodic boundary conditions

A the number of active space orbitals

N the number of orbitals in the calculation.


Geometry optimization takes longer because …

•many calculations must be done as the geometry is changed


•each iteration takes longer in order to compute energy gradients.

The amount of CPU time required for a geometry optimization


Topt,
depends on the number of degrees of freedom, denoted as D.
(degrees of freedom are the geometric variables being optimized,
such as bond lengths, angles, and the like)

As a general rule of thumb,


the amount of time for a geometry optimization can be estimated
from the single-point energy CPU time, Tsingle, with the equation

Topt ≈ 5 x D2 x Tsingle
Example

1. An ab initio method with a moderate-size basis set and minimal


correlation may be used for optimization

2. Then a single point calculation with more correlation and a larger


basis can be used for the final energy computation.

This would be denoted with


a notation like MP2/6-31G*//ccsd(t)/cc-pVTZ.

Molecular mechanics or semiempirical calculations may be used to


determine a geometry for an ab initio calculation.

Molecular mechanics is nearly always used for conformation searching.

But be careful…
vibrational frequencies must be computed with the same level of theory
used to optimize the geometry.
1. Examines the literature to determine that an ab initio method with a
moderately large basis set will give the desired accuracy of results.

2. Single point for the monomer, 2 minutes


3. Geometry optimization for the monomer, 20 minutes

4. The calculation scales as N4

5. Geometry optimization for the trimer 34 x 20 minutes or about 27h.

6. Would like to model up to a 15-unit chain, which would require 154 x 20


minutes or about 2 years.

7. Geometry optimization is not acceptable.

8. Wisely decides to stop at the 10-unit chain and use geometries


optimized with molecular mechanics methods, which takes under
an hour for the optimization.

9. Obtains the desired results with single point ab initio calculations,


which take 104 x 2 minutes or 2 weeks

10.feasible since she has her own work station with an uninterruptible
power supply.
Labor cost
(labor necessary on the part of the user)

easier to learn to use more complicated


software

than

to purchase a supercomputer to solve a problem

that could be
done by a workstation with different software.
PARALLEL COMPUTERS

Ideally, a calculation that takes an hour on a single CPU


would take half an hour on two CPUs.
This is called linear speed-up.

In practice, this is not possible because the two CPU calculations


must do extra work to divide the workload between the two
processors and combine results to obtain the final answer.

But, few types of algorithms give nearly perfectly linear scaling


because of the nature of the algorithm and the amount of work that
the developer did to parallelize the code.

•Many Monte Carlo algorithms can be parallelized very eficiently.

•Few programs for which our hypothetical hour calculation would


take 1.5 hours on a two-CPU machine!

•Some of the correlated ab initio algorithms are very dificult to


parallelize eficiently.
PARALLEL COMPUTERS

Problem: Software written for single-processor computers will not


automatically use multiple CPUs.

And
compilers are usually inefficient for sophisticated computer programs.
Some software packages can be run on a networked cluster of
workstations as though they were a multiple-processor
machine.

However,

the speed of data transfer across a network is not as fast as the speed
of data transfer between the CPUs of a parallel computer.
(Fine-grained algorithms)

large-grained algorithms
Some algorithms break down the work to be done into very large
chunks with a minimal amount of communication between processors

and they work as well on a cluster of workstations as on a parallel


computer.
WHAT APPROXIMATIONS ARE BEING MADE?
WHICH ARE SIGNIFICANT?

Mistake: e.g. investigate vibrational motions that are very anharmonic


with a calculation that uses a harmonic oscillator approximation.

To avoid such mistakes, it is important

•the researcher understand the method's underlying theory

•determine what software is available, what it costs, and how to properly


use it.

•Note that two programs of the same type (i.e., ab initio) may calculate
diferent properties so the user must make sure the program does exactly
what is needed.

•When learning how to use a program, dozens of calculations may fail


because the input was constructed incorrectly.

Do not use the project molecule to do this.


Make mistakes with something inconsequential, like a water
molecule.
List of Computationally Sick Species
http://srdata.nist.gov/sicklist/
(unfortunately as Sep.07 the site was temporarily down)

The types of problems in the Sicklist database include


wrong geometries (see C2H),

unreasonable vibrational frequencies (see NS and CH3),

and bad energetics (see Be2).

We are not interested in code dependent problems,


such as SCF convergence problems with GAMESS
or MOLPRO for a particular molecule.
The computational chemistry list (CCL)
consists of a list server and web site http://server.ccl.net/

•The web site contains information about computational


chemistry and the archives from the discussion list.

•Subscribing to the list results in receiving about twenty


messages per day.

•This is a good way to watch discussions of current issues.

•The etiquette on the list is that you attempt to find an


answer to your question in the library and the web archives
before asking a question.

•Once you have asked a question, please post a summary of


the responses received.
Theoretical work is often reviewed
Advances in Chemical Physics
Advances in Molecular Electronic
Structure
Advances in Molecular Modelling
Advances in Quantum Chemistry
Annual Review of Physical Chemistry
Recent Trends in Computational
Chemistry
Reviews in Computational Chemistry

Reviews of computational work are


Chemical Reviews
Chemical Society Reviews
Structure and Bonding
How to Conduct a Computational
Research Project …
Chapter 16
Computational Chemistry: A Practical Guide for Applying
Techniques to Real-World Problems.
David C. Young
2001 John Wiley & Sons, Inc.

http://www3.interscience.wiley.com/cgi-bin/booktoc/93517240

and how to publish it


(according to the IUPAC)

Guidelines for Presentation of Methodological Choices in the


Publication of Computational Results
A. Ab Initio Electronic Structure Calculations
(IUPAC Recommendations 1998)
http://www.iupac.org/reports/1998/7004boggs/guidelinesa4.pdf

You might also like