You are on page 1of 36

i

New Solvation Radii for the CPCM Solvation Model: Addition of Nitrogen

by

Ruveid Rizvic

A Research Paper
Submitted in Partial Fulfillment of the
Requirements for the
Bachelor of Science Degree
in
Biochemistry

Under the direction of


Dr. Adam Moser

_____________________________ _____________________________
Dr. Adam Moser Dr. Andrew Kehr

Loras College
May 2022
ii

Division of Molecular, Life, and Health Sciences


Chemistry/Biochemistry Program
Loras College
Dubuque, IA

Author: Ruveid Rizvic


Title: New Solvation Radii for the CPCM Solvation Model: Addition of
Nitrogen
Degree/ Major: BS Biochemistry
Research Adviser: Dr. Adam Moser
Month/Year: March 2023
Number of Pages: 33
Abstract

Solutions are fundamental to chemical and biological processes. In solutions, solutes

dissolve in solvent through multiple processes that are difficult to understand physically. These

processes can be better understood through computational solvation modeling. Implicit solvation

is one computational model which requires a defined cavity. The Conductor-like Polarizable

Continuum Model (CPCM) models the solute in the solvent’s cavity. The solvent is represented

as a continuous dielectric medium and has no physical properties known. The solute atom’s

solvation radii depict the range of the cavity, which determines the radii definition of the solute.

Existing radii definitions (UFF, Bondi, Pauling) were not made for solvation modeling as it used

atom radii instead of solvation radii. These radii definitions do not accurately represent and could

not replicate Gibb’s experimental values for both neutral and charged molecules, a challenge for

solvation models. New solvation radii definitions were made using carbon, hydrogen, and

oxygen molecules to improve accuracy compared to these previous radii definitions. In this

research, an addition of nitrogen-containing solutes was added to a total of 565 solutes from the

Loras Solvation Database in 420 solvation radii combinations. These combinations were

modeled using the CPCM solvation model. This resulted in 9 new solvation radii definitions that

were within reasonable error ranges and represented a realistic radii combination for each unique
iii

atom. 21 solvation radii combinations for neutrals, 3 combinations for cations, and 5

combinations for anions were within their own exceptions for error analysis. Results varied

between charged and neutral solute subgroups based on the size of oxygen’s solvation radii.

Bigger radii combinations performed best for neutral solutes and oppositely for the anion and

cation solutes.
iv

Acknowledgments

I would like to thank the Loras College chemistry and biology departments throughout

these past few years for helping me get to this point in my education. I am thankful to my peers,

the class of 2023, and the Loras Computational Chemistry Lab, for their extensive help and

support. A big thank you to Nicholas Haskin and Emma Hoefer for helping with data

organization and getting me started with this research. I would also like to thank my friends

outside my Loras Community for their continued encouragement in my endeavors. Lastly, my

sincerest gratitude to my research mentor, Dr. Adam Moser. You have helped me immensely

throughout my chemistry education, research, and thesis preparation. I would not be where I am

without your support.


v

Table of Contents

.......................................................................................................................................Page

Abstract ...............................................................................................................................ii

List of Figures.....................................................................................................................vi

List of Tables ....................................................................................................................vii

Introduction .........................................................................................................................1

Methods ............................................................................................................................11

Results and Discussion .....................................................................................................14

Conclusion.........................................................................................................................21

References..........................................................................................................................24

Appendix ...........................................................................................................................26
vi

List of Figures

.......................................................................................................................................Page

Figure 1. Visualization of the solute in solvent...................................................................1

Figure 2. Explicit Solvation vs. Implicit Solvation.............................................................4

Figure 3. Model of the molecule methoxide presenting individual solvation radii spheres…

.............................................................................................................................................5

Figure 4. Comparison of Gibb's experimental vs. calculated solvation energy for solutes in radii

combination for carbon 1.7, hydrogen 1.1, oxygen 1.4, and nitrogen 1.6.........................16

Figure 5. Comparison of Gibb's experimental vs. calculated solvation energy for solutes in radii

combination for carbon 1.9, hydrogen 1.1, oxygen 1.4, and nitrogen 1.6.........................17
vii

List of Tables

.......................................................................................................................................Page

Table 1. Atomic radii for existing radii definitions.............................................................7

Table 2. Neutral, Cation, and Anion Errors for the three existing solvation radii definitions.

.............................................................................................................................................8

Table 3. Haskin's results within the 11 new solvation radii definitions..............................9

Table 4. Solvation radii ranges for each unique atom.......................................................12

Table 5. The resulting 9 radii definitions for all solutes including nitrogen......................15

Table 6. Solutes that consistently failed within the 9 new radii definitions......................15

Table 7. Resulting 3 radii definitions for cation solutes....................................................18

Table 8. Resulting 5 radii definitions for anion solutes.....................................................18

Table 9. Resulting 21 radii combinations for neutral solutes............................................19


viii

Introduction

Solutions, where a solute is dissolved in a solvent, are crucial in the world of chemistry

because most important chemical and biochemical processes occur in solution. As an example,

the human body conducts many important chemical reactions in water. Cells in the human body

contain 70% of the body’s water, with the rest of the water mostly being contained around the

cells. Important biochemical reactions, such as protein interactions and ion channels, occur in

and around these cells. Reactions on the solute can sometimes only be observed or discovered in

these solutions. A gaseous molecule can be difficult

to conduct reactions on because the molecules are

not likely to interact with other molecules, but with

an aqueous solute interacting with an aqueous

solvent, the reactions can be easily performed as

ions are free to move around more easily. The

solvent is typically a liquid, with the most common


Figure 1. Visualization of the solute (multicolored)
solvent being water. Water is an important solvent in a solvent (blue). The white space is the cavity
formed after the solute broke through the surface
due to its polar nature and ability to not only form tension of the solvent.

London Dispersion Forces, but other stronger interactions such as hydrogen bonding and dipole-

induced dipole forces. When the solvent dissolves the solute, this process is known as solvation.

Many things occur and change the solute up to this point, which helps better understand the

process of solvation.

Solvation is a process with multiple steps. The first step is the breaking of the solvent’s

surface tension as it is entering the solution. This process is energetically unfavorable due to the

energy needed to break the intermolecular forces between solvent molecules. Once the solute
ix

manages to intrude into the solvent, a cavity is formed in the solvent for the solute to reside in.

The gain and loss of energy are the result of the making and breaking of intermolecular forces

between the solvent molecules. 2 Following this, repulsion and dispersion forces occur between

the solvent and the solute, along with electrostatic contributions and polarization.

These four processes; breaking the solvent’s surface tension, formation of the cavity,

dispersion, and repulsion, all affect the intermolecular forces as the solute enters and travels in

the solvent. It is known that the solute in a solution is affected geometrically and electronically.

Once within the cavity, the solvent causes the solute’s electronic and geometric structure to

change as a response. Geometric responses happen in solution phase reactions between the solute

and the solvent, compared to the reactions in the gas phase with no solvent. However, when the

solute transitions from the gas phase to the solution phase, both responses will occur. When the

solute enters the solvent, there is a change in polarization between atoms in the solute and

between the solvent. The solutes in the solution phase also undergo a change in geometry

resulting from the change in polarization. Additionally, the solvent responds to the solute by

arranging itself around the solute, depending on if they both have polar or nonpolar properties.

The four steps mentioned above result in a favorable or unfavorable solvation process

depending on the strength of these interactions for various solutes. The energy needed to break

the surface tension and form the cavity in the solvent is costly for the solute, making it more

stable. When inside the cavity, dispersion and repulsion is typically a gain in energy for the

solute. The sum of energy gained or lost during the process of solvation is known as the Gibbs

energy of the solute in the aqueous phase (ΔGaq). Using measurements of other changes such as

enthalpy as entropy, this term can be determined experimentally. However, it is difficult to

measure the change in energy for large solutes, such as proteins, when there are many physical
x

processes occurring. Therefore, to better understand how the solvent interacts with the solute,

these interactions can be modeled in computational calculations. The use of quantum chemistry

is needed to represent the solute.

Quantum chemistry applies quantum mechanics in chemical systems such as solutions to

study the physical and chemical properties of molecules and their reactions. 1 These physical and

chemical properties are determined by the electronic structure. The Schrödinger equation, once

solved, can allow the prediction of a molecule’s properties and behavior. This allows for the

modeling of physical processes that can be done through physical experimentation. Quantum

mechanics accurately represents the solute. It is one of the most accurate methods used to predict

chemistry in the gas phase. However, there are major limitations to using quantum chemistry. It

requires a significant amount of computational power and time to perform calculations for

systems with large molecules. It requires parameters such as the theory and basis set, which

makes calculations complex depending on the choices inside these parameters. Also, quantum

chemistry is only beneficial for chemistry in the gas phase, as most complex chemical processes

occur in the solution phase, and quantum chemistry does not give insight into the loss or gain of

energy in the solution phase.3 Nevertheless, quantum mechanical methods use a range of theories

to investigate how atoms and molecules behave. With the method of solving the properties and

behavior of the solute figured out, the next step was to determine how to represent the solvent.

Solvation models model how solvents behave and affect the solute in a solution. They are

designed to explain how the solvent and the solute interact. Some computational models are

involved in simplifying calculations made and minimizing the terms used to perform the

computation, like the implicit solvation model which will be discussed shortly. Although the

models used are not as accurate as the models used in quantum computations because they don’t
xi

focus greatly on electronic and geometric structures, they are more reasonable to produce given

the shorter computational time required. Since this research will be focusing on a large range of

molecules, big and small, this research will use solvation models as it is more reasonable and

could allow progress in developing a more effective method of representing the solvent

compared to quantum chemistry.

There are two main types of solvation models used to represent the solvent. The first

type, explicit solvation, represents the solvent molecules around the solute (Figure 2). 4,9 This

model allows for locations where the quantum mechanical solvent is interacting with the solute.

As previously explained, the ability to accurately model the interactions between the solvent and

the solute can allow for more accurate calculations. However, the computational time needed is

Figure 2. Explicit Solvation vs. Implicit Solvation. The Explicit Solvation Model contains explicit solvent molecules
(blue) surrounding the solute molecule (red and white). The placement of the individual solvent molecules does
not visually leave a defined cavity in which the solute resides. The Implicit Solvation Model replaces the solvent
with a dielectric constant (blue stripe). The effect of the dielectric constant and the impact it has on all points of
the solute, gives the solvent cavity a defined boundary (solid black).
extensive and therefore not effective for this research.9

Implicit solvation models replace the explicit solvent molecules with a continuous

dielectric medium (Figure 2). This means that the system is only as big as the solute. 5 The

solvent now has no information regarding its structure. The construction of the implicit solvation
xii

model differs in many ways from that of an explicit

solvation model due to a need for a cavity, which will be

discussed shortly. While the implicit solvation model may

not accurately depict the interaction between the solvent and

solute as well as the explicit solvation models, it is

significantly better in terms of computational cost compared

to the explicit solvation model. Therefore, in this research,

the implicit solvation model will be used.

Figure 3.A crucial


Model variable
of the to define
molecule is where the solvent ceases, also known as the cavity, as seen
methoxide with its atoms corresponding
with color presenting individual in the black outline around the solute in Figure 3. The cavity
O
solvation radii spheres (dashed).
Between the dashed circle and the atom
corresponding to that circle is known as
is the space between the solute and the solvent. There are
the cavity.
many options to consider when constructing a solvent model

with a cavity. There is the construction of the surface of the boundary in which the cavity ceases.

The most common construction of the surface is the van der Waals surface, which has a spherical

shape corresponding to its atom. Then, there is the surface type, which defines the shape of the

cavity depending on which atoms have it. The cavity shape corresponds to the molecule;

however, it should not be mistaken for the solvation radii of each atom. The size and shape of the

cavity represent the solvation radii definition. There are different radii definitions that are used to

define the shape of the cavity, one being the all-atom definition (AA). This definition allows for

the cavity to be defined by all atoms in a solute, whereas other definitions do not account for all.

Being able to change the size of the cavity could lead to stronger or weaker interactions between

the solvent and solute.10 The solvation radii can be seen as a sphere around each atom as seen in

Figure 3. Changing, or scaling, the size of solvation radii is known as alpha scaling.
xiii

If the overlapping lines of each atom’s solvation radii were to be removed, one could

visualize it to be the shape of the cavity. The size of the cavity affects the solvation energy. For

example, if the size of the cavity becomes smaller allowing the solvent to interact more closely

with the solute, the intermolecular forces would become stronger, therefore, leading to a greater

change in free energy between the solution phase and the gas phase. Then, there are ions and
C
neutrals to consider when modeling solvation. Neutrals are simpler compared to ions and

therefore more predictable to adjust the size of the cavity. Ions on the other hand require more

variables and are therefore less predictable. Changing the solvation radii size does not have to

apply to all the atoms defining the solute, as a combination of radii changes can be made to a

selective atom. This could allow the reproduction of experimental solvation values through radii

combinations in a model.

With the cavity included in all the terms discussed; electrostatics, repulsion, and

dispersion, the Gibbs energy can be calculated in the aqueous phase in Equation 1.

ΔGaq = ΔGele + ΔGrep + ΔGdis + ΔGcav/st (Equation 1)

Gibb’s solvation energy is an experimental value, which then can be used as a foundation to

compare calculated values from computational methods. Gibb’s energy of solvation can then be

calculated through the difference in the solution phase energy (∆ aqG) and the gas phase energy

(∆gasG), as shown in Equation 2.

∆solvG = ∆aqG - ∆gasG (Equation 2)

To obtain Gibb’s solvation energy error of the solutes in these radii definitions, the difference

between the calculated and experimental solvation energy was calculated, as shown in Equation

3.

Error = ∆solvG (exp) - ∆solvG (calc) (Equation 3)


xiv

There are three existing solvation radii Default Radii (angstroms)


Atom UFF Pauling Bondi
definitions (Table 1) which account for hydrogen’s C 1.9255 1.5 1.7
H 1.443 1.2 1.2
solvation radii along with all other unique atoms, O 1.75 1.4 1.52
N 1.83 1.5 1.55
creating a more defined cavity. Neuzil attempted to Table 1. Atomic radii for existing radii
definitions (UFF, Pauling, Bondi). Atom radii
successfully use a CPCM model using these radii correspond to the four unique atoms
represented in this research.
definitions, specifically UFF, Bondi, and Pauling definitions which include the radii of hydrogen,

to replicate the experimental solvation energy of various solutes. 3 The CPCM model is the

implicit solvation model with the solute in the cavity that also contains the options listed

previously for a solvent model. She tested the performance of each boundary definition when

turning non-electrostatics both off and on.

UFF Bondi Pauling UFF Bondi Pauling UFF Bondi Pauling


Neutrals Cations Anions
MSE 6.83 0.93 0.56 18.88 5.77 4.31 15.88 5.20 1.54

MUE 6.83 2.02 2.60 18.88 5.98 4.80 15.88 6.39 4.25

RMSD 7.57 2.49 3.29 19.40 7.29 6.26 17.42 7.81 5.27
Table 2. Neutral, Cation, and Anion Errors for the three default solvation radii definitions with non-electrostatics
on. The Mean Signed Errors (MSE), Mean Unsigned Errors (MUE), and Root Mean Squared Deviations (RMSD)
are listed under each radii definition.3

The existing boundary definitions used in Sloan’s work did not bring promising results,

deeming the boundary definitions not designed for solvation modeling. 3 As shown in Table 2, the

average solvation errors for both cations and anions using the three boundary definitions came

out too positive or negative for Bondi and UFF definitions for the CPCM solvation model. The

errors in both cations and anions performed better for Pauling, however, it still was not

acceptable. For the neutrals, Bondi and Pauling obtained mean signed error (MSE) values within
xv

±1 kcal/mol. Overall, the UFF definition did not perform well for any molecule type while

Pauling and Bondi definitions could only perform well for neutrals.

It is difficult to generate an acceptable radii combination that performs well for all

molecules. As shown in Sloan’s research, charged molecules are a challenge to please using

these existing radii definitions to best represent all solutes. The leading reason for this is because

these three radii definitions were based on atomic radii instead of solvation radii. This led to the

future research of Nick Haskin’s. His goal was to improve the accuracy between calculated and

experimental Gibb’s solvation energies for the CPCM solvation model using new solvation radii

definitions. To achieve this, Haskin’s work consisted of choosing solvation radii ranges based on

the existing radii definitions, calculating the solvation energy through computational modeling,

and comparing the calculated solvation energy with the experimental solvation energy with the

use of average error and absolute average error. Therefore, Nick Haskin has searched 281 new

radii combinations for 354 molecules containing carbon, hydrogen, and oxygen. 12 Looking at

mean signed errors for radii definitions can be deceiving because molecules can have equally

over-solvated and under-solvated energies, making the average error seem small while the

magnitude and spread of the error can be large. It was crucial to look for radii definitions where

all solutes in that specific radii combination are within acceptable error ranges. The term “fails”

when looking at which molecules had exceeded the threshold in solvation error is considered to

be ±5 kcal/mol for neutrals due to neutrals being relatively simpler to replicate solvation energy

within this range. Any error beyond this range is unacceptable for neutrals as it is overall

difficult to not hit this range. Considering ions tend to have a harder time replicating

experimental solvation energies due to their big magnitude in solvation energy values, therefore

leading to an easier time having big fluctuations for calculated values, their threshold is ±10
xvi

kcal/mol. Going back to Haskin’s C O H MSE MUE RMSD


1.7 1.5 1.2 0.58 2.60 3.32
findings, some radii definitions had 1.7 1.4 1.2 -0.15 2.70 3.42
1.8 1.5 1.2 0.41 2.23 2.97
most neutral solutes mostly within an 1.8 1.4 1.2 -0.31 2.33 3.07
1.8 1.7 1.1 -0.68 2.55 3.74
error range of ±5 kcal/mol, but almost 1.9 1.4 1.2 -0.14 2.14 2.90
2 1.6 1.1 -0.64 2.20 3.17
half of the ions being fails, or greater 2.1 1.6 1.1 0.59 1.89 3.28
2.1 1.5 1.1 -0.20 2.04 3.00
than the acceptable error range of ±10 2.1 1.4 1.1 -0.85 2.31 3.10
2.1 1.6 1 -0.65 2.44 3.47
kcal/mol. For other radii definitions,

the opposite could be said with more

neutrals being considered fails and

fewer for the charged solutes. A total of 11 possible radii definitions were found for all solutes

within acceptable statistical error ranges (Table 3). As shown, all 11 radii definitions have

average error values within the acceptable MSE range. It is unknown currently which radii

definitions were optimal for specific solute subgroups like cations and anions, however with 281

possible solvation radii combinations, there could be combinations that accurately represent

them.

In this research, nitrogen-containing solutes were added in new radii combinations. . This

gave more charged molecules to be analyzed compared to Haskin’s set of molecules. There were

two main goals to obtain from this research. The first goal was to search for new solvation radii

combinations containing all molecule subgroups (neutrals, cations, and anions) with the addition

of nitrogen, as well as finding which radii combinations worked best for each subgroup

individually. These radii combinations were then to be compared to Neuzil’s radii used from the

three existing radii definitions. The second goal was to suggest a new subset of radii for future

searches. Solvation radii ranges from Haskin’s research were taken into consideration to find any
xvii

similarities and differences of his radii combinations to these newer radii combinations. Once the

calculated Gibb’s solvation energy is determined, it can be compared to the experimental to get

the solvation energy error as described in the previous equation (Equation 3). This research

intends to replicate experimental solvation values by constructing a new set of solvation radii,

with nitrogen-containing solutes added, designed for the CPCM solvation model.
xviii

Methods

A total of 565 solutes consisting of carbon, hydrogen, oxygen, and nitrogen were

obtained from the Loras Solvation Database. These were all derived from the Minnesota and

Mobley Molecular Databases.6-7 Within these 565 solutes, 475 were neutrals, 48 were cations,

and 42 were anions. The first step to obtaining Gibb’s solvation energy was to obtain a

coordinate map for every atom in all 565 solutes, giving the geometry or location of the atoms in

the molecule. An example of cavitation, repulsion, and dispersion variables used in these

coordinate maps are shown in Appendix A. The molecular geometries, as well as experimental

Gibbs aqueous and gaseous energies, were already stored through the Loras Solvation Database.

To represent the solute in the solvation model, quantum mechanical methods were used. The

geometry of the solutes was represented in the gas phase because this takes less time to compute

compared to solutes in the aqueous phase. Hartree Fock theory and 6-31G(d) basis set was used

to calculate the solute due to its simplicity and its use in previous research.

Solvent cavities were created using Gaussian16 software. The solvent was also in the gas

phase geometry, utilizing the van der Waals surface construction for creating the boundary or

shape of the cavity. To generate radii definitions, the all-atom surface type was applied to every

atom in the solute, giving each atom a solvation radii sphere. In implicit solvation methods, alpha

scaling is a crucial factor to consider in calculations, as it represents the multiplicity of the

solvation radii size. The alpha value was set to default (1.1), like Sloan Neuzil’s alpha value.

With Haskin’s radii definitions being scaled to a 1.0 value, it would be inaccurate to compare
xix

how well these new radii definitions replicated solvation with his without knowing precisely

what the new radii would be when it is downscaled.

The solvation radii range for each atom was similar to Haskin’s 12, however, the addition

of nitrogen contributes to smaller radii ranges. In this research, carbon’s solvation radii size

ranged from 1.7 to 2.1 angstroms, hydrogen ranged from 1.0 to 1.2 angstroms, oxygen ranged

from 1.4 to 1.7 angstroms, and nitrogen ranged from 1.4 to 2.0 angstroms, with a 0.1-angstrom

increment change (Table 4). With ranges from these four elements with 0.1 increment changes,

every molecule consists of a total of 420 radii combinations.

Atom C H O N
Solvation Radii
1.7 – 2.1 1.0 - 1.2 1.4 - 1.7 1.4 - 2.0
(angstroms)
Table 4. Solvation radii ranges for each unique atom (in angstroms), with 0.1 angstrom increment changes within
the range.

Each radii combination for a solute displayed a calculated solvation energy which can be

compared to the experimental energy. Calculating the difference between the experimental and

the calculated Gibbs solvation energy values gives the error between the two.

For statistical analysis, the Mean Sign Error (MSE),


n
1
MSE = ∑ (x i−^x i)
2
(Equation 4)
n i=1

Mean Unsigned Error (MUE),


n
1
MUE = ∑ ¿ ( x i− x^ i ) ∨¿ ¿
n i=1

(Equation 5)

and Root Mean Squared Deviation (RMSD),


xx


N

RMSD = ∑ ( x i− ^x i)2 (Equation


i=1
N

6)

across all the calculated Gibb’s solvation energy error values were determined. The mean

signed and unsigned error shows the average error for both signed and unsigned values,

respectively. The root means squared error shows the spread of errors. There are possibilities for

the MSE in radii definitions to be acceptable, however since errors can be scattered widely in

both negative and positive values, the balance could deceive the magnitude of error by making it

look as if the average error is close to zero. It can be determined if the values are truly closer to

zero with MUE and RMSD values.

When choosing which radii combinations have small errors between experimental and

calculated solvation energy values for molecules, there are certain boundaries in errors for

neutral and ionic molecules that may not be accepted if passed, also known as fails. After

looking at the errors and looking at decisions from previous research 3,12, the range of ≥5 kcal/mol

for neutral error and ≥10 kcal/mol for ion error. The ionic error being greater than the error for

neutrals is reasonable because of the greater ∆solvG values on the ionic molecules. These

calculated values were then plotted with the experimental values to see how close the calculated

values were to the experimental values.


xxi

Results and Discussion

A goal of this research was to narrow down the range of solvation radii for each atom in

the database so that more unique atoms do not have to be searched in too large of a solvation

radii range, as well as generating more radii definitions within that range by lowering the

increment change. To do this, acceptable radii definitions consisting of all the molecules

analyzed must have an MSE value within ±1 kcal. This shrunk the search range to 82 remaining

out of the 420 radii definitions. Furthermore, radii definitions with MUE values greater than ±3

kcal were excluded. Finally, radii definitions with an RMSD value greater than ±4 kcal were

excluded. These decisions are based on previous research from Haskin, allowing solvation radii

trends to be seen more easily using the same error ranges. This left 36 out of the 82 radii

definitions that were acceptable.

However, it must be accounted for that, compared to the previous three default radii

definitions, radii definitions must represent the physical reality of atoms in molecules. Atoms

with a bigger atomic radius would have a bigger solvation radius compared to atoms with a

smaller atomic radius to better replicate these phenomena. It is also known that charged
xxii

molecules need a smaller cavity to be able to interact with the solvent. This means that atoms

like oxygen and nitrogen cannot be close in size to the size of a hydrogen atom, and oxygen

cannot be greater in size than nitrogen or carbon. In order, carbon will have the biggest solvation

radii. Following that would be either an equal size between nitrogen and oxygen, or nitrogen

would be slightly bigger than oxygen in terms of atomic radii. Lastly, hydrogen has the smallest

atomic radii, also having the smallest solvation radii. Therefore, any radii definitions that did not

Table 5. The resulting 9 radii definitions for all solutes. These radii definitions listed are the result of
excluding radii definitions that were not within ± 1 MSE, ± 3 MUE, and ±4 RMSD, as well as excluding
radii definitions that did not follow the rules of radii size corresponding to atoms.

Radii Total
Errors Failures
Combinations Failures
C H O N MSE MUE RMSD Neutral Cation Anion /565
1.7 1.1 1.4 1.5 -0.90 2.33 3.26 36 4 1 41
1.7 1.1 1.4 1.6 -0.66 2.22 3.10 34 4 1 39
1.8 1.1 1.4 1.5 -0.68 2.11 3.10 31 5 1 37
1.8 1.1 1.4 1.6 -0.47 2.04 2.98 29 4 1 34
1.9 1.1 1.4 1.5 -0.10 1.99 3.09 20 6 2 28
1.9 1.1 1.4 1.6 0.09 1.98 3.01 19 6 2 27
2.0 1.0 1.4 1.5 -0.80 2.39 3.44 28 7 5 40
2.0 1.0 1.4 1.6 -0.58 2.30 3.28 27 6 5 38
2.0 1.1 1.4 1.5 0.83 2.22 3.41 24 7 7 38
represent this reality were excluded from the analysis. This resulted in 9 radii combinations that

were acceptable out of the 36 radii definitions (Table 5).

Error analysis is not enough to determine which radii combinations are most optimal for

the 565 molecules. Some radii combinations are more favored toward neutral molecules while

others favor charged molecules and therefore a trend of over-solvated or under-solvated

molecules, or fails, can be seen as the atom solvation radii size changes. Following previous

research, a neutral molecule with an error greater than ±5 kcal/mol is considered a failure while

cations and anions with an error greater than ±10 kcal/mol are considered failures. Among the
xxiii

trends that were seen, neutral molecules had fewer failures when carbon’s solvation radii

increased in size (>1.7) and oxygen increased in size (>1.4). Cations had fewer failures when

carbon is greater in size (>1.8) and oxygen is also greater in size (≤1.6). Anions had the least

number of failures when oxygen’s radii were at 1.4 angstroms (Table 5). When comparing these

solvation radii to the existing radii definitions that Sloan tested, we see that carbon’s solvation

radii is, on average, closer to the radii of UFF. No existing radii definition could match the most

common solvation radii for hydrogen shown (1.1 angstroms). For oxygen’s most common

solvation radii shown, Pauling was easily comparable with the atomic radii being 1.4 angstroms

in size. Finally, both Pauling and Bondi’s radii, 1.5 and 1.55 angstroms respectively, were

comparable to the trends shown for nitrogen’s.


xxiv

When looking at which molecules failed consistently within the 9 new solvation radii
Gsolv exp Average Error
Solutes Charge Amount Failed
(kcal/mol) (kcal/mol)
[2-benzhydryloxyethyl]-dimethylamine 0 -9.34 9 8.33
1,2-dinitroxypropane 0 -5.00 9 -6.76
1,3-bis-[nitrooxy]butane 0 -4.29 9 -7.44
1,4,5,8-tetraminoanthraquinone 0 -8.90 8 -9.49
1-acetoxyethylacetate 0 -4.97 3 -5.89
1-amino-4-hydroxy-9,10-
0 -9.53 8 -9.87
anthracenedione
1-methyl-3-nitrobenzene 0 -3.45 3 -5.28
1-methylthymine 0 -10.40 4 -5.94
1-nitrobutane 0 -3.08 3 -5.15
1-nitropropane 0 -3.34 6 -5.25
2-methoxyphenol 0 -5.94 4 -5.44
2-nitrophenol 0 -4.58 9 -10.50
3-nitrooxypropyl nitrate 0 -4.80 9 -7.99
Amitriptyline 0 -7.43 9 8.72
Cyanuric acid 0 -18.06 4 -5.81
Dicyandiamide 0 -10.95 9 -11.66
Dinitrogen tetroxide 0 -2.14 9 -11.47
Dinoseb 0 -6.20 9 -9.65
Fenbufen 0 -12.75 4 -6.02
Glycerol triacetate 0 -8.84 4 -5.53
Isobutyl formate 0 -2.22 8 -6.29
Isopropyl formate 0 -2.02 8 -7.04
N,N-dimethylpiperazine 0 -7.58 5 -7.04
Nitroethane 0 -3.71 5 -5.76
Nitroglycol 0 -5.70 9 -6.57
Nitromethane 0 -3.95 7 -5.91
Nitroxyacetone 0 -6.00 3 -5.74
Peracetic acid 0 -5.88 8 -7.11
Propyl formate 0 -2.48 8 -6.66
Trimethoxymethane 0 -4.42 4 -5.49
Urea 0 -13.80 2 -5.45
Water 0 -6.31 2 -5.55
4-nitroaniline 1 -75.90 5 -11.76
Diethyl ether 1 -71.50 5 11.27
Dimethyl ether 1 -79.50 7 12.02
Ethanol (cation) 1 -88.40 9 12.22
Hydronium 1 -110.30 9 13.27
Methanol (cation) 1 -93.00 9 12.12
Benzyl alcohol -1 -85.10 9 12.24
Ethanol (anion) -1 -90.70 3 11.63
Methanol (anion) -1 -95.00 5 11.94
combinations, most were neutral solutes. Neutrals were too under-solvated, showing average
Table 6. Solutes that consistently failed within the 9 new solvation radii definitions. Solutes are
sorted by charge (neutrals, cations, and then anions), with their respective experimental solvation error
energy. Each solute displays an average solvation energy error in the combinations they failed in.
xxv

values greater than -5 kcal/mol. The calculated solvation energies were way too positive and

therefore the difference between the experimental and calculated values were negative.

Meanwhile, most cations and all anions that failed consistently in these 9 new solvation radii

combinations had too positive average errors. This means that cations and neutrals had calculated

values greater than the experimental, hence giving a positive value meaning they were over-

solvated. Between which solutes failed more than half of the radii combinations compared to the

solutes that failed in less than half, trends can be difficult to spot. If the solute has complex

structures, such as containing both nitrogen and oxygen atoms, the molecule seems to fail more

in these 9 radii combinations. There doesn’t seem to be a trend on how big the solute is

compared to others, and if it contains aromatic rings. For example, [2-benzhydryloxyethyl]-

dimethylamine is a large solute containing two aryl groups. Comparing this solute to Fenbufen,

containing two aryl groups and a hydroxyl group, while also being just as large, it has twice the

amount of failures. However, the solutes that seem to fail more contain more than one oxygen

and nitrogen atoms, with hydroxyl or amino groups at the end. These solutes also tend to have

double or even triple bonds between oxygen and nitrogen atoms. This makes sense as the radii

definition has to account for both atoms, which tend to have difficulty having a small enough

solvation radii size.

Increasing the solvation radii size of oxygen drastically increases the number of failures

that anions produce. The solvation radii of oxygen must be at 1.4 or lower to have minimal fails

for anions. However, oxygen’s radii solvation range was only analyzed between 1.4 and 1.7.

Cations and anions cannot perform well together at the same radii definition, as increasing

oxygen’s and carbon’s solvation radii works well for cations but drastically increases the number

of fails for anions. However, with 48 cations and 42 anions, having a subtle increase in cation
xxvi

fails to reduce the number of anions failing is exceptional. It is not exceptional to consider an

increase in fails for cations and anions to decrease the number of fails of neutral molecules as

there are over 5 times as many neutral molecules than charged molecules, heavily biasing error

values towards neutrals.

Looking at specific radii combinations using a 45-degree plot, we can see the trends of

how under-solvated or over-solvated a solute


Carbon 1.7, Hydrogen 1.0, Oxygen 1.4,
Nitrogen 1.6
is in this combination depending on how far 20

Calculated Solvation Energy (kcal/mol)


0
the dots are from the expected value, or the -120 -100 -80 -60 -40 -20 0 20
-20
trendline (Figure 4). A more positive
-40
calculated solvation energy compared to the
-60
experimental means that the solute is under-
-80
solvated at this combination and vice versa.
-100

For this specific combination, neutrals are -120


Experimental Solvation Energy (kcal/mol)

Figure 4.seen to have


Comparison over-solvated
of Gibb's solvation
experimental vs. calculated
solvation energy for solutes in radii combination for carbon
energies,
1.7, hydrogen 1.1, meaning
oxygen 1.4,they are too 1.6.
and nitrogen negative
Neutrals
(blue) are overall over-solvated (too negative), cations
(grey) are a little over-solvated,
compared and anions
to the experimental (yellow)
value. Thisare
overall under-solvated (too positive) compared to
experimental values.
makes sense as neutrals need to have

bigger solvation radii sizes due to having no charge. On the other hand, cation solutes seem to be

better at replicating experimental solvation energies at this definition than neutral solutes.

Anions, however, are overall too under-solvated at this combination.


xxvii

Now looking at a different radii combination, larger compared to the previous

combination, many trends can be found (Figure 5). Looking at each subgroup of solutes, neutrals

have shifted towards the under-solvated


Carbon 1.9, Hydrogen 1.1,
Oxygen 1.4, Nitrogen 1.6
20 side of the trendline. This makes sense as
Calculated Solvation Energy (kcal/mol)

0 the previous solvation radii that was being


-120 -100 -80 -60 -40 -20 0 20

-20 analyzed was too small for neutrals and

-40 therefore had much closer interaction with

-60 the solvent, leading to over-solvated


-80
effects. Now this radii combination is about
-100
in the right spot to be not too over-solvated
-120
Experimental Solvation Energy (kcal/mol) and too under-solvated. This is similarly
Figure 5. Comparison of Gibb's experimental vs. calculated seen with the cation solutes as well.
solvation energy for solutes in radii combination for carbon
1.9, hydrogen 1.1, oxygen 1.4, and nitrogen 1.6. Neutrals (blue)
are overall in between the over and under-solvated sides, Anion solutes, on the other hand, had
cations (grey) are overall under-solvated, and anions (yellow)
are mostly under-solvated. shifted to being more over-solvated

compared to the previous radii combination, also being in the spot where calculated solvation

energies are similar to experimental solvation energies.

With 9 acceptable radii definitions for Radii


Combinations
Errors
all 565 solutes known at these solvation radii C H O N MSE MUE RMSD
1.7 1.1 1.4 1.5 0.15 2.96 4.59
ranges, it is also important to look at which 1.7 1.1 1.4 1.6 0.77 2.55 4.34
1.7 1.1 1.5 1.6 0.91 2.62 4.54
radii definitions work best for each specific Table 7. Resulting 5 radii definitions for anion solutes.
Definitions found within ±1 MSE, ±3 MUE, and ±5
molecule type. To fulfill the goal of finding radii RMSD.

combinations for each subgroup of molecules, cations and anions need to be analyzed

individually. With cation and anion solutes, there were no radii combinations that were within
xxviii

error statistical standards and didn’t have nonideal atom radii compared to its neighboring unique

atoms. However, with values slightly higher than ±4 kcal/mol RMSD, there were few radii

definitions for cation solutes (Table 7). Looking at which radii definitions were most optimal for

cations; hydrogen’s solvation radii size must be 1.1 angstroms to be close within the statistical

error range. Something that wasn’t seen in the radii definitions representing all solutes is that

oxygen’s radii is seen at 1.5 angstroms. Finally, we see common radii definitions seen from the

definitions for all 565 solutes, with carbon at 1.4 angstroms, hydrogen at 1.1 angstroms, oxygen

at 1.4 angstroms, and nitrogen at 1.5 and 1.6 angstroms. In these definitions, we see that only 4

out of the 48 cations had failed. Radii


Errors
Combinations
For anion solutes, it was greatly C H O N MSE MUE RMSD
1.7 1.0 1.4 1.5 0.84 3.64 4.26
difficult to find definitions within acceptable 1.7 1.0 1.4 1.6 1.11 3.46 4.13
1.7 1.1 1.4 1.5 1.91 3.86 4.71
error ranges compared to cation solutes. 1.8 1.0 1.4 1.5 1.60 3.74 4.43
1.8 1.0 1.4 1.6 1.86 3.58 4.37
Looking at which definitions were most Table 8. Resulting 5 radii definitions for anion solutes.
No definitions found within ±1 MSE, ±3 MUE, and ±4
optimal for anions; MSE values had to be within RMSD. However, definitions were found with ±2 MSE,
±4 MUE, and ±5 RMSD.
±2 kcal/mol, MUE within ±4 kcal/mol, and RMSD within ±5 kcal/mol (Table 8). Carbon’s radii

were only seen on the smaller side for these definitions. An interesting find is that hydrogen’s

radii was mostly at 1.0 angstroms, but that is to be expected for charged molecules as they need

smaller radii sizes to be able to interact with the solvent more strongly. Oxygen’s radii lied at the

typical size of 1.4 angstroms as seen as a common trend in the radii definition concerning all 565

solutes. Out of the 9 radii definitions found for all solutes, only one radii definition for anion

solutes matched it, with carbon at 1.7 angstroms, hydrogen at 1.1 angstroms, oxygen at 1.4

angstroms, and nitrogen at 1.5 angstroms.


xxix

Finally, for neutral solutes, Radii Combinations Errors


C H O N MSE MUE RMSD
there were 21 radii definitions that 1.7 1.1 1.5 1.6 -0.43 1.70 2.32
1.8 1.1 1.4 1.6 -0.98 1.80 2.48
were within the acceptable error range 1.8 1.1 1.5 1.6 -0.37 1.44 2.06
1.8 1.1 1.5 1.7 -0.19 1.42 2.00
1.8 1.1 1.6 1.7 0.46 1.36 1.90
and represented an ideal radii size
1.9 1.0 1.7 1.8 -0.32 1.22 1.65
1.9 1.1 1.4 1.5 -0.71 1.61 2.36
combination (Table 9). Looking at
1.9 1.1 1.4 1.6 -0.54 1.57 2.23
1.9 1.1 1.5 1.6 0.02 1.34 1.94
which definitions were most optimal 1.9 1.1 1.5 1.7 0.20 1.36 1.92
1.9 1.1 1.6 1.7 0.82 1.43 1.96
for neutrals; carbon’s solvation radii 2.0 1.0 1.5 1.6 -0.64 1.51 2.11
2.0 1.0 1.5 1.7 -0.45 1.47 2.02
was seen at all sizes within its range 2.0 1.0 1.6 1.7 0.16 1.26 1.80
2.0 1.0 1.6 1.8 0.37 1.32 1.85
(1.7 - 2.1 angstroms). While more than 2.0 1.1 1.4 1.5 0.11 1.65 2.33
2.0 1.1 1.4 1.6 0.27 1.64 2.25
half of the radii combinations had 2.0 1.1 1.5 1.6 0.80 1.62 2.20
2.0 1.1 1.5 1.7 0.96 1.66 2.23
hydrogen’s solvation radius at 1.1 Table 9. Resulting 21 radii combinations for neutral solutes. All
combinations found were within 1 MSE, 3 MUE, and 4 RMSD.
angstroms, there were a considerable amount (7 combinations) which had a radius of 1.0

angstroms. However, as seen on the table, most of it has to do with the drastic increase in

oxygen’s and nitrogen’s solvation radii size. When oxygen was at 1.4 angstroms, most of the

definitions had hydrogen’s solvation radii at 1.1 angstroms. There are not many combinations

with oxygen’s radii at 1.4 angstroms, with the exception of hydrogen being at 1.1 angstroms.

There is a visible trend seen across the table where the most minimal errors are seen in the

biggest of radii combinations. This supports the claim that neutrals tend to need bigger solvation

radii than ions due to charge.

Comparing these combinations to others from different subgroups gives multiple

narratives. Neutrals have multiple times more combinations with oxygen being greater than 1.4

angstroms in solvation radii than cations and anions. This also leads to seeing more combinations

where nitrogen is bigger than 1.6 angstroms in its solvation radii. There is a wider range of
xxx

carbon solvation radii sizes in neutrals compared to cations and anions. Finally, bigger radii

combinations for neutral solutes tend to perform better with neutrals and cations (only when

nitrogen’s solvation radii increase), while the opposite can be said for anions.

Conclusion

Solutions are crucial for biochemical processes to function and therefore it is important to

improve the understanding of solvation through computational modeling. The purpose of this

research was to generate new solvation radii for the CPCM solvation model. In this research,

nitrogen-containing molecules were additionally explored in radii definitions. This research

obtained 565 solutes containing carbon, hydrogen, oxygen, and nitrogen, with 475 neutral

molecules, 48 cation molecules, and 42 anion molecules being analyzed. These new solvation

radii definitions considered previous radii ranges as well as the atoms’ radii size.

There were 420 radii definitions to be analyzed for the performance of replicating

experimental solvation energies. Some radii definitions were optimal for neutral molecules while

others were optimal for ion molecules. Ultimately, 9 new solvation radii definitions were found

to be acceptable for all molecules analyzed in this research. All 9 radii definitions had MSE

values of ±1 kcal/mol, MUE values of ±3 kcal/mol, and RMSD values of ±4 kcal/mol. Results

show when increasing oxygen’s radii size, neutral molecules failed less. Decreasing oxygen’s

radii size did the contrary. There were enough anions and cations to be analyzed in this research

to see trends in molecular failures. With hydrogen’s radii size being 1.0 and also increasing

oxygen’s radii size showed better performance for cations. Decreasing oxygen’s radii size

improved the performance of anions greatly.

From what was gathered in the results, searching at a smaller minimum solvation radius

value for oxygen can be predicted to have significant results in decreasing anion failures and
xxxi

overall improving statistical errors. It is not worth changing the range of carbon or hydrogen as it

greatly affects the number of neutral failures while also not providing much improvement for

ions. Carbon’s radii range is wide enough to analyze the effect of the decrease of oxygen’s radii

size to 1.3 angstroms. Having nitrogen’s radii size any higher than 1.6 angstroms would require

oxygen’s radii size to be at least 1.5 angstroms or greater to be within 0.2 angstroms of

nitrogen’s radii size. Since having a radii size of 1.5 angstroms or higher for oxygen makes

anions perform worse, searching higher radii values for nitrogen should not be worth

considering.

Extensive research in this field would be to add one additional atom at a time such as

phosphorus or sulfur as they are also common in biological molecules and help function

biochemical processes. Adding one atom at a time could help understand the solvation trends

when molecules become bigger or more complex. This also allows for replicating solvation

energies for molecules that are too long to experiment on or dangerous to analyze. To expand

more on research for these new radii definitions, using these suggestions, it could also prove

beneficial to change the atom’s solvation radii in smaller increments (such as 0.05 angstroms)

while generating radii combinations, as default radii definitions were specific for the solvation

radii size of atoms. Using smaller increments in radii changes can drastically increase the

number of combinations generated, so shortening the solvation radii range for each atom is

crucial.

To narrow down the next search for solutes containing nitrogen, carbon’s solvation radii

range should stay the same (1.7 – 2.1 angstroms), similarly to hydrogen (1.0 – 1.2 angstroms),

decreasing oxygen’s minimum and maximum solvation radii size down (1.3 – 1.4 angstroms),

and decreasing nitrogen’s maximum radii size (1.4 – 1.6) (Table 5).

Atom C H O N
Solvation Radii
1.7 – 2.1 1.0 – 1.2 1.3 – 1.4 1.4 – 1.6
(angstroms)
xxxii

There needs to be a safe range for carbon’s solvation radii size as it is shown to be performing

well for solutes in almost every size, so not changing the solvation radii range for carbon would

be ideal. Hydrogen’s radii range will remain consistent to make sure decreasing the radii size for

oxygen does not significantly increase neutral failures. This means that if oxygen’s radii were to

be at 1.3 angstroms, we may see greater values for carbon’s and hydrogen’s solvation radii size

so it is important not to exclude sizes where it might be present in radii definitions. While most

cation solutes were introduced by adding nitrogen into the set of atoms, changing the solvation

radii for nitrogen did not affect the number of failures for these ions, and since most radii

definitions consisted of nitrogen being at 1.5 to 1.6 angstroms, that will be the recommended

range.
xxxiii

References

(1) Cramer, C. J.; Truhlar, D. G. Implicit Solvation Models: Equilibria, Structure,


Spectra, and Dynamics. Chem. Rev. 1999, 99 (8), 2161–2200.
https://doi.org/10.1021/cr960149m.

(2) Adam Moser. Charge-Dependent Radii for Improved Treatment of Solvation Effects
in Quantum Mechanical Calculations, University of Minnesota, 2004.

(3) Sloan Neuzil. Benchmarking Boundary Definitions of the CPCM Implicit Solvation
Model, Loras College, Dubuque, IA, 2019.

(4) Kelly, C. P.; Cramer, C. J.; Truhlar, D. G. SM6: A Density Functional Theory
Continuum Solvation Model for Calculating Aqueous Solvation Free Energies of
Neutrals, Ions, and Solute−Water Clusters. J. Chem. Theory Comput. 2005, 1 (6),
1133–1152. https://doi.org/10.1021/ct050164b.

(5) Marenich, A. V.; Cramer, C. J.; Truhlar, D. G. Universal Solvation Model Based on
Solute Electron Density and on a Continuum Model of the Solvent Defined by the
Bulk Dielectric Constant and Atomic Surface Tensions. J. Phys. Chem. B 2009, 113
(18), 6378–6396. https://doi.org/10.1021/jp810292n.

(6) Mobley, D. L.; Guthrie, J. P. FreeSolv: A Database of Experimental and Calculated


Hydration Free Energies, with Input Files. J Comput Aided Mol Des 2014, 28 (7),
711–720. https://doi.org/10.1007/s10822-014-9747-x.

(7) Rolf Sander. Henry’s Law Constants. In NIST Chemistry WebBook, NIST Standard
Reference Database Number 69, Eds. P.J. Linstrom and W.G. Mallard; National
Institute of Standards and Technology: Gaithersburd, MD, 20899.
xxxiv

(8) Sander, R. Compilation of Henry’s Law Constants (Version 4.0) for Water as
Solvent. Atmos. Chem. Phys. 2015, 15 (8), 4399–4981. https://doi.org/10.5194/acp-
15-4399-2015.

(9) Gupta, M.; da Silva, E. F.; Svendsen, H. F. Explicit Solvation Shell Model and
Continuum Solvation Models for Solvation Energy and p K a Determination of Amino
Acids. J. Chem. Theory Comput. 2013, 9 (11), 5021–5037.
https://doi.org/10.1021/ct400459y.

(10) Zhang, B. W.; Matubayasi, N.; Levy, R. M. Cavity Particle in Aqueous Solution with
a Hydrophobic Solute: Structure, Energetics, and Functionals. J Phys Chem B 2020,
124 (25), 5220–5237. https://doi.org/10.1021/acs.jpcb.0c02721.

(11) Takano, Y.; Houk, K. N. Benchmarking the Conductor-like Polarizable Continuum


Model (CPCM) for Aqueous Solvation Free Energies of Neutral and Ionic Organic
Molecules. J. Chem. Theory Comput. 2005, 1 (1), 70–77.
https://doi.org/10.1021/ct049977a.

(12) Haskin, Nicholas. Research Thesis. 2023.

(13) Hoefer, Emma. Research Thesis. 2023.


xxxv

Appendixes

Appendix A. Example of coordinate map with variables chosen for aqueous


energy calculations of Urea.
xxxvi

You might also like