UNIVERSITY OF SOUTHAMPTON

Simulation Studies of the Structure and Energetics of a Host-Guest System

Richard Humfry Henchman

A dissertation submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy at the University of Southampton.

Department of Chemistry December 1999

UNIVERSITY OF SOUTHAMPTON ABSTRACT FACULTY OF SCIENCE Doctor of Philosophy SIMULATION STUDIES OF THE STRUCTURE AND ENERGETICS OF A HOST-GUEST SYSTEM by Richard Humfry Henchman Computer simulations are used to understand the binding behaviour of a number of amino acid derivatives in macrobicycle 12 in chloroform. Previous experimental work on this system indicated that macrobicycle 12, as well as being enantioselective for l amino acid derivatives, was able to stabilise the amide bond of the amino acid derivatives in the cis conformation. Monte Carlo (MC) simulations and free energy perturbation (FEP) calculations were able to successfully reproduce the observed behaviour. A detailed analysis was performed to rationalise the selectivity. In the course of this work, a methodology was developed that made feasible the simulations on the macrobicycle 12 system. The development of a novel MC sampling procedure and the replacement of explicit solvent by the GB/SA continuum model were found necessary to carry out realistic simulations. A new charge derivation called REPD was developed to produce OPLS-like charges by fitting to the molecular electrostatic potential. Free energies of hydration were calculated to test both REPD charges and the relative performance of the FEP and the linear interaction free energy methods.

i

Acknowledgments
The first person I am most deeply indebted to is my supervisor Jonathan Essex for his guidance, advice, friendliness, encouragement, tact, availability – all the qualities one would want in a supervisor. Next I must express my gratitude to the Commonwealth Scholarship Commission for funding my Ph.D. and that long list of people at the British Council who always used to send me random amounts of free money once in a while. I am lucky to have a long list of other people to thank. First of all is Lewis who was very helpful to me, particularly at the start when I was settling down to live in strange new land. Then there’s my Ph.D. “brother” Ian, with whom I shared all the excitement of each stage of the degree. Discussions with Ian were always very useful, and I must thank him especially for doing the statistical analysis on my free energy data using the LIE method. Andrew “Jif” Lemon’s help was invaluable to me in coming to terms with Unix, good old awk and the programs that wrote programs, without which this thesis could not physically have been completed. When Jif left it was Steve who principally helped me out in this area and moved me onto perl. Rich T’s help with the GB/SA model was particularly useful, especially given that most of it was done over the telephone. I should also mention Christopher’s help in running the PB calculations and for spellchecking his own name in the thesis. I must also thank Oz for maintaining the computers, Ian for helping me run simulations on the University machines, and Ed and Oliver for discussions about ab initio work. Being in the small room towards the end, Tim and I had a number of insightful conversations, some related to work. Julen also deserves a mention for some organic nomenclature. The assistance of Rob, Adrian Hickford, and others previously mentioned in proofreading was also of great assistance when my ability to pick up mistakes was stronly diminishing. I would also like to thank all my other friends and group members, too many to mention here, who made sure I enjoyed myself even on the (rare) occasions when work did not. Finally, I am most grateful to my family for allowing me to spend three years away from home to do this work, and for giving me all the support. ii

Contents
1 Introduction 1.1 1.2 1.3 1.4 Molecular Association. . . . . . . . . . . . . . . . . . . . . . . . . . . The Macrobicycle 12 Host-Guest System. Importance of the Host Binding Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 3 5 7 7 7 9 10 12 14 15 17 19 19 19 21 22 22 24 25 25

Aim of This Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Simulation and Free Energy Methods 2.1 Simulations Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 2.1.2 2.1.3 2.1.4 2.1.5 2.1.6 2.1.7 2.2 The Role of Computer Simulations. . . . . . . . . . . . . . . . Representation of the System. . . . . . . . . . . . . . . . . . . The OPLS Force Field. . . . . . . . . . . . . . . . . . . . . . . The System to be Modelled and Other Approximations. . . .

Molecular Dynamics Simulations. . . . . . . . . . . . . . . . . Monte Carlo Simulations. . . . . . . . . . . . . . . . . . . . . Molecular Dynamics Versus Monte Carlo. . . . . . . . . . . . .

Free Energy Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 2.2.2 2.2.3 2.2.4 2.2.5 2.2.6 The Problem of Calculating Free Energies. . . . . . . . . . . . Free Energy Perturbation. . . . . . . . . . . . . . . . . . . . . Thermodynamic Integration. . . . . . . . . . . . . . . . . . . . Difficulties With Free Energy Methods. . . . . . . . . . . . . . Fast Free Energy Methods. . . . . . . . . . . . . . . . . . . . . Choice of Free Energy Method. . . . . . . . . . . . . . . . . .

2.3

Applications of Free Energies Methods. . . . . . . . . . . . . . . . . . 2.3.1 Problems Calculating Free Energies of Binding. . . . . . . . . iii

. . . . Partition Coefficients. . . . . . . . . . Effect of Basis Set. . 3 Setup of the Host-Guest System 3. . . . . . . . . . . . . . . . . . . 3. . . . . . . . . Missing Parameters from the OPLS Force Field. . . . .3 3. . . . . . . . . . . . . . . . . .3 Dihedral Parameterisation. . . . . Previous Studies on Host-Guest Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . Residues and Their Application to MC Moves. .1 4.4. .1. . . . . 3. . . . . . . . . . . . . . . .2. . . . . . 3. . . . . . . . . . . . . . . . . . . . . . . . .2 3. . . Fitting a Fourier Series to the Energy Profile. . . . . . . iv 25 26 27 29 30 31 33 33 33 35 37 37 38 38 39 41 41 43 44 46 46 47 48 50 53 54 54 54 56 Conclusion. . . . Free Energies of Solvation. . . . . . . . . . . . . . .1. . . . . . . . . . . . . .2 3. .3.5 3. . . . . . . . 3. . . . Relative Partition Coefficients. . Absolute Free Energies of Binding. . . . . . .6 Simulation Code Customisation and Optimisation. . . Parameterisation Complications. . .4 Relative Free Energies of Binding. . . . . . . . . . . . .3. . .3. . . . . . . . . .1 3. . . . . . . . . . . .4 Structural Setup. . . . .3.1. . . . .2. . . . . . . . . . 3. . . . . . .3 2. . . . . . . . . . . . .4. . . . . . . . . . .1. . . .2 2. . . Ab Initio Method and Geometry. . . . . . . . . . . . . .2. . . . . . . . . . .1 Partial Charges in Force Fields. . . . .1 3. .2 3. . . . .5 2. . . . . . . . . .1 3. . . . . . . . . .2 The Use of Charges and Methods to Derive Them. . . . . . . . . . . . . . 4 Partial Charge Methods 4. . . . Residue Definitions. . . . . . . .3 Calculation of Ab Initio Energy Profile. . . . . . . . . . . . . . . . . . . . .1 3. . . . . . . . . . . . . . . . . .4 REPD Charges. . . .4 2. . Conclusion. . . . . . .6 2.2 . . . . . . . .3. . .3.2 3. . . . . . . . . . . .4. . . . . .3. . . . . . . . .CONTENTS 2. . . . . . . . . .1 Force Field 3. . . . . . . . . . Transferable Parameters. . . . . . . Charge Derivation. . . . . . . . . . 3. . . . .3 The Z-matrix. . . . Free Energy Protocol. . . . . .3. . . . . . . 4. . . . . Advantages and Disadvantages of OPLS Charges. .

. Particular Discrepancies with Experiment. . . . Ab Initio Method and Geometry. .1. . . . . . . . . Fitting Point Method. . . . . Results. . . . . . . . . . . . . . 4. . Basis Set and Geometry. . . . . . . . Charge Averaging. . . .2 5. . . . . . . . . . . .7 4. . . 5 Testing of REPD Charges by FEP and LIE 5. . . .3 The REPD Charge Method. . . . . .1. . . . . . .1 5. . . . . Basis Set. . .3 Form of the LIE Equation. . . . . . . . . . . . . . . . .3. LIE Protocol. . . . . . . . . . . . . . . . . Comparison with EPD and OPLS Charges. . . Choice of Atoms to Restrain. . . . .1. . . . 5. .2 4. . . . . . . . . . . . . . .4 4. . . Simulation Protocol. . . . . . . . . . . . . . . . . .1. . .2. . . . Derivation of the LIE Parameters. . . LIE Free Energies of Hydration. . . . . . . . . . . . . . . .3. . . . . . . . . . . . . . .2. . . . . . . . . .2. . . . Conformational Dependence. . . . . . . . . . . . . . . . . . . . . . . .CONTENTS 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 56 58 58 60 61 64 65 66 69 70 70 73 73 74 74 78 80 81 81 81 82 84 85 87 90 92 92 93 94 Development of the REPD Charge Method. . .3 4. . . . .2. . . . . . . . . . . . . .1.2 4. . .4 5. .1. . . . . . . .2. . . . Independence of the New Restraint On Point Selection. . . . . . . . . . A New Restraining Function. Selection of Mutations. . . . .1. . . Effect of Restraint. . . . . . 4. . . . . . .2. . . .3.3 4.3.8 4. . Multipolar Constraints. . . . . . .1 5. . . 4. . . . . .5 5. . . . . . . . . . . . . . . . . . . . . . 5. .2 Advantages and Disadvantages of EPD Charges. .9 The EPD Charge Method. . . . . . . Charge Restraining. . . . . . .2. .3 5.4 Conclusion. . . . . .2. . . . . Influence of Molecule Set on the Parameterisation.6 5. . . . . . .2. . .1 FEP Free Energies of Hydration. . . .3 4. . . . . . . . . . . . . . . . . .5 4. . . .2 5. . .1 4. . . . . . . . . . . . . . . . . . . . .6 4. . 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.1 4. . . . . . . . . . .2. . . . . . . . .2 The Molecule Test Set. . . . . . . . . . . . . . .4 Summary of the Method. . .

. . . . . 105 MLR Versus CR. . .3. . . . . . . . .2. . . . . . . .1 Identification of Sampling Problem. . .3. .3 Performance of LIE Free Energies.4 6. . . . . . . . . .1. . 113 Sampling From Simulations. . . . . . . 110 Analysis of Annealed Structures. . . . . . . . . . . . . . . . .3. . . . . . . . .5 6. . . . . . . . . . . 125 Acceptance Probability of the Conrot Move. . . . . 117 More Sophisticated MC Moves. .3. . .2. . . . . . . . . . 128 The Large Dihedral Move.3 6.2. . . . .3 6. . .3. . . . . . . . . . . . . . . . . . . . . . . 102 Biased Regression Methods. . . . . . . . . . . 102 Correlation Analysis. . . . . . . . . . 116 Biased Sampling Methods. . . 121 6. . . . . .5 5. 100 Analysis of the LIE Method. vi 96 99 Alternative LIE Functions. . 116 6. .3 Additional MC Moves.4 5.3. .2. . . . . . . . . . . . . . . . .3. . . . . . . . . . . .2 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 The Flip Move. . . . . . . . . . . . .4 Methods to Improve MC Acceptance. . . . . . . . . .6 Motivation for the Analysis. .1. . . . . . . . . 119 6.5 5. . . . . . . . . . . .2 5. . . . . . . . . . . . . .3 Generation of Possible Host-Guest Structures. .3. .2 Approaches to Improve Sampling. . . . . . . . .4 5. . . . . . . . . . . . 104 The Most Predictive Model. . .3 5. .6 6. . . . . . . . . . . . . . . . . . . . . . . 107 5. . . . . . . . . . . . . Overfitting to the Data. . . . . . . . . . . .4 Conclusion. . . . . . . . . . . . 105 The Significance of The Electrostatic Term. . . . . . . . . . . . . . . . . . . 102 5. . . . 109 110 6 Methods to Improve Monte Carlo Sampling 6.1 5. . . 114 6. . .3. 123 Application of the Conrot Move to Macrobicycle 12. .2 6. . . . . . . . . .7 The Conrot Move. . . . .1. . . . .3. . . .2. . . . . . . . . . . . . 126 Variations of the Conrot Move.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 6. . . .3.2. . . 121 Implementation and Testing of the Conrot Move. . . . . . .CONTENTS 5. . . . . . . 118 Adoption of Methods to Improve Sampling. . . . . . . . . .1 6.1 6. . . . . . . . . .2 6. . . . . .6 5. . . . . . . . . 129 . . . . . . .3.3. . . . . . . . . . . . . . . . . 110 6. . . . . .

. . . . . . . . . . . . . . . . . . . . .2 Explicit Solvent Free Energy Calculations. . . . . . . . . . . . . .4. . . . . . 158 Guest Free Energies in Continuum Chloroform. .2 6. . . . . . . . 144 Experimental Data. .3 6. . . . . . . . . . . . . . . . . . 135 Parameterisation to Experimental Free Energies. . . . . .1 Description of the Binding Site. . . .4 7. . .1 6. . . . . . . . . . .3. . .3 7.131 6. . . . . . . . . 134 Parameterisation to Poisson-Boltzmann Free Energies. . . . . . . . . . . . . . . . . . . . . . . .4. . .2. . . . . . . . . . . . . . . . .2 7. . . . .2. . . 158 7. . . . . . . . . . . . . . . . .3. . .4 vii Three Part Solute Move. . . . 143 144 7 Free Energy Calculations for Macrobicycle 12 7. . . . . . . . . . . 148 7. . . . . . . . . . . . 131 Requirements for GB/SA.5 The GB/SA Continuum Model. . .3 The Simulation System.4 6. . . . . . . . . 140 Conclusion. . . .1. . . . 149 7. . . .2 7. . . . . . .2. . .4. . . .5 6. . . .4. . . . . .1 The Macrobicycle 12 System. . . . . 137 Performance of the Derived Parameters. .2. . . . .1 7. . .3. . . . . . .2 7. . . . . . . . . . . . . . . .4.4 8 Conclusion . . . .3.6 Sampling of Macrobicycle 12 in Continuum Chloroform. . . . . . .1 7. . . . . . . . . . . . . . . . . . .6 Gas Phase Simulation Protocol. . . . . . . . .CONTENTS 6. 146 The Role of Computer Simulations. . . . . . . . . .1. . . . . . . . .8 6. . 156 Host-Guest Free Energies in Explicit Chloroform. 157 7. . . . .3 Continuum Chloroform Simulation Protocol. . 144 7. .1 7. . .2. 161 7. . . . 151 Window Spacing. 153 Guest Free Energies in the Gas Phase. . . . . . . . . 149 Explicit Chloroform Protocol. . . . . .1. 168 . . . . . . . . . . 155 Guest Free Energies in Explicit Chloroform. . . . . . . . . . . . . . . . .2. . . . . . . . . . . . .3 Continuum Solvent Free Energy Calculations. . . . . . . . . . . . . 166 167 Analysis of the Macrobicycle 12 System 8. . 130 Parameterisation and Implementation of the GB/SA Continuum model. . . . . 138 6. . . . . . .5 7. . . . 159 Host-Guest Free Energies in Continuum Chloroform. . . .

.6. . 187 Probing the Close Contacts for Different Guests. . . . . . . . .2 8. . . . . . . . . .1. . . 168 Guest Binding Features. . . . . .4 8. . . 215 8. . . . . . . . . . . . . . . . . . 187 8.3 8. . . . . . . .2 Hydrogen Bond Patterns. . . . . . . 179 Hydrogen Bond Analysis. . . .2 Binding Motifs Observed in The Simulations. . .8 Conclusion. . . . . . . . . . . . . . . . . .3.4. . . . . . . . . . . . .4.3 8. . . . . . . . . . . .6 Conformational Analysis.1 8. . . . . . . . . . . . . . . 211 Connection Between Binding Free Energies and Motifs. . . . . . . . . . . . . .2 8. . . 194 Interpretation of the Energies. . . 181 Interpretation of Hydrogen Bond Patterns. . . . . . . . . . . . . . . . . . . 191 8. . . 197 Hydrocarbon Chain Conformation.2 8. . .7.2 Energy Components. . . 207 8. . . . . . . .1. . . 198 Dominant Hydrocarbon Subconformations. . . . . . . .1 8. . . . . . . . . . . . . . . . . . 202 Phenyl Ring Conformation of the Guest. . . 171 The V-Model For Binding. . . . . . .1 8. . . . . . . . . . . . . . . .1 8. . .1. . . . . . .4 Variation of Host Shape With Different Guests. . . . . . . . . . . . . .1. .3. . . . . . . . . . . . . . . . . . . . . . 217 218 222 226 9 Conclusion A Charges Bibliography . . . .5 Energy Analysis. . . . . . . . .1 8.6. . . . . . . . . . . . . . . .6. . . . . . . . . . . . . . . 211 8. . . . . . . . . .4 Steric Analysis.3 Extracting Meaningful Steric Information. . . . . .5.5. . . . . . . . . . . . . 197 8. . .1 8. . . . . . . . .4. . 170 Origins of Selectivity. . . . . . . . . . . . . . . . . . . . .2 8. . . . . . . . 194 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Guest Orientation. . . 195 8. 188 The Nature of the Close Contacts. .CONTENTS 8. . . . . . . . 181 8. . . . 183 8.3 viii Host Binding Features. . . . . .6. . . . . . . . . . .7. . . . . . .7 Rationalisation of Free Energies. . . . . . . . . . . . . . . .

They are studied for a number of reasons. or simply. structural change. rational advance in these areas must be accompanied by an understanding of the factors that control binding and its consequences. Firstly.Chapter 1 Introduction 1. Thirdly. reasonably accurate computer simulation studies of moderately sized host-guest systems have recently become of great use in the study of molecular binding. However. Secondly.1 Molecular Association. chemosensors and pollutant removal. Any systematic. binding. The study of molecular association is a problem of great interest in many areas of science. Nevertheless. they may serve as simpler prototypes of more complex systems that are both more feasible to computational study both for practical reasons and due to the reduced level of complexity. Computer simulation studies are able to provide much information about binding that is inaccessible to both experiment and analytical theory. host-guest systems may be of intrinsic interest in themselves. is a key mechanistic step for a wide range of processes including chemical reactions as well as their catalysis and inhibition. Molecular association. particularly so in chemistry and biology. limitations in computational power currently restrict their application and degree of complexity to rather small systems which can adversely affect the degree of realism desired. 1 .1–5 Host-guest systems typically comprise a large “host” molecule containing a specific binding site to which a “guest” molecule can bind. making it is easier to deduce cause and effect.

The host and three of these amino acid derivatives are illustrated in Figure 1. While the host was designed to bind the guest inside the cavity. The particular system that is the subject of this work satisfies all four of these motivations for studying host-guest systems. 1. it is actually able to bind the guest in two possible ways. they can also serve as rigorous test cases for method development.2 The Macrobicycle 12 Host-Guest System. Macrobicycle 12 is a cup-shaped molecule made up of two rings that binds the amino acid derivatives by virtue of strong hydrogen bonds between thiourea and amide units of the host and the carboxylate group of the guest. Alternatively. This host-guest system has been found to possess a number of intriguing binding properties. either inside or outside the cavity.CHAPTER 1. The host is called macrobicycle 126 and it binds small amino acid derivative guests in chloroform.1: The macrobicycle 12 host molecule and three guest amino acid derivatives. INTRODUCTION 2 O O H NH NH O H S H HN HN H O H H 3C N H CO2O H H macrobicycle 12 H3C N –Ac–glycine H H 3C N H CO2O CH3 N H CO2O N –Ac–l–phenylalanine N –Ac–l–alanine Figure 1. The first of these is the remarkable ability to stabilise the amide bond .1. they can provide insight into particular types of molecular interaction.

INTRODUCTION 3 Figure 1. the archetypal molecule containing the amide bond. bound to a surface representation of the host. N-Ac-l-cis-alanine. macrobicycle 12.3 Importance of the Host Binding Properties. the molecule does not possess marked selectivity for different amino acids.2 shows the guest. while the d form prefers to bind outside. of the guest amino acid derivative in the cis conformation when the guest is bound inside the cavity.CHAPTER 1. in natural proteins there is almost a complete predominance . However. These binding properties are significant for a number of reasons. The amide bond connecting amino acids is of fundamental importance in determining protein structure. Figure 1. The second binding property is that the l enantiomer is bound preferentially inside the host.3 for N–methylacetamide. as shown in Figure 1. The conformation of this bond can exist in either the cis or trans forms.2: N-Ac-l-cis-alanine (ball and stick) bound to macrobicycle 12 (surface representation). 1. However. Thus the cis stabilisation occurs principally for l enantiomers.

cis amide bonds are also widely found in small cyclic peptides since the cis conformation is conducive to ring structures. can have a significant effect on overall protein conformation.12 As well as determining protein structure.8. This is due to the large free energy barrier of at least 14 kcal mol−1 separating the two conformers. INTRODUCTION cis H H H H H 4 trans H H C C H O C H C N C N H O H H C H H Figure 1. the immunosuppressant agents used in organ transplants. Indeed. This difference in abundance is attributed to the lower energy of the trans conformation due to steric and electronic effects. it is usually found adjacent to a proline residue. 10. In both biological systems and synthetic . 11 cis–trans isomerases that catalyse this conversion can provide potential targets for pharmaceutical drugs.14 Besides conformational stabilisation. yet computer simulation studies of such systems are problematic.7 The commonly observed secondary structure of proteins is largely a result of this trans predominance.6±0.05% cis. FK506 and rapamycin are inhibitors of such proteins. The presence of the cis conformation.13 The study of cis amide bonds in such large and complex systems is of tremendous interest. An experimental measurement of the energy difference for N-methylacetamide was 2.9 trans to cis isomerisation is believed to be a significant rate-determining step in protein folding.4 kcal mol−1 . the ability to stabilise molecules in a particular conformation can lead to the formation of different products in chemical reactions than would otherwise be obtained.8 On the rare occasions that a cis amide bond does occur in a protein.CHAPTER 1.3: The cis and trans structures for N –methylacetamide. cyclosporin A. enantioselectivity is another binding property that is the subject of much study. of the trans conformation with only around 0. due to its rarity.

This. Even though the macrobicycle 12 system does not appear to differentiate markedly between guests in binding strength. rigidity. As well as containing the interesting binding properties.6 The aim of the current work is to perform a complete systematic study to understand the behaviour of the system. much effort in producing synthetic versions has focused on the design of macrocyclic molecules. INTRODUCTION 5 chemistry. This would involve calculating accurate relative binding free energies for different guests with differing stereochemistry and amide bond conformation. An understanding of the factors that control this would greatly assist the design of host molecules in order to bind and manipulate small molecules. Some computational modelling on the structure of the macrobicycle 12 system had shed some light on the reasons for cis stabilisation. of which there are numerous examples.CHAPTER 1. there may still be differences in binding motifs that can be exploited in future work to enhance binding strengths. This is particularly so for the binding of small peptides. combined with a structural and energetic analysis of the resulting equilibrium structures. there is the all important feature of selective binding for different molecules. the macrobicycle 12 is an ideal system for computational study. An understanding of the influence of the amino acid side groups is critical for molecular discrimination and protein structure determination. These properties include good binding capability. Its small size makes possible the application . and vice versa. The desire to keep molecules smaller than the usually larger. naturally occurring systems for the sake of simplicity shares a common aim with the practicalities of computer simulations. While there exist many naturally occuring enantioselective molecules.4 Aim of This Work. reactivity and the possible products from chemical reactions. strong chiral centres.15 Such molecules possess the properties considered essential for enantioselectivity. it is critical in determining molecular shape. structural complementarity and some degree of symmetry. 1. Finally. would provide an understanding of exactly what is causing the selectivities in binding.

16 A range of optimisation procedures were included in the simulation code to improve speed. A new method was developed for deriving OPLS-like charges17 for functionalities not covered by the current OPLS force field.16 The ability of these charges to reproduce experimental free energies of hydration was tested using two free energy methods and their accuracy validated. A new parameterisation was developed for the Generalised Born/Surface Area model for the OPLS-AA force field in chloroform. New parameters were derived for a number of dihedrals not included in the OPLS force field. Indeed. Yet it is complex enough to provide highly interesting behaviour that cannot currently be rationalised by experiment. All of these ingredients combined to produce a working methodology to study the macrobicycle 12 system.CHAPTER 1. The small number of interactions reduces the ambiguity common in larger protein-ligand systems regarding the origin of various effects.18 This work also led to a reappraisal of the applicability of the linear interaction method for calculating free energies. the system is sufficiently complex that conventional simulation methods proved inadequate for the study of this system and so a number of methodological developments and implementations became necessary. . a number of improvements were suggested to free energy calculation protocol. Novel schemes were used to construct the host to ensure good sampling.19 More sophisicated Monte Carlo moves were included to further improve sampling. INTRODUCTION 6 of the highest quality free energy methods. Finally.

the other main tools in research.1 Simulations Methods. However.1. it is conceptually quite remarkable that computers can be used to approximately reproduce real physical behaviour. It is not at all surprising that classical observation of real physical phenomena provides a means to understand such phenomena. the degree of control and flexibility inherent 7 . This review is not exhaustive but focuses on the problem in this thesis. Nor is it surprising that certain rules and theories may be deduced from this. especially so in chemical and biological systems. particularly for large complex systems. they provide means of obtaining quantities that are unmeasurable by experiment. This chapter seeks to explain the practical questions of how computer simulations work and how they may be used to look at real physical behaviour. Computer simulations are now an invaluable tool in examining and understanding phenomena in all scientific disciplines.20–23 While still quite limited in their application to real systems. they are able to address many deficiencies associated with experimental and other theoretical methods. computer simulations offer the following advantages: they can provide mechanistic and structural information on an atomic level.1 The Role of Computer Simulations. 2. With regards to experiment.Chapter 2 Simulation and Free Energy Methods 2.

simulations may also act in the reverse role. On the other hand. they can produce new. The interplay between all three can therefore cultivate the development of each field. a continual balance must be maintained between the restrictions of system size. With regards to analytical theory. large scale simulations to calculate some property that may be obtained in a fraction of the time by experiment or analytical theory. in many cases they preclude the need for various assumptions thus often giving more accurate answers. again. and just like real experiment. to test the ability of the simulations to do so by correctly reproducing the experimental trends. while experiment and analytical theory may be used to test simulations. SIMULATION AND FREE ENERGY METHODS 8 in simulations is generally greater. they make possible the study of rarely found or undesirable phenomena. their implementation can include tricks that transcend these laws. and finally. while they must ultimately obey the physical laws of nature. This in turn can lead to very long. their applicability is limited by computer power. there are no messy chemicals.CHAPTER 2. This work on macrobicycle 12 is an example of this interplay between simulation and experiment. and frequently interfacing directly between theory and experiment. Simulation data suggest new possible theories and experiments. timescale and the level of realism so that systems are studied that are both of interest and simultaneously practicable. increasing computer power will greatly enhance the predictive ability of simulations in the future. Experimentally and theoretically derived input are frequently essential in many simulation models. computer simulations are nothing without theory and experiment since the models they use must be based on some foundation. Since simulations almost exclusively run on computers. they can tackle many more types of problems which are often intractable to theory. to provide insights into the system’s behaviour that are unavailable . testing theories and experiments. However. they are only limited by the imagination. To be useful. The work was originally motivated by experimental studies. quite unexpected behaviour. and vice versa. The idea was that computer simulations could be performed on the system. firstly. secondly. they can provide more information and means of calculating quantities. Nevertheless.

SIMULATION AND FREE ENERGY METHODS 9 in experimental studies. classical force fields to far more accurate but expensive quantum mechanical (QM) methods. They typically consist of bond. 2. simulations must be performed on an atomistic scale.2 Representation of the System. in order to study processes on a molecular level and calculate accurate free energies. modified and adopted to achieve this goal. E = Ebnd + Eang + Edih + Enb (2. However. and thirdly to then use simulations predictively and suggest sensible experiments. MM force fields provide a fast. What follows next is a discussion of various simulation techniques and a rationalisation for the selection of each technique. approximate way of calculating the energy and forces of a system. its suitability and expense must be borne in mind and balanced with the objectives of the study in order to produce useful results over accessible timescales. since there was no chemical rearrangement of the bonds.1. The relatively large system size and the need for multiple system configurations ruled out full quantum mechanical methods. The first question to be decided in this study was how the system was to be physically modelled.1) . new methods had to be derived. The evaluation of physical properties using statistical mechanics requires a formulation to calculate the energy for each structure. angle and dihedral terms for atoms covalently linked together. Conventional methods used to calculate the energy range from molecular mechanics (MM) methods with empirical. and non-bonded interaction terms as given by Eq. ranging from atomistic to mesoscopic through to macroscopic. While it would be preferable to use the most accurate method in every case. However. Since the simulation techniques initially applied were not able to achieve the first objective. There are many scales on which to perform simulations. 2. Hybrid QM/MM techniques modelling the area of interest by the more accurate QM and the rest by MM would have been feasible.CHAPTER 2.1. the increased complexity was not deemed necessary to successfully model the system.

They are all comparable in computational expense. This acronym summarises the main ideal of this force field. and their design philosophy. However. The bond and angle bending contributions are given by Ebnd = i Ki (ri − req. Such are the attributes required in the macrobicycle 12 system. AA is an acronym for allatom which means that all atoms are explicitly modelled. SIMULATION AND FREE ENERGY METHODS 10 There are a wide variety of MM force fields to choose from. Widely used force fields of this type include OPLS. In this work the OPLS-AA force field was adopted.i )2 (2. OPLS is an acronym for Optimised Potentials for Liquid Simulations.i )2 (2. The alternative united atom approach is still in widespread use.3 The OPLS Force Field.CHAPTER 2.28–30 2.25 AMBER.27 and CHARMM.24 However these force fields usually contain more complex energy functions and cross terms. this approximation was not necessary for the small system to be studied here.3) . There are more complicated force fields that attempt to reproduce experimental or quantum mechanical data for small to medium sized molecules to a high level of accuracy. There are force fields designed for modelling larger systems such as proteins. However there are differences in the complexity and form of the force field. Such variation occurs since they are approximate and cannot generally reproduce all possible experimental data simultaneously. 2.1 for this force field is as follows.2) Eang = i Ki (θi − θeq. namely to reproduce experimental properties. especially for intramolecular energetics. An example is MM3. the derivation and availability of parameters. The functional form of each of the components given in Eq. They are both simpler in functional form and parameter types and are designed more for reproducing non-bonded energetics. United atoms are created by combining hydrogens with the atom to which they are attached in order to reduce the total number of atoms.1.26 GROMOS.

the more realistic means to model chloroform would be to model the molecules . rij are the distances between the atoms.i V1. fji are the phase angles and i is a sum over all 1–4 dihedral atom pairs attached to the atoms forming the central bond.6) For heteronuclear atom pairs. Firstly. and secondly.32 to be used in this work.i the respective reference values.3 )] 2 2 2 (2. and fij is given by  : i.CHAPTER 2. σ and are combined using the geometric mean.i and θeq. ij and σij are the Lennard-Jones well-depth energy and collision diameter parameters. BOSS31 and MCPRO.5) where the double sum is over all distinct atom pairs. SIMULATION AND FREE ENERGY METHODS 11 where the summations are over all bonds and angles respectively.4) where Vji are the coefficients. Ideally.i V3.1 )] + [1−cos(2φi +fi. are the partial atomic charges.2 )] + [1+cos(3φi +fi.1). qi . One further issue related to the force field is how the solvent is modelled. with Ki the force constants and req. The one difficulty with this choice was that OPLS lacked certain parameters for this system and these had to be derived (Section 3. it was incorporated in the simulation packages. The AA version was used to achieve more realism and because the system was small enough to afford this additional expense. The decision to use OPLS-AA was made for two reasons.i [1+cos(φi +fi. j separated by 3 bonds fij =  1 : otherwise (2. j separated by less than 3 bonds  0 0. the OPLS force field is designed to reproduce experimental properties.5 : i. The dihedral energy is given by a three term Fourier series Edih = i V2. Finally. the non-bonded energy consists of a Lennard-Jones term and an electrostatic term given as Enb = i j>i qi qj +4 4π 0 rij ij σij rij 12 − σij rij 6 fij (2.

Another common approximation is to use a cutoff radius for non-bonded interactions. This problem. In reality. the solute is surrounded by the solvent chloroform which ideally would extend indefinitely. is removed by the next approximation. inducing a possible crystallinity artefact into the system. This also removes the previously mentioned self-interaction problem as long as the simulation box is . There is a choice of several box shapes.34 2. it is replaced by a new particle with the same properties coming in at the opposite side of the box. however. This is primarily made to reduce the number of energy calculations between an atom and its neighbours to save on computational expense. The possible approaches depend on how the solvent is modelled.CHAPTER 2. the generalised Born/surface area continuum (GB/SA) solvent model33 was implemented. It is justified on the basis that the energy of interaction with atoms beyond a certain distance is negligible and so can be ignored or approximated by a simple analytical function.1.1. A box even of thousands of chloroform molecules by itself would experience very strong edge effects. The parameterisation of this method required the use of yet another continuum method. but a cubic box was chosen. the first necessary approximation is periodic boundary conditions (PBC). When a particle leaves the simulation box. Therefore the box is surrounded by periodic images of itself in all three dimensions to remove all surfaces.4 The System to be Modelled and Other Approximations. principally due to sampling problems discussed in Section 6. However. If it is modelled explicitly. The problem introduced by PBC is that atoms now have the ability to to see themselves. SIMULATION AND FREE ENERGY METHODS 12 explicitly as for the solute.4 caused by the explicit solvent representation. being the simplest. the Poisson-Boltzmann (PB) method. This is illustrated in Figure 2. A balance must be struck between having the solute in a realistic environment and having a sufficiently small system that is feasible to simulate. The next issue was system size and what further approximations were necessary to simplify the simulation.

1: Periodic conditions for a solute in solvent. The GB/SA model already assumes that the dielectric contin- . The dashed line represents the cutoff radius.35 the related faster Particle Mesh Ewald method.37 and the Fast Multipole Method. One clear difference between explicit and continuum solvent simulations is that no cutoff approximations are required in implicit solvent. these add a significant cost to computations.36 the Reaction Field method.38 However. Such an approximation can also be made in solute-solute interactions but this was not necessary given the small size of the solutes in this system. made at least twice as large as the cutoff radius. which decay a lot more slowly.CHAPTER 2. especially for ions. While the inclusion of one of them would be desirable to examine its effect. Four techniques that treat long range electrostatic interactions without using a cutoff are Ewald summation. The cutoff radius approximation is reasonable for quickly decaying dispersion interactions but becomes questionable for electrostatic interactions. the normal cutoff technique was retained on the grounds that there was only one ion in the system with no other ions with which to interact. SIMULATION AND FREE ENERGY METHODS 13 Figure 2. Interacting solvent molecules within the cutoff radius are shaded darker.

measured along this trajectory. Xav . while the PB model models the solute in a box of polarisable dielectric surrounded by an infinite unpolarised dielectric. F the force. is taken. To solve these equations. A summary of each method follows. although either of these structures typically require some further equilibration to ensure that representative structures are being sampled. The assumption is that if a long enough time. SIMULATION AND FREE ENERGY METHODS 14 uum extends infinitely. Two conventional methods to produce such ensembles are molecular dynamics (MD) and Monte Carlo (MC). a number of approximations must be made.8) Eq. This idea is expressed in the equation Xav = lim 1 τ →∞ τ τ X(t)dt 0 (2. an initial starting structure is needed. m the mass of the particle and a the acceleration.7) U is the position derivative of the potential energy. This leads to N first order partial differential equations for the positions and N for the velocities. will give the value of the desired property. The solutions are written as carefully chosen combinations of truncated .CHAPTER 2. − U = F = ma where (2. Having chosen a force field. generating a trajectory of the system through time. usually by a finite difference method. to calculate free energy changes. Such equilibration is usually performed using the same method as that used to generate an ensemble of configurations from which equilibrium properties may be derived. 2. This is typically an experimental or an energy minimised structure. τ . each as a function of time and of the positions of all other atoms within the cutoff radius. then property X.1. MD attempts to simulate the real dynamics of a system by integrating Newton’s Laws of motion. Such coupled equations must be solved numerically at discrete timesteps.7 can be written down for every particle in the system.5 Molecular Dynamics Simulations. 2. will become representative of properties of the real system and so its time average.

6 Monte Carlo Simulations. if N configurations were to be taken randomly from a uniform distribution in configuration space. Over this time period. and velocities. This can be done in a number of possible ways. r(t + δt). A typical value is of the order 1 fs. Otherwise the force applied drifts from the real force and energy is no longer conserved. The equations must be solved at every time step. Every configuration is randomly generated in a prescribed way. Therefore.CHAPTER 2.10) For the Verlet algorithm. velocities and accelerations.9) (2. δt. SIMULATION AND FREE ENERGY METHODS 15 Taylor series expansions about δt as a function of the current positions. MC generates configurations by a different procedure. 2. is given by N N i=1 Xi exp(−Ui /kB T ) N i=1 exp(−Ui /kB T ) Xav = i=1 Xi P i = (2. X. the velocities are not actually needed to propagate the trajectory of the system but may still be needed to calculate properties such as the kinetic energy. gives the new positions. Pi . then the average value of X. that is. The configuration produced at each time step in this way generates the required ensemble of configurations from which to calculate molecular properties. t + δt.1. For example. δt. leap-frog or velocity-Verlet algorithms. that each occurs. for example. and weighted by the probability. the force acting on the particles is assumed to be constant. This leads to new expressions for positions and velocities at a later time. The Verlet algorithm. the choice of δt is a balance between maximising the length of the simulation and keeping errors due to this approximation small. such as the Verlet. How these configurations are generated critically influences the practicality of calculating various properties. which itself must be chosen to be small enough to keep the constant force approximation valid. to measure a property. by their Boltzmann factors.11) . v(t + δt) at time t + δt as r(t + δt) = 2r(t) − r(t − δt) + δt2 a(t) v(t + δt) = [r(t + δt) − r(t − δt)] 2δt (2.

Such an average would be very slow to converge since almost all terms would be negligibly small due to significant overlap and thus very high energy between atoms. Thus each configuration generated now contributes equally to the average. new configurations are usually generated by small moves from the original configuration and so typically have energies very similar to the old ones. Metropolis MC is the sampling method used in this work. πmn ρm = πnm ρn (2.14) . and let πmn be the probability that this trial move is actually accepted. Let ρ m be the probability that the system is in state m. The way such an ensemble of configurations is generated without ever actually calculating a state probability is as follows. that is.13) then the ratio of transition probabilities will be Boltzmann weighted. SIMULATION AND FREE ENERGY METHODS 16 where Xi is the value of the property and Ui the energy of a particular configuration i. as desired. With such a sampling scheme. Furthermore. reducing the likelihood that a high energy non-contributing state is attempted. To produce such a biased distribution. Eq. one must now reject certain configurations if they do not meet certain criteria.11 simply becomes 1 = N N Xav Xi i=1 (2. improving the acceptance rate. A more sensible sampling scheme is the commonly used Metropolis sampling 39 which produces configurations both randomly and weighted according to the Boltzmann factor. let αmn be the probability that a trial move from state m to state n is selected.CHAPTER 2. can be constructed so as to satisfy microscopic reversibility. 2. πmn .12) Now. given by ρn πmn = = exp[−(Un − Um )/kB T ] πnm ρm (2. If the matrix π consisting of the elements. the significant contribution of each configuration leads to much faster convergence of the averages.

δx. now an infinitesimal probability density. (Un − Um ) is then calculated. m = n =  1 − m=n πmn : m = n 17 πmn (2. a move is first selected from the α matrix. If it is positive.7 Molecular Dynamics Versus Monte Carlo. If the quantity is greater than the random number. Much more complex moves are possible. These are discussed in Section 6.1. This is done by comparing the quantity exp[−(Un − Um )/kB T ] to a random number in the interval [0. 2.15) where the α matrix is assumed to be symmetrical to ensure microscopic reversibility. SIMULATION AND FREE ENERGY METHODS In the Metropolis scheme.16) This equation says that the probability of moving into a region at xn is the width of this region.CHAPTER 2. divided by twice the maximum displacement if xn is within ∆x from xm . There is a vast number of ways to choose it. Otherwise. In practice. If it is negative. then ρn ρm and the move is accepted. a number of differences between them favour one method over the other depending on the system. MD naturally samples the NVE ensemble. While MD and MC should in theory give identical answers by the ergodic hypothesis. the πmn that achieves this is given by  : ρn ρ m . then it is accepted with probability ρn /ρm . A similar principle operates for altering any other coordinate such as a dihedral angle. αmn . in practice. The α matrix determines the move type.1. the original configuration (m = n) is accepted again. The energy change.∆x]. then the attempted configuration is accepted. m = n  αmn αmn (ρn /ρm ) : ρn < ρ m . is then given by δx/(2∆x) : xn ∈ [xm − ∆x. xm + ∆x] 0 : xn ∈ [xm − ∆x. . An example of a typical move is the translation of an atom from position xm in the x direction by a random displacement in the range [−∆x. xm + ∆x] δαmn = (2. chosen to be symmetric about the original position.1]. Firstly. and zero otherwise.

MC was adopted as the protocol for the generation of configurations. Thirdly. in which moves are more disjoint and generally have to be small to provide a reasonable acceptance probability. Both can be made to sample the NPT ensemble. that of relative computational expense in configurational space exploration. but more must be generated to move the whole system. Secondly. while MC only updates a small section. MD simulates real time behaviour and so allows dynamical properties such as diffusion to be studied since the momenta are explicity defined. SIMULATION AND FREE ENERGY METHODS 18 while MC the NVT ensemble. for which a volume move brings about the necessary volume fluctuations. Sixthly. MD naturally allows all degrees of freedom to change. moves can be designed to jump over energy barriers over which MD has to incrementally climb. whereas MD requires more complex methods such as SHAKE40 to constrain bond lengths. and the absence of a need for dynamical information. Fourthly. MD updates the whole system each configuration. this freedom of choice in MC move selection can lead to more efficient sampling of configurational space. Thus new MC configurations are usually faster to generate. MD requires more complex techniques involving extended Lagrangian formulations to allow the energy and volume to fluctuate and the temperature and pressure to be constrained. since the forces are included in MD. Seventhly. Which method is faster is very much dependent on the system. However this is achieved more simply in MC.CHAPTER 2. Given the generally simpler nature of MC. since larger moves can be attempted not restricted by the small time step of MD. its greater ability to traverse configurational space. it is easier to perform motions that are more cooperative and larger than in MC. Furthermore. whereas MC only samples the degrees of freedom that comprise move attempts. MD can suffer from numerical problems due to continual approximations in solving Newton’s equations. the one in which experiments are most commonly performed. Most properties of interest which depend on the derivative of the partition function such as the energy may be cal- . Thus it is easy in MC to restrict sampling to those degrees of freedom considered important to the problem at hand. This is not possible in MC since all properties are derived solely from coordinates. This is related to the fifth difference.

Thus calculation of free energies is an important way of linking computer simulations and experiment.2 2. 41–45 In the NPT ensemble the pertinent free energy function is the Gibbs free energy.17) where Gideal is the ideal gas part and may be calculated separately if U . Thus not only is Metropolis sampling which favours low energy states now inappropriate but also vastly more configurations must now be sampled.CHAPTER 2. all terms.2. SIMULATION AND FREE ENERGY METHODS 19 culated using Eq. However. What is problematic here is that the exponential depends on (U ). a large positive number.12. 2. Much experimental data comes in the form of an equilibrium constant between two states. or its counterpart. will make a significant contribution to the average. G. The free energy. especially those with the highest energy. from it all other thermodynamic quantities can be derived. Therefore. the potential NPT energy of the system. which is effectively a free energy difference. To evaluate this quantity .1 Free Energy Methods.2. the partition function.2. is independent of particle momenta. such a problem can be sidestepped by calculating free energy differences which only depend on the ratio of partition functions. 2. 2. The Problem of Calculating Free Energies. is the most important quantity in statistical mechanics. Thus there is no hope of expecting this average to converge and hence absolute free energies are inaccessible in this way.2 Free Energy Perturbation. However. It may be calculated from the ensemble average G = −kB T ln QNPT = Gideal + kB T ln exp NPT +U kB T (2. Knowing the free energy of a system reveals its stability and allows a prediction of the correct state of a system under a given set of conditions. measuring free energies is more complex and is discussed in the next section. Furthermore.

but it only converges if the two states A and B are similar to each other to keep ∆UAB small. For example. atoms with zero charge and Lennard-Jones parameters. The parameters in the Hamiltonian that differ between A and B are altered according to the value of λ. Such an expression is exact. The perturbation proceeds by gradually turning off the interactions for A and turning on those for B.CHAPTER 2. λ. This arbitrary partitioning is allowed because free energy is a state function independent of the path taken between two points. SIMULATION AND FREE ENERGY METHODS 20 there are two commonly used well-proven techniques. State A gradually changes into state B. The conventional way is the so-called single topology method.18) A where ∆UAB = UB − UA .19) . There are two common ways to perturb molecules between states A and B. is given by rλ = λrA + (1 − λ)rB (2. The single topology method was used in this work. This is because the end points commonly sample quite different regions of configuration space.46 The free energy difference between two different states A and B is given by QB −∆UAB = −kB T ln exp QA kB T ∆G = −kB T ln (2.47 In this method both A and B are present in the system but they never interact with each other. Therefore it is frequently necessary to define several arbitrary non-physical intermediate states between A and B to increase the similarity between successive states and thus overlap to a greater extent the configurations that they sample. Dummy atoms. if A and B differ by a certain bond length. rλ . and the averaging is performed over configurations for state A. which ranges from 0 to 1. What may be the low energy states for one Hamiltonian may become very high energy states for another. Typically this dependence is made to be linear. Conventionally each state is defined by the variable. then the intermediate bond length. The less commonly used alternative is the dual topology method. are used to grow or remove atoms. The first of these is free energy perturbation (FEP).

This is probably because the reference state. although with enough windows this is usually insignificant compared to the error due to inadequately converged averages. However. calculating free energies in both directions (double ended sampling) provides a reliable check that successive windows are sampling similar configurations. 2. although in this case numerical integration does introduce an approximation. thus giving a converged average. The value of this free energy should be the negative of the former. 50 that perturbations in the direction of molecules increasing in size have better convergence.CHAPTER 2. it has been observed in this work and elsewhere49.51 This method gives the free energy difference by 1 ∆G = 0 ∂H ∂λ dλ λ (2. The total free energy is then the sum of the component free energies for each section obtained using an analogous formula to Eq. then the reverse free energy may be also calculated.18 except with different end points.3 Thermodynamic Integration.48 since the free energy forward and backward can be calculated from one simulation. The second free energy difference technique is thermodynamic integration (TI). the whole Hamiltonian itself can be scaled in this way. being smaller. One point to note is that FEP does not require configurations to be sampled from state B. In practice this integral must be broken up into a number of discrete parts along the λ coordinate in the same way as for FEP.20) where H is the Hamiltonian. changes the Hamiltonian minutely for every . The usual approach (multiconfiguration TI) is to use of the order 10 λ windows with adequate equilibration and converged data collection.52 Another approach. called slow growth. is able to much better sample the perturbed configurations than for the reverse case. This can be exploited to produce a faster method termed the double wide sampling technique in which only every second window is sampled. SIMULATION AND FREE ENERGY METHODS 21 Alternatively.2. 2. Therefore. If this were done.

As a means of calculating free energy changes. they are limited by computational expense. such an approach is dubious as the question arises as to whether the new Hamiltonian at each λ is being properly sampled and whether the conformations of the system in general lag behind those appropriate for the value of λ. Thus it also important to assess what length of simulation is necessary for converged results. they require complete sampling of all relevant configurational space. Fifthly. Such an approach was found to be necessary in the macrobicycle 12 system. Firstly. Secondly.3 2. SIMULATION AND FREE ENERGY METHODS 22 configuration (single configuration TI). Thirdly. The first class simulates a particular .5 Fast Free Energy Methods. other corrections are necessary due to the standard state and changes in symmetry. Hence it is quite common to test force fields themselves by comparing free energy results obtained by this method to experiment. both to overcome large energy barriers and to traverse narrow regions requiring highly cooperative motions. as is done in this work in Chapter 5. there are considerable difficulties converting between molecules with different charges since long range interactions have to be carefully treated. There are many other free energy calculation techniques that seek to speed up this rather slow calculation. The objectives of such methods are generally to obtain free energy information for many molecules from only one or two simulations. they are also not without their problems. 2. like most simulation techniques. the accuracy of their results is limited by the force field and approximations made in energy calculations such as cutoffs.2. The addressing of this problem is described in Chapter 6.2.CHAPTER 2.4 Difficulties With Free Energy Methods. Fourthly.53 However. FEP and TI are both capable of giving accurate results in many cases.54–56 In many systems with complex energy landscapes special sampling techniques must be used. However. These methods can be categorised into three approaches. especially when large mutations are performed for which many intermediate states are required.

SIMULATION AND FREE ENERGY METHODS 23 reference state. Similar information can be obtained from the simulated annealing technique of Jarque and Tidor61 and the chemical MC/MD technique62. The final class of methods obtain free energies only from quantities calculated at the end points of a perturbation.21) α determines the softness of the atoms. (2. often an intermediate between the two end points.65 calculates free energies A . The difficulties these methods suffer is the standard problem of very different. The second class of methods allows λ itself to vary so that different hybrids of the two or more end points appear in the same simulation.64 which is able to perform MC moves from one molecule directly to the other.57 seeks to calculate free energy changes by writing the free energy change as a standard Taylor series expansion and calculating all the necessary terms from a single simulation of a reference state. and makes MC moves between ligands such that one is always real. The extreme case of simulating only the end points together is possible using the Jumping Between Wells method of Sendorowitz et al. but their ability to calculate numerical free energy differences is reduced when these differences grow so large that the least stable molecules are not sampled properly.59 The λ-dynamics method60 simulates many ligands together each with their own λ variable which itself is able to fluctuate as a dynamic variable. Tidor used MD simulations in real space coupled with MC moves in λ space. 63 which simulates all molecules together with all but one treated as ghost molecules. Free energy differences may be calculated between many similar molecules from an ensemble of an intermediate non-physical molecule modelled with a soft core Lennard-Jones potential given in Eq.CHAPTER 2. All of the approaches can be of much use in ranking relative binding. Thus this approach requires two simulations.2158 12 σij ij i j>i 6 6 ασij + rij 2 6 σij 6 6 ασij + rij Enb = 4 − 2. Free energies are then obtained using the FEP formula by mutating to each real molecule. 2. non-overlapping states. The original linear interaction energy method of ˚qvist et al. The method of Smith et al.

6 Choice of Free Energy Method. termed the generalised linear response method because it combines both the van der Waals and electrostatic energies. The inclusion of more λ windows is simpler in TI than FEP since the former requires only one additional simulation while the latter three.23) where the first term is a cavity term from scaled particle theory. the number of ligands was small. It has an advantage in that it requires no empirically derived parameters. and the macrobicycle 12 system was small enough.22 has recently been the subject of much debate and is studied further in Chapter 5. there are a number of other methods that make use of empirical free energy functions. Recently.2.49nkB T + VH 0. the choice of method was between FEP and TI. 68 In particular. 2. n is the number of atoms. Since the calculation of accurate free energies was desired. Both methods produce similar accuracy for similar computational expense. Exactly which terms are included on the right-hand side of Eq. a fairly fast theoretical approach has been developed by Kolossvary that has been used to calculate conformational free energies69 and d–l isomerism. and VH 0.67.22) where ∆Uvdw and ∆Uelec are the average differences in van der Waals and electrostatic energy between the two end points.70 The testing of this approximate but faster method to the macrobicycle 12 system would be of interest.66 The free energy of hydration is given by ∆Ghyd = 1. Finally.5 is the average solute-solvent energy averaged over an ensemble where the solute is halfway between itself and a point singularity. SIMULATION AND FREE ENERGY METHODS according to the equation: ∆Gbind = α ∆Uvdw + 0.5 (2. although whether such contributions . another linear interaction energy method has been proposed.CHAPTER 2.5 ∆Uelec . and α is an arbitrary parameter usually fitted to experiment. 2. 24 (2. For TI the free energy components are also broken up into terms according to the Hamiltonian allowing an analysis of the contributions.

rmax . making this method a large computational effort. 2.1 Applications of Free Energies Methods. If the end points for the mutation are taken as the free guest and host. the free energy method used in this work was chosen to be FEP. SIMULATION AND FREE ENERGY METHODS 25 are meaningful is debatable. 0 (2. Formally. 2. Computational chemistry is now able to provide such quantities as well as much other useful information. then λ would define a reaction coordinate.3.3 2. w(r). calculation of absolute free energies of binding is a non-trivial exercise.46 The free energy is then given by rmax ∆Gbind = −kB T ln 4π r 2 exp(−w(r)/kB T )dr . The simpler method is to only attempt to calculate relative free energies of binding by making use of thermodynamic cycles.71 Despite these small advantages of TI. There are two ways around this problem.CHAPTER 2. towards the host.72 . r. but each one would require a tremendous amount of sampling of all the possible orientations and internal degrees of freedom. and the host and guest bound. primarily because this was the method included in the simulation packages BOSS and MCPRO. this is the free energy difference between the host and guest at infinite separation.1–5 However. and the bound complex. Problems Calculating Free Energies of Binding. The aim of this work was to study the binding of amino acid derivatives of various conformations and stereochemistries to macrobicycle 12.24) Not only would many windows be required. in between.3. calculated as the guest approaches from a large separation distance.2 Relative Free Energies of Binding. The key thermodynamic quantity of binding is not simply the difference in energies but the free energy difference. The free energy would be obtained by numerically integrating the potential of mean force.

It makes use of Eq. This is the type of calculation used in studying the binding of amino acid derivatives to macrobicycle 12 as described in Chapter 7.3 Absolute Free Energies of Binding. is much more practical computationally as it only requires the now much smaller perturbation of A to B.25.3. another more complex approach that can be used to obtain absolute free energies of binding. There is. while unphysical. 2. It is calculated by making use of the following thermodynamic cycle.CHAPTER 2.73 It is a special case of the previous situation except with B now replaced by a molecule completely non-interacting with the rest of the system.26 derived from the following thermodynamic cycle. either by calculating binding free energies for A and B. H+A ∆Gmut AB c ∆Gbind A E HA ∆Gmut AB(H) H+B ∆Gbind B E c HB ∆∆GAB = ∆Gbind − ∆Gbind = ∆Gmut − ∆Gmut B A AB(H) AB (2. or evaluating free energy changes going from A to B when free and when bound in the host. . The host does not need to be simulated separately since its properties are constant in isolation. however. 2.25) It can be seen that the relative free energy may be calculated in two ways. Corrections are also included to obtain a proper free energy in the standard state. This second kind of perturbation. 2. It is called double decoupling3 and is based on the double annihilation technique that was originally proposed to calculate such a quantity. SIMULATION AND FREE ENERGY METHODS 26 The relative free energy of binding gives the preference of one solute to bind over another and is given by Eq.

Such mutations are performed by gradually turning off the interactions of the molecule with the rest of the system. SIMULATION AND FREE ENERGY METHODS 27 A(sol) + H(sol) ∆Gbind A rr rr rr E AH(sol) ∆Gdecoup A(H) c ∆Gdecoup A rr j r A(gas) + H(sol) ∆Gbind = ∆Gdecoup − ∆Gdecoup A A A(H) (2.75 Fourthly.49 softcore. There has been much work on the selectivity of macrocycles for single atom ions differing only in their radii.4 Previous Studies on Host-Guest Systems.26) The guest is decoupled from the system twice. the molecule when it is barely interacting with the system is able to access more space and may require a restraint. The earliest studies concentrated on molecular species different in only minor ways but now increasingly different molecules are being studied. necessitating the use of other functional forms for the potential such as separation-shifted scaling. Two studies on particular halide ion binders demonstrated the preference for smaller sized ions principally because larger ions were too large to fit.3 2.74 or alternatively a hard sphere which can then be accounted for theoretically. problems can also arise due to the presence of a singularity at the end point. especially the one in the host.3. 77 Another .76. A correction must be therefore used. Such simulations require more care. complications arise in the calculation of ∆Gdecoup due to the definitions of A(H) the standard state at each end point. once in solution and once when bound to the host. Secondly. This is firstly because the perturbation is usually much larger and requires many windows.CHAPTER 2. Free energy calculations on host-guest systems has been in common use for fifteen years now. It is not strictly an annihilation because the molecule has only been removed from the system and still possesses the degrees of freedom of a gas molecule. Thirdly.

They examined the different modes of binding for the two guests and their influence on host flexibility. Some studies have also focused on enantioselective binding.85 calculated the relative free energies of binding to a podand ionophore for l and d α–amino acid-derived substrates.64 Eriksson et al. Costante-Crassous et al. SIMULATION AND FREE ENERGY METHODS 28 calixspherand host showed selectivity towards K+ over both Na+ and Rb+ . Burger et al.84 were able to reproduce the correct enantioselective complexation of bromochlorofluoromethane to a chiral cryptophane in chloroform and subsequently assign the correct optical activities. they found that the larger the ion’s radius.86 found that . Different host-guest interactions have been studied as well. Senderowitz et al. This was despite the reverse preference in gas phase. The reason for the reversal of order is that the desolvation penalty for larger ions in water was found to be smaller. They also studied how these properties varied when the explicit solvent water was replaced by the Poisson continuum solvent model.82 studied the binding of neopentane and tetramethylammonium ion to cryptophane in water for a range of reasons. The appropriateness of various methods can also be studied. Duffy and Jorgensen reproduced the relative binding of quinoxaline. Mordasini Denti et al. Kirchhoff et al.64 used the same podand ionophore system to test the “Jumping Between Wells” fast free energy method mentioned in Subsection 2.5.80 attributing the difference to solvent cavitation rather than different host-guest interactions. confirm the preferential binding of pyrene to cyclophane in chloroform over that in water. the stronger it bound to the starand host.83 They observed greater conformational freedom for both guest and host in the continuum solvent than in explicit. In a study by Cho and Kollman79 on the binding of alkali metal ions to a rigid starand host.CHAPTER 2. They then predicted a more enantioselective guest and this was subsequently verified by experiment.78 The solvent can play an important role in determining binding.2. pyrazine and pyridine to Rebek’s acridine diacid in chloroform81 and rationalised the difference due to additional hydrogen bonds and host flexibility.

88–91 This quantity is intrinsically related to a molecule’s solubility. SIMULATION AND FREE ENERGY METHODS 29 the use of Particle Mesh Ewald to treat long range electrostatic effects was essential to model the binding of iminium and guanidinium organic cations to a negatively charge cyclophane host. this quantity is commonly referred to as the free energy of hydration. . This perturbation is the decoupling of the molecule from the solvent environment. This quantity requires two simulations. Free energy methods can also be used to calculate a molecule’s free energy of solvation. 2.82. 2. Mark et al.27 using the following thermodynamic cycle. A and B. absolute free energies of solvation require only a single perturbation. When calculated in water. The resultant free energy of solvation. then even ∆Gmut AB(gas) does not need to be calculated. is given by −∆Gdecoup .87 used free energy studies on the binding of para-substituted phenols to α-cyclodextrin complexes to examine the influence of force field and sampling.3. Gas Phase: A ∆Gsolv A c ∆Gmut AB(gas) E B ∆Gsolv B c ∆Gmut AB(sol) E B Solvent: A ∆∆GAB = ∆Gsolv − ∆Gsolv = ∆Gmut − ∆Gmut B A AB(sol) AB(gas) (2.5 Free Energies of Solvation. Unlike the calculation of absolute free energies of binding.27) Free energies have to be calculated only for the smaller A to B mutations. may also be calculated. 83 Pitera and Kollman62 studied a range of guests in Rebek’s “tennis ball” host to predict which one bound the most strongly and to test their fast CMC/MD method against the slower TI method. one in solvent and one in gas phase and is given by Eq. Relative free energies of hydration. ∆Gsolv .CHAPTER 2. If the rigid molecule assumption is made. ∆Gsolv between AB two different molecules.

it is equal to ∆Gmut . The wide availability of experimental data on free energies of solvation further enhances the power of this approach. This work is described in Chapter 5. 2.CHAPTER 2. is proposed (Chapter 4). Therefore AB(gas) ∆∆GAB simply equals the free energy change in solvent. The following thermodynamic cycle XY shows that the solvent X/solvent Y partition coefficient.3.28) . This intramolecular free energy change is the same in gas phase and solvent since the molecule is rigid. This quantity gives the ratio of concentrations of a solute equilibrated between two solvents. REPD. for molecule. Free energies of solvation are commonly used as tests for force field parameterisations and methods.3RT log(PA ) = ∆GXY = ∆GX − ∆GY A A A (2. It is possible to obtain this quantity directly using a similar idea to calculating free energies of hydration. In other words.6 Partition Coefficients. a new charge derivation method. in this work. Another quantity of interest that can be addressed by free energy calculations is the partition coefficient. Free energies of hydration are calculated using such methods to test how well these charges reproduce experimental behaviour. The free energy is calculated not for the real physical process of transfer from one solvent to another but rather by decoupling the molecule from each solvent.92–95 Indeed. PA . SIMULATION AND FREE ENERGY METHODS 30 The reason for this is that the free energy change calculated in solvent is now not strictly ∆Gmut but ∆Gmut with the intramolecular free energy change going from AB(sol) AB(sol) A to B removed. A. is given by the difference in free energies of solvation using the expression ∆GX A E Solvent X: A ∆GXY A c nothing ∆G = 0 Solvent Y: A ∆GY A E c nothing XY −2.

between A and B is given by Solvent X: A ∆GXY A c ∆GX AB E B ∆GXY B Solvent Y: A ∆GY AB E c B XY −2.3RT ∆ log(PAB ) = ∆GXY − ∆GXY = ∆GX − ∆GY A B AB AB (2. ∆log(PAB ). 2. verification of simulation methodology by experiment must resort to reproducing the relative partition coefficient. 2. by comparison to absolute values. are quite easy to calculate using normal free energy methods. Partition coefficients provide another route to verification of simulation methodology since there is an abundance of experimental data. The necessary steps outlined in . coefficient.4 Conclusion.29) The two terms on the right hand side of Eq. However.29 are nothing more than relative free energies and. the relative solvent X/solvent Y partition.28 being positive. this comparison is not usually made through the use of absolute partition coefficients.2).CHAPTER 2. Using the following therXY modynamic cycle. since the calculation of these requires two expensive absolute free energy of solvation calculations. The comparison is made through the calculation of relative partition coefficients. This method is not as rigorous as comparing to free energies of solvation since it involves differences but it still can prove useful.2. The rationale for using computer simulations in this work and how they may be used to study particular systems has been presented. 2. with the right hand side of Eq. This was the case for a new set of thiourea charges used later in this work (Subsection 3. In the absence of any experimental free energies of solvation.96–98 The relative partition coefficient is defined as PAB = PA /PB . SIMULATION AND FREE ENERGY METHODS 31 XY A negative value of log(PA ) indicates that most of the solute is in solvent Y.

Since part of this involves a new parameterisation technique. the system is set up and the force field parameterised. Finally. The MC simulation protocol is expanded to be capable of simulating the macrobicycle 12 system. additional computer simulations are undertaken to test the new parameter set. SIMULATION AND FREE ENERGY METHODS 32 this chapter are now taken in this work. the binding free energy protocol is established and calculations on the macrobicycle 12 system are performed and rationalised. Firstly.CHAPTER 2. .

Chapter 3 Setup of the Host-Guest System This chapter is devoted to all the steps necessary to set up the macrobicycle 12 system for free energy calculations. dihedral and non-bonded parameters. These include bond. The OPLS force field covers a wide range of functionalities. heat capacities and compressibilities.1 Force Field Missing Parameters from the OPLS Force Field. However.1. Finally. angle. The assignment of 33 .99 the dihedral parameters are chosen to best reproduce the conformational energy profile generated from structures optimised from HF/6-31G* calculations. a number of measures to customise and optimise the MCPRO simulation package for the system are described.1. the non-bonded parameters are optimised to best reproduce experimental liquid phase properties for a variety of molecules. particularly the charges and dihedrals. The force field used in this work to model macrobicycle 12 is the OPLS-AA force field25 (see Subsection 2. The bond and angle parameters largely come from AMBER. Such properties include enthalpies of vaporisation. The OPLS parameters come from a range of sources.1 3. It continues by discussing starting structures and how the architecture of the system is constructed. It first deals with the derivation of parameters missing from the OPLS force field. 3. some parameters were missing for macrobicycle 12 and thus had to be obtained by other means. densities.3 for the functional form of the components).

500 0.250 5 N -0.272 0.0 2.CHAPTER 3. The location of the new parameter types in macrobicycle 12 are shown in Figure 3.706 0.0 D7 N–C–C–C 0.1 Dihedral Parameters Dihedral V1 /kcal V2 /kcal V3 /kcal mol−1 mol−1 mol−1 D1 C–N–C–S 0. N–Ac–phenylalanine.500 0.0 D3 H–N–C–S 0.750 0.0 D5 C–N–C–C 1.3).000 0.930 -1.0 a D10 C–N–C–C -7.385 3. Bond B1 B2 C–S C–N Bond Parameters K/kcal req /˚ A −1 ˚−2 mol A 1.0 Non-bonded Parameters Atom q σ/˚ A /kcal mol−1 1 C 0.974 D11 O–C–C–N 0.271 D6 C–C–C–C 0.2.500 0.170 6 H 0.0 D2 C–N–C–N -1. bond.140 3.166 6.907 0.066 2 C 0.066 a b The last atom is the carboxylate carbon.7 114.579 0.142 -1.000 7 C -0.051 3.0 D4 H–N–C–N 0.907 0. and non-bonded parameters is fairly straightforward if new functionalities are similar to pre-existing functionalities due to the transferable nature of the OPLS force field.335 Angle Parameters Angle K/kcal mol−1 rad−2 A1 S –C–N A2 N–C–N A3 C–N–H A4 C–N–C A5 C–C–N 63 θeq / degree 122. However. angle. The derivation of these parameters is described later in Section 3.500 0.110 3.0 D8 C–N–C–C -0. are much more sensitive and must be derived separately for any new dihedrals (Section 3. c N–Ac–alanine.500 0. SETUP OF THE HOST-GUEST SYSTEM 34 Table 3.389 2.083 2.649 0. The thiourea group is moderately similar to the urea functionality which is included in the OPLS force field.089 0.040 3.9 110.030 9b C -0.020 3.191 3.188 -1.500 0. Table 3.0 0.666 1.066 3 C 0.290 3. Dihedral parameters.1: Parameters Derived or Assigned for Macrobicycle 12 Not Supplied by the OPLS Force Field.250 0.0 0. N–Ac–glycine.245 0.907 0.1.8 121.2 contains the location of the new .6 119.550 0.066 8 H 0.0 2.105 4 S -0. on the other hand.0 D9 C–C–N–C 2.340 5.0 2.1 contains a summary of all these required parameters and the values derived for them in this work. it was felt that the non-bonded charge parameters important in host-guest binding should be derived specifically for thiourea.066 10c C 0.0 2.907 0.258 0. Figure 3.

reference bond lengths were required only for the C–S and C–N bonds. parameter types in the guest.1: The location of the new parameter types in macrobicycle 12. New atom labels were created for the sulfur and the centre carbon of the thiourea unit.1. SETUP OF THE HOST-GUEST SYSTEM 35 D6 D6 C 1 D7 A5 D8 C 2 C 2 H 6 H 6 H 8 5 N A4 B2 A3 A2 A1 N 3 B1 D1. Only a cross-section of the host is shown. Therefore. New atom labels could have been assigned to other atoms of the thiourea moiety but this would have required the definition of many more new bond and angle types very similar to pre-existing types for urea.1.D2 D3.2 Transferable Parameters. This meant defining new parameters for all bonds and angles containing these atoms.D4 5 D5 H8 C 7 C C H 7 H 8 8 S 4 Figure 3. 3. The parameters are listed in Table 3. These values were taken from a MP2/6– .CHAPTER 3.

N -dimethylthiourea in the Z. This was the case for atoms 1.Z conformation. The parameters for the highlighted carbon depend on the side group.2: The location of the new parameter types in the amino acid derivatives.2. However. For atoms 3–6.3. 31+ G* optimised structure of N. 2. with the exception of . Some new types were necessary since they bridge two different functionalities. In these cases the same Lennard-Jones parameters were used as for the atom in the unbridged case and the charges were assumed to be additive in order to preserve overall charge. It was assigned a charge totalling two toluene carbon atoms and four hydrogens. Bond force constants were not necessary since these degrees of freedom were not sampled in free energy calculations for reasons discussed later in Subsection 6. X.N dimethylthiourea. while the remaining A3 and A4 angles with nitrogen at their apex were taken as the equivalent urea OPLS parameters. Non-bonded Lennard-Jones parameters were also needed for atoms 3–8 of the thiourea group. each requiring non-bonded parameters. the same Lennard-Jones parameters were taken as for urea.CHAPTER 3. the A5 C–C–N parameters elsewhere in macrobicycle 12 were taken to be the same as another chemically very similar angle already defined in the force field.1. 9 and 10 in Table 3. atom 1 is the carbon connecting the two aromatic rings. For example. Reference angle values for thiourea were also required.3. the one present in the host. The reference angles for S–C–N and N–C–N were obtained from the same MP2/6–31+ G* structure of N. A number of new atom types had to be defined. SETUP OF THE HOST-GUEST SYSTEM X 9 or 10 36 C D9 D10 D11 Figure 3. No new angle bending force constants were required for the thiourea moiety again due to the angles not being sampled. This geometry was chosen for reasons discussed further on in Section 3.

For atoms 7 and 8.3. were used for thiourea. The REPD/6–31+G* charges. Charges were derived only for the atoms enclosed in the box.3: The N. The end methyl groups had their charges constrained to OPLS values to further reduce any remaining discontinuity between the charge sets. It was decided that new charges should be derived as far as the first carbon away from the thiourea. for which the OPLS sulfide parameters were used. By parameterising to the predominant conformation. The principal motivation for the REPD method is the need to produce charges for this part of macrobicycle 12. Reference bond and angle values listed in Table 3.1 were used together with the minimum energy conformation most commonly observed in macrobicycle 12.2 3.CHAPTER 3. 3. The parameters for the amino acid derivative carbon are either those for atom 9 or 10 depending on the side group. the conformational dependence of charges discussed later in Subsection 4. Therefore. REPD charges are designed to replicate OPLS charges while at the same time be easily derivable.2. the sulfur. The charges derived in this way for the thiourea moiety in macrobicycle 12 are presented in Table 3.N -diethylthiourea molecule used to parameterise REPD charges for macrobicycle 12. SETUP OF THE HOST-GUEST SYSTEM H H H 37 C C H S H H H H N C H N C H H C H Figure 3.1 as atoms 3–8.17 discussed in Chapter 4. .N -diethylthiourea pictured in Figure 3. the charges were derived on the molecule N.1 Charge Derivation. standard hydrocarbon parameters were adopted.4 is minimised. REPD Charges.3.

There were 3 million (M) configurations of equilibration and 5 M of data collection per window.97 Equilibrium configurations were generated in the NPT ensemble at 25 ◦ C and 1 atm using the MC Metropolis algorithm.3 Free Energy Protocol. and C and W represent chloroform and water.3. the relative partition coefficient was calculated 96–98 with a similar molecule. number of configurations.100 To avoid full free energy of solvation calculations.3RT ∆ log(PTA ) = ∆GCW − ∆GCW = ∆GC − ∆GW T A TA TA (3. A full description of the use of relative partition coefficients is given in Subsection 2. Considerable effort was spent in trying to reproduce some experimental behaviour for these charges to validate their use. let T and A represent thiourea and acetamide.1.2 Relative Partition Coefficients. acetamide. while the chloroform box was of side 33 ˚ A and contained 264 chloroform molecules.CHAPTER 3. Chloroform: Thiourea ∆GCW T ∆GC TA E Acetamide ∆GCW A Water: Thiourea c ∆GW TA E Acetamide c CW −2. Full details for the FEP protocol can be found in Section 5. MC moves and spacing of windows. SETUP OF THE HOST-GUEST SYSTEM 38 3.2. These points relate to the solvent box. Maximum move sizes for solute translations and rotations were selected to be 0.6.15 ˚ and 15◦ . to be calculated as follows. Mutations were performed in both directions using 11 windows A . The mutations in water were performed in a cubic box of side 25 ˚ A containing 505 TIP4P101 water molecules.2. The only useful experimental property that could be found was the chloroform-water partition coefficient. This allows a relative CW chloroform/water partition coefficient. ∆ log(PTA ).1) 3. The protocol described here contains only essential points relating to the mutations in this system. The maximum volume move sizes were set to 250 ˚3 in water and A A 390 ˚3 in chloroform. Respectively.

3. 3.102 An assumption that had always been made up to this point was that the HF/6–31+G* geometry was adequate for all compounds. ∆Gw = 5. The initial calculations were performed using EPD/6–31G* charges and 6–31G* optimised geometries. With the development of the REPD method.661. gave TA TA ∆ log(PT A ) = −2. A range of other charge sets and geometries for thiourea were therefore studied. Figure 3. SETUP OF THE HOST-GUEST SYSTEM 39 S H O H H N H C Du H N N H C H H C H Thiourea Acetamide Figure 3.10. Many combinations of larger basis sets and better methods were therefore tested.01 and PT = 0.4 illustrates the mutation performed.66. but the main improvement was obtained using a MP2/6–31+ G* geometry.100 gives ∆ log(PT A ) = −1.00079. such an assumption may be breaking down for thiourea with its large row 3 sulfur atom. The principal difference was that this shortened the C–S bond length from 1. The free energy results were rather disappointing.92 kcal mol−1 and ∆Gc = 2.1 intervals. the calculation was repeated with the REPD/6–31+ G* charges and geometry.30 kcal mol−1 . The system may be over-polarised. The individual free energies obtained. The result was improved at −2. Much of the problem seemed to lie in the large dipole moment. Ab Initio Method and Geometry. since a solution phase (dioxane) experimental thiourea value is only 4.4: The change in geometry for mutating thiourea to acetamide. However. PA = 0. Experimentally.89 D. Such a large negative value was attributed to the increase in dipole moment and polar hydrogen charges for thiourea and was the spur for developing the REPD method.2. spaced at ∆λ = 0.CHAPTER 3. The EPD/6–31G* charges used are given in Table 3.683 to 1. lowering .2.4 Effect of Basis Set.04 but still too negative. as shown in Table 3.

thiourea EPD/6–31G* REPD/6–31+ G*b REPD/6–31+ G*c REPD/6–31+ G*d a C 0. c MP2/6–31+G* planar geometry.447 0.77 -0.331 HF/6–31G* geometry. Another possible variation in geometry lay in the treatment of the NH2 groups.666 -0. d MP2/6–31+G* transoid geometry.66 -2.312 H(anti) 0.312 0. thiourea EPD/6–31G* REPD/6–31+ G*b REPD/6–31+ G*c REPD/6–31+ G*d a b acetamide a EPD/6–31G* REPD/6–31+ G*b REPD/6–31+ G*b REPD/6–31+ G*b a ∆GwA T 5.5. SETUP OF THE HOST-GUEST SYSTEM 40 Table 3.458 -0. c MP2/6–31+G* planar geometry.452 0.418 -0. All three geometries are pictured in Figure 3.102 0. . For urea it has been observed that three possible geometries exist since the NH2 groups are not strictly planar and pyramidalise slightly. Compared to the planar geometry.528 -0.029 4. while planar ignores this effect and averages the hydrogen geometries.01 D.92 3.011 0.290 0. HF/6–31+ G* geometry. cissoid has both NH2 groups facing the same way.404 -0.559 0.97 0.301 µ/D 6. However. This led to another improvement in the ∆ log(PT A ). a transoid thiourea geometry did exist.08 ∆ log(PT A ) -2.34 to 6.45 ∆Gc A T 2.79 2.431 -0.406 0.12 HC µ/D 0.26 acetamide EPD/6–31G* REPD/6–31+ G*b a b a C O N C H(syn) 0.39 6.349 H(anti) 0.099 -0.616 H(syn) 0.366 0.34 6. transoid urea has one facing up and one facing down.746 -0.CHAPTER 3.01 0.150 3. now giving −1. it had a slightly lower energy and gave a Table 3. The cissoid geometry did not appear to exist for thiourea.502 -0.04 -1.284 0.30 1.2: Charges and Dipole Moments for Thiourea and Acetamide Using a Range of Basis Sets and Geometries.432 0.61.326 S -0.39 HF/6–31G* geometry.570 -0.92 0. HF/6–31+ G* geometry. the dipole moment from 6.3: ∆GT A in Water and Chloroform Using Charge Sets Derived with Various Basis Sets and Geometries.649 -1.319 0.61 -0.01 5. as is assumed for standard force fields.978 -0.075 0.416 N -0. d MP2/6–31+G* transoid geometry.

As mentioned in Subsection 3.1. These dihedrals were unusual. dihedral parameters had to be derived for some of the dihedrals in the host and guests as listed in Table 3. Calculation of Ab Initio Energy Profile. Hence this was the geometry used to derive charges for the N. ∆ log(PT A ) became −0. either because they lay at junctions between different functional geometries or involved thiourea.3 3.2.1 Dihedral Parameterisation. 3.1. since the high polarity could be solely due to the use of a planar geometry.1.N dimethylthiourea moiety as given in Table 3.39.12 D. Given that ∆ log(PT A ) was so sensitive to such a small change in geometry and that two plausible geometries bounded the experimental result.3. cissoid and planar geometries for urea. . as seen in Table 3.28 D. Hence the MP2 geometry is still necessary. consistent with all other force field assumptions.CHAPTER 3. a HF/6–31+ G* transoid calculation on thiourea gave a dipole moment of 6. more positive now than the experimental value. much lower dipole moment of 5. The need to use MP2 geometries was then questioned. However. it was decided that the best geometry would be a planar MP2/6–31+ G*.1.5: The transoid. SETUP OF THE HOST-GUEST SYSTEM O O N H H 41 H H N C H N H H H N C cissoid transoid H H N C planar O H N H Figure 3. Therefore the geometry that would give the “best” result would in theory lie between planar and transoid. The prototype amino acid derivative used here was N–Ac–alanine. still far too large. There was an OPLS parameter for D9 but it was felt that this important dihedral determining the cis–trans energy difference should be reparameterised explicitly.

symmetry was used to reduce the number of ab . The ab initio calculations were done using Gaussian 94.103 The energy minima were calculated by optimising the molecule at the HF/6-31G* level.CHAPTER 3.N -dimethylthiourea H H H H N. Where possible.N -ethylmethylthiourea H D6 C H D6 C H C C H C C C H C C C H H C H C C H H H C C H H C D7 H N D8 H C C C H C C H C H H O diaryl methane H N–benzylacetamide H O H C H C H H C D9 N H C H D10 O D11 C O N–Ac–alanine Figure 3. The five molecules used for this parameterisation are pictured in Figure 3. The ends of this molecule were typically capped by hydrogens or methyl groups.26 These points correspond to maxima and minima. while the maxima were obtained by finding the transition states. In the parameterisation process a neutral fragment molecule was selected that was large enough to enclose not only the dihedral in question but all local functionality to minimise truncation effects.6: The five molecules used for parameterising the dihedral parameters in macrobicycle 12 and the amino acid derivative. SETUP OF THE HOST-GUEST SYSTEM H 42 H S N H H H H C H H C H C D1 D2 D3 D4 D1 D2 D3 D4 H S N C H H H C H H N C H N D5 C H H N.6. A conformational energy profile for each dihedral was constructed by calculating the energy of an ab initio optimised structure at a few important values of dihedral angle. The guest was small enough to be treated as a whole.

The resulting Fourier series is due to all unknown 1–4 dihedral atom pairs about the central bond. 2. For example. and bond stretching contributions. in all but two cases it was clear from the energy profile that only one or two terms were adequate. and in any case. called dihedral driving.3. The difference between these profiles should be reproduced by the dihedral parameters to be derived.2 Fitting a Fourier Series to the Energy Profile. The test is that frequencies of all normal modes for minima should be real. This was done according to the following rules based on common OPLS practice. even energy differences have been shown to change with ab initio method. SETUP OF THE HOST-GUEST SYSTEM 43 initio calculations.CHAPTER 3. Firstly. The presence of minima and transition states was confirmed by normal mode analyses at these geometries to check the local potential energy curvature. by first optimising with the dihedral constrained to an intuitive value near the transition state. This creates a second energy profile. known dihedral. A force field calculation was then performed for the dihedral. one should be imaginary. This was done using BOSS. Hence the Fourier coefficients had to be partitioned into components due to each pair. they are practical for calculations on up to 25 atoms. While the most accurate results are not expected for this level of theory and basis set.105 At a range of dihedral angles from 0 to 360◦ the energy due the rest of the force field was calculated by optimising the whole structure with the dihedral constrained to the desired value. The three term Fourier series given in Eq. while for transition states. any terms involving hydrogen except those in amide . The transition state calculation was often problematic and usually had to be performed in two stages. While including three terms would always give the best fit. angle.N -dimethylthiourea104 reveal a 0. Vij . previous ab initio calculations on N. However.8 kcal mol−1 change in the energy difference between the two minima on going from HF/6–31G* to higher level MP2/6–31G*.4 was fitted to this difference to produce new Fourier coefficients. This energy is made up of changes in non-bonded. 3. the error will be substantially reduced since only energy differences are required. and then using the program’s transition state search option.

This can be seen for D2. The resulting fits to ab initio are shown in Figure 3. the coefficients were split evenly between all components. if a non-zero Vi1 coefficient was present. D3. D6. Bearing in mind that known dihedrals do not contribute to the fitted coefficients. There were a number of additional complications.7: The conformation energy profiles for the parameterised dihedrals. leaving only D3 and D4 as hydrogen-containing dihedrals. D9 and D10 could be assigned the whole values of the coefficients. it was placed entirely in the main chain term.7. Otherwise. This assumption significantly reduced the number of terms in most cases.3. D3 and D4 only contain a Vi2 coefficient. D2. while D1. D5. 3.CHAPTER 3.D2 D3. For the remainder. or sulfonamide bonds were assumed to be small enough to be set to zero.5 0 10 5 0 10 5 0 0 90 180 D8 0 20 10 0 10 5 0 270 360 0 90 dihedral angle / degrees D9 D10 D11 180 270 360 Figure 3. This was the case with the Vi2 terms for D1.D4 10 5 0 3 2 1 D6 D7 Energy / kcal mol 0.3 Parameterisation Complications. SETUP OF THE HOST-GUEST SYSTEM 10 5 0 1 −1 44 D5 D1. all molecules in this study contained not one but two or more unknown dihedrals. Note the different energy scales. iterative pa- . Firstly. Consequently. D4. D7 and D11.

The necessity for using fully relaxed ab initio geometries was no more evident than for preliminary calculations on thiourea.CHAPTER 3. For example. then the points that were considered the least important were discarded altogether from the fit. The barrier to rotation of a NH2 group was 10. The different ab initio points are evident . The priorities of the dihedral parameters were that they reproduce well the ab initio energy minima. but never to a different energy. Dihedrals derived in this way would ensure that the force field would reproduce ab initio energy barriers correctly. such as 0◦ or 120◦ . This ensured that the energy barrier remained the same. but rose massively to 18. for the D7 dihedral. and vice versa. such as forcing aromatic rings to be rigid. it was common to replace a closely spaced double minima by one minimum at the centre. One had the D8 dihedral at ∼90◦ and the other at ∼270◦ . However. Therefore a number of approximations were necessary to achieve this. fitting one and then applying to the other. A third complication was the degree to which the Fourier series could be made to fit the ab initio points. SETUP OF THE HOST-GUEST SYSTEM 45 rameterisation was necessary. force field-type geometry was enforced on the NH2 group. For symmetric profiles. A fourth complication arose due to multiple minima due to conformational freedom in other dihedrals in the molecule.5 kcal mol−1 if a fully flexible geometry was used. until they all gave consistent reproduction of their individual energy profiles with the same parameters. The second point concerned a difference in the treatment of degrees of freedom by ab initio calculations and the force field. If the fit was still very bad. two closely spaced energy profiles existed. followed next by the energy barriers. However. for the force field dihedral driving to be consistent with the force field used in the simulations. Priority for retaining points was given to minima over transition state values as can be seen for D6. The ab initio calculations were performed using fully flexible geometries to produce realistic energies. then the points were moved to the nearest critical angle. for example. This was often the case for highly irregular energy profiles such as D10. if the ab initio points differed by a few degrees from a critical value for a particular Fourier term. the geometry was constrained. Where possible the remainder of the molecule was placed in the minimum energy conformation.8 kcal mol−1 if a planar.

H5 H3 0.5° 1. The Z-matrix.6 Since this work used a united atom model.0 120. SETUP OF THE HOST-GUEST SYSTEM 46 in Figure 3. the coordinates were converted into OPLS Z-matrix format. The original geometry for macrobicycle 12 was taken from the pdb file used in the previous modelling of this molecule.4 3.CHAPTER 3. The file contains information about the molecule’s geometry. degrees of freedom to be sampled and other information particular to the molecule.5 3 4 4 180. Therefore.4.090 Angle Dihedral 1 2 2 2 108.090 Å O2 C1 1.364 Å H4 Number Atom 1 C H6 2 O 3 H 4 H 5 H 6 H Bond 1 2 1 1 1 1.5 109.0 Figure 3. Finally.945 Å 109.1.8: Structure and Z-matrix for methanol. particularly with regards to residue definitions and designation of degrees of freedom.090 1. All parameter types and geometries were assigned values either from OPLS–AA or from the derived parameters in Table 3.5 109. In this case. both of which play a major role in the efficiency of MC sampling and computations.0 -120.090 Å 1.5 109.1 Structural Setup. An example Z-matrix is illustrated in Figure 3. the overall profile was taken as the minimum of both profiles.364 0.090 1.8.7. Which of these two profiles was lower depended on the value of D7. angle and dihedral to three other atoms already defined in the Z-matrix.945 1. hydrogen atoms were added to make the model all-atom. parameter types to be used. . The fundamental principle of the Z-matrix is that the coordinates of all but the first three atoms are defined by a bond. the way the Z-matrix is defined is vital to the success of the whole simulation.090 Å 1. how residues are assigned. producing cusps in the energy profiles and the fit was made at the average value of the two angles from each profile. Z-matrix construction concerns not simply the initial structure at the start of a simulation. 3.5° 108.

In a Z-matrix. it may be just as worthwhile defining explicitly those that are not expected to vary. particularly useful for bonds. In condensed phases or ring systems. Thus the Z-matrix coordinates defined in this way are ideal candidates for MC moves since they allow the direct alteration of a bond length. it can lead to quite large displacements for atoms distantly connected to the moving atom. For larger molecules. these three atoms may be used to define whole molecule rotation and translation moves. The choice made determines which particular bonds. This approach is quite different to cartesian coordinates. SETUP OF THE HOST-GUEST SYSTEM 47 Note that the second atom only contains a bond. This allows degrees of freedom to be constrained.2 Residue Definitions. Thus large moves can be made for a low energy cost. There are many possible ways to define a Z-matrix.CHAPTER 3. angle or dihedral while minimising the distortion in other internal coordinates. 3. It is usually beneficial to define explicitly in the Z-matrix those coordinates that can vary the most.4. However. The partitioning of a molecule into residues for MC simulations is traditionally done for proteins to allow only small parts of the protein to be moved at a time. The main criteria used to determine whether bonds. a more efficient process than each one moving individually. this is done by defining atoms in a . y and z coordinates. angles and dihedrals are sampled in MC moves. for which every atom is defined independently by x. The strength and weakness of the Z-matrix is that all other atoms defined with respect to a moving atom will move with it. while the third contains a bond and an angle. even if they are still connected by covalent bonds. On the other hand. large displacements typically lead to large energies and subsequent move rejection. It involves effectively breaking up a large molecule into smaller molecules. Thus parts of a molecule can move together over quite large displacements. Since all other atoms depend on the first three atoms. the means of achieving a compromise between these competing effects is to use residues. angles and dihedrals are explicitly defined for sampling in the Z-matrix is the likelihood that they will vary much during the simulation with minimal increase in the energy.

This is particularly important when taking into account computational considerations since moves of only a small number of atoms are much cheaper to perform.CHAPTER 3. The second benefit of residues is that they ensure that only a small number of atoms are defined with respect to any other atom and that these atoms are spatially close.3 Residues and Their Application to MC Moves. dramatically reducing the chances that that move would be accepted. moves of significant size for small residues have a better chance of being accepted than large moves of the whole molecule. While there are no natural dividing points such as the amide bonds for proteins. In other words. For example. This same philosophy was adopted for macrobicycle 12. the tradeoff between larger moves and high MC acceptance moves is more favourable. there would only be a minimal change in the coordinates of other dependent atoms.N - . if the molecule is broken up into two butane units. These can be categorised as one N. the other end is likely to move a long way from its original position in cartesian space and will most probably collide with another molecule. The downside to this is that the connecting bond between the butane units can become strained and the sampling of the dihedral and angles about this bond can be reduced since they are no longer defined explicitly in the Z-matrix. it was possible to break up the host into nine roughly equally sized residues. This further improves the acceptance probability for a move. and tiny moves are ineffective in sampling configuration space quickly and extensively. MC moves now only adjust the variable degrees of freedom in one residue at a time. On the other hand. consider the molecule octane with carbons numbered starting from one end. If all the dihedrals are altered.4. this problem would be reduced. since residues are smaller than the whole parent molecule. Thus if any atom in the residue were moved. 3. Firstly. reducing the scope of the move. SETUP OF THE HOST-GUEST SYSTEM 48 residue independently of atoms in all other residues. There are two primary advantages of the residue partitioning. it would either have to be tiny or else never be accepted. If a move involving the whole molecule were to be attempted.

two amide. Identical residues are shown in the same colour. SETUP OF THE HOST-GUEST SYSTEM 49 Figure 3. This partitioning is shown in Figure 3. The atoms in each residue were then defined exclusively with respect to these three dummy atoms. two hydrocarbon.9. with the exception of the bonds connecting the centre three dummy atoms to the residue anchors. was taken as the one closest to the centre of the residue to minimise large amplitude displacements of distant atoms.3. Atoms within each residue were defined in the Z-matrix according to the criteria described earlier. All .9: The nine residues of macrobicycle 12. It was decided that all bonds would be fixed at their reference values. and the residue-connecting bonds which get sampled indirectly.3. A dummy atom was placed at the intersection of the two C–N bonds of thiourea as will be explained in Section 6. two benzamide. and two toluene residues. the residue anchor. three dummy atoms were placed at the centre of the host as the first three atoms in the Z-matrix. The thiourea unit was also made to be rigid and planar for reasons that will be discussed in Section 6. The guest was taken as a single residue. dimethylthiourea (thiourea.CHAPTER 3. The bonds and angles relative to the three centre dummy atoms were assigned a force constant of zero. for short). The first atom of each residue. All aromatic rings would be held rigid and planar. So that all residues could be defined separately.

The simulation package used in this work was MCPRO. MCPRO uses a less stringent residue-solvent cutoff radius rather than a full . Additional features were required for the macrobicycle 12 system that were not implemented in MCPRO.CHAPTER 3. and the inclusion of the GB/SA solvent continuum method. followed by the random change of all variable degrees of freedom in that residue.32 The original intention had been to use BOSS. This allows methylene. A host residue move involves the random selection of a residue.5 Simulation Code Customisation and Optimisation.105 since this program was supposedly more suited for small molecule simulations while MCPRO was designed specifically for protein systems. and planar groups to move as complete units without distortion. Having defined the residues. all attached atoms should be defined to follow it. aryl. It also contained an energy updating routine that was more efficient for large systems than the complete energy calculation used in BOSS. The first change that required a considerable coding effort was to non-bonded cutoff protocol for the solvent-solute energy calculation (see Subsection 2. If the dihedral for a particular atom changes. it is now important to make clear exactly what the MC moves are. However.4 for details). It allowed the use of residues. MCPRO was chosen for two reasons. SETUP OF THE HOST-GUEST SYSTEM 50 other angles and dihedrals were allowed to vary and so were defined in the Z-matrix so as to maximise the chance that they could change significantly. implementation of new MC moves. A regular solute move involves random changes in all variable degrees of freedom in the guest as well as some random translation and rotation of the whole molecule. the energy calculation for perturbed solutes. This is particularly important for dihedrals. the use of a “greater residue”. 3. Therefore. the benefits of which have just been described.1. a number of alterations were made to the MCPRO code. These included modifications to the solute-solvent energy calculation.

Owing to the computational expense of more sophisticated electrostatic treatments or larger cutoff radii. given the large number of charged residues in the neighbourhood of each solvent molecule. a full molecule-solvent cutoff radius was implemented for the host. the current procedure was retained. If a strict residue boundary is used. The energy for the rest of the molecule remains the same. this effect may be more significant for a smaller molecule with fewer residues. so this energy problem of solvent molecules suddenly seeing a charge still occurs to some extent. One area that was pinpointed for optimisation was the calculation of energies for perturbed molecules. Some of the residues in macrobicycle 12 are indeed charged. Thus relative movement of the two residues can lead to distortions for all bonds. angles and dihedrals that lie across the residue boundary. The difficulty with this is that if residues are charged. Only those solvent molecules within the cutoff range of the residue see that residue. these boundary atoms will not be defined in the Zmatrix with any relationship to atoms in adjacent residues. The feathering of the potential that is commonly used at the cutoff radius to soften the discontinuity in energy was turned off due to various complications that arose from the new cutoff procedure. this effect is assumed to largely cancel. It should be pointed out that the guest solute has a charge of -1. For proteins. This problem can be partly eliminated by allowing certain atoms at the boundary such as hydrogens to be defined with respect to both . then large discontinuities in energy may be expected when solvent molecules move across the cutoff radius boundary. By reusing these constant energies. The use of residues can lead to problems for the atoms at residue boundaries. most energy terms did not have to be recalculated for the perturbed molecules. However.CHAPTER 3. However. This procedure makes simulations for large proteins practicable by reducing the total number of non-bonded interactions. only the part of the molecule that actually perturbs requires a new energy calculation. MCPRO and BOSS calculate energies for perturbed molecules in exactly the same way as for the reference molecule energy calculation. SETUP OF THE HOST-GUEST SYSTEM 51 molecule-solvent cutoff. Therefore.

10: The definition of the hydrogens in the Z-matrix at the boundary between the thiourea and hydrocarbon residues (delineated by the boxes). Hence they must be added to the moving residue to form a “greater residue” for which all energies are reevaluated. with respect to which atoms the hydrogens at the residue boundary between the thiourea and hydrocarbon residues are defined to achieve this. A number of changes were made to the MC move maximum amplitudes for bonds. Maximum amplitudes for each dihedral were individually optimised to give 40 % acceptance probability.10 shows. Finally. residues. primarily as a consequence of the use of the residues.CHAPTER 3. This scaling is necessary because the more degrees of freedom that change. SETUP OF THE HOST-GUEST SYSTEM C 52 Z−matrix: H 5 6 7 8 bond 2 2 3 3 angle dihedral 1 1 4 4 3 3 2 2 H H C H 4 H C H 7 H C 3 H 8 H H C C N C N 1 2 C H H 6 H S 5 Figure 3. a large number of modifications were made to improve the sampling of the system. angles and dihedrals. These included the conrot. The maximum amplitudes themselves for each move were scaled down according to how many degrees of freedom were involved in the move.106 flip and large dihedral MC moves for the . for example. the smaller should be their maximum move size so that the subsequent energy change is not too large. This ensured that dihedrals in the Z-matrix with the greatest ability to vary were given sufficient opportunity to do so. they will also move. If either adjacent residue moves. Figure 3.

structural setup and simulation code customisation and optimisation. some of the issues involved in parameterisation and MC sampling require further discussion. and the GB/SA continuum solvent model33 for the solvent. The REPD charge derivation method and its testing are described in the next two chapters. 3. special combinations of translation and rotation moves for the guest.CHAPTER 3. SETUP OF THE HOST-GUEST SYSTEM 53 host. However. . These methods are described in Chapter 6.6 Conclusion. while the final issues addressing MC sampling are elaborated on in Chapter 6 before commencing on the free energy calculations themselves. Most of the preparations necessary for calculating the free energies on the macrobicycle 12 system have been described. These include parameterisation.

thiourea’s water/chloroform relative partition coefficient with acetamide would be compared with experiment. Following convention.1 4. The Use of Charges and Methods to Derive Them.1 Partial Charges in Force Fields. since they were unavailable in the OPLS force field (see Subection 3. the comparison faired poorly and the charges produced were of much greater magnitude than typical OPLS charges. variation of these alone proved insufficient to obtain a reasonable comparison.Chapter 4 Partial Charge Methods The initial motivation for this work was the need for partial charges for the thiourea unit in macrobicycle 12.4. charges were to be fitted to the molecular electrostatic potential (MEP) calculated at the 6–31G* level. as described in Subsection 3.1.107–112 In order to validate the use of such charges. The simplest. most widespread method of representing the electrostatic interaction between molecules in a computer simulation is to approximate it with a Coulombic potential between sets of atom-centred charges (see the OPLS force field in Subsec- 54 .17 4.1.1). However.2. This necessitated a study of the use of charges in force fields and various methods to derive them. While geometry and basis set were two factors that had a strong influence on the partition coefficient. Therefore a method was sought that produced charges that could reproduce the relative partition coefficients and be more OPLS-like.

129 However.124 Alternatively.129–132 as in.117–123 Although extensions of the basic method to overcome these problems are available.CHAPTER 4. This procedure is used in the popular OPLS force field. The deficiencies associated with this method include the monopole approximation itself. its conformation.113–115 the absence of explicit polarisation in their parameterisation. correct way to derive these parameters. charges may be obtained from the electron densities derived from X-ray diffraction studies. for example). Charges can be derived from either experimental or theoretical data.3. Of the methods using experimental information.125 while the atomic polar tensors method can use either.126–128 There is a large collection of purely quantum mechanical methods that partition the electron population of each atom according to the calculated molecular orbitals. and the calculated charges are basis set dependent and generally give a poor reproduction of the MEP. 133 An alternative approach is to optimise charges . Indeed.25 Hybrid techniques also exist that use both experimental data and quantum mechanical calculations such as the charge equilibration method. the Mulliken population analysis. the use of charges is extremely common in the simulation of molecular systems. there are a wide variety of methods to parameterise charges.108. PARTIAL CHARGE METHODS 55 tion 2. being almost ubiquitous for biological molecules. charges may be optimised to reproduce structural and thermodynamic experimental data for pure liquids or aqueous solutions of a molecule. This is in part historical. but is also largely due to the computational expense resulting from the increased complexity. the partitioning is somewhat arbitrary.1. Even if these extensions cannot be directly introduced. implementation and reparameterisation required to incorporate the extensions.116 and the dependence of charges on the geometry of the molecule from which they were obtained. and consequently there is neither a single set of parameters capable of reproducing every piece of data. and in particular. The use of charges in a computer simulation is an approximate method for modelling electrostatic interactions. for all of these. for example. including an implicit treatment of them in the charge derivation method is desirable. nor is there a unique.

simplest alternative. Secondly.98 or. However. is practical for even quite large . the charges for a given functional group may not always be transferable.135–137 4.1. alternatively. because OPLS charges are not available for all types and combinations of functional groups. another reliable and practical method must be used.2 Advantages and Disadvantages of OPLS Charges. but in essence the method works by calculating the charges that produce an electrostatic potential that most closely matches the MEP at a number of points spaced evenly around the molecule. The calculation of these Electrostatic Potential Derived (EPD) charges is described in more detail in Subsection 4.1. extensive computer simulations are needed.2.3 Advantages and Disadvantages of EPD Charges.107–112 the molecule’s electric field134 or a distributed multipole potential. whereas use of the electron density is not. Thirdly. the MEP. Using the MEP to calculate charges is therefore a possibility. the monopole approximation becomes less severe at these longer ranges. PARTIAL CHARGE METHODS 56 to reproduce various quantum mechanical quantities such as the calculated interaction energy of a collection of molecular complexes. particularly for two adjacent polar groups. The procedure is easy to implement. Since charges are used to model intermolecular interactions. the widely used MEP procedure seems the next best. a simple molecule containing that functionality is required for which there is experimental data either of the liquid phase or an aqueous solution. Since the OPLS method optimises charges to reproduce condensed phase properties. they should be optimised to reproduce molecular data in regions lying outside a molecule’s van der Waals volume. for each chemical functionality. they are arguably the parameters of choice for condensed phase simulations. 4. Therefore. Thus. Furthermore. implicit treatments of polarisation and conformational averaging. the OPLS method is best able to address the problems previously discussed by including approximate. In addition. there are a number of limitations in the OPLS method. Firstly.1.CHAPTER 4.

Thirdly. raising difficulties if a transferable pa- .CHAPTER 4.139 Secondly. most importantly.111 This essentially implies that the under-determined atoms may adopt charges over a wide range of values without significantly affecting the charge electrostatic potential (CEP) and thus the quality of the fit to the MEP. some degree of polarisation is required to model that induced in the condensed phase. The charges of buried atoms contribute less to the CEP in regions outside the molecule than surface atom charges. and also larger than the equivalent OPLS parameters. PARTIAL CHARGE METHODS H 57 H C H H C O N H Figure 4. This manifests itself in a number of ways. this is introduced in an ad hoc fashion through the use of a low-quality basis set such as 6–31G*. This is particularly the case for atoms with three or four neighbours. Both of these facts allow the buried atom charges to vary without significantly changing the CEP nor the charges of its neighbours.1 for acetamide. can be applied to any functionality. Firstly. Typically. the MEP generally does not provide enough information to determine statistically valid charges for all atoms in a molecule. Nevertheless. EPD charges can be large in magnitude.1: The highlighted carbons and nitrogen are the atoms with buried charges in acetamide. Buried atoms also have a greater number of neighbours. as demonstrated by a singular value decomposition analysis by Francl et al. The buried atoms are illustrated in Figure 4. EPD charges often do not accord with chemical intuition.138 although it is possible to calculate charges in a polarisable continuum. This raises compatibility problems if some combination of the two force fields is desired. commonly termed ”buried atoms”. a number of difficulties still exist with the use of EPD charges. molecules and.140 larger than chemical intuition would suggest. EPD charges can also vary significantly for chemically similar atoms in a given homologous series.

ri . are calculated by performing a least squares fit between the CEP. and the subsequent development of the REPD method. an analysis of its attributes.1) where rij is the distance between ri and atom j with charge Zj . obtained from a quantum mechanical calculation. EPD charges often have quite significant conformational dependence. and ρ(r) is electron density at each point. 4. r. in space. Given these difficulties with the EPD charges and the unavailability of OPLS charges for some functionalities. as is often found for C–H bonds. It is called the Restrained Electrostatic Potential Derived (REPD) charge method. 4. the polarity of EPD charges does not always correlate with electronegativities. ideally a method needs to be developed with the speed and flexibility of the MEP procedure that generates electrostatic parameters of comparable magnitude to the OPLS force field.1 The EPD Charge Method. Charges.140 and uses to our advantage the statistical indeterminacy of buried charges. The EPD charge method produces charges in the following way. What follows is a description of the EPD method. this should not be unexpected given that EPD charges are derived from the MEP and not the charge density. The method developed in this work seeks to do just that.2. The MEP may be calculated at any arbitrary point in space. PARTIAL CHARGE METHODS 58 rameter set is desired.2) . Fourthly.CHAPTER 4. qj . Furthermore. using the equation Vi = j Zj /rij − ρ(r) dr |ri − r| (4. given by M Vi = j=1 qj rij (4.17 It is based on the RESP (Restrained ElectroStatic Potential) procedure of Bayly et al.2 Development of the REPD Charge Method.117–123 a problem exacerbated by their large magnitude. However.

CHAPTER 4. rik (4. the vector of charges.4) which can be easily solved in matrix form for q. To find charges that minimise χ2 in Eq. setting the derivative of χ2 with respect to each charge to zero produces M equations linear in qj of the form ∂χ2 = −2 ∂qk N i=1 1 rik M Vi − j=1 qj rij =0 (4. 4. PARTIAL CHARGE METHODS 59 and the MEP at N points spaced evenly around the molecule. Aq = b where the elements of A are given by N (4.6) and the elements of b by N bk = i=1 Vi .3) is to be minimised. That is.3. N χ2 = i=1 (Vi − Vi )2 (4. M is the number of atoms in the molecule. defined by N 1/2 RRMS = χ/ i=1 2 Vi2 (4. the following χ2 function.5) Ajk = i=1 1 rij rik (4.7) The quality of the fit can be assessed by the relative root mean square (RRMS) of χ2 .8) .

139 -1.2.110. Charges were generated for a range of basis sets. extra polarisation functions for hydrogens make neglibible difference.451 0.789 -0.81 0.255 1.153 1.1) found that the 6–31+ G* basis set with its slightly larger dipole . basis set permitting.76 0.464 4.61 0.222 -1. The results. The calculations were performed using GAMESS141 with the Connolly109 point selection scheme.482 0.485 4.218 -1.09 0. Geometry from diffraction data. but the dipole moment obtained is the highest of all of the methods.237 -1.1: Urea Charges for Various Basis Sets.62 0. and. PARTIAL CHARGE METHODS 60 Table 4. The charges produced using the experimental geometry are similar to the ab initio ones. The first feature examined was the basis set and ab initio method used to calculate the MEP.CHAPTER 4.695 -0. The almost identical HF/6–31+ G* geometries have similar properties and are only moderately more computationally demanding to calculate.153 -1.18 In each case.130 1. unlike experimental geometries.2 Basis Set. adding polarisation and diffuse functions and the HF method versus MP2. Later free energy work (see Section 5.492 4.480 0. are readily obtainable for any common molecular functionality.465 4.463 H (anti ) µ/D 0.095 O -0.477 0.142 4.480 5. charges and geometry are calculated using the same basis set and method. Ab Initio Method and Geometry. and urea was forced to be planar.451 0. are given in Table 4. together with dipole moments.105 H (syn) 0. Diffuse functions increase the charge magnitude and dipole moment marginally.740 -0.455 0. The choice between basis set and geometry was made primarily for practical reasons. It can be seen that there is little variation in charge beyond the 6–31G* level.460 4.741 -0. 138 are easily calculated for small molecules. The same basis set and level of theory were used for both the optimisation and the MEP calculation.708 N -1.80 0.161 1.697 -0.314 1.1 for the molecule urea. while MP2 charges are almost identical to HF ones. Method/Basis Seta HF/6–31G HF/6–31G* HF/6–31G** HF/6–31+ G* HF/6–31+ G** MP2/6–31G* experimentalb a b C 1. HF/6–31G* charges and geometries have been found to agree well with experiment. Methods and Geometries.688 -0.136 -1.449 5.266 1. µ.

4. Furthermore. Figure 4. The CHELP method110 places points sparsely around a number of concentric spheres centred on each atom. Connolly’s method109 places points much more uniformly on spheres. beyond some distance from the van der Waals volume of the molecule but not too distant either. The geodesic method143 places points even more uniformly on spheres according to the tesselation of an icosahedron.2: Fitting points (Connolly’s method) around acetamide at which the electrostatic potential is calculated. Therefore. There are four common methods used to select points.2 shows the point spacing around acetamide. CHELPG122 places points on a cubic grid. Typically.3 Fitting Point Method. This affects what parts of the MEP one considers are important for the EPD charges to reproduce. points are placed in regions considered important for molecular interactions. The CHELP method was .CHAPTER 4. inclusion of diffuse functions is recommended for large atoms such as sulfur. PARTIAL CHARGE METHODS 61 Figure 4.2. moments reproduced experimental results better. the ab initio charges and optimised geometries using both HF/6–31G* and HF/6–31+ G* were considered. namely. The selection of points was another candidate for influencing charges.

2: Urea Charges for CHELPG. The RRMS at this density is calculated to be 0. On the question of point density. the number and spacing of spheres.462 4.465 4.101 0.432 1.109 and only improves to 0.146 0.3 shows the variation of the EPD/6–31+ G* charges for acetamide with the point density used to calculate the charges. At densities above 1 point ˚−2 . Figure 4.CHAPTER 4. the charges produced by the CHELPG method itself have also been shown to depend on the orientation of the grid with respect to the molecule. approximately 500.455 1.163 -0.111 suggests that the MEP does not contain sufficient information to produce such charges regardless of the number of points. While it has been suggested that A thousands of points are necessary to obtain well-defined charges.67 0.446 4.64 shown to produce charges that varied widely with how the points were spaced122 and was not further considered.453 H (anti ) µ/D 0.153 0. Point Selection Method CHELPG Connolly Geodesic C O N H (syn) 1.152 -0. Connolly and Geodesic Point Selection Schemes.108 at the much higher density of 100 points ˚−2 . the quality of the A fit to the MEP is still adequate at point densities of 1 point ˚−2 .161 -0. All of these made negligibile difference to the charges produced as long as the number of fitting points was not too low nor the points too close to the van der Waals surface. and the distance of the closest and furthest spheres.143 The charges produced by these three methods for urea are given in Table 4. Point densities were chosen for each method to give roughly equal numbers of fitting points. the charge variation is insignificant for the purposes of reproducing the A MEP. A number of aspects of the Connolly method were varied to determine their effect on charges.700 -1. The Connolly method was adopted for this work since it is most widely used. From this table little difference can be discerned between the charges produced by any of the methods.705 -1. However. PARTIAL CHARGE METHODS 62 Table 4. A number of other schemes that sampled points at different distances according .122 the SVD analysis of Francl et al. These included the point density on each sphere. the density used in this work.695 -1.62 0.2. However.

it may affect the values of charges obtained. it is unnecessary to include such close points because the atoms of other molecules are unable to approach to this distance and secondly. are questionable.107 Yet another used least absolute value fitting. charges are calculated using a single sphere of fitting points of different radii. The carbon charge becomes much smaller in magnitude at shorter distances.1. This variation was found later to have a profound effect on the free energy of hydration for benzene (see Subsection 5. again. Firstly.CHAPTER 4.5 −1 −1. dependent on the radius chosen. as well as being unsuitable for modelling elec- q/e .5 0 −0.4 times the molecule’s van der Waals surface.6). no significant difference was observed. Another scheme weighted points in the least-squares fitting using various functions according to the distance from the molecule.5 CH O N −1 0 1 −2 log (point density / point Å ) 2 CO 63 HN HC Figure 4. A final note about point selection is that care must be taken not to include points closer than approximately 1. To illustrate this point. Figure 4.3: Variation of the charges with point density for REPD/6–31+ G* acetamide. and so the conventional Connolly method was retained. This effect is probably due to a severe worsening of the point charge approximation and such charges.4 reveals how the carbon charge of 6–31+ G* benzene is severely influenced by the distance at which the sphere of fitting points is chosen. However. to atom type were tested on the basis that the MEP around some atoms was more important. PARTIAL CHARGE METHODS 1.5 1 0.

462 H (anti ) µ/D 0.464 4. trostatic interactions at longer ranges. the rounding of charges to three decimal places necessitates manual readjustment to ensure that the total molecular charge is correct. The results for urea are presented in Table 4.12 −0.0 Figure 4.455 -1.694 -0.2. implemented using Lagrange multipliers. In addition. 4.4: Influence of the radius of the fitting point sphere (determined by the van der Waals scale factor) on the carbon charge calculated for EPD/6–31+ G* benzene.110 was employed to ensure that the absolute molecular charge is exactly correct. the charge constraint.60 0.4 1.10 q/e −0.693 -0.165 0. PARTIAL CHARGE METHODS −0. Constraint no restraint charge dipole quadrupole C 1.6 1.14 −0.0 1.152 O -0.3: Urea Charges Using Various Multipole Constraints. A number of multipolar constraints can be included in the least-squares fit to reproduce ab initio values.3.153 0.454 -1.465 4. the unconstrained charges capture well the electrostatic moments of the MEP and so multipolar constraints proved unnecessary.60 .161 1. However.16 1.8 2.62 0.CHAPTER 4.688 N H (syn) -1.465 4.4 Multipolar Constraints. The CHELPG method122 has been frequently implemented (GAMESS.08 −0. Evidently. dipole and quadrupole constraints.455 -1.153 0.62 0.150 0.06 64 −0. These include charge.2 van der Waals scale factor 1.160 1.157 1.470 4. In this Table 4.141 Gaussian 94103 ) including these close points.695 -0.

too crude since the dipole moment.140 In the RESP method. This function is added to Eq. However. This method is commonly used to modify charges fitted to semi-empirical MEP’s. one for the initial restraint and one in the averaging of conformationally equivalent atoms (see Subsection 4. PARTIAL CHARGE METHODS 65 adjustment.001. The parameter b determines the tightness of the hyperbola and is given a value of 0. for example. simultaneously reducing the magnitude of the charges.9). On closer inspection. a method was found in the literature that aims to reduce the magnitude of charges by using a restraint.2. respectively. it was found to have negligible effect as seen in Table 4.CHAPTER 4. Since the magnitudes of the charges appeared to be largely independent of parameters used in the EPD method. a number of deficiencies in . a simple alternative method that efficiently reduces charge magnitudes is linear scaling. The restraint exploits the statistical indeterminacy of buried charges in order to bring down their magnitude without significantly affecting the quality of the fit. buried atom charges were given preference since they have less effect on the CEP.1. scales by exactly the same factor.5 Charge Restraining. These values are 0. for which the restraining force is smaller. 4. 4. The parameter a determines the strength of the restraint and takes two values.9) where M is now the number of non-hydrogen atoms. It is called the RESP method.2. though. The hyperbolic restraining function restrains all charges with approximately the same force except for those with a magnitude comparable to b. When the RESP method was applied to urea. In this way the large size of EPD charges noted previously is reduced. charges qj are fitted to the MEP while simultaneously being minimised.0005 and 0.4. however.144 Such a method is.3 which is then minimised. This is accomplished using a restraining function of the following form in the least-squares fit M χ2 rest =a j=1 2 ((qj + b2 ) 2 − b) 1 (4.

58 the RESP method were apparent. hydrogen atoms are selectively excluded from the restraint. PARTIAL CHARGE METHODS 66 Table 4.035 H (syn) 0.433 4.140 Method EP RESP C O N 1. In order to gain a proper understanding of how the restraint affected a large number of different molecules. Three restraining functions that were of particular interest. which surprisingly made charges more OPLSlike.6 A New Restraining Function. A whole host of methods were implemented and tested.161 -0. One was linear.153 1. secondly.695 -1.CHAPTER 4. thirdly. Therefore.62 0.455 0. using different fitting equations to the MEP. arbitrarily assigning charges. These included the choice of restraining function. or even simply halving charges of buried atoms. a restraining method was sought that eliminates these problems.4: Urea Charges by EPD and RESP Methods.653 -1. 4.10) M N 2 ((qj + b2 )1/2 − b) χ2 rest =a j=1 i=1 1 2 rij (4.11) .2. 29 molecules were included in the study. M N χ2 rest =a j=1 i=1 |qj | 2 rij (4.426 H (anti ) µ/D 0. respectively.005 -0. restraining different atoms selectively.465 4. one hyperbolic. Firstly. parameters a and b are somewhat arbitrary. The aim of the REPD method is to restrain the charges to become as close as possible to OPLS values. and one quadratic in charge. the hyperbolic function being similar to that of Bayly et al. the charges do not vary uniformly with a. The general aim in all cases was to observe the properties of each method and to use these to make charges OPLS-like. These functions are. the effect of the restraint is dependent on the selection of points used in the fitting procedure. even if in a rather ad hoc fashion. preferably using as few parameters as possible. and finally.

4. linear (Eq. all other charges suffer discontinuities in their slopes.1. The value of b for the hyperbolic restraint is taken as 0.0004 0.5 after the addition of the restraint now become N Akk = i=1 1 a ) (1 + 2 rik 2|qk | (4.0000 0.5 shows how the charges for acetamide vary with a for each of the three restraining functions.13) N Akk = i=1 a 1 (1 + ) 2 2 rik 2(qk + b2 )−1/2 (4.0 0.0002 a 0. of the A matrix in Eq.5: Effect of the restraint.0004 Figure 4.14) N Akk = i=1 (1 + a) 2 rik (4. Akk .0 −0. The linear restraint applies a constant force to each charge.0004 0.12) Respectively.0002 a 0. This makes the linear restraint unsatisfactory because some charges reach zero before other . When one charge reaches zero. M N χ2 rest =a j=1 i=1 ( qj 2 ) rij (4.0 0.0002 a 0. hyperbolic and quadratic restraining functions.5.0000 O N 0.10). PARTIAL CHARGE METHODS 1. The one feature that all restraints have in common is that they restrain buried atoms the most. the same as for RESP.12). the diagonal elements. Figure 4. 4.0000 0. a.5 67 CO HN HC CH O N CO HN HC CH linear CO HN HC CH O N hyperbolic quadratic q/e 0. hyperbolic (Eq. 4. 4.11) and quadratic (Eq.CHAPTER 4.5 −1. on the 6–31+ G* charges for acetamide for the linear.15) while the off-diagonal elements remain the same as in Eq. 4.

it is rather the overall proximity of the atom to the fitting points that determines the sensitivity of its charge to a restraint rather than its size.CHAPTER 4. polar charges experience a stronger restraint than smaller. 4.5 may be solved in one step. but since the overall goal is to make charges reproduce OPLS parameters. Setting buried charges to zero or some other small value may be a promising way of developing transferable force fields. b.5 shows. are rapidly reduced in magnitude while the well-exposed. for them. One further advantage of using a quadratic restraint is that Eq. The dipole moment and diagonal elements of the quadrupole moment are hardly affected by the restraint. 4. simple linear scaling of 6–31+ G* . A moderate but unavoidable increase in the RRMS accompanies the restraint. quadrupole moment and RRMS vary with the quadratic restraint. The quadratic restraint applies a restraining force proportional to the charge. together with need for the extra parameter.13 and 4. Bayly et al. PARTIAL CHARGE METHODS 68 charges are even mildly affected. the carbons and nitrogen. thus restraining larger charges more. but the restrained charges still adequately reproduce the MEP. Large charges are generally found for polar atoms or buried atoms. The effect of the quadratic restraint on the fit to the MEP is only minor. Eq. Figure 4. the charges of the buried atoms in acetamide.14. polar hydrogens and oxygen charges change much more slowly. From this point on. non-polar charges. Using a hyperbolic restraint produced similar results but with these gradient discontinuities rounded off and obscured. 4. any reference to the REPD (Restrained Electrostatic Potential Derived) method involves a use of the quadratic restraint. decided against using a quadratic restraint because larger.6 illustrates how the dipole moment. The break-through for the quadratic restraining function proposed here was that only one single parameter is found necessary to make charges considerably more OPLS-like. As Figure 4.5 must be solved iteratively. this restraint was not pursued. This. as seen in Eqs. However. Therefore. By comparison. while the off-diagonal elements of the quadrupole moment decrease in magnitude to some extent. This solving process is more complicated for the methods using linear and quadratic restraints since their Akk values depend on qk . ruled out the use of a hyperbolic function.

a. Secondly.00000 0. for on the one hand it could be argued that other well-defined atoms such as carbonyl oxygens might also not require restraining while on the other. hydrogen atoms in large.109 for REPD. whether or not hydrogens are restrained makes little difference to the charges for two reasons. especially significant for atoms with only one neighbour such as hydrogens. the RRMS.245. charges by 0. the charges of all well-defined .0000 −10 Q xy a −15 0. 4.0006 1 0 0.00020 a 0.15 0.0004 0.00 0. it was felt that this exclusion is inconsistent and unnecessary. PARTIAL CHARGE METHODS 5 4 5 69 Q xx µ/D 3 2 0 Qii/ DÅ 0. compared to 0. folded molecules might be sufficiently buried as to require restraining.6: Effect of the quadratic restraint parameter. originally 0. quadrupole moment and the RRMS for 6–31+ G* acetamide. However.0002 0.00060 Figure 4.052. Bayly et al. However.10 0. chose not to restrain hydrogen atom charges because they are already well-defined and therefore do not require restraining. is now 0. on the dipole moment.CHAPTER 4. Thus any change in the charge of an atom neighbouring a hydrogen will affect the hydrogen charge indirectly. Atom type does not necessarily correlate with the need for a restraint.00040 0. the restraint in this work is applied to the charges of all atoms without exception.2. Firstly.Qyz RRMS 0.7 Choice of Atoms to Restrain. there is strong coupling between adjacent atoms.20 Q zz −5 Q yy Qxz. Finally. In either case.761 gives the best reproduction of the corresponding OPLS values for acetamide.05 0.

Consequently. (Eq.16) The restraint term. The benefit of the quadratic restraining function proposed is that it acts independently of point selection. charges calculated as a function of the restraint with and without the hydrogens restrained were found to differ negligibly. are relatively insensitive to the restraint as is evident in Figure 4. This is not only convenient if different point densities are desired. usually including hydrogens. the charges of all atoms are chosen to be restrained.7 shows how RESP/6–31+ G* and REPD/6–31+ G* charges for acetamide vary with point density. One other issue that was addressed was the averaging of geometry and charges. 4. Figure 4. Since the parameter a can be factored into the sum over points used in the fit.5.CHAPTER 4. being independent of N . Atoms that are related by symmetry are constrained to have the same charge. This is not the case for the restraint of Bayly et al. the effect of their restraint is dependent on N . 4. 4. the effect of the additive RESP restraining function is diminished at higher point densities while the multiplicative REPD restraining function acts on all charges similarly for all point densities. Clearly. such as the hydrogens of formaldehyde. PARTIAL CHARGE METHODS 70 surface atoms. but is essential if the restraining function is to act in a similar way on different molecules with different numbers of points. atoms that are not symmetry related but conformationally equivalent should have their geometry and charges constrained to be iden- .9 Charge Averaging. for the reasons described above.9) which has Akk elements given by N Akk = i=1 1 a + 2 2 rik (qk + b2 )1/2 (4.2. To verify this.2.8 Independence of the New Restraint On Point Selection. However. becomes insignificant for very large N . Therefore. the effect of a scales proportionally with Akk as N is varied.

5 RESP charges REPD charges q/e 0.8.000252 for acetamide as the point density is varied.8: The three non-equivalent hydrogens highlighted in methanol.0 0 2 4 6 8 10 point density/ point Å −2 Figure 4.3.5 −1. this was not so for conformationally equivalent atoms. Charge averaging is effectively a constraint on the fitting procedure and consequently can reduce the quality of the fit. Averaging for atoms that were equivalent by symmetry was found to have a negligibile effect on the quality of fit. This issue of averaging conformationally equivalent atoms is merely a special case of general conformational dependence (see Subsection 4. The H H H H H O C H H Figure 4. The objective is to achieve averaging together with the best fit possible.4) in which the new conformation is identical to the old. However.7: Comparison between RESP/6–31+ G* (dashed line) and REPD/6–31+ G* (solid line) charges with a=0. PARTIAL CHARGE METHODS 71 1. Geometry averaging can be introduced as a constraint in the ab initio optimisation stage. . tical if conformational flexibility is allowed in a simulation.CHAPTER 4.0 −0.0 0. An example is methanol which has three conformationally equivalent hydrogens on the carbon as shown in Figure 4.

were considered and tested on methanol. Then. Thirdly charge averaging may be performed directly in the fit. Secondly. This freezing in the second stage prevents the averaging from detrimentally affecting the charges of the rest of the molecule. Fifthly.708 HO H (trans) 0. In this method.027 H (gauche) -0.025 0.129 1. An increase in dipole . as in the first method.130 1.358 O -0.431 -0. Firstly. However.431 -0.024 0. Such averaging is seen to cause a quite a large increase in the RRMS and dipole moment. Table 4.5 shows the results for each method.304 0.428 0. Charge averaging is achieved by adding together the rows for these atoms of the matrix equation (Eq.688 -0.431 0.84 0.026 -0. Geometry averaging was considered essential for usefulness in force fields and consistency if any subsequent charge averaging was to be used and so was retained for testing the next three methods.381 -0.027 µ/D RRMS 1.211 five alternatives similar to the work of Bayly et al. Method no averaging geometry only charge during charge after charge two stage C 0. as seen in the “during” row in Table 4.708 -0.024 -0. a charge fit is first performed without averaging. Dipole Moment and RRMS for Different Geometry and Charge Averaging Methods. while freezing the charges of all atoms not requiring averaging nor adjacent to ones that do. a fitting procedure with no constraints was performed. an averaged geometry may be used. particularly the oxygen and polar hydrogen.CHAPTER 4. Averaging the geometry is seen to have a moderate effect on charges but little effect on dipole moment or RRMS.5.5: REPD/6–31G* Methanol Charges. this method was discarded on the grounds that the final charges are then not optimised to reproduce the MEP.211 2.5).16 0. PARTIAL CHARGE METHODS 72 Table 4.640 -0. the two-stage averaging procedure similar to that of Bayly et al.181 2. followed by averaging of conformationally equivalent charges. no averaging of geometry or charges can be made.348 0.048 -0. was tested that captures the advantages of the previous two methods.708 -0.85 0. 4.140 This charge averaging approach was found to adversely affect all charges.98 0. reducing the dimensionality of the matrix.015 0.15 0.348 0.271 0. a second fit is performed with conformationally equivalent atoms constrained to be the same. Fourthly.041 0.015 -0.

0 times the atom’s van der Waals radius. The new restraint is independent of the number of points used in the fitting procedure.103 Points are selected according to the method of Kollman and Singh. A method has been derived that addresses the deficiencies of the RESP method and achieves the objective of OPLS-like charges. . The full REPD method will now be summarised. One exception to the averaging rule was that amide hydrogens were excluded from the conformational averaging since their barrier to rotation is sufficiently high to keep them distinguishable in a simulation.8 and 2.146 which takes the MEP table from GAMESS as input. a sufficient density to obtain converged charges.3 4.4 times the van der Waals radius of any atom are excluded. 4. The point density on each sphere is 1 point ˚−2 . Ab initio-optimised geometries are calculated at the Hartree-Fock (HF) level with 6–31G* and 6–31+ G* basis sets using Gaussian 94..3. GAMESS141 A is used to generate a table containing the MEP at each point using the same ab initio method and basis set as the geometry optimisation method. and contains one adjustable parameter that is optimised to yield charges in close agreement with the equivalent OPLS parameters for a wide range of molecules. unlike the approach of Bayly et al. is quadratic in charge.4. 1.109 in which points are spaced using Connolly’s algorithm145 on four spheres centred on each atom with radii 1. is applied uniformly to all atoms.CHAPTER 4. PARTIAL CHARGE METHODS moment and RRMS is still however unavoidable. Distances r are in units of Bohr and charges q are in electronic charge units. 1. Summary of the Method.6.1 The REPD Charge Method. All charge fitting is performed using a modified version of the RESP module in the AMBER software package. However. 73 The fifth approach was the one adopted as it seemed to be the best compromise between producing averaged charges while still fitting to the MEP. Points at a distance less than 1. the same restraint a was applied in both stages.

2 Comparison with EPD and OPLS Charges.13 and there exist the previously mentioned problems.3. the slopes of the lines are of course 1.3 Influence of Molecule Set on the Parameterisation. a correlation plot of one restrained charge set against the other (not shown) gives a correlation coefficient of 1.31. with a correlation coefficient of 0. respectively.3. In this procedure. For both sets of REPD charges.00 by definition and there is a good correlation with OPLS charges. Each parameter a was calculated according to the following procedure.1 in Appendix A. 6–31G* or 6– 31+ G*. REPD/6–31G*. the above method was used to calculate charges for a large diverse group of 29 molecules for which there are OPLS charges. even though they were not designed to do this. charges are obtained that both reproduce the MEP and are comparable in size to OPLS. Firstly. The correlation is poorer for unrestrained charges. These are EPD/6–31G*. a cross-validation analysis was used. PARTIAL CHARGE METHODS 74 The values of a for the quadratic restraint with the HF/6–31G* and HF/6–31 + G* protocols are taken as 0. with identical correlation coefficients of 0. The charge sets are given in Table A.97. EPD/6–31+ G* or REPD/6–31+ G*.96.93 for both EPD/6–31G* and EPD/6–31+ G* charges and slopes of 1. the 29 molecules . They differ in their basis set.24 and 1.00 and the expected slope of unity. The parameter a was then varied until the slope of a plot of all unique charges versus their corresponding OPLS values equalled unity. In addition. with a correlation coefficient of 0.000252.000184 and 0.9 shows correlation plots of the set of unique charges calculated for each basis set versus their OPLS counterparts. For comparison. In this way. together with the OPLS charges. 4. Four sets of charges have been calculated. the slope is still 1. and whether a restraining function is used.CHAPTER 4. To determine the dependence of the parameter a on the molecules used in the parameterisation.25 Figure 4. the RESP charges also give a better reproduction of OPLS charges than EPD. 4. Nevertheless. respectively.

9: Correlation plots of REPD/6–31G*.CHAPTER 4. each containing 5 or 6 molecules. were divided up randomly into 5 groups.31 r =0.1.24 r =0. EPD/6–31G* and EPD/6–31+ G* charges versus the corresponding OPLS parameters for the 29 molecules listed in Table A. The average correlation coefficients for the predicted charges with OPLS charges were found to be 0.93 q/e 0 0 −1 1 −1 0 m=1. REPD/6–31+ G*. giving 500 different selections of the 29 molecules. A correlation coefficient between these charges and their OPLS counterparts was then calculated.10 shows how the absolute deviation of .97 qOPLS/e qOPLS/e 1 EPD/6−31G* 1 1 EPD/6−31 G* + q/e −1 −1 0 m=1. For each group. the a parameter was derived from the remaining 4 groups and then applied to the original group to predict charges for the selection of molecules in that group.00 r =0. To provide further insight into how a depends on the choice of molecules. Figure 4. PARTIAL CHARGE METHODS REPD/6−31G* 1 1 75 REPD/6−31 G* + q/e −1 −1 0 m=1. This process was repeated 100 times.93 qOPLS/e qOPLS/e 1 Figure 4.97 q/e 0 0 −1 1 −1 0 m=1.00 r =0.97 for both REPD/6–31G* and REPD/6–31 + G*. the same as the overall correlation coefficient.

000000 0 number of molecules 10 20 30 Figure 4. Table 4.000100 0. a from its universal value varies with the number of molecules used to derive a for the 6–31+ G* basis set.6. as shown in Table 4. giving a slope and correlation coefficient.000020. The closeness of the correlation coefficient. the universal parameter was applied to each functionality. While it is conceivable . PARTIAL CHARGE METHODS 0. no single a parameter is perfectly suited to all molecule types. the gradient to unity.01. The zero value of a for the ethers and sulfides arises because their unrestrained charges are actually smaller than OPLS charges. this dependence is still non-negligible for twenty molecules. Some aromatic molecules are put into two groups. the molecules were divided up into groups chosen according to functionality. Secondly. averaged over 300 random molecule sequences. and by definition.000050 0. reaches zero at 29 molecules. This graph has been obtained by averaging over 300 random molecule sequences.6 reveals that the ethers. In order to test how the a parameter performs for each chemical functionality. and the individual a values compared to the universal a value indicate the appropriateness of the general restraint for that functionality. the carbonyl compounds are not sufficiently restrained. with a variation of around ±0.000150 <|∆a|> 0. While there is a clear decreasing dependence of a on molecule selection as more molecules are used. a specific a parameter was calculated using only the molecules in each group. Clearly.000200 76 0. On the other hand.CHAPTER 4. Firstly. since they contain two of the listed functionalities.10: Variation of the absolute error in a from its universal value with number of molecules used to derive a. This corresponds to an average charge variation of ±0. sulfides and amides are over-restrained by the universal a value.

98 0. acetaldehyde.99 0. EPD charges are larger in magnitude than both REPD charges and OPLS charges. chlorobenzene.019 1. especially for buried atoms. EPD/6–31 + G* charges are generally larger in magnitude than EPD/6–31G* charges. dimethyl sulfide.957 0.000144 0.000200 0. chloroethane.000304 0.940 0.99 0. methanethiol.091 0.CHAPTER 4. b that different a parameters could be calculated for different functional groups.000212 0. diethyl sulfide. PARTIAL CHARGE METHODS 77 Table 4.97 0.000456 0. Thus the method only remains practicable if the universal a value is used. benzoic acid. phenol.1 for all 29 molecules. There are a number of features to note. i Ammonia. Derived by fitting the charges of the molecules in each group to their OPLS counterparts to give a slope of unity.749 0. aniline.000000 0. ethanethiol.000174 0. g Formaldehyde. h Acetic acid. functionality alcohol/thiolc ether/thioetherd amidee aromaticf aldehyde/ketoneg carboxylic acid/esterh aminei otherj a m 0. ethene. f Benzene.125 a 6–31G* ra ab 0.777 0.000000 0.021 1. as mentioned above. benzoic acid. diethyl ether. c Water.97 0. while the charges of surface atoms such as in hydroxy groups decrease only a small amount on restraint.000234 0. r. Thus.066 a 6–31+ G* ra ab 0. a. but it would then be impossible to calculate a values for functionalities not covered by the OPLS force field.000070 0.000354 m 0.95 0. as indeed they are designed to be. Secondly.073 1.6: Slopes.94 0.98 0. aniline. e Formamide.147 1.97 0.905 1.94 0. ethanol. methyl acetate.97 0.98 0.99 0. trans-N-methyl acetamide.000408 0. benzonitrile.000138 0.939 0. acetamide.957 1. phenol.97 0. methylamine. j Methane. acetone.000392 0. The charges obtained using all four protocols together with the OPLS charges25 are given in Table A. and Restraint Parameters.857 0. Correlation Coefficients. methanol. a larger value of a is needed to restrain the larger EPD/6–31+ G* charges to . while REPD/6– 31+ G* and REPD/6–31G* charges are comparable in magnitude with respect to each other and to OPLS charges.974 1. for Each Functional Group. not only would more parameters be being fitted to fewer molecules. As expected. m.98 0.93 0.000450 Derived by applying the universal a value to each group. ethylamine. d Dimethyl ether.000284 0.

CHAPTER 4. however. The first conformation is about the C–N bond.7.4 Conformational Dependence. 1–propylamine contains two significant dihedrals as shown in Figure 4. further comparisons are made between all four charge sets in relation to their dipole moments and free energies of hydration. 1-propylamine is used here as a test case. Along the top of this table are the charge sets derived from each conformation. both EPD and REPD. The diagonal elements indicate that the charge set is being applied to the conformation from which it was derived. In the particular case where charges are applied to their own conformation. Along the side are the conformations to which these charge sets are applied. their corresponding OPLS values. 4.11: The two main dihedrals in 1–propylamine. . 121 Ideally. this variation should be negligible. The results are shown in Table 4.11. For REPD charges compared to EPD charges.5. One final aspect of REPD charges to be discussed is their conformational dependence. The conformational dependence of charges may be assessed by observing how the dipole moment and RRMS of 1-propylamine vary when the charges derived from one conformation are applied to the other conformations. the dipole moments generally deviate less from ab initio and the increase in RRMS is smaller. It can therefore be concluded that the REPD charges are less conformationally dependent and better able to reproduce the MEP for other conformations.3.1.120. PARTIAL CHARGE METHODS H 78 H H C C H H H H N H C H g-g+–1–propylamine Figure 4. the relative performance of EPD and REPD charges is very similar. In Subsection 5.

41 2.61 1.52 1.76 1. respectively. These observations regarding dipole moment and RRMS are mainly due to the fact that restrained charges are smaller in magnitude and so the variation of charges with . suggesting that this charge set is the most suitable for flexible 1-propylamine simulations.63 g +g + EPD REPD EPD REPD EPD REPD EPD REPD 1.CHAPTER 4.41 0.25 1.66 1.26 0.43 0.54 2.60 1.94 test conab a formation initio EPD REPD aa 1.21b 0.36 0.40 1.44 1.36 0.82 1. b Bold numbers indicate that charges are tested on the same conformation from which they were derived.27 0.80 1.14 1.60 1.35 ga 1.37 0.54 1.30 0.74 1. Coincidentally.33 0.42 0.31 0.42 0.42 0.47 2.05 2.06 sum of errorsc 2.10 g +g + 1.24 1.88 1.11 1. The g–g + REPD charge sets requires particular attention.79 sum of errorsc 2.23 0.80 1. Anti and gauche conformations are denoted a and g.89 1.52 1.60 1.34 0.57 1.80 1.05 1.77 1.39 0.09 2.35 0.59 1.68 1.05 1.09 1.45 0.46 2.38 0.70 1.55 1.30 0.7: Comparison of the Conformational Dependence between EPD and REPD Charges in Relation to the Dipole Moments and RRMS of the Fit for 1-Propylamine. d Only the errors and total RRMS are shown.09 6–31G* Charge setd 2.50 2.50 2.56 0.81 1.75 1.08 1.32 0.53 1.51 1.38 0.33 0.36 0.53 2.29 0.90 1.50 1.86 1.31 0.41 0.22 0.67 1.43 0.69 1. PARTIAL CHARGE METHODS 79 Table 4.31 0.00 1.28 0.28 1.62 1.45b 1. Effect on the Dipole Moment 6–31+ G* Charge set aa ag ga g–g + 1.14 2.42 1.29 0.60 test conformationa aa ag ga g–g + g +g + total RRMS aa 0.94 1.06 1.31 0.14 1.64 1.87 1.39 a Conformations described by (lp)-N-C-C and N-C-C-C torsions.76 g–g + 1.27 2.16 1.59 1.24 2.26 1.31 0.38 0.84 2.45 2.43 total RRMS 1.99 1.92 1.74 1.70 0.37 2. c The error is taken relative to the ab initio dipole moment.49 2.33 0.34 0.37 0. The full table is similar to the 6–31+G* results.30 0.16 1.44 1.54 1.54 0.44 Effect on the RRMS 6–31+ G* Charge set ag ga g–g + 1.34 0.28 0.88 1.46 0.32 0.54 2.58 6–31G* Charge setd 1. this is also the charge set with the smallest charge on the nitrogen.25 0.22 0.01 1.75 g +g + EPD REPD EPD REPD EPD REPD EPD REPD EPD REPD 0.39 ag 1.26 0. since the errors in dipole moment and RRMS are particularly small for all conformations.46 0.

Reynolds et al. The method is able to achieve its objective of producing OPLS-like charges by a much more accessible route than the OPLS method.2). The usefulness of REPD charges may be further validated by testing their properties with experiment. only conformational energy minima are considered.125 although as yet this solution has not proved popular. Charges derived by this method have been used for parameterising the thiourea unit in macrobicycle 12 (see Section 3. and the charges produced are temperature-dependent.4 Conclusion. PARTIAL CHARGE METHODS 80 conformation will be correspondingly smaller. However. Charges that actually vary with conformation may be implemented in the force field. . The derivation. 120 perform this averaging over different conformations. each conformation Boltzmann-weighted according to its energy. This is done simply by fitting to the MEP using a carefully chosen restraint. formulation and properties of the REPD charge method has been presented. the method is more computationally demanding. The main problem remains that there is generally no unique charge set able to reproduce well the MEP for all the conformations of a molecule. Therefore the comparison of the free energies of hydration for molecules parameterised with REPD is made in the next chapter. 4. One method of dealing with conformational dependence is to produce an averaged set of charges over all conformations.CHAPTER 4.

Chapter 5 Testing of REPD Charges by FEP and LIE Since REPD charges are intended for use in condensed phase computer simulations.1. this list excludes 81 .5 kJ mol−1 . 147.1 FEP Free Energies of Hydration. the commonly used technique of calculating the free energies of hydration for a range of small molecules and comparing with experiment was employed here. Therefore.1.65. The Molecule Test Set.88–91 This study allows the effect of the restraint and basis set on MEP charges to be examined. The free energies of hydration were calculated for the 22 molecules listed in Table 5.1 5. It is already well known that the free energies of hydration of molecules using EPD/6–31G* charges compare reasonably well with experiment with an average error of around 4. it is imperative that the charges are able to reproduce experimental condensed phase data. 148 Owing to potential overfitting problems observed in the LIE parameterisation. an alternative procedure was also tested. 5. a detailed statistical analysis of the method was subsequently performed to determine the best LIE equation. Of the original 29 molecules used for the REPD parameterisation.18 Owing to the computationally intensive nature of the FEP method. to examine their performance. that of the linear interaction energy (LIE) method.

As noted earlier in Subsection 2. REPD/6–31G*. phenol and aniline. TESTING OF REPD CHARGES BY FEP AND LIE 82 7 of the molecules used in the original charge parameterisation for the following reasons.2. Of those molecules with conformational degrees of freedom. and REPD/6–31+ G*. 2. the FEP equation (Eq. EPD/6–31+ G*.2. To calculate the free energy of hydration for a rigid molecule.5. The FEP method was used in this work to calculate this free energy change. This approximation was made to simplify the free energy calculations and reduce other sources of error such as sampling or dihedral angle parameterisation that may complicate the assessment of the performance of each charge set. Standard OPLS Lennard-Jones parameters were used. namely EPD/6–31G*. were included to ensure that all chemical functionalities were represented. The suitability of the rigid molecule approximation was the main criterion for choosing these molecules. the mutation to nothing was divided up into a number of smaller stages. as shown in the mutation ”tree” in Figure 5.2 Selection of Mutations. the molecule is mutated from itself to a non-interacting particle in aqueous solution as described in Subsection 2. Where possible. acetamide and trans-N-methyl acetamide were used as the barrier to rotation about the amide C–N bond is large enough to justify rigidity. Firstly. perturbation pathways were chosen to minimise .25 The atomic charges used were those developed in the previous chapter.3.1. Thus most of the molecules included contain no significant conformational degrees of freedom. apart from those associated with hydrogens on methyl groups.CHAPTER 5. molecules were mutated to their next simplest most similar molecule. Benzonitrile was excluded because it lacked an experimental free energy of hydration. With this in mind. while formamide and formaldehyde were retained due to their usefulness as intermediates in the free energy calculations.1. The simulations were performed in TIP4P water. acetic acid. The 6–31G* and 6–31+ G* optimised geometries matched with the particular basis set used to derive the charges were adopted for the simulations.101 5. methyl acetate. In addition.18) only converges in practice if the two states are very similar.

2. In this work. Mutations for the grey lines were performed elsewhere.149–151 changes in the number and position of atoms. To further increase the similarity between end states of the O H H H H H O C H C O C H H H C C H Du C H H Methyl Acetate Acetone Figure 5. . OPLS benzene or TIP4P water. TESTING OF REPD CHARGES BY FEP AND LIE 83 methane £ acetaldehyde £     £ ¢ formaldehyde d d acetone methyl acetate £¢ £ ¢ ethene £¢  £¢  £¢¨ dimethyl ether dimethyl sulfide ¨ ¨  £ ¢ OPLS methane r ¨ ammonia ¨ g f r t r TIP4P water ¥ TIP3P water ¨ ¥ rr gft r water ¥ gf t t ¥ gf methanethiol ¥ gf acetic acid ¥ gf f chloromethane    ¥ g ¨ nothing¥  ¨ formamide g i g methanol  ¨ ¨ rr i d trans-N-methyl acetamide r i dr acetamide i aniline d d i  methylamine  i ¨ benzene i ¨ ¨ i OPLS benzene  r tr r t chlorobenzene t t phenol Figure 5.149–151 Therefore. only the easier relative free energy calculations had to be performed.CHAPTER 5. By calculating the free energy changes for all the stages in the tree.1: Free energy tree showing the mutations performed to calculate the free energies of hydration. all molecules were eventually mutated to OPLS methane.2: The change in geometry mutating from methyl acetate to acetone. the absolute free energy for any molecule was then calculated by summing the components. The typical change in structure involved in a perturbation may be seen for the mutation from methyl acetate to acetone as given in Figure 5. for which the decoupling free energies have already been determined.

No internal solute moves were attempted due to the rigid molecule A approximation. λ. each of these mutations between molecules was subdivided further into a number of windows defined by the coupling parameter. The free energy change was taken as the average of the forward and reverse free energies. This allowed simulations of all windows to be run simultaneously. TESTING OF REPD CHARGES BY FEP AND LIE 84 mutation. Any non-zero difference between the forward and reverse free energies provided by double-ended sampling allows the convergence to .3 ˚ and 10–30◦ . 2.3 Simulation Protocol. The simulations run at each value of λ used exactly the same starting geometry since it was assumed that the equilibration was adequate. some of the mutations involving the more polar molecules were extended to 10 M configurations of data collection.CHAPTER 5. depending on the solute size. A To ensure faster equilibration. The maximum volume move sizes were set to 320 ˚3 . At each window the free energy change was calculated for mutating to both the next and previous windows using Eq. Free energy simulations were performed using BOSS 3. Maximum move sizes for solute translations and rotations were selected to be between 0. respectively. to give A an acceptance probability of approximately 40%.04 % of all attempted configurations. However. dramatically increasing the length of time for a single calculation. 5.6. At A each λ window the system was equilibrated for 3 million (M) configurations followed by 5 M configurations of data collection.1. Solute moves comprised 1 % and volume moves 0. As a test for the convergence of the free energies.5 ˚.1–0. Equilibrium configurations were generated in the NPT ensemble at 25 ◦ C and 1 atm using the MC Metropolis algorithm with quadratic feathering to zero over the last 0. which varies from 0 to 1. 7–10 water molecules with the highest interaction energy with the solute were discarded.31 Each solute molecule was placed in a cubic box with side 27 ˚ containing 648 equilibrated water molecules. Propagating the coordinates from the end of one window to the start of the next can be used to save on equilibration costs.18. it does mean that all windows must run in sequence rather than parallel.

5. The mutations for trans-N-methyl acetamide and methanol were broken up into two completely separate simulations for which first the geometry and then the non-bonded parameters were perturbed.1 for all 22 molecules for each of the four charge sets. Errors for each λ window were calculated as half the hysteresis between the forward and reverse free energy changes. A 0. Such a difference is commonly referred to as the hysteresis. additional windows were used in other cases in which significant hysteresis of 1 kJ mol−1 was evident for a given window. 0. six λ windows were used. 0.0. The free energies for the molecules parameterised with the remaining charge sets were calculated by mutating their charges to their respective REPD/6–31G* charges in three λ windows of 0.6. To further clarify the comparison with experiment. Table 5.92–95 The average unsigned error for 20 molecules with respect to experiment given at the bottom of the table indicates the overall performance for that charge set.8 and 1. Geometries and non-bonded parameters were scaled linearly with λ between the initial and final molecules. If no atoms were destroyed. particularly where methyl groups were mutated to hydrogens and where polar groups were adjusted. 0. The average signed error is given at the bottom of this table and indicates whether the free energies are on average too positive or too negative compared with experiment.4 Results. Atoms that disappeared were mutated to dummy atoms and had their bond lengths reduced to 0.0.2 contains the errors with respect to experiment for each molecule and charge set.0. the calculated free energies from these longer runs .4. The longer simulations consisting of 10 M configurations of data collection gave smaller random errors as expected. Also. The free energies of hydration calculated by FEP are presented with simulation errors in Table 5.5 and 1. However.2.0. TESTING OF REPD CHARGES BY FEP AND LIE 85 be monitored.1. together with the experimental values. Accumulation of the errors gave the total error for the overall perturbation.1 were carried out using the REPD/6–31G* charge set. spaced at 0.CHAPTER 5.2 ˚. The mutations between molecules as given in Figure 5. Such mutations typically converged very quickly. 0.

8±0. were identical to those obtained using 5 M configurations of data collection to within error.4 1.6 -15.4 1.7±1.3 water -26.2 -22.4 -1.2±1.3 -36.8±3.7 -25.8±1.5 -39.1 acetic acid -28.1±0.0 -10.2±1.3±1.0 -20.3 -40.4 methanethiol -5.6±1.2 2.7 -35.8 -30.3±1.8 9.9±1.9±3.7±2.0 -19.8 -43. and the acetic acid to methanol mutation was unchanged at 11.3±1.7 -27.0 -24. the methanol to methane mutation changed from 24.3 -26.6 -38.0±1.6 phenol -27.3±1.4±1.2±1.8±3.7 -18.2±1.0 -22.2 -7.2 -12.4 kJ mol−1 .6 trans-N-methyl acetamide -42.8 -15.7 4.7 -12.2 -2.8 6.1 methyl acetate -13.2 -10.1±2.2 -16.4±1.9 -13.7±1.9±1.7 -9.9 -12.8±1.4 ethene 5.5±1.8 to 25.0 -30.7 -23.2 -1.7 -11.1: Calculated Free Energies of Hydration (kJ mol−1 ) Using FEP versus Experiment92–95 for All Four Charge Sets.4 -16.7±1.6±1.2 -7.7 -18.9±1.6±1.2 1.2 -14.CHAPTER 5.4±3.0±0.8 kJ mol−1 .1 -12.8±1. For example.8 -3.5±1.5±1.9 -2.0±0.8±2. Molecule ∆Gexpt methane 8.8±2.4±1. .0±2.0±2.8±3.8 -19.4±1.0 3.7±1.6±2.6±3.6±1.6 -19.0±1.9 ammonia -18.6 -35.9 -11.9 -46.5 kJ mol−1 .2±1.5±1.6±0.4 chloroethane -2. the error decreasing from ±2.9 -27.0 methylamine -19.8 -9.2±2.3±3.5 -49.8 -31.7±1.1±0.1±2.6±1.9 With respect to experiment.3 kJ mol−1 .3 -40.3 -14.1±3.4 -43.0 -19.1 acetamide -40.8±1.3±1.5±1.3±1.7 1.9 -9.7 aniline -20.7 1.7 1.7±2.0±1.6±1.8 5.5 chlorobenzene -4.7±2.7 acetone -16. TESTING OF REPD CHARGES BY FEP AND LIE 86 Table 5.4 -4.9±2.7 -18.8±2.6 -16.5±3.8 6.4 1.2 -2. the acetamide to methanol mutation changed from 18.8 -4.3 -18.3 -34.7 formaldehyde formamide Average unsigned errora a ∆GEPD 6–31G* 9.6 ∆GREPDG* 6–31+ 9.3 -19.2±2.0 4.7 -14.5 ∆GREPD ∆GEPD+ G* 6–31G* 6–31 9.3±1.0 3.0±2.7 -36.0±1.9±1.0 -26.4±1.9±1.5±1.7±1.7±1.6 benzene -3.7 -17.7 1.0±1.3 -38.9 dimethyl sulfide -6.5±1.8 -9.7±1.9 -22.4 to ±1.6 -17.4±2.3 to 20.8±1.7±2.4 methanol -21.4±2.7±1.5±1.9±1.2 acetaldehyde -14.6±1.6±1.2 dimethyl ether -7. Thus 5 M configurations of data collection was sufficient for the performance of all charge sets to be determined.2±1.3 -22.9 -10.

7 4.5 1.8 ErrorREPD 6–31G* 0.5 Effect of Restraint.5 kJ mol−1 .4 0.1 3. application of the restraint makes the free energies of hydration less negative while changing from the 6–31G* to the 6–31+ G* basis set makes them more negative.9 -8. The commonly used EPD/6–31G* results are the next most reliable with an average absolute error of 3.2 0.9 8. Molecule methane ethene water methanol methanethiol acetaldehyde acetone acetic acid methyl acetate ammonia methylamine acetamide trans-N-methyl acetamide dimethyl ether dimethyl sulfide chloroethane benzene phenol aniline chlorobenzene Average Signed Error ErrorEPD 6–31G* 0.2 7.0 3.0 8.3 gives a clear indication of the good correlation between REPD/6–31+ G* and experimental free energies of hydration. From Tables 5.2 it can be seen that the REPD/6–31+ G* charges perform the best with an average absolute error of only 2.6 0. The REPD/6– 31G* and EPD/6–31+ G* charge sets perform less satisfactorily.7 4.0 -1.1 -4.3 0.3 -2.7 1.7 1.4 7.0 4.9 4.8 5.3 -7.8 -5.0 7.0 6.7 -5.9 1.9 -3. the former being too hydrophobic and the latter too hydrophilic. The slope for the line of best fit is 0.5 1.0 0.0 -2.7 0.6 3.3 -0.5 1.4 -15.7 3.3 -4.4 3.6 3.8 4.1 -7.9 -0.3 -4.1 -4.6 ErrorEPD+ G* 6–31 0.1.9 -2.7 4.6 -3.6 -2.7 0.7 -5.7 -3. The slope and correlation data for the other three charge sets may be found in Table 5.9 -0.4 4.2 2.6 -0.4 -5.1 5.2: Errors in ∆Ghyd (kJ mol−1 ) with Respect to Experiment for All 20 Molecules using All Four Charge Sets.3 2.0 -4.3 2.7 -3.0 5.2 1. For the REPD/6–31+ G* charge .8 3.1 and 5. Evidently.9 -2. Basis Set and Geometry. Figure 5.9 kJ mol−1 .8 2.0 4.98 and the correlation coefficient is 0.97.3 -7.3 0.5 2.3 1.5 3.4 ErrorREPDG* 6–31+ 0.CHAPTER 5.3. TESTING OF REPD CHARGES BY FEP AND LIE 87 Table 5.7 1.1 5.9 3.4 2.1 -9.0 -1.8 5.

The restraint slightly decreases the dipole moment of 6–31G* charges by on average 1. Charge Set EPD/6–31G* REPD/6–31G* EPD/6–31+ G* REPD/6–31+ G* m 1.2 indicates. these effects largely cancel.97 . these strongly influence the water structure surrounding the solute molecule. but as Table 5.4 contains the dipole moments for all 22 molecules with all four charge sets. r.3: Slopes.02 0. TESTING OF REPD CHARGES BY FEP AND LIE 20 88 ∆Ghyd(REPD/6−31 G*) /kJ mol 0 + −1 −20 m=0.CHAPTER 5. This would be expected to have only a small effect on ∆Ghyd .3%.98 r 0.95 0.91 1.97 −40 −40 −20 0 −1 ∆Ghyd(Experimental) /kJ mol 20 Figure 5.98 r =0. Much of the observed free energy behaviour may be understood in terms of the molcules’ dipole moments. Table 5.3: FEP REPD/6–31+ G* free energies of hydration versus experiment for the molecules listed in Table 5. m.1. the average free Table 5. and Correlation Coefficients. set.15 0. The dashed line is the line of best fit while the solid line has unit slope.96 0.97 0. The general trends are as follows. producing free energies with a similar average signed error to EPD/6–31G* but with an improved average absolute error. for ∆Ghyd of All Four Charges Sets versus Experiment.

27 2.53 0.48 0.8 kJ mol−1 .04 2.17 2.58 2.4: Dipole Moments (D) for All 22 Molecules from All Four Charge Sets.00 4.87 4.35 2.00 2.71 1.72 1. the average error using the 9 common molecules was 3.00 0.23 2.72 4.64 2.12 3.99 2.05 3.22 2.00 1.20 2.96 1.00 1.07 1.88 1.CHAPTER 5.0% for EPD charges and by a similar value for REPD charges. In that study.00 2. In this work.72 2.47 0.82 4.65 1.7 kJ mol−1 .15 2.58 2.00 2.85 2.00 0.02 3.03 µEPD+ G* 6–31 0.09 1.10 3.28 µREPDG* 6–31+ 0.84 1.07 3.60 2. To allow a more valid comparison.86 2.30 3.42 1.30 2. By combining the 6–31+ G* basis set with the restraint.00 0. these two effects cancel to some extent giving slightly more polarised molecules with OPLS-like charges which reproduce well experimental free energies of hydration.00 1.25 2.66 4.04 2. TESTING OF REPD CHARGES BY FEP AND LIE 89 Table 5.26 3.04 4.86 4. the 9 molecules used in both this study and the work of Carlson et al.06 3.30 4.53 0.09 1.87 4. EPD/6–31G* charges were used with OPLS rather than ab initio geometries.29 1.2 kJ mol−1 for EPD/6–31G* and 2.68 4. Switching to the 6–31+ G* basis sets increases the dipole moment by an average of 4.72 4.22 energy of hydration become more positive by 1.01 1.85 1.00 2.16 1.95 1.4 kJ mol−1 for the 13 molecules used in their calculations. were examined.41 1.21 2.95 1.94 1.89 who calculated an average error of 4.10 2.00 2.00 0.72 2.17 1.57 2.90 1.12 1.26 4.00 2.08 µREPD 6–31G* 0.30 1. The results presented are an improvement on those of Carlson et al.97 1. Molecule methane ethene water methanol methanethiol acetaldehyde acetone acetic acid methyl acetate ammonia methylamine acetamide trans-N-methyl acetamide dimethyl ether dimethyl sulfide chloroethane benzene phenol aniline chlorobenzene formaldehyde formamide µEPD 6–31G* 0.24 2.

the calculated free energies of hydration are rather inaccurate. point selection schemes and FEP simulation protocols. These include the use of different geometries. The free energy for this mutation was calculated to be 6 kJ mol−1 . benzene and chlorobenzene which are too hydrophilic. Part of this may be attributed to the increase in dipole moment from 4. only one parameter is necessary to describe its electrostatic properties within the point charge approximation. REPD/6–31+ G* molecules with an error greater than 4 kJ mol−1 are methyl acetate. as discussed in Section 3.CHAPTER 5. Owing to symmetry. The detrimental effect of a few poorly-reproduced molecules can strongly affect the overall average error. Benzene is an especially interesting case. All of the charges listed in the previous chapter for sulfur and chlorine deviate significantly from OPLS values but there is no consistent trend between ∆Ghyd and charge. 5.6 Particular Discrepancies with Experiment.04 D to 4.9 kJ mol−1 . As an example of the importance of geometry. this analysis has only be performed using a small number of molecules. The rigid molecule approximation may be responsible for the discrepancy for a number of larger molecules such as methyl acetate. 6–31G* optimised acetamide with EPD/6–31G* charges was mutated to the OPLS geometry with EPD/6–31G* charges rederived for that geometry. an improvement on the Carlson et al. irrespective of charge set. and ammonia.1. However. This does raise concerns with the application of such charges to the macrobicycle 12 system. However. dimethyl sulfide. result of 4. TESTING OF REPD CHARGES BY FEP AND LIE 90 for REPD/6–31+ G*. this may be partially put down to 6–31+ G* geometries being inadequate. making this number particularly . While the performance of REPD charges would be expected to be different.2. The charge parameteristion seems to be wanting for third row atoms such as sulfur and chlorine. More molecules would need to be considered and a more systematic investigation undertaken to ascertain whether the difference between these two studies is significant. and chloroethane which are too hydrophobic.38 D for the OPLS geometry. For a number of molecules studied here. there are a number of possible reasons for this difference in EPD charges.

016 in charge between EPD/6–31G* and EPD/6–31+ G* benzene results in a large change in free energy of 3. The EPD/6–31G* value in Table 5. A trend is apparent by comparing the benzene free energies with charges. methylamine is found to be more hydrophilic than ammonia.133. for molecules with a lower symmetry than benzene such as aniline and phenol.1. respectively. Nevertheless. significantly better than the corresponding error of 2.3. it may conceivably be related to geometry. although this result is probably aided by the previously discussed problems for ammonia. A difference of 0. .89 are. the OPLS value150 and the value of Carlson et al. -3.CHAPTER 5. The above-mentioned discrepancies may also be due to the Lennard-Jones parameters and even the form of the force field itself.115 and -0.88. polar molecules. for the REPD/6–31+ G* charge set. it is clear that benzene’s free energy of hydration scales rather sensitively with its carbon charge. It has been noted by a number of workers that free energy calculations fail to reproduce the increase in hydrophilicity observed experimentally for acetamide and ammonia when a hydrogen on the nitrogen is replaced by a methyl group. -0. In contrast to previous work. Assumptions and approximations inhererent in all force fields will inevitably work better for some molecules than others. This sensitivity is especially important given the strong dependence of benzene’s EPD charges on point selection as discussed in Subsection 4.0±2.103.8 kJ mol−1 . the accuracy of which is especially important for small. However. the agreement with experiment improves. Fortunately.8 kJ mol−1 .7 kJ mol−1 .8 kJ mol−1 and -1.2. If these problem molecules are removed. TESTING OF REPD CHARGES BY FEP AND LIE 91 critical.6 kJ mol−1 for EPD/6–31G*. -7. the results of this work are no different. The disagreement for ammonia is perplexing given the success for water. the average error for REPD/6–31+ G* molecules reduces to only 1.7 kJ mol−1 .7 kJ mol−1 while the corresponding charges are -0. an improvement on previous studies and arguably within the limits of simulation error. the main objective of having a general method that works well for as many molecules as possible has been achieved. the relative free energy difference was calculated to be incorrect by only 3. 90 For acetamide and trans-N-methyl acetamide.

1) β was set to 1/2 in that work.CHAPTER 5.2.65 for the calculation of binding free energies for A protein ligand systems. (5. 148 In its application to such calculations. they studied three additional possible contributions to ∆Gbind .161. Furthermore.65 binding free energies were assumed to depend on two terms. was adjusted in fitting to experimental data.147.2 5. empirical method requiring parameterisation to known free energy data. The dependence was assumed to be linear. obtained either from experiment or simulation. This method has the advantage of only requiring simulations at the end points of a mutation but suffers from the disadvantage of being an approximate.1 LIE Free Energies of Hydration. Carlson and Jorgensen147 firstly found that the model would reproduce experiment much better if β was allowed to vary. The inclusion of an additional term was deemed necessary because both existing terms were negative and a positive contribution was necessary to account for molecules like methane with a positive free energy of hydration. molecular volume and solvent accessible surface area (SASA). It was physically rationalised as a water cavitation term. α. In the original formulation. These were the differences in van der Waals and electrostatic ligand-surrounding energies between simulations of a free ligand in solution and a ligand bound to a solvated protein. Possible candidates they considered were the molecular surface area. The Linear Interaction Energy (LIE) method is a fast free energy method that was originally proposed by ˚qvist et al. Form of the LIE Equation. TESTING OF REPD CHARGES BY FEP AND LIE 92 5. as in the equation ∆Gbind = α < ∆Uvdw > +β < ∆Uelec > . with α and β the respective coefficients. Each of these was obtained by averaging the individual values over all equilibrium configurations. SASA is . Hence only one parameter. Its value for their system was 0. The methodology has subsequently been applied to the calculation of free energies of solvation. drawing on the Born model for the free energy of hydration of ions.

The LIE equation in their work became ∆Ghyd = α Uvdw + β Uelec + γ (SASA) (5. summed over all molecules in the fitting set. .CHAPTER 5.2. SASA’s were calculated using Macromodel. Since the solutes were held rigid in these calculations. The simulation protocol was identical to that applied in the FEP calculations. between the solute and water. Eq. saying that a value of 0. and 0. 5.023 fitting to FEP free energies. SASA was constant for each solute over the simulation. 5.2 was the form of the LIE equation tested in this work.154 Formaldehyde and formamide were excluded from the fits to experiment since the experimental data were not available. Uelec and Uvdw were calculated from the same 5 M configurations used for the first window of that molecule’s FEP simulation.4 ˚. A combined fit using all molecules with all charges sets was also tested. 0.020 fitting to experiment. ˚qvist and A Hannson clarified their work. β and γ were optimised using a subplex algorithm153 to minimise the absolute difference between Eq. Subsequent to this.5 for β was valid for ionic systems but a value of 0.2) where Uelec and Uvdw are the absolute solute-solvent electrostatic and van der Waals energies. The parameters α. In water. TESTING OF REPD CHARGES BY FEP AND LIE 93 the combined area of all atoms with their radii augmented by a solvent probe radius.2 LIE Protocol.348.4 was more appropriate for neutral dipolar molecules. α.489. All the necessary data came from the end windows of the previous FEP computer simulations for 22 small molecules.152 5. this is conventionally set to 1. Separate fitting was performed for the molecules in each of the four charge sets. but were included in the fit to FEP. 0. The inclusion of each of these additional A terms was able to improve the reproduction of experiment similarly.421 and 0.444 and 0. yet they elected to adopt SASA as the third fitting term since its performance was marginally better. respectively.2 and either FEP or experimental free energies. β and γ parameters Carlson and Jorgensen derived were 0.

6 -19.3 -69.3 -17.2 -27.8 10.1 -17. Table 5.3 -59.3 -23.6 contains the α.0 -0.4 -55.4 -34.5 -96.6 -14.8 -15.3 -20.1 -111.9 -93. β and γ parameters obtained from fitting the energy components and SASA to either experimental or FEP free energies of hydration us- .1 -30.5 -16.8 -101.8 -17.7 -30.2 -38.3 -26.7 -12.9 -85.7 -61.1 -12.7 -62. The energy and SASA data required for the LIE parameterisation have already been calculated for each molecule in water in the FEP Monte Carlo simulations and are presented in Table 5.8 -48.2 -107.2 -9.3 -18.9 -97.3 Derivation of the LIE Parameters.0 -61.5 M configurations and vary from zero to approximately 2 kJ mol−1 and 6 kJ mol−1 .9 -38.7 -40.6 1.2 -24.1 -16.9 -30.5 -17.1 -95.5 -58.4 -5.2 -35.2 -10.4 144 170 114 160 182 189 219 197 243 132 172 204 240 197 214 178 244 256 265 243 153 171 5.4 -5.4 -49.3 -111.1 143 170 114 160 182 189 219 198 234 132 172 204 240 197 214 178 245 256 265 243 153 171 -13.4 -6.6 -12.5 -59. The statistical errors on Uvdw and Uelec are standard errors calculated over batch averages of 0.5 -74.9 -66.5 -28.9 -8.7 -30.CHAPTER 5.8 -28.6 -17.2 11. there is no discernable trend between the two charge sets and the SASAs are almost identical. Comparing the energy data obtained for each molecule.8 -41.9 -18. respectively.3 -59.4 -32.3 -17. Electrostatic Solute-Solvent Energies (kJ mol−1 ) and Solvent Accessible Surface Areas (˚2 ) for EPD/6–31G* and REPD/6–31+ G* molecules. The FEP results suggested that these charge sets were the most useful.2 -69.1 -30.0 -16.7 -32.9 -45.2 -28.7 -38.6 2.6 -7.7 -34.4 -72.5: van der Waals.3 -0.7 -36.0 -16.6 -50.5 for the EPD/6–31G* and REPD/6–31+ G* charge sets.0 -60.1 -67.0 -17. TESTING OF REPD CHARGES BY FEP AND LIE 94 Table 5.4 -87.2. A Molecule methane ethene water methanol methanethiol acetaldehyde acetone acetic acid methyl acetate ammonia methylamine acetamide trans-N-methyl acetamide dimethyl ether dimethyl sulfide chloroethane benzene phenol aniline chlorobenzene formaldehyde formamide EPD/6–31G* Uvdw Uelec SASA REPD/6–31+ G* Uvdw Uelec SASA -14.3 -29.

526 γ 0. This may either be a real physical effect or purely statistical.65 while there is more variation for α and γ. but the SASA term and its associated parameters strictly should not vary if its inclusion is physically the result of solvent cavitation. the correlation coefficient between α and γ is found to be 0. 5.551 0.5 as predicted by theory.582 +G* EPD/6-31 0.519 0.382 + G* EPD/6-31 0.12. Indeed.471 0.131 0. regardless of how the charges are derived.CHAPTER 5. in theory the LIE free energies are assumed to depend strictly on Uelec .137 0.093 0.092 0.078 0. That the γ parameter does vary is indicative of some correlation between SASA and the other terms. Uvdw and SASA.678 REPD/6-31G* 0.594 Charge Set β 0. Hence the LIE equation being used may be overfitting to the data. 5. ∆G Origin Experiment FEP α EPD/6-31G* 0.120 0.085 0. particularly when fitting to FEP free energies of hydration. However.2. 152 In particular. the parameters do vary somewhat between charge sets. Some coupling may be expected between the energy terms for different charge sets.467 0.506 +G* REPD/6–31 0.086 0. β and γ for Each Charge Set Fitted Using Eq. α. 148. TESTING OF REPD CHARGES BY FEP AND LIE 95 Table 5.525 0.506 0. This discrepancy is discussed in the next subsection. The parameters derived by fitting to all molecules of all charge sets lie in the range spanned by those from each individual charge set.444 0.6 and 0.350 EPD/6-31G* 0. Overall.08 while FEP-fitted values are 0.532 All Molecules 0.35 and 0.2 to Experiment and FEP. The values of LIE parameters obtained from the fitting procedure are consistent with those listed earlier for free energies of hydration.96.115 0. . respectively.491 0.370 All Molecules 0. the β values are all found to lie in the vicinity of 0. though.129 ing Eq.502 0.147.6: The LIE Parameters.303 REPD/6-31G* 0.366 + G* REPD/6-31 0. Experimentally fitted values for α and γ lie around 0.

1 4.96 Charge Set r Average Error CV Error 0. Correlation Coefficients.8 kJ mol −1 and 1. As FEP is formally an exact method.4 for EPD/6–31G* charges. respectively.2 0.90 +G* EPD/6-31 0.7 0. the ability of LIE to reproduce FEP results is a critical test of the LIE free energy methodology.9 0.89 + G* EPD/6-31 0. Although the utility of the LIE method involves fitting to experimental free energies of hydration.9 kJ mol−1 . Table 5.CHAPTER 5.9 2.95 +G* REPD/6–31 1.98 1.96 3. with mean unsigned errors of 1.98 1. in assessing the performance of the LIE method itself.7 gives the gradients and correlation coefficients.92 3.98 2.01 All Molecules 0.9 0. TESTING OF REPD CHARGES BY FEP AND LIE 96 Table 5.4 0.7 that all the LIE free energies agree well with their equivalent FEP values to within approximately 2 kJ mol−1 . REPD/6–31+ G* and EPD/6–31G* are in closest agreement.94 All Molecules 0.2 0.5 3.93 FEP EPD/6-31G* 0. fitting to three of the groups and applying the resulting LIE parameters to the fourth to give an average error.4 Performance of LIE Free Energies. ∆G Origin m Experiment EPD/6-31G* 0.94 3. Each of the four groups was examined in turn and the complete procedure repeated 100 times. The results are also presented in graphical form in Figure 5. 5.4 0.98 2.98 REPD/6-31G* 0. m. r. the results of fitting to FEP values and comparing with experiment should also be examined.3 2.2.7: The Slopes. Average Errors and CV Errors for Each Charge Set Fitted Using Eq.4 5. The CV error was calculated by dividing each set of molecules into four groups at random.8 2.3 2.97 2.0 3.2 2.97 2. The predicted and experimental free energies of hydration together with the errors between them for EPD/6–31G* and REPD/6–31+ G* are given in Table 5. The ability of LIE to reproduce experimental free .4 0.94 3.7 0.2 3.2 to Experiment and FEP. It is apparent from Table 5.8.97 REPD/6-31G* 0.95 + G* REPD/6-31 0. average unsigned errors and average cross validation (CV) errors of the predicted LIE free energies versus experiment for all four charge sets.6 4.

6 -3.7 -3.3 -26.7 -20. In this regard.2 0.7 -18.6 -5.6 -22. This suggests that the LIE parameterisation is able to reduce the errors in predicted free energies arising from inadequacies in the force field.4 -6.0 1.5 -4.0 - energies of hydration examines the reliability not only of the free energy methodology but also the force field.5 0.3 -27. Only the FEP REPD/6–31+ G* results are comparable with the LIE results. Molecule methane ethene water methanol methanethiol acetaldehyde acetone acetic acid methyl acetate ammonia methylamine acetamide trans-N-methyl acetamide dimethyl ether dimethyl sulfide chloroethane benzene phenol aniline chlorobenzene formaldehyde formamide ∆Gexpt 8.2 kJ mol−1 .0 2.1 -40.0 -2.0 kJ mol−1 .8 Error 0.6 4.9 -1.8 -16. the EPD/6–31G* charge set is the most reliable.1.3 -3.9 -14.1 -24.0 -22.2 -7.2 -16.6 -27.8 7.0 -3.5 8.4 3.5 -19.1 -0.0 1.8: Predicted LIE Free Energies of Hydration (kJ mol−1 ) Fitted to Experiment Using Eq.0 -5.5 -0.3 -5.5 3.7 ∆GEPD 6–31G* 6.6 0. 5. .3 -1. these errors compared to experiment are larger than those obtained on fitting to FEP.7 -13.5 -23.1 3. as would be expected since the reliability of both the LIE relationship and the force field affect the fit. For each charge set.9 Error 1.9 -0.0 -0.2 -31.0 -10.2 -14.7 -3. This indeed appears to be the case for molecules such as methanethiol. TESTING OF REPD CHARGES BY FEP AND LIE 97 Table 5.5 -37.5 -32.0 -19.7 5.1 -12.7 -33.3 -24.2 -2.4 -2.0 -0.0 2. The most noteworthy comparison is that the mean unsigned errors for the LIE method fitted to experiment are at worst the same as the equivalent errors for the FEP calculations reported in Table 5.6 5.CHAPTER 5.2 -0.3 -36.3 -0.1 -2.4 5.6 -42.0 -4.7 -18.6 -21.8 5.6 -20.9 -3.1 -16.2 for the EPD/6–31G* and REPD/6–31+ G* Charge Sets.3 -7. with a mean unsigned error of 2.0 -14.7 0.7 ∆GREPDG* 6–31+ 8.5 -4.4 -21.6 -5.3 -0.9 -18.1 -0.8 3.9 -6.7 -40. followed by both REPD charge sets with mean unsigned errors of approximately 3.0 -40.4 6.6 -40.2 -2.1 -13.7 -16.2 -7.8 -19.1 -0.1 -28.1 -4.9 -38.4 -5.6 2.

and chlorobenzene.6 are larger than the average error by between 0. this is probably as a result of fitting rather than any intrinsic property of the method.CHAPTER 5.3 and 1.97 −40 −40 −20 0 −1 ∆Ghyd(Experimental) /kJ mol 20 Figure 5.6 give an indication of the sensitivity of the LIE parameters to the molecules used in the fit. The dashed line is the line of best fit while the solid line has unit slope. TESTING OF REPD CHARGES BY FEP AND LIE 20 98 ∆Ghyd(EPD/6−31G*) /kJ mol −1 0 −20 m=0.3 kcal mol−1 . acetic acid. For the individual charge sets the CV errors given in Table 5. The CV errors presented in Table 5. These CV errors suggest that more molecules should be included in the fitting procedure. to reduce the dependence on the molecules used in the parameterisation.1 kJ mol−1 . .2 versus experiment for the molecules listed in Table 5. 5. In this way. However. the parameterisations were repeated using all the molecules from every charge set. especially when fitting to experiment.98 r =0. methyl acetate and phenol.2 and 2. This sensitivity is revealed by the degree to which the CV error is worse than the average unsigned error. ammonia. since for other molecules LIE fares worse than FEP such as acetone.4 kcal mol−1 for experiment or FEP is now only marginally worse than the average values of 3. dimethyl ether.4 or 2. Consequently. the influence of a few poorly reproduced molecules would be reduced and indeed the resulting CV error of 3.8.4: LIE EPD/6–31G* free energies of hydration fitted using Eq.

The REPD/6–31G* and EPD/6–31+ G* charge sets. 5.2.5 Overfitting to the Data. Uelec and SASA. A fit was then performed using Eq. the average error and correlation coefficient increase much less significantly. when Uvdw and SASA are randomised.2 may in fact be overfitting to the data. it would suggest that use of that particular quantity in the fitting equation is unjustified.2. The results for fitting EPD/6–31G* and REPD/6–31+ G* energy data with FEP free energies of hydration using Eq.6 and noted elsewhere147 is of particular concern since it suggests that Uvdw and SASA contain similar information regarding free energies. while the remaining two terms were unaltered. However. In this method. As already indicated. These results do not suggest that the energy terms and SASA used in .9. 5. the values of a given quantity were reassigned at random to each molecule. TESTING OF REPD CHARGES BY FEP AND LIE 99 respectively. if the fit is largely unaffected. due to their overall worse performance. On the other hand. Nevertheless.2 and the average error and correlation coefficient calculated.CHAPTER 5. 5. the values of α and γ obtained from the fitting procedure vary significantly between charge sets. are no longer considered for the remainder of this section. This procedure was repeated 100 times and the overall average error and correlation coefficient obtained. the average error increases dramatically and the correlation of predicted free energy versus FEP is severely degraded. Uvdw . a procedure involving the randomisation of each of the molecular quantities was employed. 5. the strong correlation between α and γ observed in Table 5.2 are presented in Table 5. then the fit would be expected to become extremely poor. the average error is now worsened for EPD/6–31G* and REPD/6–31+ G* charge sets so it is advisable to retain the parameters derived from each charge set. To gauge the importance of the three molecular quantities. used in Eq. If the calculated free energies of hydration are strongly dependent on one of these quantities. In this regard. This result is undesirable and raises the possibility that Eq. 5. Fitting one parameter to each one is therefore arguably overfitting. It is evident that when Uelec is randomised.

A range of other LIE functions including various combinations of the Uelec .0 0.337 β Uelec + γ(SASA) 0. when a smaller number of molecules or shorter simulations are used. A similar analysis has been performed by McDonald et al. a number of these relationships perform almost as Table 5.6 0.8 4. Uvdw and SASA.7 2.9 4.10: The LIE Parameters. TESTING OF REPD CHARGES BY FEP AND LIE 100 Table 5. 5.6 3.0 0.9: Average Errors and Correlation Coefficients Obtained when Fitting to FEP Using Eq.424 α Uvdw + β Uelec + δ 0. 5.94 8.8 0.62 2.10 indicate that reducing the number of parameters in the LIE relationship does indeed lower the fitting ability of the function.10 error 2. Uvdw and SASA terms were examined for fitting EPD/6–31G* energy data to FEP free energies of hydration.678 0. This would be particularly true when there is more noise in the fit.525 β Uelec + γ(SASA) + δ 1. but they provide clear indications that electrostatic term appears to be always by far the most significant.8 4.7 CV error 2.148 The results given in Table 5.3 .97 REPD/6–31+ G* Average Error r 3.426 β Uelec + δ 0.2 with Randomised Uelec . However. 5. Function α β α Uvdw + β Uelec + γ(SASA) 0.137 0.1 4.2 have no influence on the free energies.449 β Uelec 0.96 Eq.58 3.6 3. in chloroform. Variable Uvdw Uelec SASA EPD/6–31G* Average Error r 3.6 0. The contribution to the total free energy from both Uvdw and SASA may at best not be properly resolved.039 9. at worst.9 2.471 0. Alternatively.1 5.94 9.1 5.2 3. these two terms have little if any systematic effect on total free energies of hydration.202 0. as would be expected.51 14.9 0.6 Alternative LIE Functions. Average Errors and CV errors Fitting to FEP using Each Function for the EPD/6–31G* Charge Set.448 α Uvdw + β Uelec -0. for example.5 0.042 0.331 γ δ 0.1 3.2. as is the case.CHAPTER 5.030 1.

a detailed statistical analysis of the LIE method was clearly necessary and this work is presented in the next section. In Eq.418 to 0.1 kJ mol−1 better than that observed for Eq. usually a positive term. ethane and propane are all approximately 8 kJ mol−1 . 5. an interesting two parameter equation is proposed involving only the variable.2. γ. the question must be asked as to whether the inclusion of a SASA term is statistically significant.449 and γ that range from 7. the SASA term has been replaced by a single parameter. In Eq. Finally.1 kJ mol−1 . 5.CHAPTER 5.4) The SASA term. Despite these preliminary results.155 close to the values of α obtained on fitting to FEP. 5. 5. 5. but with either fewer parameters.4 to FEP for all four charge sets in turn yields values of β that range from 0.4. The increase in error of 0. 5.147 Eq. the free energies of hydration of methane. fitting Eq. Even though fewer parameters are used.3 and 5. both. Given that the statistical errors on Uvdw and Uelec (data not shown) are generally larger than this. Since fitting to FEP free energies of hydration should remove any systematic influence arising from the different nonbonded parameters. .2. and the resulting fit is actually 0. or indeed.4 to 9. there is still some variation in them with charge set. Uelec .3) ∆G = β Uelec + δ (5. 5.3. However the variation is less than that observed in the parameters derived using Eq. energy terms.4 is that the α term corresponds to some averaged free energy of hydration for a solute with charge parameters of zero. A physical interpretation of Eq.4 also allow for this possibility by virtue of γ and α being positive in each case.2. Two equations of particular interest are the following: ∆G = α Uvdw + β Uelec + δ (5. one would expect the LIE parameters obtained for each charge set to be virtually identical. Indeed. was originally included to allow the possibility of positive free energies of hydration. TESTING OF REPD CHARGES BY FEP AND LIE 101 well as Eq. 5.6 kJ mol−1 is of the same order of magnitude as the statistical errors in Uelec .

156 In their work. an investigation was subsequently carried out concerning the most appropriate equation for the calculation of free energies of binding to neuraminidase.3. due to the previously discussed problems with the LIE method. TESTING OF REPD CHARGES BY FEP AND LIE 102 5. since the aim of LIE equations was ultimately to predict unknown energies. 5.1 Analysis of the LIE Method.3 5. identification of the most appropriate model should be based on predictive ability and not the quality of fit to the current data set. E . and likewise disagreement indicate possible problems.158–160 Secondly. it highlighted the fact that the widely used Multiple Linear Regression (MLR) method157 for assessing the significance of variables was not appropriate for the data set used due to intrinsic cross correlations within the data.157 To overcome these correlations. possible agreement between different sets would reinforce any subsequent conclusion. Firstly. This study revealed several important factors that must be considered when carrying out such an analysis.3. Therefore the purpose of this current study became twofold. The work in this section assessing the validity of the terms in the LIE equation applied to free energies of hydration was performed by Wall and Essex. In subsequent discussions the charge sets will be referred to using the following notation: U denotes the unrestrained EPD method and R REPD. P indicates polarisation functions in the basis set (6–31+ G*).2 Correlation Analysis.CHAPTER 5. and secondly. By having a range of charge sets to examine. generalised biased regression methods based on orthogonalised variables were carried out by implementing the continuum regression (CR) method. to identify the most important variables and valid fitting equation for such calculations. and otherwise the 6–31G* basis set is used. firstly to investigate whether the same issues were important for free energy of solvation calculations. Motivation for the Analysis. who were also applying the LIE method to the prediction of free energies of binding for inhibitors of the enzyme neuraminidase and undertook a similar analysis for their system.

Both these correlations are important.157 In this method one compares a quantity called the t-statistic to standard tabulated values for each variable and this tells whether the variable contributes significantly to the dependent variable. or less than −0.918 -0. For example. Conversely. RE. ∆G Uvdw Uelec SASA ∆G 1 -0. Similar values for these correlation coefficients were obtained for the other seven data sets. then there is a 95% chance that the two variables are genuinely correlated.CHAPTER 5.017 0. The data also suggest that there is a minor correlation between ∆G and Uvdw . The number of molecules in the charge sets is 20–22. RPS is the set where charges obtained by the REPD method at the 6–31+ G* level are fitted to free energies obtained by FEP simulations. Therefore. respectively. Table 5.42. One important assumption which underpins the MLR procedure is that the variables are independent. while the second suggests that Uvdw and SASA contain similar information and that one may be redundant to describe ∆G.11: Correlation Analysis of Energy Components for Data Set RE. The high correlation between Uvdw and SASA violates this assumption.42. but for different reasons. UE is the data set where the charges are obtained by applying the EPD method at the 6–31G* level and the fit is to experimental free energies of hydration. One possible method considered to determine the correct fitting equation was MLR. and shows two significant correlations. and little correlation at all between SASA and ∆G.183 1 and S represent fits to experiment and FEP simulation. the first between ∆G and Uelec and the second between Uvdw and SASA. The first step in the investigation was to perform a correlation analysis to detect any correlations in the energy and SASA data. The first supports the idea that ∆G is strongly dependent on Uelec .926 Uelec SASA 1 -0. ∆G.112 Uvdw 1 -0. if the correlation coefficient is greater than 0. indicating that the use of MLR is .11 gives the correlation matrix for a typical data set.313 -0. TESTING OF REPD CHARGES BY FEP AND LIE 103 Table 5.

For each data set. biased regression methods had to be applied. Intermediate values of αCR represent hybrids of these methods. αCR .5 where αCR = 0 and αCR = 1. αCR was taken as the value that gave this particular q2 value. the most predictive model was taken as the one with the highest q2 value.3 Biased Regression Methods. Therefore. CR was applied exhaustively to each data set using every possible combination of the descriptor variables. αCR = 0. TESTING OF REPD CHARGES BY FEP AND LIE inappropriate.2) but so were all equations involving subsets of these descriptor variables. However. Selection of the best model is based on the “Leave One Out” cross validated correlation coefficient (q2 ). The number of components used is decided according to standard significance tests. Having established that MLR is not valid for this data set.CHAPTER 5.3.5 to obtain a set of models each based on different component constructions.5 is PLS and αCR = 1 corresponds to PCR. These methods construct a series of orthogonal components from linear combinations of the original variables. 104 5. a more recent generalised procedure called Continuum Regression (CR)158–160 encompasses both these methods as well as Ordinary Least Squares (OLS). Uelec and SASA. The regression analysis is then carried out on these orthogonalised components. thereby alleviating the problems associated with correlated variables.5 correspond to OLS. not only was the full LIE equation considered (Eq. This best model is then transformed back to the original data space. 5. to determine the component construction criteria — αCR takes values between 0 and 1. It should be noted that each equation examined also includes a constant term that arises from the back transformation to the original data space. Partial Least Squares (PLS) and Principle Components Regression (PCR) are two well known biased regression methods which use different component construction criteria. . The Portsmouth formulation of CR159 implemented using the PARAGON drug design software161 uses a parameter. Uvdw . The CR procedure involves systematic variation of the αCR parameter from 0 to 1.

However. This is seen in the almost complete independence of q2 with αCR . Furthermore. Between the charge sets themselves.3. the largest variation in q2 being 0. data set UE RE UPE RPE US RS UPS RPS Uvdw * * * * * * * * Uelec * * * * * * * * SASA components 1 1 1 1 1 2 2 1 q2 0. The number of components was found to range from 1 up to the number of variables. Table 5.12 lists the most predictive model found for each data set.886 0.949 0. in five of the eight cases.862 0.5 shows an example of the dependence of q2 versus α for the RE charge set. The most common formulation of the most predictive equation was that containing just Uelec and Uvdw . Another point of interest is that for five of the eight models.038 for data set UPE. q2 and the Corresponding Value of αCR .935 0. the optimum value of αCR is 0. Since it was observed that OLS gave the best predictions in the majority of cases. Despite the very similar performance of many models. the best models were adopted for further analysis. 5.12: Best Model for Each Charge Set Showing the Variables Included. again it can be seen that RP and U charge sets are more predictive than the others.CHAPTER 5. any model including Uelec gave similar good values for q2 . many other models were constructed with different component constructions that were almost as predictive as the best model.817 0. for a given set of variables it was decided to carry out the model construction process . Figure 5.4 The Most Predictive Model. and that S sets are more predictive than E sets.3 0 0 0. TESTING OF REPD CHARGES BY FEP AND LIE 105 Table 5.913 0.960 0. Indeed. corresponding to simple OLS.2 0.5 MLR Versus CR. many models containing different descriptor variables produced results almost as good. Number of Components.1 0 * * * 5.3.955 αCR 0 0 1.

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE
1

106

0.8

0.6 q 0.4 0.2 0 0
2

0.5 αCR

1

1.5

Figure 5.5: Plot of cross validated correlation coefficient (q2 ) versus CR parameter αCR for model constructed from Uvdw and Uelec for charge set RE. using the build up MLR analysis, despite the correlations shown earlier, to examine performance. In this method, all descriptor variables were tested for significance and the most significant one, if it existed, became the first term in the model. This was repeated until no more significant variables could be added. Table 5.13 shows the results of the MLR procedure compared with the best model identified by CR. MLR fails to identify the best model in three cases, and when the optimum model is identified, the q2 is sometimes less than that obtained by CR, since MLR optimises r2 whereas CR optimises q2 . It should be noted that the differences in q2 are only very subtle, and for this data set the MLR procedure generates highly predictive models. Table 5.13: Comparison of Best Model Obtained by CR for Each Charge Set with that Obtained by MLR. Data Set UE RE UPE RPE US RS UPS RPS Uvdw * * * * * * * * CR Uelec SASA * * * * * * * * * * * q2 0.935 0.862 0.817 0.886 0.960 0.913 0.949 0.955 Uvdw * MLR Uelec SASA * * * * * * * * * * * * q2 0.933 0.862 0.803 0.883 0.960 0.911 0.943 0.955

* * * *

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

107

Table 5.14: Table of Coefficients, Standard Error (SE) and Significance for Each Variable for the Best Model for Each Charge Set. Data Uvdw Uelec SASA constant Set α SE. Sig. β SE. Sig. γ SE. Sig. δ SE. Sig. UE 0.073 0.062 n 0.414 0.024 y 8.0 1.8 y RE 0.179 0.099 n 0.467 0.043 y - 11.4 2.8 y UPE -0.062 0.091 n 0.334 0.053 y 3.3 3.6 n RPE 0.103 0.082 n 0.442 0.037 y - 10.4 2.6 y US 0.211 0.053 y 0.454 0.018 y - 13.6 1.5 y RS 0.642 0.391 n 0.513 0.060 y 0.111 0.114 n 3.8 12.0 n UPS 0.626 0.381 n 0.545 0.063 y 0.162 0.110 n -5.1 11.5 n RPS 0.585 0.209 y 0.540 0.036 y 0.135 0.066 y -0.4 7.4 n However, CR remains the method of choice, since if applied carefully it will always generate the most predictive model. This is not true of MLR, as was found in the statistical analysis on the neuraminidase system.156

5.3.6

The Significance of The Electrostatic Term.

Once the optimum model had been identified for each data set, a bootstrapping procedure162 was carried out on that model to estimate the standard errors on the coefficients and hence establish the significance of each variable. Table 5.14 shows the estimate of each coefficient, its standard error and whether or not the associated variable is significant at the 5% level. Bootstrapping suggests that Uvdw is only significant for two models and SASA for one. This suggested that the LIE equation involving only the constant term and Uelec was justified. Therefore a model containing just Uelec was studied. Table 5.15 presents the q2 for the best model and for the model containing only the constant and Uelec term. The reduction of the model to the one variable equation causes a negligible reduction in q2 for the prediction of experimental results and only a small reduction of 0.08 in q2 for the fits to FEP

results. It is important to note that when the bootstrapping is carried out again on the model containing Uelec and a constant, both terms this time were found to be significant in all cases. Since q2 is an indicator, not a definitive measure of predictive

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

108

Table 5.15: Comparison of Best Model With the Model Containing Just the Electrostatic Variable for Each Charge Set. Data Set UE RE UPE RPE US RS UPS RPS Uvdw * * * * * * * * Best Model Uelec SASA * * * * * * * * * * * q 0.935 0.862 0.817 0.886 0.960 0.913 0.949 0.955
2

Electrostatic Only Uelec q2 * 0.933 * 0.833 * 0.803 * 0.883 * 0.925 * 0.833 * 0.930 * 0.914

ability, the size of this difference suggests that a model containing only Uelec and a constant (Eq. 5.4) is appropriate for LIE calculations of free energies of hydration, supporting the earlier suspicions. The final coefficients and errors for Eq. 5.4 fitted using CR are given in Table 5.16. It can be seen that the UE error at 2.6 kcal mol−1 is the smallest of all of them, even smaller than the US error, and smaller than all the FEP errors. Whether or not there is an element of luck is hard to say, but this is nevertheless a remarkable result given that only two parameters are being used and the error is only 0.4 kcal mol−1 worse than the error for the original Eq. 5.2. The β parameter at 0.405 is also close to the value predicted by ˚qvist for dipolar molecules.152 A Table 5.16: Coefficients for the β Uelec + γ Model with Errors for Each Charge Set. Data Set UE RE UPE RPE US RS UPS RPS β 0.405 0.437 0.342 0.429 0.427 0.418 0.436 0.449 γ Error 6.0 2.6 6.1 3.6 4.9 4.0 7.5 3.5 7.8 3.0 7.4 3.5 8.0 3.0 9.1 2.7

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

109

5.4

Conclusion.

The free energies of hydration for REPD charges have been calculated using FEP. It was found that REPD/6–31+ G* charges gave the best reproduction of experiment, marginally better than commonly used EPD/6–31G* charges. The other charge sets fared significantly worse. This validates the inclusion of REPD charges in the macrobicycle 12 system. The LIE interaction method was found to perform even better, particularly for the EPD/6–31G* charge set, whose predictions were superior to both FEP either when fitted to experiment or to FEP results. The exact form of the LIE equation initially used was found to be overfitting to the Uvdw and SASA, either because their contributions were obscured by noise or because they were not significant. Based on this analysis a new LIE equation suitable for free energies of hydration was proposed depending only on the solute-solvent electrostatic energy and a constant.

new MC moves and a continuum solvent model were introduced.1 Identification of Sampling Problem.4.Chapter 6 Methods to Improve Monte Carlo Sampling For simulations to be useful. to know exactly what configurational space is available to the molecule. to make the system explore this space. and thirdly to explore this space quickly. Generation of Possible Host-Guest Structures. Therefore. This will only re- 110 .1 6. This is because the real experimental properties observed for a molecule are the average over all these possible configurations and so simulations must do likewise. Completely misleading results will almost certainly be obtained if only one area of configurational space is sampled. The three principal difficulties in achieving good sampling in simulations are firstly. secondly. it is generally not possible to do this by examining the different structures generated by the standard simulation protocol itself.3 proved to be negligible. as discussed in Chapter 3. These improvements led to significant improvements in sampling. they must be able to explore all areas of configurational space accessible to the system. The sampling in the macrobicycle 12 system in explicit chloroform using the host residue and regular solvent moves defined in Subsection 3. 6.1. When testing whether a simulation protocol is achieving good sampling of all low energy structures.

There is some potential for structural variation in macrobicycle 12. another method is required to generate all possible structures that the system can adopt. it also has a number of reasonably flexible dihedral angles.7. Guest molecules are fairly rigid as shown by the dihedral profiles in Figure 3. Therefore. only the phenylalanine derivative possesses dihedrals that can lead to different structures. Apart from the structurally insignificant methyl group rotations.CHAPTER 6. The Boltzmann factor used in the MC acceptance test is now much larger and so moves have a much higher chance of being accepted. These are also indicated in Figure 6. Most structural variation in large molecules is principally due to dihedral angles. veal what conformations the protocol can access.1.1. Many of these degrees of freedom can vary over a large range with little change in energy. METHODS TO IMPROVE MONTE CARLO SAMPLING H H H H 111 C O C C C H C C H C C C C H H H H H C C C C O H H C C C H H C N H C C H H C H H C H H N C C H H CH H H H H H H H H C H H H C O C H C C C H H C H N C H H N C C H C O H C H C O N H S Figure 6. This allows the . thiourea and aryl groups. while bonds and angles are restrained close to their reference values. Although it is bicyclic and does possess some rigid amide. Simulated annealing involves performing a MC simulation at an elevated temperature in the gas phase.1: The flexible dihedral angles in macrobicycle 12 (cross-section shown) and N-Ac-phenylalanine. These are indicated in Figure 6. These can then be compared with the structures generated by the simulation protocol in order to make a valid sampling assessment. not the ones it cannot. The method used to generate all possible structures is simulated annealing.

simulated annealing was carried out on the macrobicycle 12 system using MCPRO. Structures produced at this temperature are unlikely to show much resemblance to the structures of interest at room temperature. each consisting of 0.32 Host-guest complexes were constructed by placing in the host cavity one of the three guests with a particular conformation and stereochemistry. was placed on the guest to keep it inside the host cavity.1 This gave 10 types of host-guest complex. The temperature was then lowered in 20 steps to 20 K.1 M MC moves. METHODS TO IMPROVE MONTE CARLO SAMPLING 112 molecule to sample much more quickly a wider range of configurations than at room temperature. The presence of different guests may also lead to different structures. giving 309 structures in total. sampling can still produce many different structures. Too sudden a decrease in temperature can leave the structure trapped in a high energy minimum. . The restraint holding the guest was removed at the halfway stage of the temperature reduction. The structure was then minimised to 0 K using the Fletcher-Powell algorithm. In this work.5 M configurations in the gas phase. Simulations of the host alone may lead to distorted structures that may be of no use when comparing with structures produced by the standard protocol in host-guest free energy calculations. High temperature simulations were performed at 2000 K for 0. The procedure was carried out ∼30 times for each of the 10 possible host-guest complexes. effectively a weak bond. To ensure that the guest remained inside the host. Regular solute moves made up 10 % of attempted configurations while host residue moves were the remainder. Therefore. while at absolute zero. structures are fewer in number. The reason why absolute zero rather than room temperature is chosen as the final temperature is that at room temperature. more distinct and easier to classify. the temperature is slowly decreased during the simulation not just to room temperature but all the way to absolute zero. although more structures were generated for some guests than others.CHAPTER 6. The gradual decrease in temperature ensures that the molecule is gradually directed towards a nearby minimum energy structure that is close to if not at the global energy minimum. a small restraint. The exact nature of the guests is described in full in Section 7.

However.1. The middle two dihedrals remain in the trans conformation since the thiourea unit is held rigid. A wide range of quite different structures of varying energies were generated. 6. indicating that the host was relatively rigid. There were a number of small differences observed in host structure. The definition of a unique conformation is that all 14 of the dihedral angles N C O H dihedral number C C H H H H H H H H H H H H H H H H H H C N H H H H C C C C H H C C H H N N S S C C N N C C H H H H C C C C H H H H C C C H N O C N . each dihedral angle was assigned to either gauche + (g +).2: The distribution of g +. although much duplication of structures did still occur. t and g– conformations of the hydrocarbon chain from all structures generated from annealing runs. All structures had reasonably similar overall shapes. trans (t) or gauche– (g–).2 Analysis of Annealed Structures.CHAPTER 6. Amide groups also moved to some degree but all structures with the exception of a few of the high energy ones had the polar hydrogens of all four amide groups pointing inwards. Figure 6. To classify conformations in this chain. The conformation of the hydrocarbon chain containing the thiourea unit appeared to vary quite substantally between structures. METHODS TO IMPROVE MONTE CARLO SAMPLING 400 number of conformations 300 200 100 0 H H 113 g+ t g− Figure 6. Aryl groups were seen to adopt slightly different orientations. there was one quite significant difference between the structures produced. trans and gauche– conformations present in the hydrocarbon chain of the generated structures.2 shows the distribution of gauche +.

Regular solute moves comprised 5 % of all attempted moves. METHODS TO IMPROVE MONTE CARLO SAMPLING 114 Table 6. About half of the conformations are within 10 kcal mol−1 of the lowest structure for that particular guest. if indeed there is any such preferance. 1 2 3 4 Population 28 18 18 12 1 2 3 g+ t t g + g– t g + g– t g+ t t 4 g– t t g– 5 g– g– g– g– 6 g– g– g– g– 7 t t t t 8 t t t t 9 10 11 12 13 14 g– g– g– t t g+ g– g– t t g– g + g– g– g– t t g+ t g+ t g + g– g + in the hydrocarbon chain consist of a unique combination of these three specifiers.1. 6. There are no clear trends about which guests preferred which conformations.3 Sampling From Simulations.1: Population of the four most common hydrocarbon chain conformations in the annealed structures.32 The starting structure was an arbitrarily chosen low energy host-guest structure produced by simulated annealing containing the cis-glycine derivative. 156 had unique hydrocarbon chain conformations. Nevertheless. Of the 309 structures generated. while others are much higher in energy and are likely to be unrepresentative.CHAPTER 6.1 shows the four most common conformations observed. Figure 6. they are all gas phase minimum energy structures and the available conformations may well be quite different in chloroform at room temperature.3 illustrates the resultant sampling of the 12 flexible dihedrals in the hydro- . A gas phase simulation was performed using MCPRO. the rest being host residue moves. what is clear is that there are a lot of different structures and the simulation protocol used for free energy calculations must be able to sample between them. Table 6. However. Its hydrocarbon conformation was the same as conformation 2 in Table 6.1. An analysis of the resultant sampling showed that the limited motion of the aryl and amide units of the host was adequately reproduced by this level of sampling. Only so much can be learned from these conformations. Furthermore. The simulation was run for 5 M configurations at 293 K. there was a different story for the sampling of the hydrocarbon chain.

3: The dihedral distribution for host containing the cis-glycine derivative sampled only with regular solute and host residue MC moves. let alone sample many of the conformations observed in the annealed structures. for the host alone. with the possible exception of the last dihedral. The dihedrals about the C–N bonds in thiourea are not shown since they were not sampled. This poor sampling was identical for the host with other guests. carbon chain. These types of MC moves are incapable over a reasonable simulation period of going N . METHODS TO IMPROVE MONTE CARLO SAMPLING N 115 C N 8 4 0 8 4 0 8 4 0 8 4 0 H C C C H H H H C C C C H H H H H H H H O dihedral distribution (x10 configurations) 8 4 0 8 4 0 0 90 180 270 360 H H C C N N H H H H S S C C N N H H 5 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 H H C C H H H H C C H H C C H H H H H H C C N H C H H C O 0 90 dihedral / degrees 180 270 360 Figure 6. and the host in explicit chloroform. The sampling produced does not come close to producing even one conformation different to the starting conformation. Clearly.CHAPTER 6. the sampling of the hydrocarbon chain using only regular solute and host residue moves is inadequate.

distances are defined not in three but in four dimensions. Such a result made necessary the search for improved sampling schemes.168 In this method. they vastly increase the configurational space that must be sampled.6. The first of these is simply to run simulations at a higher temperature. Therefore. While all these methods may improve sampling. non-bonded parameters. each system must still be perturbed back in separate runs to the standard force field at room temperature. Thus atoms that would normally overlap in three dimensions may lie far apart in the fourth dimension. The second method is to soften the potential by reducing the strength of various terms in the force field such as dihedral.1 Approaches to Improve Sampling. the results of these calculations were found to depend significantly on hydrocarbon chain conformation. the annealing runs suggested that very large temperatures of at least 1000 K would be necessary to achieve adequate sampling. To determine whether different host conformations affect free energies. while the improved sampling would help in intermediate non-physical states in free energy calculations. 6. and the functional form for non-bonded interactions. The benefit of this is that the high energies produced by steric clash are reduced.CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING 116 from one conformation to another no matter how they are defined in the Z-matrix. the simulation must firstly be perturbed from three to four dimensions.2 6. some free energy calculations in chloroform were performed. leading to longer simulations. Furthermore. However. There are a number of techniques available to improve sampling by increasing the Boltzmann factor and hence increasing the acceptance.2.2. leading to problems with solvent heating. The full details of these are given in Subsection 7. To summarise.163–167 The third method is 4-dimensional sampling. a state at which the sampling would remain poor. The only way around this problem would be to do these end point mutations in . Methods to Improve MC Acceptance. However. no one conformation could possibly be used for free energy simulations.

There are the J-Walking175 and S-Walking176 methods that generate trial moves from configurations generated in another simulation run at higher temperature for which the sampling will be improved. The main application of these methods is to improve sampling in free energy calculations by increasing window spacings and thus overall calculation speed rather than to improve sampling for real states. Such moves.22 However. These techniques include “smart”169 and “force-bias”170 MC that preferentially make moves in the direction of the force. Umbrella sampling may be used to calculate a potential of mean force172 by pushing the system slowly from one state to another. Parallel tempering177 is a similar method that . Yet such an approach is likely to suffer from poor convergence due to the perturbation being very large. 6. the areas that are desired to be sampled should be known and small in number.2 Biased Sampling Methods. the potential is either slowly turned off or a systematic correction is made to account for the potential. would be inappropriate for crossing energy barriers. though.173 The fluctuating potential method of Liu et al. METHODS TO IMPROVE MONTE CARLO SAMPLING 117 one step and in one direction using the good sampling system as the reference state. There are techniques that seek to bias the acceptance of MC moves. For this technique to be useful. At the end points. Such a move may be used for protein side chains. Such methods must include a correction to the acceptance criterion to maintain reversibility.174 samples between the real potential and a softened potential in which the barriers of torsional profiles are chopped off. Umbrella sampling171 biases the sampling to a certain region of configurational space by adding a carefully chosen potential function to the energy.2. Another type of method is the configurational bias method that regrows parts of molecules so as to minimise clash with the rest of the system. this method requires the molecule to have a free end.CHAPTER 6. An umbrella potential may be used to gradually force a system from one state to another along a reaction coordinate. It is also possible to perform a free energy perturbation between two states by using the reaction coordinate as the λ coordinate.

moves that rotate a randomly chosen atom about the axis connecting its two neighbours. These methods.185 There are other move types such as reptation186 and end-bridging. There are two similar MC methods that sample configurations not according to the Boltzmann factor but to some other factor that allows sampling at higher energies. METHODS TO IMPROVE MONTE CARLO SAMPLING 118 attempts moves to simulations of the system run at a higher temperature. The other is entropy-sampling179 which samples according to exp(−S(E)). while promising. these two were not applicable to the macrobicycle 12 system.22 There is another MC move called the concerted rotation. Another possible MC move is the extended continuum configurational bias method. however. being more appropriate for polymer systems which have chain ends.3 More Sophisticated MC Moves. Again. It acts on a small molecule segment and does not require molecule ends. bonds and angles fixed. There are also extended concerted rotation moves which alter even more dihedrals. suffer from large computational expense since ρ(E) and S(E) must be calculated iteratively. The “jumping between wells” (JBW) method188 first locates a set of conformations that the molecule adopts. regrowing atoms into low energy positions subject either to geometric rules that enforce closure.2. One method is multicanonical sampling 178 which samples according to ρ(E)−1 .187 However. more sophisticated types of MC moves may be used. . 6. where S(E) is entropy as a function of the energy.181–183 This move is very localised and induces moderate conformational changes.180 This approach gives better sampling of high energy regions. It then defines a mapping from each conformation to every other and these mappings comprise the MC moves.184 or probability functions. There are “flip” Finally.106 This is a fairly complex move that causes a large variation in a number of consecutive dihedral angles while keeping the ends. Another technique is to sample configurational space according to the generalised statistical distribution of Tsallis. where ρ(E) is the energy (E) density. the annealing runs suggested that very large temperatures are necessary to obtain improved sampling.CHAPTER 6.

Fifthly. the system is incremented a small unit of time. the concerted rotation was selected since it apparantly had all the desired attributes to address the sampling problem.191 Configurations are generated by randomly assigning velocities from a Maxwell distribution to each atom. it is pseudodynamic and causes a large change in dihedral angles as commonly occurs in conformational change. Of the MC moves available.CHAPTER 6. Firstly. despite the numerical complexity it is still a relatively small and cheap move. it is localised and leaves other atoms intact. A method that does examine many minima and assesses their contribution to the ensemble is the “mining minima” technique.2. A further advantage of only implementing additional MC moves would be that they would require no alterations to the simulation algorithm nor any additional simulations. it was felt that the introduction of more sophisticated MC moves was essential regardless of what other techniques were used since the current moves did not appear physically capable of reproducing motion that resembled conformational change. this requires a knowledge of the potential energy surface and its minima and thus was not practical for the macrobicycle 12 system since there appeared to be far too many possible conformations. Only dihedrals and non-bonded interactions contribute to the energy change. Converting between different structures involves the alteration of many degrees of freedom in a cooperative fashion and a means to jump over possibly large energy barriers. then the final configuration is tested for acceptance. While different sampling schemes and biasing techniques may improve sampling. This was not considered as it would have required implementing molecular dynamics algorithms.4 Adoption of Methods to Improve Sampling. it suffers from no adverse changes in energy due to bonds and angles changing. There are also cluster moves that move atoms together.189 It would be possible to apply this method to the macrobicycle 12 system yet it is rather complex and requires many additional simulations. 6. METHODS TO IMPROVE MONTE CARLO SAMPLING 119 However. it has . Thirdly. Fourthly.190 Another sophisticated move is hybrid Monte Carlo. and it is difficult to decide which of these are important. Secondly.

And sixthly. METHODS TO IMPROVE MONTE CARLO SAMPLING 120 reasonable acceptance.188 Larger conrot moves such as the extended conrot move185 would suffer even worse problems. Small moves are ineffective because they explore conformational space far too slowly and in particular have problems crossing energy barriers. All the previously discussed methods may well improve sampling to some degree.167 However. the source code for the move was available. The sampling observed was as bad as that in Figure 6. Finally. Ideally what is needed is a type of MC move that involves the solvent moving at the same time.CHAPTER 6. none would be expected to overcome the dilemma of performing large moves without producing significant overlap with solvent. However. negating the very advantage of the conrot move. However. Three other MC moves were also implemented. despite the inclusion of all these moves. The large dihedral move alters particular dihedrals over large ranges. sampling of the real end points remains a problem for these methods. particularly the conrot move. Its implementation is described in the next section. Searches were made for possible reaction coordinates that mapped from one structure preferred by one guest . Possible exceptions to this might be 4-dimensional sampling168 or potential softening using the soft-core potential. Solvent molecules were found to crowd the hydrocarbon chain to such an extent that only small conrot moves were ever accepted. to be effective in causing conformational change. The flip move produces moderately large localised changes in dihedrals and was also applied to the hydrocarbon chain of the host.6) that the explicit representation of the solvent made it impossible for any of these large moves.2. This is a common problem for large MC moves. the three part solute move was designed to ensure good sampling of the guest in the host cavity. This move was used to sample the dihedral in the phenylalanine derivative.190 may provide a possible alternative but this move has only been developed for application to spheres on a lattice. The cluster move of Wu et al.3 when only host residue moves were used. it was found in later free energy calculations (see Subsection 7. yet a search of the literature revealed no such method.

The actual move involves the alteration of up to seven adjacent dihedral angles in a molecule.3 6. The question is.3. This restriction is equivalent to six constraints since the three atoms possess nine degrees of freedom minus two bonds and one angle which are already constrained. coordinated movement of atoms localised in a small section of the molecule. termed the . then all subsequent atoms will also be so. but nor could any host structures characteristic of each guest. the only degrees of freedom are dihedral angles. The conrot move is illustrated in Figure 6. the full move about seven bonds is always used. METHODS TO IMPROVE MONTE CARLO SAMPLING 121 to another preferred by another guest. Its primary application is in long chain condensed phase polymer systems and is designed to replicate real polymer motions. A concerted rotation. Since the move in this work is to be implemented in a cyclic system in which molecule ends are absent.4. Assuming that the bonds and angles are constant. It is constructed such that no bonds or angles are changed and that the rest of the molecule remains unaffected. If one dihedral. what is the minimum of dihedrals that can be altered such that the following three atoms remain fixed in space relative to the fixed atom in front of the moving atoms? If these three atoms are fixed.CHAPTER 6. satisfying the requirement that the move is localised.1 Additional MC Moves. solvation energetics would be obtained together with sampling as good as that in the gas phase. The number of dihedrals actually altered is less if the move is made closer to a chain end. Not only could no reaction coordinate be found. or conrot. move106 was designed to provide a way of inducing a significant. Restraints could be used to drive the structure along this reaction coordinate to the other structure. In the end the simplest solution was to replace the explicit solvent with a continuum representation. enabling the calculation of a potential of mean force. The Conrot Move. With this solvent model. The reason why seven dihedrals are adjusted is as follows. 6.

If one dihedral is altered slightly. it is impossible for the other dihedrals to adjust and still preserve constant bonds and angles. The essence of the problem is.13 The conrot move is then formulated such that it samples configuration space .CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING 122 Figure 6. An important point that must be made is that the section of macrobicycle 12 in which the conrot move is to be implemented consists of fourteen dihedrals. although two of these are actually fixed. Recently. an analytical solution has been applied to solve for the other dihedrals. there may be multiple solutions or there may be no solutions. The details of how this problem is solved are discussed elsewhere. is arbitrarily changed. and the driver is then randomly altered. if there are chain ends nearby. A driver dihedral and a direction down the chain are randomly selected. An example of there being no solutions is the case of a molecule whose dihedrals are all in the trans conformation. what are the values for the other six dihedrals? Furthermore. Hence there is indeed space to incorporate such a move. Thus four atoms actually move in space while all other atoms remain fixed.4: The seven dihedrals of n-decane that change in the conrot move. greatly simplifying the problem. This gives seven dihedrals in total.106 For a given new driver value. A typical number of solutions is 4–12 and is always an even number. what modifications are necessary in the algorithm? The solution to this problem involves the solving of a complex non-linear function. The conrot move works as follows. driver dihedral. then so must six other dihedrals be adjusted to satisfy the six constraints. for a given random displacement of the driver dihedral.

defining how and where the moves are made in the host. and allowing the thiourea unit to remain rigid in the hydrocarbon chain. Let m and n denote the original and reference states. The move is then accepted with probability J(n)/Nn ∆Vmn exp − J(m)/Nm kB T P(m → n) = min 1. A driver dihedral and direction are randomly selected and the driver is changed by a random amount in the range [−∆φmax . it was necessary to test that it was able to produce the uniform distribution of dihedral angles as is required for a MC move so that microscopic reversibility is satisfied (see Subsection 2. (6. if any. However.106 Finally. ∆Vmn must be calculated. including the Jacobian in the acceptance test. Once the conrot move was implemented. assigning all the residues that move in the move to the “greater residue” (see Section 3. the energy change. The smallest molecule on which to test the full seven dihedral move is united-atom n-decane. Theodorou. A Jacobian determinant is then calculated for both initial and destination states.CHAPTER 6. D. Since molecule ends were present.3. N. given by J(m) and J(n).1. One of these solutions is chosen randomly. ∆φmax ]. respectively. METHODS TO IMPROVE MONTE CARLO SAMPLING 123 reversibly as a proper Metropolis MC move. generalising the code to allow different bonds and angles. All Nn final solutions for the remaining φ1 –φ6 are then found. The conrot code was generously supplied by Prof. One of these must be the original solution. A number of these points are elaborated on later.1) 6. a large amount of coding was necessary to tailor the code to MCPRO and to the macrobicycle 12 system. respectively.6).2 Implementation and Testing of the Conrot Move. m. That is. It principally involved transforming between different coordinate systems in the two codes. the driver dihedral for the chosen destination solution is moved back to its original value and all possible Nm values for the other φ1 –φ6 are again found. The reverse problem is then solved. This is necessary because the solution space for φ1 –φ6 is not spanned uniformly due to the constraint that the chain end must remain fixed.5). all the other types of conrot move that involve changing .

4 shows the n-decane chain before and after the move.2 normalised distribution 1.2 shows the dihedral angles in united atom n-decane before and after the more. Incidentally. The results are presented in Figure 6. Uniform sampling is indeed obtained as desired. There is clearly a massive change in most dihedral angles.8 0. six or fewer dihedral angles were also attempted. ∆φmax was set to 180 ◦ . This time. most conrot moves are actually a lot smaller in real systems. non-uniform sampling is obtained.5: The dihedral angle distribution for n–decane averaged over all dihedrals and 1 M configurations with and without the Jacobian acceptance correction.CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING 1. including the correction generally has the effect of rejecting more configurations. Table 6. This demonstrates that the correction is essential to obtain the correct dihedral angle distribution. The acceptance rate for n-decane was 55 % including the Jacobian and 64 % without. However. Figure 6.1 1 0.5. The actual number varied depends on the location of the driver angle. The simulation was run for 1 M configurations. To give an indication of the possible changes in the dihedral angles in a conrot move. .7 With Jacobian Without Jacobian 124 0 90 180 dihedral / degrees 270 360 Figure 6.9 0. To make clear the need for the correction to the acceptance test using the Jacobian transformation. The conrot move is designed to produce large scale dihedral sampling. It is a purely geometric test and so all bonds and angles were made rigid and no dihedral or non-bonded energetics were included.5. the distribution obtained without the correction is also shown in Figure 6.3 1.

while two outside the hydrocarbon chain are included. METHODS TO IMPROVE MONTE CARLO SAMPLING 125 Table 6. An important consideration in applying the conrot move to macrobicycle 12 was where in the molecule the move is made. .6. Since the end dihedrals are typically sampled less well by conrot moves. This suggests that the twelve dihedrals about these bonds are the candidates for the conrot move. Thus in this way the poor sampling for the end dihedrals complements well with dihedrals that do not require good sampling. giving two more in total. The second modification involves the thiourea unit. half the conrot moves attempted were made over the end seven dihedrals at each end.4 for Illustration). These additional dihedrals would not be expected to be very flexible.7). Note that the two rigid dihedrals are ignored. The two dihedrals about the C–N bonds are rather rigid (see dihedrals D1–D4 in Figure 3. To further improve the sampling at the ends.3.CHAPTER 6. two modifications to this were made.3 Application of the Conrot Move to Macrobicycle 12.6: The twelve dihedrals sampled in macrobicycle 12.2: The Typical Change (degrees) in Dihedrals for a Conrot Move for n-Decane (See Figure 6. For a given driver included H H H removed S removed S H H H H included H H H H H H H H N H H N C C C C H H H H C C C C H H H H C C N N C C N N Du Du H H H H N C C C C C C H H H H H C C C H H H H N C O O Figure 6. This is illustrated in Figure 6. However. Dihedral Before φ1 61 φ2 270 φ3 292 φ4 60 φ5 158 φ6 60 φ7 51 After 245 68 137 234 68 90 297 Change 176 158 155 174 90 30 114 6. The hydrocarbon chain consists of twelve bonds from one junction carbon to the other. an additional dihedral at each end about a bond in the amide/aryl ring was included.

∆φmax has to be considerably reduced in order to produce reasonable acceptance probability. these dihedrals were chosen not to be sampled. twelve dihedrals in the hydrocarbon chain are eligible for the conrot move. Thiourea is already a fairly rigid unit so this approximation is reasonable. This typically gave around 10 % acceptance for the . for the n-decane system discussed earlier.4 Acceptance Probability of the Conrot Move. Such a rejection rate is affordable for idealised n-decane systems. the two C–N bonds would intersect at a single point as shown in Figure 6.CHAPTER 6. The price paid was that conrot moves are now generally smaller. Rather than adopting this approach. The dihedrals sampled are still exactly those that require sampling in the real system. raising the energy of the new configuration and leading to its rejection. but for macrobicycle 12 such large moves had negligible acceptance rates. Having a gap in the chain like this increases the complexity of the conrot move. In macrobicycle 12. However. A dummy atom placed at this point would replace the whole thiourea unit in the hydrocarbon chain for the conrot move. ∆φmax was set to 180◦ . The large move size and high acceptance observed for the ndecane molecule was a consequence of the flexibility and omission of energetics. For larger molecules with the energetics turned on. This led to 36 % of the moves being rejected because there were no solutions to the conrot move. Thus there is a high chance these dihedrals will be altered significantly. METHODS TO IMPROVE MONTE CARLO SAMPLING 126 dihedral. For any MC move. So in total. An implementation for dealing with these rigid units has been developed by Deem and Bader173 who have applied conrot moves to protein systems.3. there is little control over the values the other six dihedrals will take. setting ∆φmax to only 5◦ gave the best compromise between acceptance and large move sizes.6. a balance must be struck between the size of the move and its acceptance probability. 6. even if the driver dihedral angle is small. Therefore. By making the thiourea unit rigid and symmetric. Thus setting ∆φmax lower produced a better acceptance probability by making more moves eligible for the acceptance test. a much more simple alternative was found. the other dihedrals may still move significantly. For example.

4. the acceptance of the conrot move was still only around 10 %. so the reduction in sampling is only partial. The second feature that affects the conrot move is the presence of the guest.2. The first feature is that the hydrocarbon chain can clash with other parts of the host. Such a small value for ∆φmax has to be used because the host-guest system has a number of features that severely reduce both the acceptance probability and the effectiveness of the conrot move. The host structure was found to be quite different depending on whether a guest was inside (see Section 6. the conrot move on the same seven dihedrals starting at the other end does not suffer this restriction.CHAPTER 6. However. modelling the host as realistically as possible remained a priority not to be compromised unless absolutely necessary. Explicit solvent reduced the acceptance probability to around 7 % but as discussed earlier in Subsection 6. METHODS TO IMPROVE MONTE CARLO SAMPLING 127 host-guest complex. However. The problem with this is that the first dihedral of a conrot move tends not to be well sampled. particularly around the tertiary carbons at the junctions. with no large scale conformational changes occurring. as demonstrated later in Section 6. This was a possible argument for resorting to the simpler united-atom force field. The conformational sampling of the host alone was found to be superior to the sampling of the host when containing the guest. Nevertheless. the same as the with the guest. leading to poor sampling for dihedrals at the end of the sampled chain. the effectiveness of the conrot move was eliminated. A rather interesting occurrence arose here that demonstrates the point that a good acceptance probability does not necessarily imply good sampling. in continuum solvent.5 further on). The feature that reduced acceptance probability and conrot effectiveness dramatically was the presence of explicit solvent. However. the conrot move was found to adequately sample the conformation of macrobicycle 12.5. This problem is exacerbated by the use of an all-atom force field because the large conrot moves lead to even greater displacements in the hydrogen atoms. particularly since it lies close to the thiourea unit. .

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

128

6.3.5

Variations of the Conrot Move.

Two variations of the conrot move were tested to try and improve its effectiveness in explicit solvent. The first method was to allow bonds and angles to randomly vary in the conrot move. This increased flexibility may make it easier for the conrot atoms to arrange themselves in a different conformation. However, the results in explicit solvent were unchanged. The second method was to use configuration bias (CB).106 The conrot algorithm generates a number of possible final solutions and from these one is randomly chosen, regardless of the energy of this state. CB, instead of choosing a solution randomly, favours solutions of low energies by biasing the choice according to the energies of each solution. On testing this method, on the one hand, it proved to be much slower with all the extra energy calculations for every structure. On the other hand it gave about a three times higher acceptance probability. However, the overall sampling obtained was very similar. That this was the result was not so surprising. To achieve good sampling, getting a higher acceptance is not necessarily the key, since often it is the higher energy moves that involve large conformational change. These are the very moves that CB throws out. CB conrot does not introduce conrot moves that are any smarter. It only eliminates conrot moves that are less likely to be accepted. Such a feature is not going to improve sampling with explicit solvent. A biasing technique in favour of large conformational change rather than energy may prove more useful. In summary, neither of these methods were able to achieve improved sampling in explicit solvent.

6.3.6

The Flip Move.

The flip move181–183 is a simple move that alters the dihedral angles and angles around a particular atom. It works by randomly choosing an atom and rotating by a random amount about the dihedral defined by the axis connecting its two neighbours, as shown in Figure 6.7. This leads to a change in two angles, four dihedrals and no bond lengths. Quite large maximum displacements are possible. 50◦ gives 30 % acceptance for the host. If the moves are made too large, the angle distortions start to become

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

129

Figure 6.7: The mechanics of the flip move. significant. In applying this move to the host, it was applied only to the hydrocarbon chain, as for conrot, since this was the part requiring an improvement in dihedral sampling. However, it was not applied to the dummy atom nor to the two carbons adjacent to it in the thiourea unit since this would have changed the angles in the thiourea unit which are supposed to remain rigid. The maximum displacement used for macrobicycle 12 was set at 50◦ . By itself, this move was not able to produce conformational change in the gas phase. However, it does produce moderate dihedral sampling intermediate to the conrot and host residue moves.

6.3.7

The Large Dihedral Move.

The other special MC move was the so called large dihedral move. This is simply a Z-matrix coordinate dihedral move with a large maximum displacement of 180 ◦ . Its motivation was in achieving good sampling for the phenylalanine derivative. The preferred conformation for the C–C–C–N aryl “swing” dihedral of this molecule was found to depend on whether the amide bond was in the cis or trans conformation (see Subsection 8.6.4 for more details). However, a large energy barrier of the order 4 kcal mol−1 separated these two conformations. Normal Z-matrix coordinate moves were unable to cross this barrier. Thus for a cis→trans mutation, a MC move was required that could. The large dihedral move is such a move and is shown in Figure 6.8 for the swing dihedral of N–Ac–phenylalanine. Its acceptance probability is smaller than usual moves, ranging from 10 % in the gas phase down to around 3 % in explicit chloroform. In the gas phase, all possible conformations were successfully sampled. However, in explicit solvent the sampling problems were similar to those for the conrot move. Most of the time, the large move

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

130

H

C
H

H

C C
H

C
H

C C
O
H

C

H

H

C

H

C N
O
H

H

C
H

C
O

Figure 6.8: The large dihedral move acting on the swing dihedral of N–Ac– phenylalanine. simply led to the aryl ring crashing into solvent molecules. This was a further reason to use continuum solvent.

6.3.8

Three Part Solute Move.

Since one of the main concerns of the whole project involves how the host and guest interact, it is important that the guest is able to move around significantly within the host. Given that the guest is relatively rigid, a greater emphasis was placed on the ability of the guest to move around within the host cavity rather than the guest’s ability to change geometry internally. Therefore a more sophisticated move was designed for the guest. It is called the three part solute move. It consists of three types of move, the type chosen randomly. The first move is predominantly a translation, with a small rotation. The second move is predominantly a rotation with a small translation. The third is a regular solute move which has only small changes in translation and rotation. The maximum amplitudes are 0.4 ˚ for the translation A and 30◦ for the rotation. The acceptance for the first two types typically lies around only 10 %, while the third has better acceptance of 40 %.

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

131

chloroform continuum
ε = 4.81

ε=1
O

H

C N
H H

H

C
H

Figure 6.9: Acetamide in a cavity ( = 1) embedded in a dielectric continuum ( = 4.81) representing the solvent chloroform..

6.4

Parameterisation and Implementation of the GB/SA Continuum model.

6.4.1

The GB/SA Continuum Model.

The generalised Born/surface area (GB/SA) continuum solvent model33 provides an approximate means of studying the behaviour of a solute in solvent without explicitly modelling the solvent. Explicit solvent molecules are replaced by a polarisable dielectric continuum. Figure 6.9 shows acetamide in a cavity of dielectric in chloroform continuum with = 1 embedded

= 4.81. The omission of explicit solvent offers two

main advantages. Firstly, it leads to a solute-solvent calculation that is usually much quicker since there are no solute-solvent energies to compute. Secondly, it can lead to increased solute sampling because the solute no longer has the problem of steric clash with solvent molecules. It is this benefit principally that necessitated the use of continuum solvent for the macrobicycle 12 system. The use of continuum solvent effectively increases the speed of sampling of configurational space for the host-guest system because the continuum solvent is always in equilibrium around the solute. Other continuum solvent methods are in common use. These include the polarisable continuum method (PCM),192 and the Poisson-Boltzmann (PB) method.34 However,

The solvation free energy. ∆GvdW is the solute-solvent van der Waals term. where the dielectric boundary is taken as the van der Waals . respectively. The polarisation energy. which effectively amounts to the temperature at which the experimental quantities were obtained.2) where ∆Gcav is primarily due to the entropy cost in forming a cavity for the solute in the solvent. ∆Gsol = ∆Gcav + ∆GvdW + ∆Gpol (6. being one of the fastest and most widely tested. and ∆Gpol the solute-solvent polarisation term. the γi parameters are also temperature dependent and should not be used at different temperatures to that used in the parameterisation. One important note regarding the use of ∆GSA as an entropy term is that entropy contributions to free energies are always temperature-dependent. It is formed by augmenting the van der Waals radius of the atom by the solvent probe radius and removing any of this area inside the volume of other solute atoms. each with radius. (SASA)i is the area of a particular surface around an atom i that defines the closest distance that a solvent molecule may approach. αi . .3) where (SASA)i and γi are the total area and atomic solvation parameters for atom type i. qi . The chloroform probe radius was taken to be 2. ∆Gsol in the GB/SA model is broken up into three terms. in a continuum of dielectric constant. The first two terms are taken to be proportional to the solvent-accessible surface area (SASA) weighted according to atom type by the formula194 ∆GSA ≡ ∆Gcav + ∆GvdW = γi (SASA)i i (6.193 was adopted in this work. this calculation is computationally very expensive. Therefore.CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING 132 GB/SA.195 The (SASA)i terms are calculated analytically A using the method of Richmond for multiple overlapping spheres. ∆Gpol . is the energy required to form a cavity of dielectric 1 for a solute of n atoms with charge.5 ˚.196 While exact.

METHODS TO IMPROVE MONTE CARLO SAMPLING 133 surface of the molecule. It simplifies the . ρi = 0. a compensating scaling factor. However. αij = √ 2 (αi αj ) and Dij = rij /(2αij )2 . This is a very demanding calculation numerically since Ai (r) must be evaluated at every radius increment. It is possible to calculate it exactly by numerically integrating the ratio of the exposed area to the full area of a sphere over all radii centred at the atom i. This simply assumes that all other atoms overlapping with atom i do not overlap with each other.197 called the pairwise descreening approximation. commonly termed the effective Born radius. for which there is a simple analytical formula. Therefore. The radius. is the same as that for the entire molecule with all other atoms in the molecule still displacing solvent and their charges set to zero. is introduced to reduce the van der Waals radii of neighbouring atoms.5) where the limit. αi . calculated using the Born equation. The equation is called “generalised” because it is derived from the simpler Born equation which gives the energy for charging a single ion in a spherical cavity surrounded by a uniform dielectric. The effect of this approximation is to overestimate the eclipsed area. corresponds to the radius of that spherical cavity whose electrostatic energy. Thus the total eclipsed surface area for atom i can be decomposed into pairwise terms solely with one other atom. it may be calculated in a very quick manner using the approximation of Hawkins et al.4) i=1 j=1 where rij is the distance between atom i and j. and Ai (r) is the exposed surface area of atom i with radius r. this assumption is not generally true and must be accounted for. ∆Gpol is equal to the difference between the total electrostatic energy in solution and in vacuum and is given by the generalised Born equation33 1 n n ∆GGB pol = −166 1 − qi qj 2 2 rij + αij e−Dij (6. Of course. S.33 as given in the formula ∞ −1 αi = ρi dr Ai (r) r 2 4πr 2 (6.CHAPTER 6. is the atomic van der Waals radius.5(σi 21/6 ).

5(σi 21/6 ) is the atomic van der Waals radius scaled by a screening factor. It was a similar story for calculating the Born radii. To apply the GB/SA model. S.CHAPTER 6. This was done by modifying code taken from an earlier implementation in MCPRO coded by Richard Taylor. rather than just on a residue basis.2 Requirements for GB/SA. The scaling factor technique is also able to account in the reverse manner for any exposed area that is actually not able to contribute to solvation. A particular bug was discovered in the SASA module for an atom . In that implementation.6) where  : rij + ρj ρi  1 ρi : rij − ρj ρi < rij + ρj Lij =  rij − ρj : ρi rij − ρj Uij = 1 : rij + ρj ρi rij − ρj : ρi < rij − ρj except this time.4. for neighbouring atom. it had to be coded into MCPRO. firstly. METHODS TO IMPROVE MONTE CARLO SAMPLING Born radii calculation by replacing the integral with the expression 1 2 1 1 rij − + Lij Uij 4 1 1 − 2 2 Lij Uij 1 1 − 2 2 Uij Lij 134 −1 αi = ρ−1 − i j=i + ρ2 1 Lij j ln + 2rij Uij 4rij (6.198 The surface area and Born calculations were adjusted to be performed for the whole molecule. 6. the required GB/SA code had in turn been taken and modified from the Tinker software package. ρj = S 0. j. such as would be expected for area in narrow gaps between atoms. This was done because after a host residue move. there was little benefit gained in updating SASA around only that residue because more than half of SASA for the whole molecule usually changed.

no parameters could be found for OPLS-AA in chloroform. All that could be found in chloroform was parameters using OPLSUA charges154 or SM5 charges. numbering at least twenty.7) and it gives the electrostatic potential. The standard way to derive parameters is to find the values that reproduce a particular property for a varied range of small molecules. r. Here.3 Parameterisation to Poisson-Boltzmann Free Energies. 6. the easier option was taken to eliminate it which involved removing the degeneracy by altering one distance in the sixth decimal place. Therefore. φ(r).34 For this work. Secondly. These are the atomic solvation parameters. and the screening parameter. S. Since the bug was due to random numerical rounding. such as the two hydrogens on a methylene carbon. for molecules of complex shape.CHAPTER 6. In the literature.4. only the Poisson equation needs to be solved since the ionic strength is zero. METHODS TO IMPROVE MONTE CARLO SAMPLING 135 that had two neighbours at exactly the same distance away from it. The conventional properties used for GB/SA are experimental free energies of solvation to parameterise γ i and electrostatic free energies obtained from accurate computational calculations such as Poisson-Boltzman to parameterise S. By performing this calculation once in the continuum solvent to give φsol (r) and once in . such as finite difference. over all space. Such a small change would have no effect on the geometry. the parameters had to be derived. the implementation of the latter was dubious due to the greater complexity in the Gcav term and the excessive use of parameters.199 In any case. (r) is the dielectric constant and ρ(r) is the charge density. γi . This equation must be solved by numerical methods. a number of parameters were required. The method used to calculate the Gpol energies was finite difference Poisson-Boltzmann (PB). The Poisson equation is given by · (r) φ(r) = −4πρ(r) (6. Such parameters are likely to be force field dependent.

The value obtained for S in this way was 0. The potential A A at the box boundary was taken to be the potential calculated as if each of the solute atoms were independent Debye Huckel spheres.5 ˚.201 and solute random rotational averaging. A 65×65×65 grid of spacing 0.201 15 point harmonic dielectric smoothing. pol As expected.81 for the continuum and 1 for the dielectric cavity. practical and cannot be attributed any physical significance. 203 . The protocol for the PB calculations was as follows. Charged atoms with radii of zero lead to severe numerical problems in PB202 and infinite energies for GB (see Eq. one for each atom type197. These were the inclusion of the techniques of charge-antialiasing. it is less than unity and gives an indication of the degree of overlap between atoms.2 ˚ since the OPLS-AA force field assigns such atoms A a zero van der Waals radius. giving a box size of 19.56. ∆GPB is given by pol 1 2 qi (φsol − φi =1 ) i (6.200 A number of modifications to the original version had been coded into UHBD by Christopher Woods. A number of workers have used multiple S parameters. Each method is approximate and has been found to work best with such assumptions.CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING a dielectric of one to give φ =1 136 (r). The ρH for A polar hydrogens was set at 1. The dielectric constant was taken as 4. the boundary between these two regions of different dielectric was taken as the solute’s solvent accessible boundary using a chloroform probe radius of 2.5 ˚. One important difference with GB was that the GB calculations were done using the van der Waals surface as the dielectric boundary.8) ∆GPB = pol i where the sum is over all grid points.3 ˚ was used. This difference in treatments of the dielectric is largely historical. ∆GPB was calculated and averaged pol for ten random orientations of each molecule to remove rotational dependence due to the cubic grid. For the PB calculations. 20 molecules were used to find the S parameter that made the ∆GGB pol give the best reproduction of ∆GPB . 6. PB calculations were performed with a modified version of UHBD.4).

Using the values of ∆GGB .5 -3.6 -5. GB. 204 Free Energies for 20 Small Molecules (kcal mol−1 ) Molecule ∆GP B ∆GGB SASA/˚2 GSA ∆Gsol ∆Gexpt A pol pol water -6.3 -5.7 phenol -5.4 benzene -1.7 -6.3 kcal mol−1 .9 457 -3.6 -7. were then obtained by pol fitting ∆Gsol = ∆GGB + ∆GSA to experimental free energies. not only could more parameters lead to overfitting.3 0.4 388 0.8 -6.9 -7.4.7 352 1. METHODS TO IMPROVE MONTE CARLO SAMPLING 137 Table 6.CHAPTER 6.4 -3. The resulting energies are given in Table 6.0 -6.6 -4.3 -3.2 -5.8 423 -1.1 -5.3 297 0.1 449 -4.8 -5.3 ethanol -4.8 409 -3. the dielectric offset was found to be unnecessary.5 345 0.7 diethyl sulfide -3.8 pyradine -3.8 -1.1 -3.7 -4.1 -3. 6.6 toluene -2.4 Parameterisation to Experimental Free Energies.0 butanone -3.5 -3.8 -4.2 -7.8 228 3.7 -5. 199.0 -6.7 -2.9 -4.7 -1.6 -4.1 -6.1 -4.8 -7. γi .9 acetamide -7.5 398 -2.33. GB/SA and Experimental148.5 -4.3 -5.2 -3. methylamine -3.3 -2.0 -4.5 Average error 0.4 acetic acid -5.1 -2.9 -3. Only aniline stands out as significantly different.5 -3. at least. atomic solvation parameters.1 -2.4 403 -2.7 on the grounds that different atoms overlap to different extents.0 -5.9 -5.5 439 -4.1 methanol -4.6 propanoic acid -4.4 methyl acetate -2.6 332 -1.2 344 -0.1 -3.1 -5.3: PB. but one parameter was found to be quite sufficient. SA.9 -2.3 -6.7 . 154 In chloroform.6 -3.7 -3.5 -4.5 -2.4 -5.5 chlorobenzene -1.4 dimethylamine -3.0 -4. However.8 372 -1.8 -3.8 aniline -3.5 -6. Some workers prefer to include another parameter called the dielectric offset which moves the position of this boundary to improve the electrostatic term.5 -2.1 -5.9 -5.1 462 -4.3.8 303 0.4 427 -1. 199. A close fit between ∆GGB and ∆GPB pol pol was obtained with an average error of only 0.148.1 nitrobenzene -3.8 399 -1.9 acetaldehyde -3.6 -5.5 -7.2 352 -1. 204 Since different pol .5 -5.0 acetone -3.4 -7.

Ideally.5 ρH / ˚ A 1.3. This is despite the fact that . Different workers have a used a number of such parameters ranging from one33 up to twenty.CHAPTER 6. ∆Gsol is well reproduced for most molecules. Probe Radius/ ˚ A 2.56 atoms are expected to solvate to different extents. γO / cal ˚−2 A 16. but experimental free energy data could only be found for 20 varied molecules.7 kcal mol−1 was obtained.199 but this is still too many parameters for the size of data set.4. The free energy of solvation results with this parameterisation are presented in Table 6. with the exception of aniline which is too positive and ethanol and dimethylamine which are too negative.0 γrest / cal ˚−2 A −9. it is customary to assign different parameters to different atom types.5 Performance of the Derived Parameters. this suggests 2–3 as the number of possible parameters. Using the rule that the pol number of parameters advisable is log3 N . An average error with experiment of 0. Nitrogen and oxygen atoms were assigned one parameter. The best results were obtained when these were assigned according to the ability of the atom to hydrogen bond. respectively. the number of parameters chosen was 2. for which all atoms were included in the “other” category on the grounds that the hydrogen bonding ability was being over-represented.4: List of All Parameters Used in GB/SA Calculations. admittedly using of the order of 100 molecules. Such an assignment has been used previously in a free energy model by Fraternali and van Gunsteren. where N is the number of molecules. An inspection of Table 6. A complete list of all parameters used in GB/SA is given in Table 6.205 The exception to the rule was nitrobenzene.4. A 6.3 reveals that at least two γi parameters are required because some ∆GGB are greater than experiment and some are less.0 S 0. γi . The γ parameters obtained were 16 and −9 cal ˚−2 for γN /γO and γrest . METHODS TO IMPROVE MONTE CARLO SAMPLING 138 Table 6. Erring on the side of caution. more molecules would be used.2 γ N . while the remaining atoms were assigned the other.

except for dummy atoms. The application of the model to the macrobicycle 12 system is straightforward. hence the positive γN and γO . The new nitrogen parameter was 7 cal ˚−2 . Going to a three parameter fit with a separate γ for oxygen and nitrogen led to a marginal improvement with an average now of 0. Chloroform is a hydrophobic solvent. However.7 kcal mol−1 .0 ˚. it was decided that this model was not appropriate. dummies have a small but still substantial charge in a very small dielectric cavity.7. then very near to the ends of the perturbation. Since the electrostatic energy of solvation in chloroform is expected to be small compared to a solvent like water. METHODS TO IMPROVE MONTE CARLO SAMPLING 139 the most recent OPLS parameters are used for dimethylamine. (6.021 and b = 2. Such an atom is safely buried within the real atom it is bonded to and so will A .CHAPTER 6. one other parameterisation that was considered was to ignore the ∆GPB term altogether.6 kcal mol−1 .206 No adjustment of the GB/SA parameters was able to bring these molecules into line. ρi are also zero. and the negative γrest . It is interesting to note the sign of the γi parameters. The rough correlation between SASA and ∆Gsol pol evident in Table 6. if their Born radii. so polar groups solvate less well in it than non-polar groups. since the functionality of acetamide is critical to modelling macrobicycle 12. A the improvement was not enough to justify the inclusion of the extra parameter.3 kcal mol−1 was acetamide whose Born term is quite significant. The average error was 0.3 validates this assumption. While dummy atoms at one end of the perturbation have zero charge. Therefore. one critical outlier by 2. the radius of a dummy atom was set to 1.9) The parameters obtained were γ = −0. While neglecting the Born term leaves an inherently simpler model. However. the same as the previous fit which also had 2 parameters. the rest being the same as before. Such a charge has a huge electrostatic energy and the presence of this would lead to energy instabilities. ∆Gsol was then parameterised with two variables according to the equation Gsol = γ i (SASA)i + b.

The trajectory N .3.CHAPTER 6.5 Sampling of Macrobicycle 12 in Continuum Chloroform.10: The dihedral distribution of the hydrocarbon chain for the free host run over 30 M configurations. This was taken from a long run of 30 M configurations using the same protocol as in Subsection 7.1 but with no guest. METHODS TO IMPROVE MONTE CARLO SAMPLING N 140 C N O dihedral distribution (x10 configurations) S S C C Du Du H H 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 H C C C H H H H C C C C H H H H H H C C N N H H H H H H 0 90 180 270 360 N N H H 5 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 H H C C H H H H C C H H C C H H H H H H H H N H C C C C O 0 90 dihedral / degrees 180 270 360 Figure 6. pol 6. it is necessary to demonstrate that the hydrocarbon chain of the host is indeed sampling adequately. not contribute to ∆GGB at the mutation end points. Before presenting the free energy results.10 illustrates the sampling in each dihedral. Figure 6.

of the dihedrals is also shown in Figure 6. The middle six dihedrals are seen to move very frequently and produce the distribution expected for a hydrocarbon chain. METHODS TO IMPROVE MONTE CARLO SAMPLING N 141 C O 180 N H 0 180 H H C 0 180 0 180 C C H H H H C C H H H H H H 0 180 0 H H H H C C C C dihedral angle / degrees 180 0 0 10 20 30 N N S S C C N N Du Du H H H H 180 0 180 0 180 0 180 0 180 H H C C H H H H C C H H C C H H H H H H H H C C H 0 180 0 O N C 0 10 configurations (x10 ) 6 20 30 Figure 6.CHAPTER 6. Sampling is indeed observed to be reasonably symmetrical but not N C . Due to the C2 symmetry of the host.11: The trajectory for the dihedrals in the hydrocarbon chain for the host alone run over 30 M configurations.11. with most in the trans conformation and the rest in the gauche. The end dihedrals barely change at all. However. towards the ends of the hydrocarbon chain. the restraint of ring closure appears to limit the sampling. The sampling is now seen to be much improved on that obtained using explicit chloroform or with no extra MC moves. a symmetrical distribution would be expected for each chain.

Clearly. giving 300 in total. 97 had unique hydrocarbon chain conformations. Symmetrical sampling is not helped by the fact that the host when alone adopts a very distorted.5: Population of the four most common hydrocarbon chain conformations in the host. A simulation was carried out on a structure that had had the guest removed. This is primarily due to the influence of the guest in the annealed structures and reinforces the importance of having the guest in the host for the annealing structure generation. the sampling is not perfect. Some degree of internal hydrogen bonding occurred between adjacent amide units. structures were saved every 0. This in turn would be expected to reduce the sampling of the host hydrocarbon chain to some extent. METHODS TO IMPROVE MONTE CARLO SAMPLING 142 Table 6. when it binds inside the host. The cavity then collaped in on itself with the two aryl walls coming together. It is interesting to examine the structure of the host without the guest inside the cavity. Of these. the guest. As the simulation progressed. It can be seen that these are quite different to the most common conformations produced in the annealed structures. .1 M configurations. The different influences of each guest on sampling of the hydrocarbon chain is discussed later in Chapter 8. must organise the host to a large extent in order to fit inside. From the run of 30 M configurations.12.CHAPTER 6. The hydrocarbon chain hung off in a loop with the thiourea lying sideways to the chains. The host structure with no guest is found to be quite different to the structure found when it is complexed with the guest.5 shows the four most common conformations. Table 6. the thiourea inverted such that the sulfur pointed into the cavity and the polar hydrogens outwards. A typical host structure is illustrated in Figure 6. Although improved. 1 2 3 4 Population 29 24 15 12 1 t t t t 2 g– g– g+ g– 3 t t t t 4 t t g+ t 5 g– g– g– g– 6 t g– g– t 7 t t t t 8 t t t t 9 10 11 g– g– t t g– t g+ t t t g+ t 12 t t g+ t 13 g– g– t t 14 t t t t exactly so. asymmetric structure.

. The simulation protocol is now ready for the calculation of free energies in the macrobicycle 12 system.CHAPTER 6. The GB/SA continuum solvent model for chloroform was implemented to replace the explicit solvent model. flip. METHODS TO IMPROVE MONTE CARLO SAMPLING 143 Figure 6.6 Conclusion. MC moves introduced were the conrot. Methods to improve sampling of configurational space for the macrobicycle 12 system have been described. large dihedral and three part solute moves.12: The macrobicycle 12 structure without the guest. These modifications were shown to lead to a vast improvement in sampling. 6.

1 The Macrobicycle 12 System. The free energy protocol and results in explicit solvent are described. These problems led to the implementation of additional MC moves and the continuum solvent model as discussed and implemented in Chapter 6. These will be compared with experiment6 and rationalised. This in turn allows a comparison of the performance of the two solvation models. 7. Macrobicycle 12 was designed to bind the carboxylate forms of these molecules using a 144 . Kilburn et al. have been designing receptors for amino acids and peptides. The simulation protocol is now ready for testing on the macrobicycle 12 system. What follows is a full description of the experimental system and the data that the simulations are trying to replicate.Chapter 7 Free Energy Calculations for Macrobicycle 12 The aim of this work is to obtain relative free energies of binding of all combinations of enantiomers and conformations for the various amino acid derivatives. The host-guest free energies of binding obtained using the more successful continuum solvent protocol are then described and compared to experimental observations and free energies of binding. ideally leading to predictions of better hosts for binding.1. The Simulation System.1 7. together with their resultant sampling problems.

The TBA is not included in the simulations for three reasons. The centre two thiourea hydrogens are particularly important in this interaction. as was subsequently discovered. two of which contain a chiral centre. The host was designed so that its amide groups. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12145 H C H H C C C H C H C H H H H CH C H C N H H H C C O H H O H H C O C C C N N H C H N C O O C O H H H H C C H H H H H H H C N C C H C O H H N C H C C C H C H H N C H H S Figure 7. it is not necessary that the system be neutral . These would provide hydrogen bonds to enforce not just amino acid specificity and enantioselectivity.6 thiourea moiety. The other end is a charged carboxylate group counterbalanced by a tetrabutylammonium (TBA) ion.CHAPTER 7. In the computer simulations. but also conformational specificity.1 illustrates the general binding pattern in a cross-section of the host. would lie adjacent to the guest. The binding is due to two strong hydrogen bonds between the CO− 2 of the amino acid derivatives and the six polar hydrogens inside the cavity of the host.1: The general binding mode for N-Ac-l-phenylalanine to macrobicycle 12 (cross-section). Figure 7. Two biaryl methane units were included to link up the amide groups to rigidify the structure into a double ring. It is not considered important to binding. The hydrogen bonds on the right between the guest carbonyl oxygen and a polar hydrogen of the host are suspected of stabilising the cis conformation. one macrobicycle 12 molecule is modelled with one amino acid. The guests themselves are actually tetrabutylammonium amino acid derivative salts. Their amide nitrogen is capped by an acetyl group in order to keep this end of the molecule neutral. The experiments were performed in CDCl3 so that 1 H NMR spectra could be recorded.

firstly with no host in the CDCl3 . Cbzglycylglycine (glycine with a protecting group) was available from the syntheses and so was also tested for completeness. the OPLS force field does not even treat the hydrogen explicitly and uses a united atom representation for explicit chloroform with the hydrogen absorbed by the central carbon. ranging from a hydrogen for Gly to a methyl for Ala through to a benzyl for Phe. although the conrot moves implemented later in the protocol would have been capable of adequately sampling it. Binding constants were calculated by comparing the partitioning of guests between water and CDCl3 . The similar binding data obtained for all molecules . and the inclusion of a second ion would result in considerable sampling and energetic problems. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12146 in charge. 7. In any case. histidine and lysine. the glycine (Gly).CHAPTER 7.1.1. most of the experimental information was obtained for Ala and Phe. alanine (Ala) and phenylalanine (Phe) derivatives were chosen as a representative sample for study (pictured in Figure 1. The first of these is binding data. Furthermore. They were not considered because the –NH2 groups they contain can cause problems with protonation state. Benzoic acid and hexanoic acid were also studied to gauge the significance of the carboxylate-host interaction.2 Experimental Data. and secondly with the host present. The solvent used was either explicit or continuum chloroform.1). This sample of three amino acids provides a reasonable range in size of side chains. not CDCl3 . These abbreviations for the amino acid derivatives are used in the remainder of the text. The temperature of the system was 20 ◦ C. The experimental binding data is given in Table 7. For example. glutamine. This was because only chloroform force field parameters were available. There were two main experimental data with which to compare the simulation. The lysine group was also rather flexible. the aim of the study was not so much to examine amino acid specificity but rather the preferred enantiomer and conformation of each amino acid. Seven different amino acids were studied experimentally. Of these. although the difference is expected to be negligible. The other four amino acids available for testing were asparagine.

It was the second set of experimental data.0 ± 2. suggesting that some was still present in the trans conformation.07 5. and trans for the d forms.0 ± 20.22 5. Secondly.57 ± 0.52 ± 0.9 16.12 6. An analysis of these structures revealed the hydrogen bonding pattern responsible for the binding and stabilisation of the cis amide bond.3 ∆G/kcal mol−1 6.07 suggested that almost all the binding was due to the carboxylate-host interaction and not to the other differences between each amino acid — hence the lack of specificity observed.6 ± 2.05 5.9 14.33 ± 0.86 ± 0.47 ± 0.0 9. While in each case the guest was observed to bind to the thiourea unit.14 ± 0. In addition. ROESY spectra indicated the presence of a cis amide bond for the l forms of the two amino acids tested.12 5.0 5. Ala and Phe.0 6.36 ± 0. 1 H spectra revealed the stereoselectivity of the host.4 28.6 ± 7.1 ± 3.43 ± 0.9 ± 5.6 Amino acid N-Ac-glycine N-Ac-l-alanine N-Ac-d-alanine N-Ac-l-phenylalanine N-Ac-d-phenylalanine N-Ac-l-asparagine N-Ac-d-asparagine N-Ac-l-glutamine N-Ac-l-histidine N-Ac-l-lysine Cbz-glycylalanine benzoic acid hexanoic acid Ka /103 mol−1 68.6 55.10 5. Firstly. while l guests preferred to bind on the inside.07 6.CHAPTER 7.6 ± 7.9 ± 1.8 ± 2.05 5. d guests showed a preference to bind on the outside of the host.07 5.81 ± 0. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12147 Table 7.4 ± 17.1 ± 8. 70 % of the l-Phe appeared to be in the cis conformation.0 130.6 A range of structures were generated using simulated annealing and molecular dynamics simulations with a united atom representation.95 ± 0.28 ± 0.1: Association Constants for Macrobicycle 12 with Various Tetrabutylammonium Carboxylates in CDCl3 . some structural modelling work has been done on the system.8 13.3 ± 7. The carbonyl oxygen of the guest appeared to be hydrogen .9 11. the NMR data.04 ± 0.07 5.07 5.8 ± 8.0 8.66 ± 0.22 5. that revealed the interesting binding behaviour of the host. most likely bound on the outside of the host like the d form.0 22.

25. Ala and Phe to obtain relative binding free energies. These mutations are shown in Figure 7. and their difference gives the relative binding free energy using Eq. They provide another means of calculating binding free energies. 2. Computer simulations are ideal for this for a number of reasons. mutations were only performed for guests in this . Experiment showed that the guests were able to bind either inside or outside the cavity. Finally. The actual free energy mutations performed are between all l and d. It is primarily the results concerning the stabilisation of the cis amide bond that this work aims to rationalise. Since all the interesting binding behaviour appeared to be occuring for guests bound inside the cavity of the host. They make possible the study of all binding complexes both strong and weak. The weak complexes are difficult to probe by experiment since they are rarely observed.2. Gly is mutated to Ala. and cis to trans. Perturbations were constructed between molecules that were the most similar in shape so as to keep the mutations small and minimise the computational effort. 7. These quantities are calculated using the method described in Subsection 2. no energetic studies were undertaken nor rationalisations of why different amino acids bound differently. cis and trans forms of Gly. which has no stereochemistry. Ala to Phe.3 The Role of Computer Simulations.2.CHAPTER 7. The preference for one particular molecule to bind inside the cavity over another requires only relative free energies of binding. Free energy perturbations are performed once in the host and once in chloroform. However. Gly.1.3. if at all. they provide energetic and structural information that gives clues to how the binding occurs.1. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12148 bonding to an amide hydrogen in the main ring of the host as shown in Figure 7. There are no l to d mutations since these require two large perturbations rather than one. serves as the connection between the l and d molecules. Possible binding structures for the d enantiomers on the outside were also investigated. The rest of this chapter concerns the free energy calculations themselves.

27. position.1 Explicit Solvent Free Energy Calculations. the molecules are now flexible so free energies for the gas phase mutations have to be calculated. Their relative free energies would be expected to more resemble those calculated in pure chloroform.2 7. 2. Unlike the free energy of hydration calculations in Chapter 5. free energy perturbations were also performed in the gas phase so that free relative energies of solvation could be obtained using Eq. For gas phase calculations. Configurations were generated at a temperature of 293 K using MC Metropolis sampling.2. Guests bound outside the cavity are more in a solvent-like environment and will be less influenced by the host.2: The free energy perturbations performed to calculate relative binding free energies. To further understand if the solvent is able to play a role in stabilising any enantiomers or conformation. Each λ window had 3 million (M) configurations of equilibration and 5 M of data collection.CHAPTER 7.32 Free energies were calculated using FEP. Gas Phase Simulation Protocol. The gas phase free energy simulations were performed using MCPRO. this was found to be well in excess of that required to give converged results but their speed was so fast . FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12149 l-cis-Phe T E l-trans-Phe T l-cis-Ala T E l-trans-Ala T cis-Gly c E trans-Gly c d-cis-Ala c E d-trans-Ala c d-cis-Phe E d-trans-Phe Figure 7. 7.

avoiding the large dihedral term and the steric clash which would occur when the perturbation was performed in the host. In the regular solute moves. molecules in most cases could perturb quite quickly without introducing hysteresis. Free energies were calculated using the same method as described in Subsection 5. It is important to note that the amide variable dihedral angle was sampled to a small extent but not so much that it could interconvert between cis and trans during a single simulation. With no solvent to crash into. Combining these errors for all windows gave a total error for the perturbation. For the cis–trans mutations. an oxygen and three dummy atoms were perturbed to a methyl group. The three types A of mutation are illustrated in Figure 7.3. ranging from 2 for Gly–Ala mutations up to 10 for Phe cis–trans mutations.CHAPTER 7. but it is also a path with a much lower energy barrier.3. Initial dummy atom bond lengths were set to 0.2 ˚. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12150 that this was not an issue.3. The only additional special MC move necessary for these calculations was the large dihedral move (see Subsection 6. For the Ala–Phe mutations. The error for each window was taken as the differences between forward and reverse free energies. a hydrogen and ten dummy . This large dihedral move comprised 5% of all attempted configurations. Identical starting geometries were used for each window. with concomitant increases in bond lengths. Phe required more windows because it is a lot more flexible than Gly. respectively. Therefore the window spacing could be set at a fairly large spacing. effectively swapping the two.1. The Gly–Ala mutations were performed by growing a hydrogen and three dummy atoms into a carbon and three hydrogen atoms. and the reverse process applied for the methyl group. This was achieved by setting its maximum amplitude to 27◦ which is far too small to allow it to climb over the ∼14 kcal mol−1 barrier separating the two conformations. Such a perturbation is not only simpler than a direct perturbation around the dihedral angle of the amide bond.7) to sample the swing dihedral of Phe. Mutations of isolated molecules were only performed for the d isomers since it was assumed and verified that l and d isomers would give exactly the same result in the gas phase. all Z-matrix variable angles and dihedrals were sampled while all bonds were kept fixed.

a much larger mutation.2. Preferential sampling 207 was used to improve the sampling of solvent molecules around the solute. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12151 H H H H H H C C C C N H O H Gh O Gh Gh C C C N H O O O GhO Gh Gh H C H H N-Ac-cis-glycine H H H N-Ac-trans-glycine H Gh Gh H Gh H H H H H C H C C O C N H O H C C C N H O C O C H H C C C H O O N-Ac-glycine Gh Gh Gh Gh Gh Gh Gh Gh Gh H Gh N-Ac-alanine H C C C H H H H C H H H H C H H H C C H C C N H O H C C N H O C C O O O O N-Ac-alanine N-Ac-phenylalanine Figure 7.15 ˚ and A . atoms were grown into a phenyl group.CHAPTER 7. There were 3 M configurations of equilibration and now 10 M of data collection.3: The three types of perturbation performed for the amino acid derivatives. The protocol for the perturbation of the guest alone in explicit chloroform had a number of differences to the gas phase protocol. With the exception of the large dihedral move. Maximum move sizes for solute translations and rotations were selected to be 0. none of the improved sampling schemes described in Chapter 6 were used in this particular protocol. Periodic boundary conditions were used together with a non-bonded molecule-based cutoff radius of 10 ˚. Simulations were performed in a box of side 33 ˚ containing 265 OPLS chloroform molecules.97 Configurations were A generated in the NPT ensemble at 20 ◦ C and 1 atm.2 Explicit Chloroform Protocol. 7.5). although no feathering of the potential was included for these simulations (see A Subsection 3.

two different starting structures were examined. The first protocol lacked all of the additional sampling schemes discussed in Chapter 6.1. 3. The second protocol was more advanced and included all the additional MC moves from Section 6.1. and the rest solvent.3 % host residue. The purpose of this was to examine the effect of starting structure on the free energies obtained. Preferential sampling was used to improve A the sampling of solvent molecules around the solute. large dihedral moves 2%. The first was taken from a structure used in the previous modelling work on the system. for both A protocols a much larger box was used. Again. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12152 15◦ . 2. The breakdown of move attempts was 0. the remainder of the protocol was the same as for the guest in explicit chloroform. Regular solute moves were A attempted 5% of the time. For the second protocol. and volume moves 0.2.3. Perturbations were only run for the thermodynamic cycle containing the cis and trans forms of Gly and l-Ala in Figure 7. The regular solute moves contained solute translations and rotations of 0.017 % volume.5◦ . Between 10–15 windows were included and carefully spaced. since the perturbation had to be done slowly to avoid clashes with the solvent. Compared to guest-only simulations. The breakdown of move attempts was 0.7%. Two protocols for the host in chloroform were tested. A For the first protocol. only the d isomers were considered. There were 3 M configurations of equilibration and 10 M configurations of data collection per window. Apart from the differences described here. 5 M configurations of equilibration and 20 M of data collection were used per window. 1 % . 3.017 % volume.5 % three part solute for the guest. 17 different host starting structures were taken from the lowest energy host-guest structures generated from the simulated annealing runs described in Subsection 6.3 % host residue and 1 % regular solute. with the remainder being solvent moves. The maximum size of volume moves was set to 1000 ˚3 to reflect the larger box volume.015 ˚ and 1. The box was of dimension 41×44×45 ˚ and contained 592 chloroform molecules to enclose the larger host. The maximum volume move sizes were set to 700 ˚3 .6 The second was another structure of similar energy with the hydrocarbon chain in a different conformation.CHAPTER 7.

placing the additional windows in between two existing windows so as to equalise209 or minimise52 the error in each direction. This was not so important an issue for the free energy of hydration calculations in Chapter 5 which were fairly cheap. Another was to use the statistical error as the guide.2. A number of these methods involving partitioning according to the errors and free energies were tested. such approaches are generally flawed in that they require prior simulation. a balance had to be struck between the number of windows and length of simulation. the main assessment criterion in this work for evenly spaced windows was reducing the hysteresis between forwards and reverse windows to around 0. but for the macrobicycle 12 system. The optimum situation is to obtain the most accurate free energy change possible for a given total simulation time. One approach is to spread the free energy change evenly between each window. This is because the large energy differences generated for the perturbed states are usually due to Lennard-Jones .210 but this does require a separate simulation to determine the spacing. the remainder being solvent moves. 7.208 an approach that requires iteration. For a fixed number of total configurations.3 Window Spacing. Generally. While there was no hard and fast protocol. Then the windows had to be spaced in the most efficient manner. Only the cis-Gly to trans-Gly perturbation was performed for the second protocol. more windows means smaller sampling length. Yet another was to equalise the entropy differences for each window. it is worth mentioning a little about window spacing in free energy perturbation calculations.1 kcal mol−1 for all windows. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12153 conrot and 1 % flip. Firstly. the expensive simulations necessitated some window spacing optimisation. the window spacing was the more critical factor than simulation length in achieving this. The spacing of windows is important for convergence reasons. While some success was obtained. There have been a number of recommendations in the literature as to the optimum way to space windows. Before presenting the results.CHAPTER 7.

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12154
150 3

∆SASA / Å

50

∆ G /kcal mol 0 0.2 0.4 λ 0.6 0.8 1

100

2

−1

2

1

0

0

0

0.2

0.4 λ

0.6

0.8

1

Figure 7.4: The change in SASA and free energy with λ for the cis-Ala to cis-Phe perturbation in explicit chloroform. contributions rather than electrostatics.155 Hence the free energy, which depends on both terms, does not serve as a suitable guide. Statistical errors can also be misleading since they can arise from other causes and do not necessarily indicate bad window placement. One the one hand, two consecutive windows can give very well converged but quite different results, while on the other hand, the size of the error can scale with the free energy change rather than due to any underlying statistical uncertainty. The approach adopted here was to assume that the windows should be spaced to minimise large changes in Lennard-Jones energies. Rather than calculating LennardJones energies, though, it was assumed that the change in molecule’s solvent accessible surface area (SASA) would correlate well with these energies because an even spacing of windows with respect to SASA approximately correlates with an even change in Lennard-Jones energy. At each λ value, SASA could be quickly calculated. Extensive sampling might still be necessary to obtain an average value of SASA. However, due to the approximate nature of the approach, such a refinement was not considered to be necessary and a single configuration was used. Such a hypothesis was indeed found to be the case and subsequently proved of great use in spacing windows to obtain the optimum distribution. Figure 7.4 shows how SASA and the free energy vary with λ for the cis-Ala to cis-Phe perturbation in explicit chloroform. The small change in area for small λ allows a large spacing in window. This leads to large but well converged free energy change. For high λ, the area changes rapidly, necessitating

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12155

cis-Gly 3.84
c

-3.19 0.01 -3.67 0.01 -4.15

E

trans-Gly 3.35
c

d-cis-Ala 3.60
c

E

d-trans-Ala 3.11
c

d-cis-Phe

E

d-trans-Phe

Figure 7.5: The gas phase relative free energy perturbation results (kcal mol−1 ). The square box gives the closure for each cycle. closely spaced windows. Despite this close spacing, there is still a moderate error in free energy changes. These errors would have been even worse had the windows been more distantly spaced. As for the number of windows, a certain number was required to obtain the desired hysteresis goal. However, if too many windows were added, the summation of errors due to each window from such a large number of windows outweighed the gain in closer window spacing. Such an imbalance could only be rectified by longer sampling. Hence 10–15 windows were used as a compromise.

7.2.4

Guest Free Energies in the Gas Phase.

The gas phase free energy results are given in Figure 7.5. The numbers in boxes in the middle of each cycle represent the closure of the thermodynamic cycle. Ideally, this value would be 0. It can be seen that the gas phase runs are very precise with a closure of 0.01 kcal mol−1 for each cycle. The statistical errors are less than 0.01 kcal mol−1 and are not included since they are insignificant compared to the errors obtained for the chloroform and host simulations. There are a number of interesting points to note from the results. Firstly, the obvious stability by 3–4 kcal mol−1 of the trans form relative to the cis is evident for all three amino acid derivatives as would be expected. This is in reasonable accord with previous experimental8 and theoretical211 results which both gave 2.6±0.4 kcal mol−1 for N-methyl acetamide in water. Importantly, the cis to trans energy difference

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12156

cis-Gly
c

−0.40 ± 0.15 0.09 −0.04 ± 0.23 0.22 0.88 ± 0.18

E

trans-Gly
c

−0.61 ± 0.11

−0.34 ± 0.15

cis-Ala
c

E

trans-Ala
c

−2.88 ± 0.37

−2.18 ± 0.36

cis-Phe

E

trans-Phe

Figure 7.6: The relative free energies of solvation of the amino acid derivatives in explicit chloroform (kcal mol−1 ).The square box gives the closure for each cycle. appears to be largely independent of side group. The second point is the difference in energy between different amino acid derivatives. There is an increase in free energy observed going from Gly to Ala. This is probably due to a rise in internal energy of ∼3 kcal mol−1 that accompanied this mutation. However, the increase in free energy going from Ala to Phe only has an internal energy rise of ∼2 kcal mol−1 . The difference between the internal energy and free energy probably arises from a loss of entropy due to the presence of a hindering phenyl group.

7.2.5

Guest Free Energies in Explicit Chloroform.

By subtracting the free energy change in a perturbation in the gas phase from the corresponding free energy change in chloroform, relative free energies of solvation are obtained by Eq. 2.27. Figure 7.6 contains the relative solvation energies in chloroform between all the amino acid derivatives. Errors are now of the order 0.1–0.3 kcal mol −1 and the thermodynamic closures of 0.09 and 0.22 kcal mol−1 are less exact than before, reflecting the poorer sampling for the solvated system. The solvent has a small and mixed effect on the cis to trans equilibrium. Compared to the respective trans forms, cis-Gly is slightly destabilised by the solvent, cis-Ala is unaffected, while cis-Phe is stabilised by the moderate amount of 0.88 kcal mol−1 . Another observation is that the relative free energies of solvation increase with increasing molecular size. This is to be expected given the greater number of favourable energy interactions a larger molecule can have with the solvent.

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12157

l-cis-Ala
T

−0.10 ± 0.44 0.48 0.68 ± 0.73

E

l-trans-Ala
T

−2.46 ± 0.41 cis-Gly

−2.76 ± 0.84
E

trans-Gly

l-cis-Ala
T

3.96 ± 0.86 0.72 3.45 ± 0.49

E

l-trans-Ala
T

−3.99 ± 0.26 cis-Gly

−2.76 ± 0.23
E

trans-Gly

Figure 7.7: The host in explicit chloroform relative binding free energies (kcal mol−1 ) using two different starting structures with no special MC moves. The square box gives the closure for each cycle.

7.2.6

Host-Guest Free Energies in Explicit Chloroform.

The relative free energies of binding for the cis and trans conformations of Gly and l-Ala are obtained by subtracting the free energy change for the guest perturbation in chloroform from the free energy change for the same perturbation in the host (Eq. 2.25). The results are presented in Figure 7.7 for the two different starting structures. The closures for each of these cycles of 0.48 and 0.72 kcal mol−1 are now a lot worse than for the runs in chloroform even though both runs have the same 10 M configurations of data collection. Obviously the larger system now being studied would require more sampling to make a fairer comparison. However, the discrepancy in free energies obtained for each structure makes it clear that there is a more serious problem. In the first structure, the relative free energies of the cis to trans structures are −0.10 and 0.68 kcal mol−1 . However, in the second structure, the same perturbations have free energies of 3.96 and 3.45 kcal mol−1 . Such a dependence of free energies on starting structure is very unsatisfactory and quite misleading. One structure indicates a stabilisation of the cis structure, while another indicates no stabilisation. This, together with the poor sampling of the host in explicit solvent as discussed in Subsection 6.1.3 led to the development of improved sampling schemes involving additional MC moves.

Continuum calculations were performed using the version of MCPRO now including the continuum chloroform GB/SA model parameterised in Section 6.8 shows the results for the relative free energies of binding obtained. Continuum solvation free energies. the explicit solvent was replaced by a continuum solvent.8: Distribution of relative free energies of binding to the host for cis-Gly with respect to trans-Gly using 17 different starting host geometries. The positive relative binding free energies around 4 kcal mol−1 for some host structures indicate that the host is stabilising the cis conformation. equivalent to solvation energies. for most structures the relative binding free energies are centred around 0 kcal mol−1 .4. It can be seen that despite the inclusion of the additional MC moves. the relative free energies of binding still show a marked dependence on starting structure. An area calculation . It should be noted that all the structures that did produce stronger binding for the cis conformation came from annealed structures containing a cis guest.3. The free energies were calculated going from cis-Gly to trans-Gly using 17 different starting structures taken from the annealing runs. These additional MC moves were incorporated into the second protocol in explicit solvent.CHAPTER 7. To develop a protocol that could produce good sampling for the system and free energies independent of starting structure. 7. The GB/SA protocol was the same as that described in the continuum chloroform parameterisation. are added to the Hamiltonian in the FEP equation to give the total free energy. Continuum Chloroform Simulation Protocol.3 7.1 Continuum Solvent Free Energy Calculations. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12158 • • • • • • • •• • • • -1 0 1 2 ∆∆G bind • •• 3 /kcal mol −1 • 4 • 5 Figure 7. the assumption was made that SASA varies sufficiently slowly not to require updating at every configuration. Since the SASA calculation is rather time intensive. However. This suggests that these host structures were possibly biased towards stabilising cis guests. Figure 7.

7. structures were saved every 0. 5% large dihedral (for Phe). even though the sampling would be expected to be the same.CHAPTER 7. 20% conrot and 13% flip. Free energies this time were calculated for all stereochemistries and conformations of all three amino acid derivatives. the remainder being host residue moves. The mutation with the worst error was the d-Phe cis to trans with an error of 0.3.1. There were 1 M configurations of equilibration and 8 M configurations of data collection per window.22 and 0.2 Guest Free Energies in Continuum Chloroform.1 to enable faster equilibration. The closures of the cycles at 0. Therefore. apart from the fact that there were 1 M configurations of equilibration and 5 M configurations of data collection per window. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12159 at every configuration would be prohibitively expensive. The breakdown of move attempts was 26% three part solute. host starting structures were taken from the lowest energy host-guest structures generated from the simulated annealing runs described in Subsection 6. With the dependence shown to be minimal.08 kcal mol−1 . The relative free energy of solvation results in continuum chloroform are presented in Figure 7.28 kcal mol−1 are reasonable. Errors for each mutation are only 0. Such a relatively small number of well-spaced. A number of preliminary runs were performed using different starting structures to verify that the sampling of the host was now adequate. It is interesting to note that the errors obtained are greater than those in the gas phase (< 0. uncorrelated structures was found to be sufficient to deduce the necessary trends. The end windows were run for 10 M configurations to obtain more sampling information for the real physical states.9.1 M configurations for later analysis in Chapter 8 to give 100 structures in total for each guest. The continuum chloroform protocol implemented was very similar to the gas phase protocol since neither have explicit solvent. 10–15 windows were used with the same spacing as in explicit chloroform simulations. the area was calculated every 100 attempted configurations.01 kcal mol−1 ).1 kcal mol−1 at most. For the end windows. The protocol in the host had a few more differences compared to the gas phase perturbations. It .

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12160

cis-Gly
c

0.35 ± 0.08 0.22 0.35 ± 0.07 0.28 −0.24 ± 0.07

E

trans-Gly
c

−0.13 ± 0.01

0.09 ± 0.08

d-cis-Ala
c

E

d-trans-Ala
c

−0.30 ± 0.03

−0.61 ± 0.08

d-cis-Phe

E

d-trans-Phe

Figure 7.9: The relative free energies of solvation of the amino acid derivatives in continuum chloroform (kcal mol−1 ). The square box gives the closure for each cycle. may possibly be a result of the shorter equilibration of 1 M configurations for continuum calculations. Otherwise, it may simply be due to the addition of an additional energy term, the solvation free energy. The calculation of relative free energies for the amino acid derivatives in both continuum and explicit solvent allows a speculative comparison of the two models. Compared to the previous explicit solvent simulations (Figure 7.7), the results appear to be somewhat different. The most obvious difference is that the free of energy of solvation for Phe of either conformation is only marginally more negative than that for Ala. In explicit solvent, the difference was much larger with Phe’s free energy of solvation lower by 2–3 kcal mol−1 . The second difference concerns the cis versus trans stabilisation. trans-Gly appears to be less stable with respect to cis-Gly in continuum, yet the reverse was found in explicit. The same is true for Ala but to a smaller extent. However, the reverse trend is found for Phe. trans-Phe is more stable with respect to cis-Phe in continuum, yet the order was reversed in explicit. These are all relative free energies, so it is not immediately obvious which molecule’s absolute free energy of solvation may be varying with the solvent model. However, the likely candidate is Phe. Anything that may affect the simpler Ala or Gly will probably affect Phe in the same way, leading to little observed difference between the molecules. Phe is a more complex molecule and would be expected to be more sensitive to differences in solvent model. One or both of the solvent models must be failing to some extent. The main

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12161 weakness of the explicit solvent model is that the sampling of the solutes may not be adequate. However, the solvation energy for the explicit solvent is expected to be more reliable. Sampling problems are likely to be minimal for the continuum model, while its main weakness is that it may be incorrectly parameterised for certain functionalities in the amino acid derivatives such as the aryl ring which is the key difference in structure between Ala and Phe. On the sampling issue, Phe has two dihedrals whose sampling may be considerably hindered by the explicit solvent. If these dihedrals can only vary over a small range, then it is no surprise that the free energies obtained may be different to the continuum case, which does have good sampling. Poor sampling already appears to the result for the wide distribution of free energies shown in Figure 7.8 for the cis to trans mutation for Gly. On the parameterisation issue, it has been noted212 that the ∆GSA free energies of solvation for cyclic groups such as aryl rings should have a more negative dependence on SASA to compensate for the fact that cyclic structures, being more compact, have a smaller SASA. If cyclic atoms were assigned a separate γ term in a GB/SA parameterisation (Section 6.4), then this would give Phe with its cyclic aryl group a more negative GB/SA free energy of solvation as is observed in explicit. This, though, would have required an extra γi parameter in the GB/SA parameterisation protocol, giving three γi parameters in total, too many for fitting to the free energies of solvation of only 20 molecules. However, an inspection of the free energies of solvation in Table 6.3 for the molecules used in the GB/SA parameterisation reveals that aromatic molecules perform quite well. If this second explanation is the reason, there is cause for some concern in having used this GB/SA model, particularly for Phe, although the host-guest complexes are also in chloroform, so these effects may cancel out to some extent.

7.3.3

Host-Guest Free Energies in Continuum Chloroform.

Before presenting the relative binding free energies for all guests, the relative binding free energy for the cis-Gly to trans-Gly mutation was calculated using four quite

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12162

l-cis-Phe
T

4.12 ± 0.52 1.16 2.21 ± 0.69 1.17 −0.23 ± 0.28 0.32 −0.08 ± 0.27 0.98 2.14 ± 0.32

E

l-trans-Phe
T

−1.80 ± 0.29 l-cis-Ala
T

−1.05 ± 0.24
E

l-trans-Ala
T

−2.01 ± 0.24 cis-Gly
c

−0.74 ± 0.17
E

trans-Gly
c

0.02 ± 0.30

−0.15 ± 0.17

d-cis-Ala
c

E

d-trans-Ala
c

−1.65 ± 0.28

−0.41 ± 0.50

d-cis-Phe

E

d-trans-Phe

Figure 7.10: The relative free energies of binding of amino acid derivatives in the host in continuum chloroform (kcal mol−1 ). The square box gives the closure for each cycle. different starting structures. In explicit solvent, the four structures had given relative binding free energies of −0.49, 0.77, 3.09 and 3.94 kcal mol−1 with a maximum error 0.8 kcal mol−1 . In continuum solvent, the same structures gave the results of 0.06, −0.23, −0.39 and 0.11 kcal mol−1 , respectively with a maximum error of 0.45 kcal mol−1 . Clearly, within the bounds of error, these results are independent of structure. Figure 7.10 contains the relative free energies of binding for the guests in the host-guest complex in continuum chloroform. The errors and closure of the cycles with the host present are now somewhat worse than in chloroform, suggesting that the free energy results are not fully converged. The closures range from 0.32 to 1.17 kcal mol−1 . The errors range from 0.2–0.7 kcal mol−1 and tend to be worse for the cis to trans relative binding free energies. The errors obtained indicate that the sampling appears to be insufficient to obtain fully converged free energies despite the improvements made. However, the reassuring feature is that the errors now appear small enough compared to the size of the free energy numbers to make meaningful qualitative deductions. Another feature of the errors is that they are comparable

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12163 to those found in the poorly sampling perturbations performed earlier in explicit solvent. This is probably the result of two competing effects. The improved sampling in continuum leads to a much greater exploration of configurational space. If many different regions are visited, many different terms will contribute to the average in the FEP equation, leading to a larger error. For the explicit solvent case, if the system remains trapped in one conformation, then most terms in the FEP equation will be of similar value. However, the discrete nature of the solvent can produce wide fluctuations in energies not observed in continuum since continuum solvent free energies are averaged over all solvent configurations. These two opposing effects probably lead to errors or similar value. It also emphasises the difficulty in explicit solvent that sampling has to average over both solute and solvent degrees of freedom. As anticipated by the experimental findings, there is a wealth of information in Figure 7.10 addressing the two main points of study in this work regarding enantioselectivity and conformational stabilisation. The first major point of interest is the stabilisation of the cis conformation relative to the trans for l-Ala, l-Phe and apparently d-Phe, but not Gly or d-Ala. The second major point of interest is the stronger binding of the l enantiomer compared to the d enantiomer for both Ala and Phe, particularly for the cis compounds. This suggests that firstly, l enantiomers are likely to complement the host better, and secondly, that the cis stabilisation is selective for the l isomer. These results are in exact accordance with the major findings of experiment. However, on the third point of interest, namely the selectivity of the host, there is more uncertainty. To more clearly observe the relative binding free energies of each amino acid derivative, the data from Figure 7.10 may be used to construct a table of relative binding free energies with respect to one of the molecules. trans-Gly is selected to be this reference molecule. The relative free energies (∆Gsim ) for most molecules with respect to trans-Gly may be calculated using more than one path. In these cases, either the most direct path was taken or, if there was more than one, the value was taken along the path with the smallest error. For example, ∆Gsim

Unless the experiment or simulation is in error. Molecule N-Ac–glycine N-Ac-l–alanine N-Ac-d–alanine N-Ac-l–phenylalanine N-Ac-d–phenylalanine ∆Gsim /kcal mol−1 cis trans 0.58 ± 0. it is possible to make one meaningful comparison concerning the influence of the side chain.2: Relative Free Energies Obtained From Simulation and Experiment.37 -0. Before interpreting this table. simulation gives relative free energies for cis and trans individually. It can be seen that both experiment and simulation predict stronger binding for l–Phe over l–Ala. such a difference must be put down to l guests binding both inside and outside the cavity.17 0. either cis or trans. but they differ significantly in predicting the extent to which this happens.25 -0.91 0.66 ± 0. since binding . together with the experimental values.81 0.CHAPTER 7. However.79 ± 0. as predicted by simulation is highlighted in bold.53 ∆Gexpt /kcal mol−1 0. Simulation predicts l–Phe to be −1. or maybe even exclusively outside. binding outside the cavity is much less selective. simulation relative binding free energies were obtained exclusively for the guest inside the host cavity while the experimental values were obtained for the guest binding to the host anywhere.41 -0.40 -1.25 ± 0. The stronger binding conformation. there are two significant differences between the relative binding free energies that are being measured by experiment and simulation. their relative binding free energies may be compared with simulation.96 ± ± ± ± ± 0.2 contains those data.17 -3. while experiment predicts only −0. experiment has estimated the relative binding populations of inside to outside the cavity at 70:30 for l-Phe.10 0.00 0.23 0.15 ± 0. either inside or outside the cavity. Since experimental NMR data indicate that l–Phe and l–Ala both bind in the cavity.14 kcal mol−1 for the same number.67 0.78 ± 0. while the experimental values presumably are for the lowest energy conformation whichever that is.29 -1.40 ± 0. Firstly.80 kcal mol−1 more stable than l–Ala.28 0 -1.07 0. Table 7. Secondly. Presumably.23 ± 0.12 for l-cis-Phe is found by going from trans-Gly to cis-Gly to l-cis-Ala to l-cis-Phe.74 ± 0.10 0. Hence. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12164 Table 7. Indeed.

then experiment and simulation are in disagreement. The large difference in the structure of the end points for the latter simulation would probably necessitate the calculation of absolute binding free energies to each site. Full elucidation of the binding constants can only be made if more simulations are performed to measure the relative binding free energies of all the guests outside the cavity and the relative binding free energy for guests between the inside and outside of the cavity. However. . the experimental relative binding free energies should be smaller than in the simulation. Ala and Phe were not as well solvated in the continuum. Concerning relative binding free energies. As noted earlier in the comparison between explicit and continuum free energies of solvation. one possible source of error for simulation is the GB/SA solvation model.CHAPTER 7. if Gly does bind inside the cavity. while experiment predicts that Gly binds less stably by 2. For Gly. This contradiction appears to suggest that Gly may bind outside the cavity.01 kcal mol−1 . experiment predicts Gly binds 0. Enantiomers and conformations do not differ in terms of the atom identities present and so are unlikely to be adversely affected if there is a problem with the GB/SA model. simulation predicts that it will marginally prefer to adopt the trans conformation. experiment has not shown whether the guest binds inside or outside the cavity. This may account for some of the difference with experiment.2 again. A less negative free energy in chloroform leads to stronger relative binding. and if it were true. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12165 on the non-selective outside position is occurring in experiment. it suggests that a better parameterisation of the GB/SA model is required to understand the effect of different side chains. This prediction is currently being tested by experiment. This calculation would be very expensive.81 kcal mol−1 more stably than l-Ala. If it binds inside. Looking at Table 7. It is not possible to compare the d molecules since they are binding on the outside of the cavity.

it binds in the trans conformation. Moreover.CHAPTER 7. Problems were demonstrated with the use of an explicit solvent model. The experimental enantioselectivity and stabilisation of the cis amide conformation has been correctly reproduced. However. . the ability of GB/SA to correctly model relative solvation energies may affect the relative binding free energies obtained. Experiments to test this prediction are in progress. Nevertheless. The GB/SA continuum model was able to provide the means by which results could be obtained that largely agreed with the experimental findings. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12166 7. it has been tentatively predicted that should Gly bind inside the cavity. the agreement with the other experimental findings reinforces the reliability of the simulation data and gives confidence in using this data to examine precisely their physical origins.4 Conclusion The application of the simulation protocol to calculate relative free energies of binding for the amino acid derivatives to macrobicycle 12 has been described. This is the subject of the next chapter.

the binding mode and ultimately the binding free energies. The different properties of each guest determine which of these are prefered. the exact causal relationship between side chain and binding must be determined. The main property of the macrobicycle 12 system that causes the diverse binding behaviour for different guests is the availability of a number of possible alternative binding motifs. Following this. Initially.Chapter 8 Analysis of the Macrobicycle 12 System Experiment has shown that simply changing the side chain of the amino acid is able to produce a diverse range of binding behaviour. a detailed description of the host and guest is given. Finally. 167 . Thus. The model will be used to aid the interpretation of the structures observed in the simulation. to predict binding behaviour. It must be emphasised that this study only examines binding inside the host cavity since this is the place where experiment indicates that interesting binding selectivity occurs. A model is then proposed to describe all the possible interactions between them. The purpose of this chapter is to draw the connection between the properties of the guest and the way it binds. focusing on the features that differ between guests and using the model to interpret the observed behaviour. a detailed analysis of all the host-guest complexes is given. an overall rationalisation is made about the link between the guest.

Two sides are made up of benzamide A groups and two are made up of N –benzylamide groups.CHAPTER 8.1 Description of the Binding Site. The diagonal dimensions are approximately 9–11 ˚ between the tertiary junction carbons and 10–12 ˚ between A A the diaryl methane carbons.1 8. macrobicycle 12. Firstly. Figure 8. Host Binding Features. 8. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM benzamide 168 8 aryl amide r d sr    rr d   d   d   h d   d y   d   dd   d   r d   d   a od a   m   c d r   A i a d 9–11 ˚ d   y   d r d d   l   d e b d   ˚ A o   d d   n €   d d   €€ d  € thiourea € d  N-benzylamide N-benzylamide €€  d  d €   d d   h   d y d   a   d a d   d m r   d r ˚ d   i y   10–12 A d d o   d l   d c da e     d r d     d   bdd   o d     d n d   rrd   d   rd c   ‚ © r aryl amide T ' benzamide 8˚ A E Figure 8. The definition of the two depths A . The depth of the cavity ranges from 3 ˚ at the junction A carbons up to 6 ˚ at the diaryl methane carbons. The main ring of the host resembles a rhombus of side 8 ˚.1. consider the host. The thiourea unit lies in the middle of the cavity and is joined to the main ring by two hydrocarbon chains which stretch along the diagonal between the two junction carbons.1 shows a schematic of the host illustrating the layout of its components and a few important distances.1: Schematic of the dimensions and important parts of macrobicycle 12.

Firstly.1) revealed two important facts. In this figure. also influencing the shape of the cavity. the host molecule possesses C2 symmetry. These are highlighted in Figure 8. The host also possesses four aryl groups available both for hydrogen bonding to polar hydrogens and π–π interactions to other aryl groups. the polar hydrogens of the amide groups almost always pointed into the cavity. Furthermore. Only a few strained structures had the oxygens pointing inwards and so may be discounted. Therefore.2: Schematic of the two depths of macrobicycle 12 as viewed along the axis connecting the two junction carbons. is still important as it affects the position of their polar hydrogens. The first of these is the presence of six polar hydrogens in the cavity.2. It is not sufficient . the thiourea and the two pairs of amide groups contribute a total of six polar hydrogens with which the guest can bind inside the cavity. The annealed structures (see Section 6. This flexibility emphasises the necessity to examine many structures. nevertheless. are shown in Figure 8. when a guest was present. they preferred to point inside the cavity. This allows the guest to reside in two equivalent orientations.3. Overall. The hydrocarbon chain is capable of adopting a large number of conformations that influence not only where the hydrogen bonding thiourea unit lies but also the shape of the host cavity. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM r Tr r ¨ ¨¨ 169 benzamide N-benzylamide r ¨ rr¨¨ T 4–6 ˚ A hydrocarbon c 3˚ A c thiourea Figure 8. The host possesses a number of regions important for binding by the guest. The amide units are relatively rigid but their orientation. The aryl units can adopt various orientations. while the thiourea unit preferred to have its hydrogens pointing outwards for the host. macrobicycle 12 is being viewed along the axis connecting the two junction carbons.CHAPTER 8. These binding possibilities are further complicated by the flexibility of the host.

it is necessary to describe the guest. polar hydrogens.2 Guest Binding Features. Gly and Ala are similar except in the side chain.4: The guest molecule (cis-Phe) with the oxygens. Before describing how these regions come into play in binding. 8. Figure 8. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 170 H H H H aryl ring H H polar hydrogen H Figure 8.4 highlights the important binding regions in the guest. . aryl rings and other important features highlighted.1.CHAPTER 8. to consider a single host structure and compare how each guest binds to it because different structures will be more suited to some guests than to others. The guest also contains a polar hydrogen that can repel the host polar hydrogens or form an internal hydrogen bond to one of M O SC H O M oxygen SC methyl group H O O side chain polar hydrogen Figure 8. The guest possesses two highly charged oxygens in a carboxylate group at one end and a carbonyl oxygen on the amide unit.3: The host molecule with the polar hydrogens and aryl rings highlighted. All these oxygens have the potential to hydrogen bond to the polar hydrogens of the host.

Differences in stereochemistry for the guest alone cause no difference in potential guest binding sites. partly due to stabilisation by an internal hydrogen bond.CHAPTER 8. These are discussed further on in Subsection 8. These are the dihedral which swings the aryl group around. The reverse applies for the methyl group. The main difference arising from guest conformation about the amide bond is that the carbonyl oxygen lies more on the side of the guest for cis but more on the top for trans. Ala has a larger methyl group. and the dihedral which twists the aryl group about its own axis. These two dihedrals are important because they allow Phe to place the aryl group in a number of different positions.1. Gly has no side chain and only a hydrogen. This can be seen in Figure 8.1. However. their binding interactions may now be discussed. when combined with the chiral host. . Phe guests possess two more dihedrals that are moderately flexible. as mentioned in the previous Subsection. The variation with side chain is mainly in size. There is a wide range of factors determining the interactions possible between the host and guest. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 171 the carboxylate oxygens. 8. The guests. The primary interaction stabilising the binding is the hydrogen bonding of the guest carboxylate group to the host. and there is also the potential for π–π interactions with the host aryl group. It is not able to hydrogen bond to the carbonyl oxygens of the host since these oxygens were found to lie outside the cavity.4 by interchanging the amide oxygen and methyl group. The lower two dihedrals closer to the carboxylate end are very restrained and change little in value. Having described all the major binding sites on the host and guest. are much more rigid. The third dihedral is the amide dihedral and determines the two conformations of interest for each guest. while Phe has a very large phenyl group. as opposed to the host.4. The variation between guest molecules is also important. Flexible dihedrals in the phenyl group can point it in different directions. All guests contain three significant dihedrals.3 Origins of Selectivity. differences do arise.

The carboxylate may choose to bind to the thiourea hydrogen pair in the middle. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM H 172 C H H H H O H C C O H O C H C N C H H H H C N H N H C H C H O O C C H H C O O O H H O H H H H H N C H N C C H H H N C H N C C H H H N C H N C C H H S S S Figure 8. Alternatively.CHAPTER 8. giving four in total. This is because one of the oxygens may form a double hydrogen bond about which the whole guest may pivot. there are a large number of orientations that the guest may adopt while still having two hydrogen bonds between its carboxylate oxygens and the two thiourea hydrogens. Furthermore. giving six hydrogen bonds in total. Which of these binding patterns a guest adopts will depend on both the shape of the guest and the conformation of the host. the carboxylate may be able to have each oxygen bonding to one thiourea hydrogen and two amide hydrogens. This allows the rest of the guest to point out into the open end of the cavity. the resulting hydrogen .5: Three possible ways that cis-Gly may bind to the thiourea of the host. The thiourea hydrogens would not be expected to have a strong. However. the guest would have to tip to one side. In doing this. it is very accessible to the carboxylate group that all guests possess. Figure 8. it may bind to an amide pair at one end. There is a choice of three pairs of hydrogens to which the carboxylate may bind. One carboxylate oxygen forms a double hydrogen bond to the thiourea pair and the other oxygen forms a double hydrogen bond to the amide pair.5 illustrates three possibilities. direct influence on guest selectivity. It is also possible to form up to four hydrogen bonds between the carboxylate group and the thiourea if the guest in the middle picture twists by 90◦ about its axis. Either of these cases would involve the formation of between two and four hydrogen bonds. Since the thiourea lies in the middle of the cavity. A more complex mode of binding is intermediate between these two extremes. Finally.

It keeps the guest close by offering two easy hydrogen bonds. The amide hydrogens are more relevant for influencing guest selectivity and there are five important points to make about them. The fourth point is that the the N-benzylamide unit is more flexible than the benzamide unit . constrained geometry to form. but has little influence on guest orientation. Thus. it is able to move relative to the rest of the host to a small but significant extent. In this way. By way of contrast. This is particularly important for the side chains which lie lower down on the guest than do the amide methyl groups.6 illustrates this difference. this difference in height is important sterically to guest methyl groups and side chains in the opposite way. Alternatively. The second is that they lie in the cleft at the two shallowest ends of the cavity. The third point is that in each pair. It will be easier sterically for these groups to lie over the lower benzamide than the higher N-benzylamide. the N-benzylamide hydrogen lies higher above the thiourea than the benzamide hydrogen. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 173 bonds are likely to be weaker and they require a very specific. The first is that they are potential candidates for hydrogen bonding for both the carboxylate and the carbonyl oxygens of the guest. thiourea plays an anchor role. the guest must tip over further than for the N-benzylamide hydrogen. Since bulky atoms on the guest might also prefer to reside in this cleft. This might lead to a change in hydrocarbon conformation and move the thiourea to a position adverse to the guest. For the carbonyl oxygen to bond to the lower benzamide hydrogen. this results in a competition between hydrogen bond formation and steric relief of large groups. the guest may force a change in the cavity shape. and in a more complex manner. The one interesting feature of thiourea that may influence guest selectivity is the exact location of the thiourea group in the host. Thus there is a competition between which oxygens are preferred. Since the thiourea group is attached to two flexible hydrocarbon chains. Guests that may be trying to improve some other binding interaction elsewhere in the cavity may be able to force the thiourea to another position to obtain both interactions at once. Figure 8.CHAPTER 8. the side chains are closer to the host and have a stronger influence when the guest binds.

The final point is that the aryl group in the benzamide unit prefers to lie in the same plane as the amide group and so access to the polar hydrogen of this group is somewhat sterically hindered. Such flexibility is important since it allows the attached polar hydrogen to reach up a little if necessary to hydrogen bond to the guest carbonyl oxygen. As well as the different heights previously described. the aryl group of the N-benzylamide group prefers to lie at an angle to the amide. It is clear from the discussion about polar hydrogens that the host aryl groups are also important to binding selectivity. and the closed (left) and open aryl groups. may have a strong influence on steric clash with the guest side chain and methyl groups since open aryl groups can more easily accommodate these groups. Thus the aryl group is more out of the way of the guest. The benzamide aryl groups tend to lie in a more open manner with respect to the cavity while the N-benzylamide ones are more closed over the cavity. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 174 H H N−benzylamide benzamide Figure 8.CHAPTER 8. due to the presence of the extra CH2 group.6: The difference in height for the polar hydrogens in the N-benzylamide (left) and benzamide units. This difference. one other key feature distinguishing the two types of host aryl groups is in their orientation about the cavity. On the other hand.6. . shown in Figure 8.

4 The V-Model For Binding. For trans. being attached to a sp3 carbon. all possible binding modes using these key interactions may be considered and evaluated. Changing one apparently small feature such as amide conformation may lead to large structural changes throughout the whole complex. The combination of the different properties of the the guest and the intricate features of the host polar hydrogens and aryl groups would be expected to produce a diverse range of binding possibilities. One end of the V is the side chain. A simple model for the shape of the guest is now described. It is assumed that the . as seen in Figure 8. while the other is either the amide oxygen or methyl group. this V faces opposite ways depending on the guest’s stereochemistry. The highly coupled nature of these features further complicates their analysis since they cannot easily be examined in isolation. Furthermore. This is important because in rationalising binding. the situation may be considerably simplified. However. Figure 8. its presentation now will aid later interpretation of the structures. is not flat but has the shape of an open V.CHAPTER 8. models have their limitations and so other factors left out of the model may well have to be considered. This V shape is taken to represent the guest.4 for definition) with the carboxylate at the other end. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 175 8.1. are not coplanar with the guest amide group. when viewed down its long axis (see Figure 8. Naturally. For Ala and Phe. For cis.4.7 illustrates the V-shape for l-cis-Ala as viewed looking from the top down the main axis. An important feature of the V is that the side group end lies lower down the guest and closer to the carboxylate than the amide end. which do have side chains. it is the more bulky methyl group. depending on the amide conformation. Gly lacks one end of the V since it has no side chain. more often than not it is the binding modes that do not occur that are of greater assistance in explaining the observed structures. A model is now proposed to describe all the modes of binding. This is because the side chains. It should be noted that this model was not completely derived a priori to examining the structures observed in the simulations. this end is the smaller oxygen with potential for hydrogen bond formation. By picking out a few key interactions. The guest.

It is more important to satisfy the side chain since this end of the V lies closer to the host.3.4. while two are low (benzamide hydrogens). Both ball and stick and van der Waals surface representations are shown. two of the hydrogens are high (N-benzylamide hydrogens). lying below it.CHAPTER 8. is likely to dominate any interactions with the host at this end. while two are closed (N-benzylamide hydrogens). It is assumed that the end of the amide group that points up can be ignored since the side chain. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 176 Figure 8. Figure 8.7: The V-shape of l-cis-Ala as viewed down its long axis.1. it is assumed that the carboxylate at the centre of the guest V always bonds with at least one oxygen to the thiourea.8 shows the four possibilities for the four types of V shape. Thus the proposed model captures all the important differences present in the guests. apart from the difference due to the size of the side chain. carboxylate group lies parallel to the length of the V as indicated in Figure 8. and two of the aryl groups are open (benzamide aryl groups). When bringing the guest and host together. A favourable interaction for oxygen is with a high hydrogen to form a hydrogen bond while a favourable interaction for a side chain or methyl group is to find space over a low amide or open aryl group. As described in Subsection 8. The V faces the other way for d guests. this model of binding predicts the binding order as l-cis > l-trans > d-trans > d-cis. Now a model for the host is described. The features of the host included in the model are the six polar hydrogens and the four aryl groups. These binding modes are termed the primary binding modes for . Therefore. Each end of the V will compete to bind with its preferred section of the host.

or do a roll.CHAPTER 8. then the guest will tip backward and fit its clashing part inside the cavity. These are to tip forward. it may tip sideways into the .8: The four possible binding motifs for a V-shaped guest. then they will most likely adopt some other position. Thirdly. the V model. the V may tip over. Which of these occurs will also depend on whether the clashing part is a lower side chain or a higher amide group. sideways. The V may do one of four things. If the guests do not bind well in one of these primary modes. backwards. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 177 L H O T T SC H L H M L T T H L L−cis L H O l-cis L−transl-trans L H M SC T T SC H T T SC H D−cis d-cis Key: L D−trans d-trans O L H "high" amide hydrogen L "low" amide hydrogen T oxygen M methyl group SC side chain thiourea hydrogen aryl ring favourable unfavourable Figure 8. These motions are illustrated in Figure 8. If the part of the host with which the guest is clashing is low. Firstly. If the clashing part of the host is too high. This back tip is the second type of motion. then the guest may tip forward slightly to place the clashing part of the guest above this part of the host. A tip in either direction may help form a hydrogen bond or remove the clash. Such a move may be possible if there is a clash at only one end.9.

with the carboxylate always near polar hydrogens. it may roll around an axis coming out of the page so that both ends move. side tip and roll. These four possibilities are termed secondary relief modes. use of the word “tip” will imply the forwards or backwards tip. or it may occur by a rotation of one of the dihedrals in the guest itself. side of the host. From this point onwards. back tip. This is likely to happen when both ends are clashing. Sideways tipping will always be explicitly stated as such. Such a motion may either be a whole body motion for the guest. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM H H 178 H C H H O C H C N C H H O C C C N N H O C N H H H C O H H N C O C O O C C C N N H H H OH H C O N C C O H H N C O O C H H C H H H C H H H H C H H H H OH C O N C H C C H H H H H H C H N C H N H C H H H H C H H C C H H C C H C H H H H N C H N H forward tip S back tip C C C H C H H H S H H H H H C O H C C H C H C H H H H C C C H C C O N C H H H H H H C C H H C C C C H H C H C H H C C C N C C C C H H C C H C H C C O C H H H N C N H H H C O O H H C C N H O C H C N H H H H H O C H N C O H H N C H C C H H H C C H C H H C C O H H N C O C C C H H side tip C C H H H C O C H H H S C H roll H C N H C H H C H C H H C C C H C C C C H O Figure 8.CHAPTER 8.9: The four possible motions available to the guest. These are the forward tip. It is important to keep in mind that if the guest adopts . Fourthly. leaving the carboxylate group fixed with respect to the host. This may also occur if there is a clash at only one end. Forwards and backwards tipping is the most likely secondary mode because it keeps the V aligned in the cavity.8 are important in determining to where the clashing groups may choose to move. Sideways tipping and rolling do not do this so much and would only be expected to occur if tipping failed or was not possible. The aryl groups shown in Figure 8. Only selected cross-sections of the host are shown in each diagram.

In the binding model. Evidently. as shown in Figure 8. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 179 a secondary relief mode. Table 8. does not have a side group so only the amide conformation influences the binding mode. An initial study of the structures revealed that the coordinate that varied the most between different guests was the angle at which they lay in the host. d-cis will roll. Forming this hydrogen bond will most likely involve some forwards tipping to maximise the strength of this bond. Only so much can be rationalised from the shapes of the guests themselves. The first place to start for such an analysis is to see what the average positions of the guests are in the host. there is nothing to stop the oxygen in either case hydrogen bonding to the host amide hydrogens. l-trans and dtrans guests will most likely tip to relieve their one bad contact. is calculated from the dot product of the C–C vector of the guest and the junction carbon–junction carbon vector of the host. Their effects are best observed in later analyses. together with its standard deviation.CHAPTER 8. this is the degree to which guests tip forwards or backwards. In this model. θf b . An analysis of the binding structures observed from the computer simulations is now presented using the V model as a guide to elucidate these binding motifs. the host restricts their range. the guest amide group at the top that has been previously ignored may influence which relief mode occurs. most likely in a clockwise direction to remove the bad side chain contact. Therefore.2 Guest Orientation. Of course. Gly. The other roll and sideways tip angles were seen to vary only to a small degree.10. This particular tip angle. 0 ◦ . although some degree of tipping sideways or rolling is possible. a preliminary prediction of this model is that l-cis guests remain in the favourable position in Figure 8. which is not shown.8. The real binding motifs are a complex interplay of all these described features. Pictures of all the main motifs are given at the end of this Chapter on Page 211 and may prove useful to inspect during the analysis.1 shows the value for θf b for each guest averaged over the 100 structures used in the analysis. 8.

θf b is calculated from the dot product of the two dashed vectors shown. This table shows that l-cis-Phe and l-cis-Ala are tipped to the side containing the carbonyl oxygen at respective angles of 72 and 82◦ . corresponds to the guest lying sideways with the polar hydrogen pointing straight down (side 1). as indicated by the standard deviations. at 90 ◦ the guest is vertical. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM H 180 H C C C H C H C C H H H C H C H H H C N H O C C H side 2 N O C C θfb side 1 O C N C H H H C H H H H H H H C C H C H N H N C C H C H C H C H H S Figure 8. These angles are average values so some deviation from them does still occur. trans-Gly with a tipping angle of 146◦ Table 8. For Each Guest in the Host. at 180 ◦ . θf b . the guest lies the other way with the polar hydrogen pointing straight up (side 2). suggesting the possibility of a hydrogen bond to the amide hydrogens. Guest N-Ac-l-cis-phenylalanine N-Ac-l-trans-phenylalanine N-Ac-l-cis-alanine N-Ac-l-trans-alanine N-Ac-cis-glycine N-Ac-trans-glycine N-Ac-d-cis-alanine N-Ac-d-trans-alanine N-Ac-d-cis-phenylalanine N-Ac-d-trans-phenylalanine Angle / degrees 72 ± 4 90 ± 9 82 ± 8 100 ± 9 115 ± 20 146 ± 10 130 ± 7 110 ± 14 113 ± 8 116 ± 6 .1: Tip Angle.CHAPTER 8.10: The definition of θf b used to define the orientation of the guest with respect to the host.

typically. An explanation for much of the different binding motifs can be found in a hydrogen bond analysis. particularly in non-competitive solvents like chloroform. The more of them there are. followed by d-trans-Ala. Two A features concerning hydrogen bonds are of particular interest. d-cis-Ala tips in the same direction as trans-Gly but to a lesser extent.11. Such behaviour may also be present for l-cis-Phe. while Gly.3. It is interesting to note that trans-Gly is a lot less mobile than cis-Gly. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 181 represents the other extreme. In between lie l-cis-Phe and l-cis-Ala which are fairly vertical while the remainder lie in the range 110-116◦.CHAPTER 8. 8. lacking a side chain appears moderately mobile. the cis-Gly conformation appears to be very mobile. The first of these is the total number of hydrogen bonds that can occur. The angles given are only averaged angles. Figure 8. Hydrogen bonds are strong. the stronger. l-cis-Phe appears to be the most constrained. l-cis guests tip forwards. Their deviations reflect the mobility of each guest. d guests tip backwards. This observed behaviour is consistent with the V model.3 8. The second feature of interest is the types of hydrogen bonds formed. This difference is not simply due to the amide conformation. A hydrogen bond analysis will make clearer any trends. which appears a lot less mobile than l-trans-Phe. the overall binding. energy-lowering interactions that generally favour binding. Hydrogen Bond Patterns. Clearly. At the other extreme.1 Hydrogen Bond Analysis. l-trans guests are fairly level. The frequency of a given number of hydrogen bonds occurring simultaneously for each guest is given in Figure 8. It may be hydrogen bonding to the amide hydrogens on the opposite side of the cavity. since the reverse trend is found for d-Ala.5 ˚ apart. In this work a hydrogen bond is deemed to exist if the two atoms involved are less than 2.12 indicates the number of hydrogen bonds of a given type averaged over the simulation . This difference for Gly may be due to a possible hydrogen bond gained by trans-Gly in tipping over.

11: A histogram of the number of hydrogen bonds for each guest with the host. six or even seven.CHAPTER 8. The height of the entire bar gives the total number of hydrogen bonds on average. It can be seen that all guests are quite capable of forming three to four hydrogen bonds while some are able to form five. on average. the second between the carboxylate and the amides. and the third between the carbonyl and the amides. . and if they do. The first type is between the carboxylate and thiourea.11 reveals that most guests form two carboxylate–thiourea hydrogen bonds and two carboxylate–amide hydrogen bonds. Figure 8. for each guest. A source of this difference lies in the types of hydrogen bonds formed. they form one of these. while only some are able to form carbonyl–amide hydrogen bonds. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM cis trans 80 80 40 1 2 3 4 5 6 7 0 80 40 1 2 3 4 5 6 7 0 80 40 1 2 3 4 5 6 7 0 80 40 1 2 3 4 5 6 7 0 80 40 1 2 3 4 5 6 7 0 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 182 N-Ac-l-phenylalanine 40 0 80 7 N-Ac-l-alanine 40 0 80 7 N-Ac-glycine 40 0 80 7 N-Ac-d-alanine 40 0 80 7 N-Ac-d-phenylalanine 40 0 7 Number of Hydrogen Bonds Figure 8. There are three types of hydrogen bonds.

12 contain a wealth of information concerning the differences for each guest.2 Interpretation of Hydrogen Bond Patterns. Usually.11 and 8.12: The breakdown of the three types of hydrogen bonds present between each guest and the host. the more hydrogen bonds there are. 8. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 183 Number of Hydrogen Bonds 6 5   Carbonyl−Amide Carboxylate−Amide Carboxylate−Thiourea  ! ¡©  ¡    ¥  ¥   ¥ ¡    ¥  ¥   ¥ ¡      ¥  0 Molecule Figure 8. 8 and 10% for cis-Gly.3. only one of the two bonds forms and it is typically with the higher N-benzylamide hydrogen. In this binding motif. The reason for this difference becomes clearer when considering the types of hydrogen bonds formed. the smaller the side chain in the d position. the larger the side chain in the l position or alternatively. Three of these guests are in the cis conformation and are either of l or have no stereochemistry. What is immediately evident from them is the general trend that. These guests are cis-Gly. The main exception to this rule is trans-Gly which has the largest number of hydrogen bonds of all guests. Both Figures 8.CHAPTER 8. If two of these ¡¢  ¤ ¥£ ¡¢  ¤ ¥£ ¡¢  ¤ ¥£ ¡¢  ¤ ¥£ ¡¢  ¤ ¥£ ¡¢  ¤ ¥£ ¡¢  ¤ ¥£ ¡¢  ¤ ¥£ ¡¢  ¤ ¥£ 1 ¡ ¢  ¤ ¥£ ¡$ ¡ ¡  ©§ ¡ ¥¨ ¡ ¡ ¡ ¡¦ 2 ¡ ¡$ §© ¡ ¥¨  ¡$ §© ¡ ¥¨   ¥  ! ¡© " # ¤  "  ¤   ¡¦ ©§ ¡ ¥¨    ! ¡© 3 ¡¦ ©§ ¡ ¥¨    ¥ ¡$  ! ¡©     4  ¥  ¥    ¡¦  ! ¡©     ¡   . Carbonyl to amide hydrogen bonds are only able to form for four guests. l-cis-Ala and l-cis-Phe. The percentage of carbonyl–amide bonds that involves lower benzamide hydrogens is only 17. l-cis-Ala and l-cis-Phe. up to six and occasionally seven hydrogen bonds are observed to form. respectively.

and two carbonylamide. Table 8. two carboxylate–amide. This variation is interesting as the repulsion is smaller for the extreme cases. This table shows that rH−H is 3. For the Guest Amide and the Nearest Host Amide. for trans-Gly there is no polar hydrogen–hydrogen repulsion between host and guest.2: Average Polar Hydrogen Separation. for all guests.9 ˚ for A A A A cis-Gly. particularly compared to cis-Gly.7 N-Ac-d-trans-phenylalanine 3. The fourth guest that seems to possess the greatest ability to form such carbonylamide hydrogen bonds is trans-Gly. Being in the trans conformation. Even though all four of these guests can form the carbonyl-amide hydrogen bond. rH−H . trans-Gly quite commonly forms two carbonyl-amide hydrogen bonds simultaneously.3 ˚ for trans-Gly.9 ˚ for l-cis-Phe. then all this evidence is indicative of the guest tipping over to side 1.2 shows the closest H–H contact. including one to the benzamide hydrogen which now contributes 39% of all carbonyl amide hydrogen bonds.2 N-Ac-l-cis-alanine 2. The lack of tipping for cis-Gly indicates . Instead.1 N-Ac-d-trans-alanine 2. but for different reasons. trans-Gly is able to approach much closer to the amide hydrogens than the other three guests. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 184 Table 8.9 N-Ac-l-trans-phenylalanine 2. giving six in total.0 hydrogen bonds are carboxylate–thiourea. Guest rH−H / ˚ A N-Ac-l-cis-phenylalanine 2.4 N-Ac-cis-glycine 2. Evidently.9 N-Ac-trans-glycine 3.3 N-Ac-d-cis-alanine 3. 2. this indicates that the guest tips over in the opposite way to the l-cis guests allowing the carbonyl to form one to two additional hydrogen bonds to the amide pair. This is in agreement with the θf b results and the V model.5 N-Ac-l-trans-alanine 2.5 ˚ for l-cis-Ala and 2. 2. Again.CHAPTER 8.8 N-Ac-d-cis-phenylalanine 2. Phe and Gly. this supports the large θf b value found. the polar hydrogen points up away on the other side to the amide hydrogens. rH−H .

the greater the chance that these hydrogen bonds will form. Such tipping is possible since the side chain lies over the lower.13 illustrates this effect graphically. Two conclusions may be drawn. in an intermediate state and experiences the strongest H–H repulsion. with the larger the side chain. more open benzamide group.CHAPTER 8. Thus l guests with larger side chains may favour the cis conformation on steric grounds as well as due to the hydrogen bond gained since the carbonyl oxygen is smaller than the methyl group. Secondly. Clearly. and secondly. to relieve the side chain clash. while cis-Gly does not.11 and 8. l-cis-Ala lies. What is particularly significant from Figures 8. l-cis-Phe is so successful at forming this bond that the guest is sufficiently tipped over to separate the two hydrogens sufficiently. Possibly.13: Histogram of the distances of the carbonyl–N-benzylamide hydrogen bond for cis-Gly. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM N-Ac-l-alanine N-Ac-glycine 185 20 Frequency 15 10 5 0 1 N-Ac-l-phenylalanine 2 3 4 5 1 2 3 4 5 Distance /Å 1 2 3 4 5 6 Figure 8. the absence of the H–H repulsion appears to be the reason why trans-Gly easily forms a carbonyl–amide hydrogen bond. on average. the guest tips for two reasons — firstly to form the hydrogen bond. l-cis-Ala and l-cis-Phe. Firstly. hydrogen bond distances increase as the side chain gets smaller. What may be happening is that the larger side chain lying over the favourable benzamide cleft may be forcing the guest amide oxygen closer to the N-benzylamide unit even if the polar hydrogens are also getting closer.12 is that none of the . the side chain seems to affect the likelihood of the carbonyl-amide hydrogen bond forming for cis guests. It shows a histogram of the N-benzylamide hydrogen bond distance for these three guests. Figure 8. that it barely attempts to form the carbonyl–amide hydrogen bond and thus the two hydrogens never draw close.

However. appear to be able to form five or six hydrogen bonds. Such a bonding pattern is further evidence that the host conformation is placing the thiourea unit closer to one of the amides to make this possible. To conclude this section. Evidently. This suggests two things. stereochemistry and side chain of the guest. a steric analysis provides more clues. l-trans-Phe. A second reason is that their side chain lies over a high N-benzylamide group and so the guest must move to relieve this bad contact. The influence of host conformation on binding is discussed later.CHAPTER 8. The V model predicts that such guests will tip backwards. The extra one arises when one carboxylate oxygen simultaneously forms two hydrogen bonds to an amide hydrogen pair and one to a thiourea hydrogen. this result is consistent with the V binding model. the d-cis structures must also have trouble forming these hydrogen bonds. the host is likely to be adopting a more narrow structure to make this possible. . This is consistent with the θf b angles being close to 90◦ . Figure 8. The V model shows that this is easy to do for the l compounds. forcing the amide oxygen away from the guest amide group. all the other hydrogen bonds remaining the same. Three of the trans compounds. One reason for this is that the nearest amide hydrogen is the lower one. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 186 other six guests are able to form such a carbonyl–amide hydrogen bond. it can be seen that the hydrogen bond pattern varies quite considerably with the conformation. Secondly. these guests must be able to simultaneously hydrogen bond to amide hydrogens on each side of the cavity. these three guests must be positioned in a fairly vertical position to achieve this and they must be able to place their side groups over a low aryl group. In a few instances. d-trans-Ala. A third reason is the amide hydrogen repulsion. trans-Gly and l-trans-Phe are even able to form seven hydrogen bonds. l-trans-Ala and to a small extent. To more clearly understand why only certain hydrogen bonds form in each case. Firstly. For the trans structures. while d-trans-Ala must tip sideways or roll to some extent to achieve this.12 reveals that these are likely to be carboxylate–amide hydrogen bonds. which assumes that the guest oxygen is too distant to be considered.

Rather.4. the particular atom under consideration can clash sterically with many other atoms rather than just a few easily identifiable ones. However. An energy analysis is described later in a Section 8. looking at distances for steric analyses is more difficult than for the hydrogen bonding analysis. Thus even if atoms would like to occupy the same space. except only very slightly.1 Steric Analysis. Extracting Meaningful Steric Information. The difficulty with energy is that steric clashes are ambiguous to interpret. Two atoms pressing up close will look virtually the same as two atoms lying comfortably alongside each other. Secondly. is used to indicate a close contact which may or may not have a high energy penalty. It uses a particular radial distribution function about the group of interest called the contact radial probability distribution function (CRP). These small differences are difficult to interpret because energy also encompasses many different effects making it hard to extract only the steric ones. In examining structures. In the case of significant steric strain. unlike hydrogen bonds. they never overlap. the energy clearly rises. This is because steric effects cause such great penalties in energy that they simply do not compromise. However. the discrepancy in distance between atoms that do not overlap and atoms that do is too marginal to be significant or detectable. differences in energy between close contacts and moderate contacts are small. there is hardly any “on” or “off” or “degree” for steric clash. The first is by measuring energy. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 187 8. “steric clash”. the wording. This makes necessary the examination of many distances.5. It is a distribution function of contact distances between Lennard-Jones surfaces.CHAPTER 8. In this section. Since the main two factors determining the overall binding are hydrogen bonds and steric strain. there is only “off”.4 8. Distribution functions . There are two possible approaches for detecting steric clash. Thus average distances alone can reveal little. The other method to analyse steric clash is distancebased. Firstly. it would be ideal to do a steric analysis looking at how distances vary between certain atoms in an analogous fashion to the hydrogen bond analysis.

This is seen to vary marginally for each guest and may vary sufficiently to be significant. The CRP then decays away at larger distances due to the 1/r 2 weighting and the finite system size. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 188 are useful because they add up many small effects that are unnoticeable individually into something from which differences can be discerned. CRP distribution functions are defined as the radial probability that the closest host atom is at a given contact distance from the guest group.4. The more A negative is rc for the first peak. r. CRP distribution functions are constructed to show the degree to which particular groups are sterically hindered. although it can be non-zero at negative distances since some slight overlap of atoms does occur. They are constructed as follows. In this analysis. The second steric effect that does differ much more between guests is the degree to which the CRP function clusters around zero. while examining which parts of the host contribute to the contact distances indicates which parts of the host and guest are closely interacting. These are also the components of the guests that most significantly differ with stereochemistry and amide conformation. This distance is binned with a weighting of 1/r 2 . the larger the number of repulsive contacts. then the function will tend to cluster around zero.14. between the centre of every host atom and the nearest atom of the particular guest group is calculated for each configuration.CHAPTER 8. The CRP for the side chain with the rest of the host are shown for the four Ala and Phe guests in Figure 8. 8. Then from this distance is subtracted the geometric average of the Lennard-Jones radii of each of the two atoms involved to give a contact separation. The main parts of the guest that may be involved in steric clash are the side chains and the methyl group on the amide. The steepness of the gradient of the CRP at rc around 0 ˚ indicates repulsive steric clash. Short range clustering . If the guest is constrained in a very tight fit. No normalisation is applied since they are only used for comparison between different guests. CRP plots indicate steric clash in two ways. rc .2 Probing the Close Contacts for Different Guests. This function is expected to rise sharply with increasing distance from zero. The distance.

Such a trend is exactly what the V binding model predicts for side chains. However. All aryl side groups experience a similar large degree of steric strain and confinement. and so were not shown here. There appears to be a clear ordering of d– A cis-Ala > d-trans-Ala > l-trans-Ala > l-cis-Ala. Similar plots were examined for Gly whose side chains are only hydrogens. can indicate confinement for that part of the guest. The small amount of strain appeared identical for both cis and trans guests.14: The CRP between the host and any atom on the guest side chain. For .15. the trends are harder to make out. whether the lack of mobility observed is actually due to this part of the guest cannot be inferred since the guest may be being restrained by an interaction elsewhere in the system. The CRP functions in Figure 8. For all four hydrogens the distributions were virtually identical. The same CRP functions may also be plotted for the methyl groups on the guest amides. The side chains of both d isomers appear to be more sterically strained and confined than the l isomers. These are shown for the same four Ala and Phe guests in Figure 8.14 show that the trends in steric clash for the side chain are clear for Ala. but it is marginal. This is seen by the more negative first peak and greater clustering at rc around 0 ˚.CHAPTER 8. The trans conformations seem to be experiencing a little more strain and confinement. as would be expected. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 189 N-Ac-l-cis-alanine N-Ac-l-trans-alanine N-Ac-d-cis-alanine N-Ac-d-trans-alanine N-Ac-l-cis-phenylalanine N-Ac-l-trans-phenylalanine N-Ac-d-cis-phenylalanine N-Ac-d-trans-phenylalanine Probability 0 2 4 rC /Å 6 8 0 2 4 rC /Å 6 8 Figure 8. For Phe.

The methyl groups for l guests appear to be much more confined.15: The CRP between the host and any atom on the guest methyl group. However. clash for d-cis-Ala is possibly due to tipping sideways or rolling caused by the side chain. the tipping of l-cis-Phe to hydrogen bond must be bringing . The confinement of the methyl group for l-trans but not of the methyl group for d-trans and l-cis is exactly what the V-binding model predicts. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 190 N-Ac-l-cis-alanine N-Ac-l-trans-alanine N-Ac-d-cis-alanine N-Ac-d-trans-alanine N-Ac-l-cis-phenylalanine N-Ac-l-trans-phenylalanine N-Ac-d-cis-phenylalanine N-Ac-d-trans-phenylalanine Probability 0 2 4 rC /Å 6 8 0 2 4 rC /Å 6 8 Figure 8. Ala. What must be happening is as follows. however. This motion brings its methyl group into close contact with some other part of the host. The ordering here is l-trans-Phe > l-cis-Phe > d-cis-Phe > d-trans-Phe. followed by that of d-cis-Ala. However. the CRP functions of the methyl group are more uniform for all guests. apparently even more than the d-Ala isomers. Evidently.CHAPTER 8. The methyl group of l-trans experiences the most steric clash and confinement. The large clash for l-trans-Phe and the lack of steric clash for d isomers is again predicted by the V model. There is now a distinct ordering of confinement. The V model. that the clash is smaller for d-Phe than d-Ala is not predicted. does not predict the large clash for l-cis-Phe. These clashes are curious. The story for the methyl group of the Phe guests is quite different to that for the side chains. Which part of the host will be revealed in the next Subsection. while those for the d guests are a lot more mobile. The larger side chain for Phe may possibly be forcing the methyl group more into open space.

3 while plots for all other guests appeared similar to each other. The host was divided up into residues in a manner similar to that used in the Monte Carlo moves but now there are 11 in total rather than 9. The scale on each of these plots is not shown. being lower down the guest.4. These are rC 0.6 ˚. CRP plots were also made for the oxygen on the amide of the guest. 8. The results of such an analysis for all guests are presented in Figure 8. In addition. This comparison is valid and requires no normalisation because the CRP always accumulates only one closest guest group-host distance.6 ˚. indicating that overall. In order to achieve this. the side chains are more responsible for steric clash. 0. These are the thiourea. Three bins were used. this again appears to be another difference between Phe and Ala. One final point of interest is to compare the heights of the CRP functions for the side chain and methyl group. are closer to the host.3 ˚. A This figure shows where the close contacts are occurring using a schematic of the host broken down into its residues. two hydrocarbon chains. This is expected. CRP plots for the methyl group of Gly were found to be the same for cis and trans with little strain for either and so no figures are shown. it would be more informative to know which parts of the host are responsible for this. the side group for Phe is larger than the methyl group. Blue indicates the side chain. the closest contact distances between the group of the guest and every atom of the host were averaged over all configurations.3 The Nature of the Close Contacts. Since this does not occur for l-cis-Ala. While some conclusions have been drawn from CRP distribution functions about which guests have the most steric strain. These showed large short range peaks exactly when the carbonyl–amide hydrogen bonds form as discussed in Section 8. but the peaks for the side chains are approximately twice as high as those for the methyl groups for both Ala and Phe.3 < rC A 0. green the methyl . and A rC > 0. and four aryl segments.CHAPTER 8. four amide.16. The extent of clash between the guest group and the host residue was binned according to distance. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 191 its methyl group into contact with the host somewhere. showing not too much clash. given that side chains.

This indicates that smaller guests are also more mobile. again as expected. for the methyl group by green. In order to emphasise the properties of the different amide and aryl residues. Another clear trend is that the larger the side group. The guest amide and side groups generally interact sterically on opposite sides of the host as would be expected. but also the lack of them. and for the oxygen by red.16: Close contacts with the host for all ten guests. the greater the number of close contacts. care must be taken in interpreting it since a close distance does not necessarily indicate steric clash. A close contact is indicated for the side chain by blue. In these structures. As mentioned before. while grey indicates that there is no particular clash. group and red the oxygen. First of all a few general trends can safely be given. the guest is always oriented with its polar amide hydrogen facing the corner of the host containing the residue abbreviations suffixed by “1”. . There is a lot of information concerning the modes of binding in this figure.CHAPTER 8. The darker shade of colour is used to indicate the stronger clash. l-cis-Phe A H 1 B C 1 BO2 ANALYSIS OF THE MACROBICYCLE 12 SYSTEM l-cis-Ala cis-Gly B C 2 TH A H 1 B C 1 BO2 192 d-cis-Phe d-cis-Ala B C 2 A H 1 B C 1 BO2 d AL1 HC1 BO1 B C 2 TH g A H 1 B C 1 d AL1 HC1 BO1 d AL1 HC1 BO1 d AL1 HC1 BO1 B C 2 TH g g TH g A H 1 B C 1 d AL1 HC1 BO1 B C 2 TH g g HC2 AL2 d A H 2 g BO2 HC2 AL2 d A H 2 g HC2 AL2 d A H 2 g HC2 AL2 d A H 2 g BO2 HC2 AL2 d A H 2 l-trans-Phe A H 1 B C 1 BO2 l-trans-Ala B C 2 A H 1 B C 1 BO2 trans-Gly B C 2 A H 1 B C 1 BO2 d-trans-Ala B C 2 A H 1 B C 1 BO2 d-trans-Phe B C 2 A H 1 B C 1 BO2 d AL1 HC1 BO1 d AL1 HC1 BO1 d AL1 HC1 BO1 d AL1 HC1 BO1 d AL1 HC1 BO1 B C 2 TH g TH g TH g TH g TH g g HC2 AL2 d A H 2 g HC2 AL2 d A H 2 g HC2 AL2 d A H 2 g HC2 AL2 d A H 2 g HC2 AL2 d A H 2 TH – Thiourea AL – Benzamide Amide (Lower) HC – Hydrocarbon BC – N-Benzylamide Aryl (Closed) AH – N-Benzylamide Amide (Higher) BO – Benzamide Aryl (Open) (“1” denotes on the side of the guest amide hydrogen. It is not only the close contacts that are of interest. and “2” the other) Figure 8. N-benzylamide amide and aryl units are respectively referred to as “higher” and “closed” while benzamide amide and aryl units are termed “lower” and “open”.

This indicates that d-Phe guests are unable to approach as closely to the host as l-Phe guests. there is clear close contact around side 1 due to carbonyl–amide hydrogen bond formation for l-cis–Ala. The methyl group of the ltrans-Phe guest may be seen lying quite close to side 1. the clash for d guests appears to be reduced compared to that for l-Phe guests. However. the oxygen clashes instead with BC1. it only draws marginally near to the unfavourable AH1. this time. while l-trans-Phe does not. occurs. To understand the binding for the other guests. the steric contacts for all four Ala guests are exactly where the V-binding model predicts and the clash for d is greater than that for l. This indicates that d-trans-Phe tips. suggesting that the side chain may lie close to all of them in a close fit. applying the Vbinding model may prove of use. For the red oxygen atoms in the cis-Ala guests. the prediction of a clash for l-trans-Ala with the high amide. while in d-trans-Phe. For the green methyl groups of the trans-Ala guests. Not only is there no such hydrogen bond for d-cis-Phe but surprisingly. This suggests quite a strong degree of tipping of the whole guest to side 2. AH1. particularly to the favourable BO1 residue. while the d-Phe guests appear to only touch the host at a few points.CHAPTER 8. What is also evident from these diagrams is some degree of clash due to the group at the top end of the amide. the guest’s high mobility and the slight preference for the end opposite to the guest polar hydrogen. again the carbonyl–amide hydrogen bond is seen for l-cis-Phe. indicating some degree of tipping or rolling. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 193 Little can be deduced from the data for Gly except for the trans hydrogen bond. while there is no sign of any clash for the oxygen in d-cis–Ala. while the absence of a clash for d-trans-Ala is again predicted correctly. . It is evident that the observed binding pattern is somewhat more complex than that predicted by the simple V model. while the oxygen in d-trans-Ala clashes slightly with BC1. For the red oxygen. Now consider Phe. All four guests experience a large degree of clash between their blue side chains and the residues around cleft 2. This is seen in the clash of the methyl group with BC2 for l-cis-Ala and the clash with BO2 and HC2 for d-cis-Ala. The side chains for l-Phe guests lie near to 4 or 5 residues simultaneously. Starting with the blue side chains.

π–π interactions and steric effects. and a number of its components are listed in Table 8. with the exception of the nature of the secondary relief modes. the solvation energy of the whole complex. Edih . It becomes almost impossible let alone meaningless to assign individual terms to a particular effect. 8. 2. These three terms combined give E. the observations from the steric analysis have agreed with the θf b and hydrogen bond analyses and it has shed more light on the position of each guest and how it fits into the cavity. the dihedral contribution to Eint is given.5 8. ELJ . the two level l-trans modes. Nevertheless. The individual force field component energies as given in Eq. Energy Components. The difficulty is that energies encompass many different effects. it appears as though there are four general possibilities. These include intermolecular interactions such as hydrogen bonds. Exx .1 are of some use in small systems or when differences are obvious and unambiguous such as the torsional profile in the ethane guest. However. the Lennard-Jones energy. the two forwards tipping l-cis modes with carbonyl–amide hydrogen bonds. the . the host guest interaction energy and Esx . Exx is broken down into two further components. particularly for confirming any suspected physical effects. The V model was able to account for most of these trends.1 Energy Analysis.5. The total energy. and ECoul . ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 194 In summary. and the four d backwards tipping modes. E. the combined host and guest internal energies. Epol . in complex systems differences become many.3 for each host–guest complex. and host and guest strain. The coupling of all these contributions makes it rather difficult to deduce particular physical phenomena from energies. Finally. There are the two mobile Gly modes. At this point. subtle and distributed over many contributions.CHAPTER 8. the electrostatic energy. The components listed are Eint . the force field components when grouped into types can still serve some use. Energies indicate much about the relative stabilities of different complexes.

8 10.0 3.5 6.7 5.7 Epol 5.2 0.5 5. An extra decimal point is also included for precision purposes.CHAPTER 8.5 0.5 6.8 1.2 4.9 5. Absolute energies are rather large and it is not so easy to spot trends for such numbers.7 0.2 9.0 ELJ ECoul 1.0 3.1 4.5 0.2 Interpretation of the Energies.5 Eint 9. Table 8.3 7.5 5.8 14.3 6.5 12. Note that now the energy components are zeroed.4 6.0 3.9 Exx 5. A lower energy value does not necessarily imply that this guest is the strongest binder.4: Relative Total Energy and Components (kcal mol−1 ) for the Ten Guests. Guest N-Ac-l-cis-phenylalanine N-Ac-l-trans-phenylalanine N-Ac-l-cis-alanine N-Ac-l-trans-alanine N-Ac-cis-glycine N-Ac-trans-glycine N-Ac-d-cis-alanine N-Ac-d-trans-alanine N-Ac-d-cis-phenylalanine N-Ac-d-trans-phenylalanine E -193 -195 -191 -193 -191 -194 -188 -190 -187 -190 Eint -51 -48 -53 -55 -61 -60 -58 -55 -51 -55 Exx -85 -91 -82 -81 -71 -74 -70 -76 -77 -73 Esx -55 -55 -55 -56 -58 -59 -59 -57 -58 -61 Edih ELJ ECoul 20 -19 -66 26 -21 -69 26 -16 -65 25 -16 -64 24 -13 -58 24 -13 -61 22 -15 -55 24 -15 -61 23 -17 -59 18 -17 -56 Epol -49 -49 -49 -50 -52 -53 -53 -52 -52 -54 polarisation term of Esx is also given.0 0.4 5.6 16.0 13.4 4.9 9.8 4.9 8. It gives the relative energies with the most negative energy component becoming the energy zero.0 6.6 5.9 4.8 7. Before discussing the energies.0 .3: Absolute Total Energy and Components (kcal mol−1 ) for the Ten Guests.6 0.4 8.3 11.5.2 8.2 0.8 19.1 8.3 0.9 4.3 5.4 is also included.3 5. It is the free energy data in chloroform and the host that provide this. Therefore Table 8.5 6. 8.9 2.5 2.0 Edih 1.5 3.6 2.1 13.0 7. Guest N-Ac-l-cis-phenylalanine N-Ac-l-trans-phenylalanine N-Ac-l-cis-alanine N-Ac-l-trans-alanine N-Ac-cis-glycine N-Ac-trans-glycine N-Ac-d-cis-alanine N-Ac-d-trans-alanine N-Ac-d-cis-phenylalanine N-Ac-d-trans-phenylalanine E 1.4 14.0 8.9 3.6 Esx 6.5 2.1 2.5 5.5 5. components will not add up to the total energy.0 1.0 4.7 0.4 8. it is important to remember that they are not free energies which include entropy.9 3.9 17.9 4. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 195 Table 8.3 6.3 3.7 2.6 2.3 20.0 0.

indicating a favourable binding contact. which has a small Edih term. as might be expected. reflecting the smaller number of interactions. ELJ becomes more negative again for the larger d guests. d-cis-Ala appears to suffer a particular unfavourable binding energy. The variation in Exx is much more dramatic. Esx . Interestingly. giving them a larger size and a more negative Esx term. however. Thus the host and guest geometries are more perturbed for larger guests. When looking at the three components to E. the story is much more complex.11.CHAPTER 8. Going down the table. An explanation for this is made clearer by looking at its two components. A few general rules can be extracted nevertheless from each component. Even though the free energy calculations predict the cis conformation to be more stable for l-Phe. there is the general trend of rising energy going from l–Phe to d-Phe. the cause . ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 196 The total energy. with trans-Gly the only exception. This may be rationalised by remembering that d complexes tend to bind less tightly. the larger Eint . but does not recover to the same value for large l due to worse contacts. Going down the table. E. A large Eint must be evidence of conformational strain and possibly different conformations. varies the least between guests. It appears to run in the reverse order. within each conformation. Most of the cause for this trend interestingly lies in Edih . l-Ala and d-Phe. This observation probably reflects the fact that the trans conformers have a lower internal energy than the cis conformers. The larger the guest. Secondly. The variation in the Coulomb term appears to largely correlate with the number of hydrogen bonds observed earlier in Figure 8. ELJ rises as the guest size decreases. reveals two interesting facts. Firstly. There is one notable exception in this case and this is l-cis-Phe. though. the energies of all the trans conformers lie lower in energy than the respective cis conformers. l guests clearly have a more negative Exx . The third component. ELJ and Ecoul . with l-Phe guests the least stable and d-Phe the most. the origin of this stabilisation does not appear to come from the energies alone. Exx rises but becomes more negative again for larger d compounds. This trend indicates a general greater stability of l over d.

The surface area term. The final piece in the puzzle for understanding the binding comes from a conformational analysis of the host. Table 8. as would be expected. the energy analysis has shown that trans complexes have lower energies than cis complexes and l complexes have a lower energy than d. which is not shown. In summary.6 8. The main dihedrals of interest that bring about different conformations are those in the flexible hydrocarbon chain. Such intramolecular strain. leading to a more negative Esx term.5 shows how various geometric properties of the host vary when different guests are bound to it.1 Conformational Analysis. The only trend it shows is that it is more negative for larger guests. of which the main component is the dihedral term. Indeed.6. There have been a number of indications that the host structure varies for different guests.CHAPTER 8. Most of these differences may be understood by examining the flexible parts of the host.17. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 197 of this trend lies not so much in the cavity term but in the polarisation term. However. The exact definitions of these parameters are given in Figure 8. . may be indicative of conformational rearrangement. Variation of Host Shape With Different Guests. the proper sampling of the host was found to play a major role in obtaining consistent free energies. other dihedrals that still affect binding include the aryl dihedrals of the host and the “swing” dihedral of the aryl group in Phe. Thus the origins of this trend must be more electrostatic in nature than dispersion and cavitation-based. Therefore it is worthwhile examining the host structure to see what role it plays in binding. Epol . 8. More distantly spaced atoms are able to polarise the solvent to a greater degree. barely changes. l guests appear to achieve this principally through a strong host-guest interaction partially cancelled out by some intramoleculer strain and a smaller solvation term.

For the 10×100 configurations saved. gauche+.5 6.7 4.8 8. and may therefore limit the number of unique conformations available to the host.6 5.0 8.6 6.0 6.7 6.9 11.4 rBC1 7.5 4.1 7. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 198 Table 8. trans or gauche-.9 5. Guest N-Ac-l-cis-phenylalanine N-Ac-l-trans-phenylalanine N-Ac-l-cis-alanine N-Ac-l-trans-alanine N-Ac-cis-glycine N-Ac-trans-glycine N-Ac-d-cis-alanine N-Ac-d-trans-alanine N-Ac-d-cis-phenylalanine N-Ac-d-trans-phenylalanine rJC 10.6 8.1 9. These Properties are Defined in Figure 8.8 7.6.0 rBC2 8.1 5.1 6.1 rHC2 5.1 7. 92 unique conformations were counted for all twelve dihedrals for all guests. While only 100 structures would not be expected to provide all the conformations . Such a reduction has to be compensated for by the gaining of a favourable energetic host-guest interaction. When a guest binds.0 10.4 5.3 rd 4.3 8. This is marginally less than the 97 unique conformations observed for the free host out of 300 structures.3 5.0 8.9 6. If a conformation consisted of a unique combination of these three specifiers.6 9.2 Hydrocarbon Chain Conformation.7 8.1 7. Since it is likely that many more conformations will be found for the free host from 1000 structures.17.2 10.CHAPTER 8. it has to organise the host.0 10.8 7. Distances Are in ˚ and Angles in A Degrees.8 5.1 5.6 10.4 8.4 10. the general indications are that the total number of unique conformations is greater for the free host.7 7.2 8.6 5.9 6. The starting point for such an analysis is to examine the number of conformations for each guest arising from the host hydrocarbon chain.5 4.5 5.6 8. then it was taken as new conformation.3 6.4 10. It is not the fairest comparison to compare the total number of unique conformations since for the host-guest complexes they are being selected from 1000 structures while for the host from only 300.9 5.0 θAC θJC 84 91 74 100 79 93 75 95 80 90 78 87 81 84 78 92 84 90 86 88 rHC1 6. especially if it is to open up the cavity and bind inside.3 8. A conformation was defined similarly to before by dividing the dihedral angle into three sections.8 8.9 6.6 7.4 6.0 5.4 6.2 5.5: The Values of Various Geometric Properties For Each Host-Guest Complex.2 6.2 8.

6. the number of unique conformations generated in the simulations for each guest is given in Table 8.17: The definition of the distances and angles given in Table 8. indicating that the guests restricting the host is a real physical effect and not a result of poor sampling.AL2 rd c r ¨ r ¨ AH2.6. It can be seen that the number of hydrocarbon conformations the host can access is larger for smaller guests and smaller for larger guests.6 is that there appear to be fewer conformations for the l guests than for d. The number of unique conformations generated from simulated annealing for each guest is also given in Table 8. Most conformations only occur a few times. 92 conformations are still too many to analyse individually.5 illustrated for a schematic of macrobicycle 12.CHAPTER 8. possible. Before doing so. This may be a consequence of a stronger binding interaction for l guests. although some are more prevalent and these warrant closer analysis.BC2 HC2 TH θJC Figure 8. A comparison with annealed structures can suggest which is the case. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 199 θJC T o s ƒ d T ƒd  d ƒ θAC d ƒd d ƒ ƒd HC1 ƒd A B ƒd ƒd g H C ƒ d rJC ƒ dg 1 2 rHC1ƒ d ƒ d g ƒ d rBC1 rBC2 ƒ d o w TH ƒ ƒ d ƒ d ƒ d ƒ r g d ƒ HC2 A B gd ƒ d C H g dƒ dƒ ƒ 1 2 dƒ HC2 ƒ d dƒ ƒ θAC d  d dƒ ƒ d d d ‚ ƒ c c w BO2 AL2 AL1 BO1 r Tr ¨ ¨ BO2. To give a preliminary idea of the differences in conformation for each guest. It is possible that this difference is an artefact due to poorer sampling for larger guests. together with the total number of structures generated for that guest. The trends appear to be the same for both annealed and simulation conformations. it should still find the dominant ones. They also can reveal differences . The other main trend in Table 8. more clues can be obtained concerning which are the important conformations from the dihedral distributions. as would be expected.

19 are the dihedral distributions of the hydrocarbon chain for l-cis-Phe and l-trans-Phe. A sample dihedral distribution for one of these. d-cis-Ala. What is also evident is the greater flexibility in the other dihedrals.6: Number of Unique Hydrocarbon Conformations For Each Host-Guest Complex From Annealed Structures and Simulation. An analysis of the dihedral distributions for all other guests indicated that their dominant subconformation resembled either the subconformation of l-cis-Phe. Indeed. While eight of the dihedrals (unshaded) remain mostly unchanged for each guest.20. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 200 Table 8. The sampling compared to the host alone (see Fig- ure 6. especially for the larger Phe guests. trans-Gly. the four dihedrals that are shaded differ quite dramatically between the two guests. The predominant subconformation is g–ttg– for l-cis-Phe and tg–g–t for l-trans-Phe. The guest similar to l-trans-Phe was l-trans-Ala. or something in between. d-cis-Phe. it can be seen that there are a number of differences between each distribution. All the other guests showed properties common to both. some very interesting differences were observed. and l-cis-Ala. d-trans-Ala. These are cis-Gly. Despite the restricted sampling.CHAPTER 8. the subconformation of l-trans-Phe. Almost all of the dihedrals appear to be rather more restricted.18 and 8. Illustrated in Figures 8. It can be seen that the shaded dihedrals lie in an intermediate state between the two extremes.10) can be seen to be somewhat reduced. trans-Gly is given in Figure 8. This must be a consequence of the . The guests similar to lcis-Phe were d-trans-Phe. Guest N-Ac-l-cis-phenylalanine N-Ac-l-trans-phenylalanine N-Ac-l-cis-alanine N-Ac-l-trans-alanine N-Ac-cis-glycine N-Ac-trans-glycine N-Ac-d-cis-alanine N-Ac-d-trans-alanine N-Ac-d-cis-phenylalanine N-Ac-d-trans-phenylalanine Number of Unique Conformations Annealed (Total) Simulation 7 (8) 12 19 (28) 8 22 (31) 16 20 (30) 18 24 (32) 22 29 (40) 32 32 (47) 16 28 (34) 14 26 (39) 17 17 (20) 16 for each guest.

CHAPTER 8. However. greater sampling was also found for all the other guests with intermediate subconformation behaviour.18: The dihedral distribution for all the hydrocarbon dihedrals in the host– l-cis-Phe complex. On face value. Indeed. smaller guest. a number of patterns similar to those found in previous analyses may suggest possible reasons. this different subconformation preference for each guest seems rather inexplicable. N ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 201 C dihedral distribution (x10 configurations) 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 N H C C C H H H H C C C C H H H H H H C C N N H H H H H H O H H 5 0 90 180 270 360 S S C C N N Du Du H H C 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 H H C C H H H H C C H H C C H H H H H H H H N H C C C O 0 90 dihedral / degrees 180 270 360 Figure 8. by examining the subconformations more closely. N .

3 Dominant Hydrocarbon Subconformations. only six are significantly populated and common to more than one guest.7 together with the individual guest populations. When only these four dihedrals making up the subconformation are considered. g–ttg–.6.19: The dihedral distribution for all the hydrocarbon dihedrals in the host– l-trans-Phe complex. and the second conformation. N ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 202 C dihedral distribution (x10 configurations) 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 N H C C C H H H H C C C C H H H H H H C C N N H H H H H H O H H 5 0 90 180 270 360 S S C C N N Du Du H H C 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 H H C C H H H H C C H H C C H H H H H H H H N H C C C O 0 90 dihedral / degrees 180 270 360 Figure 8. These are listed in Table 8. N . The dominant subconformations appear to differ by pairs. 8. only 26 subconformations are found for all guests. Of these. For example. the first conformation. g–tg–t differ in the third and fourth dihedrals. This pairing reflects the fact that the restraint of chain closure usually requires two dihedrals to adjust in synchronisation.CHAPTER 8.

is due to the occurrence of other subconformations not listed here. two particular subconformations are predominant. while the tg–g–t subconformation on the other hand is the most common for guests at the l- N . Any discrepancy between the total number of subconformations and the total. 100.CHAPTER 8. The main subconformations appear to be symmetric between the two hydrocarbon chains.20: The dihedral distribution for all the hydrocarbon dihedrals in the host– trans-Gly complex. N ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 203 C dihedral distribution (x10 configurations) 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 N H C C C H H H H C C C C H H H H H H C C N N H H H H H H O H H 5 0 90 180 270 360 S S C C N N Du Du H H C 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 H H C C H H H H C C H H C C H H H H H H H H N H C C C O 0 90 dihedral / degrees 180 270 360 Figure 8. The g–ttg– subconformation dominates for the guests at the l-cis-Phe end of the table. As observed earlier. while the others are asymmetric. Note that the order in which the guests are listed is deliberately set to be the same as that observed in the trend for the dihedral angle distributions.

the g–ttg– subconformation occurs 363 times in total. One of these occurs 136 times.7: The Populations of Each Host Sub-Conformation For Each Guest. it was not truly representative of conformational class since the full conformation for the whole hydrocarbon chain differed to quite an extent for the two d-Phe guests. These two dominant subconformations can go a long way to explaining the geometric differences in host structure. By looking back at the original full conformations. Another conformation occurred 156 times. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 204 Table 8. it is possible to extract some information concerning the likelihood of these subconformations occurring.g.363 g.CHAPTER 8.t t g.g+ t g75 g. Conformation Total d-Phe l-Phe d-Ala d-Phe trans cis cis cis Gly Gly d-Ala l-Ala l-Ala l-Phe cis trans trans cis trans trans g. Thus there must be many more ways the other dihedrals can arrange themselves for this subconformation. On the other hand.t 168 t g. This one was found to contain the tg–g–t subconformation.t 66 t g.t g. This one was found to contain the g–ttg– subconformation. Even though d-trans-Phe appears to be even more extreme than l-cisPhe with 92 % of it adopting the first subconformation compared to 65 %.t g+ g53 % of total 77 92 0 0 0 2 1 95 65 28 0 0 3 4 100 64 0 0 12 3 2 81 51 2 0 0 4 11 68 27 12 11 14 19 13 96 5 0 1 13 13 1 33 17 10 7 6 17 6 72 34 1 42 4 10 3 94 8 8 43 4 4 5 72 0 5 64 11 0 7 87 trans-Phe end. there is a cross-over region in which other intermediate subconformations also occur.t g64 g. This is done by comparing the geometric proper- . it suggests that there appears to be only one main way that the other eight dihedrals can arrange themselves to form this subconformation. suggesting that this is a rather extreme structure. In between. Two full conformations stand out from the total of 1000. Given that the tg–g–t subconformation only occurs 168 times in total for all full conformations. compared to the other eight. This diverse sampling is particularly the case for transGly which can also access to a large extent a number of other subconformations not listed here. Hence this subconformation is likely to be easier to form.

For the wide cavity.9 to 3.6 6. for the wide subconformation.6 ties averaged for each conformation with the respective properties averaged for each guest in Table 8.6 82 10. given by the distances.2 80 10. A Subconformation g. The geometric properties concerned with the hydrocarbon chain are listed in Table 8. The two angles that describe the shape of the cavity are θAC and θJC are defined in Figure 8.9 rd 5.5.17.t t g. The main difference between the conformations is their width. rd This distance is defined in Figure 8.17. θAC = 79◦ and θJC = 92◦ .7 5. The shallower cavity depth was not found to vary to any great extent.g+ t gg.g.t g+ grJC θAC 10. Variation in this depth depends on A .3 5. Therefore.3 6.17 and their average values are listed in Table 8. ranging from 2.7 7. Another feature of the host shape that differs between guests is the deep depth of the cavity.9 7.8. the g–ttg– subconformation will be A referred to as the “wide” subconformation.8 8.3 6.9 7.t t g. while tg–g–t will be referred to as the “narrow” one.3 80 θJC 87 92 96 89 90 89 rHC1 6.6 ˚ for g–ttg– but only A 9.4 rHC2 6.1 79 9. For the narrow cavity. from this point on.t g.7 8. this dihedral points directly towards the other junction carbon. The other difference is the length of the hydrocarbon chain segments.3 ˚. There are a number of additional features in the host structure that differentiate between wide and narrow conformations.t gg.6 ˚.2 5.9 7.6 5.5 74 10. rHC1 and rHC2 . rHC1 = rHC2 = 6.8: The Values of Various Geometric Properties For Each Subconformation Averaged For Each Host-Guest Complex.3 5. while the narrow cavity has much shorter hydrocarbon A A A chains with rHC1 = 5. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 205 Table 8.0 7. For the wide cavity.7 6.2 5.5 5.5 for each guest.CHAPTER 8.0 8.8 5.3 5. θAC = 82◦ and θJC = 87◦ on average over all guests.1 rBC2 8.2 7.9 rBC2 8.5 ˚ for tg–g–t. The average junction carbon to junction carbon distance is 10. These properties are defined in Figure 8.6 5. The basic reason why the narrow subconformation is narrow is because the first dihedral of the subconformation points the hydrocarbon chain away from the other junction carbon. Distances are in ˚ and Angles in Degrees.2 7.4 4.8 ˚ and rHC2 = 5. However.t t gg.7 ˚.1 79 10.

Any shift of it from the centre may be revealed by a difference between rHC1 and rHC2 . The rd value varies a lot more. l-trans-Ala and l-cis-Ala. This depth appears to be controlled by the degree to which the two aryl-amide A sides lift up. It is difficult to see any trends in thiourea position for the predominant subconformations since they are averaged over so many different guests. Definitions and values of these distances are also given in Figure 8. the distances may shorten to 7. On average. d-cis-Ala and the two d-Phe guests. the role of rBC1 and rBC2 appears to be quite variable. A Alternatively. For Gly. The complexes with deep cavities A occur for trans-Gly and d-cis-Ala. the distances lie around 8 ˚ but they A can fluctuate by up to 0. However. Another interesting feature concerns the position of the thiourea unit.6 ˚ to make the cavity more narrow. Table 8. which make up opposing sides of the main host ring. An examination of Table 8. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 206 the hydrocarbon chain conformation.3–8. as seen for d-cis-Phe and d-trans-Phe. as seen for the two l-trans guests. In these instances. However.3 ˚ to enlarge rJC . these two distances are seen to contract if the cavity becomes narrower. all four aryl rings often turn over to closed.6 to 6. the cavity width. It may also increase to 8.17 and Table 8.5 shows clearer trends for each guest.4 ˚. the two benzamide aryl rings lie open. In other words. A lengthening in rHC1 occurs for l-cis-Phe and lengthening occurs for trans-Gly. almost trapping . Alternatively. while the two N-benzylamide aryl rings lie closed. in a few instances this was found to change slightly. rBC1 increases to 8. from 4.CHAPTER 8. This shift to one end of the cavity would make it easier for the guest to tip and make its carboxylate oxygens simultaneously hydrogen bond to both pairs of thiourea and host amide hydrogens. as discussed earlier. A However.3 ˚. the hydrocarbon chains are becoming more trans-like. This lifting up appears to be caused by a narrowing of the θAC and θJC angles. This behaviour is in agreement with the presence of a wider or deeper cavity.5. A as occurs for l-trans-Phe. In general.4 ˚ in either direction.5 indicates that rd also appears to be influenced by the length of the two N-benzylamide units.2–8. rBC1 and rBC2 .

In summary. it may be able to form favourable π–π interactions with the aryl groups of the host ring. The position of the phenyl group is important in binding to the host for two reasons. d-trans-Phe.21. Figure 8. What this conformational analysis has shown is the structural variation in the host for different binding modes. The BC2 aryl group was often found to lie open. it appears that l-trans-Phe and l-trans-Ala prefer narrow cavities. However. The t conformation has the phenyl ring roughly parallel to the long axis of the guest. The small size of the guest must be behind this feature.6. the phenyl group can still not sit comfortably. g+ points away in the plane of the guest. Firstly. There are three possible conformations available for this dihedral. while the connecting BO1 aryl group lay closed. Preferences are intimately tied in with their θf b angles.22 shows the total force field energy for a dihedral drive (see Subsection 3. there are still small differences between each complex in order to optimise the overall host-guest interaction. This appears to be a result of the large phenyl side group pushing against the BC2 group. smaller guests can fit in a cavity of either size. while the remaining.3. This is shown in Figure 8. Even within a given binding mode. Guests with θf b close to 90◦ prefer narrow cavities. g–. there are indications that the relative energies of each conformation depend on the amide conformation. 8. this dihedral is defined with respect to the carboxylate carbon rather than the nitrogen. and d-cis-Ala prefer wide cavities.2 for details) around the swing dihedral when the amide conformation is trans and when .4. Specifically.CHAPTER 8.4 Phenyl Ring Conformation of the Guest. Evidently. while g– points perpendicularly away from the plane of the guest. The preferred conformation in chloroform is found to be g+. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 207 the guest inside. The other deviation occurrs for the d-Phe guests. t. l-cis-Phe. d-cis-Phe. the backward tipping may be sufficient to reduce the bad contact for d-Ala guests. One final dihedral of interest is the swing dihedral of the Phe guests. and g+. These are shown in Figure 8. the phenyl group must be placed into space. while secondly. while guests which tip over prefer wider cavities. but for d-Phe guests.

22: The dihedral angle energy profiles for the swing dihedral in Phe for cis and trans conformations. There appears to be remarkable variety in the conformation adopted. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM H 208 H C C C H H H H H H H H H C H H C C N H H H O H H C C C H H C H C C C H C C H C C C N H O C N C C H O H C C H H HH C C H C C C H H H C O H C H C O O C O C O O g+ t g– Figure 8. The preference for g+ swing trans amide is clear. A combination of the internal energy and steric interactions with the host play a role in determining which conformations occur.21: The three conformations for the swing dihedral of Phe.CHAPTER 8. it is cis. . The d guests adopts the t conformation almost exclusively. Illustrations of how each Phe guest binds in the host are given in Figure 8.23. calculated using force field energies.9 shows how the populations of the three conformations for each hostguest complex. This must be a result of the backward tipping that draws the side chain deeper into the cavity. Table 8. 15 −1 g− t cis ( ) trans ( ) g+ Energy / kcal mol 10 5 0 0 60 120 180 240 300 Dihedral Angle / degrees 360 Figure 8. what is especially interesting is that in the cis amide conformation. the t swing dihedral is not only the most stable conformation of the three but also comparable to the t swing conformation of the trans amide. However. This leaves the phenyl nowhere to go but straight up.

Like l-trans-Phe. as seen in Figure 8.9: The Distribution for the Guest C–C–C–C Dihedral Angle Between the Three Conformations For the Four Phe guests. on the other hand. This stabilisation of the swing dihedral is important in itself. the even greater implication is that for these higher energy swing conformations. as indicated in the dihedral profile Figure 8. One interesting feature about this stabilisation is that the high energy conformation for the swing dihedral is different between l and d Phe. Conformation g+ t g– l-cis-Phe l-trans-Phe 18 66 5 34 77 0 d-cis-Phe 0 99 1 d-trans-Phe 0 99 1 l-trans-Phe prefers to adopt the g+ conformation. This . In doing this it places its aryl group over BO2. The forward tipping must raise the phenyl group out of the host to a small degree.22. there is still some inclination for it go t.CHAPTER 8. the cis amide conformation is stabilised internally as a result of a conformational change in the phenyl group induced by a remote steric interaction with the host. gaining favourable π–π interactions with it and BC1. The major result that may be drawn from the preferred dihedral conformation is that l-cis-Phe.22 indicate that the stabilisation is slightly greater for d.22. The dihedral energy profiles in Figure 8. the relative energy of cis and trans amide conformations is greatly reduced compared to that in the lowest energy g+ conformation. The average dihedral angle comes out to be 247◦ . However. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 209 Table 8. However. adopts the g– conformation. l-cis-Phe places its phenyl group over BO2. given the higher energy for the g– conformation. Thus for Phe. although it is also able to access the other two. making all three conformations accessible. The preferred conformation is interesting. This threefold partitioning of conformations is slightly misleading for l-cis-Phe since the g– conformation is not well defined. picking up favourable π–π interactions with it and the adjacent BC1. The l-cis-Phe. on the edge of the shoulder. d-cis-Phe and d-trans-Phe all adopt higher energy conformations. the most stable in chloroform.

difference may provide a means to selectively stabilise only one stereoisomer. it may suffer from some inaccuracy for higher energy conformations.23: Particular structures for the four Phe guests. l-cis-Phe is g–. highlighting the different swing conformations. while both d-Phe guests are t. Since this dihedral profile is only a force field energy profile. l-trans-Phe is g+. This is because force fields are not perfect and usually give greater priority to getting the low energy conformations correct. . Therefore. ab initio calculations need to be carried out on all the conformations to better quantify this effect. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 210 N-Ac-l-cis-phenylalanine N-Ac-l-trans-phenylalanine N-Ac-d-cis-phenylalanine N-Ac-d-trans-phenylalanine Figure 8.CHAPTER 8.

24: The four main binding motifs of the guest in the host shown for representative molecules. These are illustrated in Figure 8.7.CHAPTER 8. .7 8.1 Rationalisation of Free Energies. The different motifs that each guest adopt are shown in Table 8. Given the large number of binding observations made. a summary of them all is now given in Table 8. Four main binding motifs have been observed for the ten guests observed in the simulations.10.11. Here is a summary of the main characteristics of each motif.24. 8. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 211 N-Ac-l-cis-phenylalanine (Motif 1) N-Ac-l-trans-phenylalanine (Motif 2) N-Ac-d-cis-phenylalanine (Motif 3) N-Ac-trans-glycine (Motif 4) Figure 8. Binding Motifs Observed in The Simulations.

The Presence of a Space Indicates no Special Behaviour. S = Shallow): S S D D S Number of Host Conformations (F = Few. N = Narrow): N N W Unusual Cavity Depth (D = Deep. M = Many): F F M M Shifting of Thiourea: * * * * Swing Conformation: g– g+ t a W S * t Low energies are the most negative. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 212 Table 8.10: Summary of the Main Binding Properties For Each Guest.CHAPTER 8. B = Backward): F V F V V. l-Phe cis l-Phe trans l-Ala cis l-Phe trans Gly cis Gly trans d-Ala cis d-Ala trans d-Phe cis d-Phe trans Tipping (V = vertical. H = High): H H L L L Binding Energy: L L L L H H H Solvation Energy: H H H L L l-Phe cis l-Phe trans l-Ala cis l-Phe trans Gly cis Gly trans d-Ala cis d-Ala trans H H L d-Phe cis d-Phe trans Unusual Cavity Width (W = Wide. F = Forward.B B B B B l-Phe cis l-Phe trans l-Ala cis l-Phe trans Gly cis Gly trans d-Ala cis d-Ala trans d-Phe cis B d-Phe trans Carbonyl–Amide Hydrogen Bond: * * * * Carboxylate–Amide Hydrogen Bond. . BOTH SIDES: * * * * Polar Hydrogen Repulsion: * * * l-Phe cis l-Phe trans l-Ala cis l-Phe trans Gly cis Gly trans d-Ala cis d-Ala trans d-Phe cis d-Phe trans Side Group Clash: * * * * Methyl Group Clash: * * * * Oxygen Clash (Including Hydrogen Bonds): * * * l-Phe cis l-Phe trans l-Ala cis l-Phe trans Gly cis Gly trans d-Ala cis * * * * d-Ala trans d-Phe cis * d-Phe trans a Internal Energy (L = Low.

CHAPTER 8. • Motif 4 has the guest tipped backwards completely on its side with the carbonyl oxygen forming one or two hydrogen bonds to the amide hydrogens on the opposite side of the cavity to the guest polar hydrogen. This is particularly so for the d isomers. and usually one or two thiourea hydrogens. while the other forms a double hydrogen bond to the amide hydrogens on the opposite side to the polar hydrogen of the guest. One carboxylate oxygen forms a double hydrogen bond to the thiourea while the other forms a double hydrogen bond to the amide group on the opposite side to the guest polar hydrogen. This motif appears to . Such a bonding pattern is indicative of l-cis guests. Such a bonding pattern is found for l-trans guests. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 213 Table 8. Motif 1 2 3 4 l-Phe cis trans 94 6 86 14 l-Ala cis trans 60 35 71 5 29 Gly cis trans 7 25 68 28 72 d-Ala cis trans 27 73 d-Phe cis trans 100 100 100 • Motif 1 has the guest tipping forwards moderately with the carbonyl oxygen hydrogen bonding to one or possibly two amide hydrogens of the host. The side chain typically lies over the benzamide.11: The Number of Times a Given Binding Motif Occurs For Each Guest. This motif is accessible to most guests but it is characteristic of those unable to hydrogen bond to the host amides and that are sterically confined. • Motif 2 has the both the carboxylate oxygens hydrogen bonding to at least one amide hydrogen on either side of the cavity. The Dominant Motif is Shown in Bold. One oxygen of the carboxylate forms a double hydrogen bond to the thiourea. • Motif 3 has the guest tipping backwards such that the side chain descends partially into the cavity. Guests in this motif are fairly upright and are unable to form carbonyl–amide hydrogen bonds but are able to place their side chains over the benzamide unit so as to prevent tipping.

Assuming that l-trans-Ala and l-cis-Ala have similar properties in . The table shows the general trend. However. it is the ability of the host to hold the guest in the correct orientation using the other hydrogen bonds and careful side group positioning that forces the guest amide to be more stable in the cis conformation. suggests that the cis conformation is more stable in the host than the trans. Motif 3 appears accessible to all guests. the preferred motif goes from 3 to 1. of which there were only one or two. which lacks a side chain. Both l-trans and l-cis structures appear to have a number of preferable interactions such as many hydrogen bonds and minimal steric clash. it appears fairly clear that guests that bind in Motif 3 are the worst binders. From this anaylsis. This allows trans-Gly to adopt Motif 4 all by itself. indicating that these conformations may be comparable in stability to each other. This is primarily due to the different heights of the host aryl units at each end of the cavity. while cis-Gly surprisingly adopts Motif 3 most of the time rather than Motif 1 with the additional hydrogen bonds.11 reveals that l-cis-Ala is able to sample both Motifs 1 and 2. Alternatively. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 214 have fairly stringent requirements that only trans-Gly can satisfy. The anomalous guest is Gly. Table 8. yet it prefers 1. This would explain the stereoselectivity of the host.CHAPTER 8. suggesting that this mode is rather non-selective. These motifs are fairly well-defined and rarely overlap. that as the d side chain gets smaller and the l side chain gets larger. the V model. This is the particularly the case for l guests. going from right to left. but some can adopt a second or third binding motif. Any borderline cases. It is the presence of the other lower energy modes that brings about the host selectivity. with the exception of l-cis-Phe. particularly the ones near the side chain. which has been shown to prove fairly useful in interpreting the results. It can be seen that each guest has a preferred binding motif. That some guests adopt more than one motif demonstrates the finely balanced energetics. On the question of cis amide stabilisation. Indeed. This fact raises the suspicion that the presence of a correctly placed side chain is critical in stabilising the cis conformation. were assigned to Motif 3. a telling feature as to which is more stable is in the motifs that l-trans-Ala adopts.

there is reason to believe that they bind the strongest simply by virtue of their larger size. On the question of whether trans-Gly or cis-Gly is more stable. The basis of rationalisation in this analysis has been shape and energetics. Larger guests would be expected to have stronger binding energies. there are two principal problems. this issue was not fully resolved.2 Connection Between Binding Free Energies and Motifs. The second problem is that real binding depends on free energies which includes entropy as well as energy.CHAPTER 8. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 215 Motif 2. Entropy is much harder to model. trans-Gly would appear to be more stable since it prefers to adopt Motif 4 with up to six hydrogen bonds. it is difficult to predict which binds better. in terms of using the analysis to predict actual free energies. 8. However. In other words. apart from qualitative arguments about changes in host flexibility and guest mobility. On the other hand. However. all that may be concluded is that no large difference in binding free energy would be expected. it must be concluded that Motif 1 is more stable than Motif 2. The assumption made for this is reasonable because Motif 2 is characterised by both ends of the amide group being well-placed in space. Apart from a real experiment itself. Given that the focus of this study was more on the stereochemical and conformational selectivity. the only way to determine relative binding free energies is to perform absolute free energy calcula- . Given that l-Phe and l-Ala still seem to be able to fit well in the cavity. the cis conformation is more stable than trans. The first is that many effects are qualitative and competing and it is hard to determine which one will dominate. the fact that cis-Gly chooses to form usually no more than four hydrogen bonds suggests that there is something to be gained from being mobile and possibly allowing the host to be flexible too. In any case.7. the side chain only seems to play a steric role in binding and so all side chains may be expected to behave similarly. A large amount of information may be obtained and interpreted from such an analysis. the d guests aside. but at the same time there may be some reduced freedom and strain for the host and the guest. On the question of selectivity between different amino acid derivatives.

as has been done in Chapter 5. and the phenyl swing dihedral stabilises d-Phe. On the question of stereoselectivity. The apparent better binding for trans-Gly does not seem to carry across to a better relative free energy binding energy. These results again agree with the analysis. the free energies are in strong agreement with the analysis. which only predicts one bad clash for each. presumably due to concomitant entropic restrictions. This observation is consistent with the V model. At this point in this work. the stabilisation is particularly strong at around 2 kcal mol−1 for l-Phe. the greater ability of the trans conformation to form more hydrogen bonds comes at the price of loss of mobility and host flexibility. Gly is able to bind the best in either conformation with the most number of hydrogen bonds and the most comfortable fit. for which the difference is only around 1 kcal mol−1 . predict that l-Phe binds the . The carbonyl oxygen–amide bond stabilises the cis conformation for both l guests. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 216 tions. on the other hand. There is little benefit in adopting the cis amide bond since cis-Gly loses too much in the way of good contacts if it tries to hydrogen bond to the host amide. so it is the ultimate test to compare how they match up. The inability of d guests to bind properly to the host appears to costs it around 2 kcal mol−1 of free energy compared to l for all enantiomers with the exception of l-trans-Ala and d-trans-Ala. moderately strong at 1 kcal mol−1 for l-Ala. their results may be directly compared. absent for d-Ala and moderate again 1 kcal mol−1 for d-Phe. The strongest stabilisation for l-trans-Phe seems to be due to the fact that it experiences stabilisations due to both effects. This comes at a small price of straining the host to fully accommodate the guest.CHAPTER 8. Considering all the other guests. both relative free energies and an analysis have been performed. For Gly. The cis amide stabilisation free energies also correlate well with the analysis. there appears to be no stabilisation of the cis conformation. Since they have both been performed using the same simulation protocol and for binding inside the cavity. Evidently. particularly for trans-Gly. The free energies. The analysis was not able to predict any significant relative binding between different guests.

being inconclusive. and so a direct comparison with experiment is not possible. steric contacts and conformations.CHAPTER 8. The analysis was able to explain that the origins of enantioselectivity arose mainly from a badly positioned side chain. ANALYSIS OF THE MACROBICYCLE 12 SYSTEM 217 strongest. this analysis has fulfilled its objectives and made possible the interpretation of the binding in terms of the fundamental structure of each guest. while there is little difference between d guests. hydrogen bonds. Overall. . is not inconsistent with this ordering predicted by the free energies.8 Conclusion. All guests are able to adopt more than one of these motifs. The results predicted by both the free energy calculations and the analysis agree with experiment on the key questions of conformational stabilisation and enantioselectivity for binding inside the cavity. 8. The stabilisation of the cis amide conformation was attributed to two factors. One was the formation of a well-placed carbonyl oxygen–amide hydrogen bond. This is because experimental binding relative free energies were obtained for guests binding in the non-selective outside position and the selective inside position. For all guests the binding modes could be classified into four motifs. The analysis was not able to fully elucidate relative binding constants for different guests. The other was a internal stabilisation of Phe due to the stabilisation of the swing dihedral by a remote steric interaction with the host. A detailed analysis of the host-guest structures has been presented. followed by l-Ala and Gly. The analysis. Each complex exhibits a range of interesting features including orientation.

FEP free energy calculations were performed to calculate the relative binding free energies of amino acid derivatives to the host. The primary motivation for this study was to understand the novel binding behaviour that had been observed in experiment. Thirdly. A number of improvements were made to the simulation program to improve both sampling and speed. Fourthly. The decision taken was to calculate free energies using FEP. A number of steps were required to achieve this goal. Sixthly. This work highlighted a number of caveats in using the LIE method. a simulation protocol was developed to model the system. the use of REPD charges was validated by calculating their free energies of hydration using FEP and LIE free energy methods. Geometry. Firstly. generating equilibrium configurations using Monte Carlo on a system modelled with the OPLSAA force field. This behaviour was the enantioselectivity and the stabilisation of the cis amide conformation binding inside the host cavity. dihedral and charge parameters that were missing from the OPLS-AA force field were derived. was developed to produce OPLS-like charges by fitting to the molecular electrostatic potential. Fifthly. REPD. Secondly. a range of computer simulation methods were assessed to determine the most suitable method for this system. a 218 . a number of more sophisticated MC moves and the GB/SA continuum solvent method were implemented into the simulation protocol to improve sampling of the macrobicycle 12 system. a new charge parameterisation method. Finally.Chapter 9 Conclusion A detailed computer simulation of the binding of amino acid derivatives to macrobicycle 12 has been presented.

Contrary to most host-guest systems. while not examined in this work. The first insight is the remarkable way the host and guest interact. They are faster and yield better sampling than in explicit solvent. to not only reproduce behaviour but provide a means to directly understand the system’s properties. However. This analysis led to a number of interesting insights into the binding process.CHAPTER 9. would also be of great interest. either involving a potential of mean force or double decoupling. it is now worthwhile to comment on the overall findings of this study and its usefulness for the future. it would be a significantly more demanding task. A detailed explanation for the binding behaviour was also proposed. The end result of this study was that the relative binding free energies calculated agreed well with the experimental findings. However. different guests appear to make the host take on slightly different structures. However. CONCLUSION 219 structural and energetic analysis of the macrobicycle 12 system was performed in order to understand the origins of the relative binding free energies. Each of these steps are dealt with in their own chapters. . The usefulness of computer simulations was demonstrated. based largely on a model termed the V binding model. the explicit nature of solvent is important to many binding phenomena. Continuum solvent models have their advantages. the host when alone adopts quite a different structure to that in the host-guest complex. Furthermore. The success of this procedure hinged on replacing an explicit solvent model with a continuum solvent model. As was found in this study. most likely requiring full free energy of binding calculations. the guest is able to force a large change in host structure to bind inside the cavity. a simulation protocol has been developed to perform MC simulations on highly constrained cyclic systems. computationally. it would be desirable for a method to be developed that allowed large structural change in explicit solvent using reasonable simulation lengths. there would be even greater sampling problems. The situation is further complicated by the possibility of binding also occurring outside the cavity. Therefore. On the methodological side. This aspect concerning the location of binding. Evidently.

The different binding motifs observed arise due to a number of competing effects. This approach may be extended to stabilise only one particular Motif. or by forcing the side chain into a higher energy conformation. Differences in the structures observed for different motifs may provide clues as to what alterations may be appropriate. indications from experiment and simulation are that the host is not very selective. A proper study of the effect of side chain would require the testing of many more amino acid derivatives and calculation of the relative free energies of binding to the whole host.CHAPTER 9. it may be possible to alter this part of the host to create some side chain selectivity. Stabilisation of the cis amide conformation is quite an achievement. catalysis of cis–trans isomerism itself in the manner of protein rotamases would be a major . CONCLUSION 220 The second insight is that even for this relatively simple host-guest system. or it may be used to develop new motifs. the binding process inside the cavity is still rather complex. four different binding motifs were identified by which the guests bound. although a more qualitative and informative technique may be to physically alter parts of the host and guest and observe what effect that has on binding. Even having established these effects. Thus only guests that bound in the other motifs would be able to bind. In any case. Two means to achieve this have been demonstrated. One possible approach to further enhance the host selectivity may be to alter the host in some way so as to destabilise Motif 3 while leaving the others unchanged. Part of the problem lay in the fact that experimental binding constants were not available for binding solely inside the cavity. selectivity arose because some guests were able to adopt other motifs even lower in energy. having established that the side chain lies on top of the benzamide ring for l guests. Free energy calculations may provide a direct answer to this problem. there is still considerable difficulty in predicting which one dominates. Nevertheless. either by use of carefully placed hydrogen bond. However. both inside and outside the cavity. One issue that was not adequately resolved in this study was the selective binding of different amino acid derivatives. On the issue of selectivity. However. All guests appeared to be able to bind in Motif 3.

Nevertheless.CHAPTER 9. In addition. The further simulations move into areas inaccessible by experiment. interpreting the observed behaviour in such complex systems becomes ever more difficult. the greater the care required to ensure that sound methods are being applied. With ever increasing computer power. and a reliable means of testing simulation development. an important bridge between these large scale simulations and experiment. Many simulations are now being performed on large protein-ligand systems that are pushing computers to their limits. will remain the simulation of host-guest systems. For the next few years at least. CONCLUSION 221 advance. the study has shed light on how unstable structures may be stabilised. the scope of systems accessible to computer simulations is rapidly expanding. This issue was not examined at all in this study and there are no obvious indications that the macrobicycle 12 system is able to stabilise the transition state for such a process. This knowledge may prove useful in moving closer to this goal. .

434 0.199 -0.109 -0.040 0.282 -0.067 -0.134 -0.092 0.277 -0.071 -0.060 -0.087 0.422 0.436 -0.126 -0.059 q REPD 6-31+ G* -0.240 0.1: Listing of OPLS.648 0.705 0.102 -0.075 -0.230 0.378 0.131 -0.145 -0.180 -0.255 0.171 -0.059 -0.105 0.121 -0.403 0.072 -0.376 0.061 q REPD 6-31G* -0.076 0. REPD/6-31G*.060 0.200 0.683 0.060 0.043 0.687 0.834 0.806 0.259 -0.027 0.060 0.352 0.019 0.178 -0.Appendix A Charges Table A.067 -0.708 0.060 q EPD 6-31G* -0.435 0.702 0.493 -0.215 0.344 0.358 -0.145 -0.422 -0.440 -0.328 0.300 0.25 EPD/6-31G*.180 -0.342 0.044 q EPD G* 6-31+ -0. Molecule methane (CH4 ) ethene (C2 H2 ) water (H2 O) methanol (CH3 OH) Atom C H C H O H C O HO HC CH2 CH3 OH HO H2 C H3 C C S HS HC CH2 CH3 SH HS H2 C H3 C q OPLS -0.255 0.431 0.060 -0.203 0.386 0.207 0.030 -0.206 0.559 -0.035 0.664 0.131 -0.364 0.085 0.431 -0.380 0.356 0. EPD/6-31+ G* and REPD/6-31+ G* Charges for All 29 Molecules Used in the Parameterization of a.060 -0.683 0.765 0.066 -0.042 -0.040 0.201 0.041 0.418 0.042 -0.040 0.050 -0.264 0.452 -0.409 0.189 -0.416 -0.010 0.121 -0.007 0.040 0.039 -0.417 0.844 0.211 0.862 0.484 0.404 -0.484 0.253 -0.210 0.357 0.238 -0.000 ethanol (CH3 CH2 OH) methanethiol (CH3 SH) ethanethiol (CH3 CH2 SH) 222 .767 0.115 -0.173 -0.164 -0.225 -0.818 0.010 0.418 0.418 0.393 0.435 0.111 0.

002 0.643 0.397 -0.127 0.026 0.070 0.350 0.066 -1.060 0.104 0.496 -0.368 0.436 q REPD 6-31+ G* 0.676 -0.232 -0.900 0.009 0.398 0.362 -0.140 1.060 -0.004 -0.134 0.498 -0.080 0.526 -1.132 0.870 -0.060 0.471 0.180 0.520 -0.599 -0.APPENDIX A.655 -0.479 0.060 0.669 -0.160 0.520 -0.674 0.050 0.060 0.365 0.035 0.165 0.628 -0.450 0.053 0.069 -0.155 -0.081 0.149 0.196 0.180 0.416 q EPD 6-31G* 0.128 0.379 0.585 -0.012 0.342 0.459 0.106 0.071 0.430 -0.590 -0.886 -0.468 0.597 -1.001 0.500 -0.358 -0.802 -0.461 -1.521 0.771 -0.180 0.092 0.470 -0.330 -0.096 0.927 -0.053 0.020 -0. CHARGES Molecule formaldehyde (CH2 O) acetaldehyde (CH3 CHO) Atom CO O H C O CH3 HC H3 C C O CH3 H C OH OC CH3 HO H3 C C OC2 OC CH3 C CH3 O H3 CC H3 CO N H C N HN HC CH3 CH2 N H3 C H2 C H2 N C O N HC HN (cis) HN (trans) q OPLS q REPD 6-31G* 0.679 -0.186 0.041 0.060 0.433 -0.640 -0.638 -1.068 0.060 0.026 0.720 0.741 -0.495 0.450 -0.450 0.475 -0.007 -0.032 -0.500 -0.119 0.063 -0.629 -0.450 -0.586 -0.057 -0.085 -1.975 0.442 0.312 q REPD 6-31+ G* 0.758 -1.025 -0.830 -0.003 0.030 -1.523 -0.510 -0.351 0.000 0.498 -0.000 0.006 -0.569 -0.069 0.537 -0.706 -0.517 0.454 0.298 0.133 0.416 -0.231 -0.446 0.378 0.054 0.405 -0.530 -0.379 -0.380 -0.356 0.000 -1.445 0.358 0.057 -0.503 -0.080 -0.033 0.060 -0.536 -0.350 0.138 1.180 0.067 0.192 0.029 0.470 -0.135 0.380 0.583 -0.479 -0.057 -0.581 -0.154 -0.010 0.461 -0.900 0.121 0.380 0.010 0.450 -0.110 0.360 0.973 0.352 0.440 -0.532 -0.652 -1.622 -0.244 -0.350 0.708 -0.060 0.040 0.565 -0.003 -0.039 0.951 0.113 0.024 0.301 223 acetone ((CH3 )2 CO) acetic acid (CH3 COOH) methyl acetate (CH3 COOCH3 ) ammonia (NH3 ) methylamine (CH3 NH2 ) ethylamine (CH3 CH2 NH2 ) formamide (HCONH2 ) .957 0.333 -0.280 0.410 -0.737 0.641 -0.450 0.050 0.444 -0.462 0.390 -0.628 -0.711 -0.080 0.760 0.108 0.000 0.412 -0.180 0.704 -1.571 -0.101 0.454 0.

535 -0.APPENDIX A.559 0.490 -0.350 0.131 0.013 -0.331 0.256 0.098 -0.388 0.666 -0.500 -0.432 0.858 -0.236 0.531 -0.585 -0.551 -0.598 -0.133 0.079 0.324 0.272 0.609 -0.072 -0.038 0.365 0.063 -0.296 0.180 -0.207 -0.125 -0.115 -0.332 0.713 -0.396 0.140 -0.231 0.085 0.378 0.060 0.024 -0.301 0.294 0.380 0.192 -0.115 0.002 -0.150 0.306 -0.669 -0.582 -0.042 0.500 -0.188 0.558 0.063 -0.508 0.003 -0.145 -0.180 -0.401 -0.456 -0.060 0.100 -0.317 0.030 -0.207 -0.200 0.005 0.180 -0.057 -0.331 0.060 0.180 -0.332 -0.539 -0.372 0.115 0.465 0.030 0.486 -0.300 0.485 0.150 -0.006 -0.185 -0.055 -0.649 -0.030 0.107 0.096 0.400 0.065 0.060 -0.065 -0.099 -0.060 -0.315 0.249 0.060 0.044 -0.237 q EPD 6-31G* 0.146 0.036 -0.133 0.034 -0.364 0.180 -0.528 -0.028 -0.246 -0.014 -0.058 -0.174 0.586 -0.387 -0.435 0.076 -0.077 -0.038 -0.542 0.500 0.180 0.804 -0.205 -0.380 0.406 0.644 -0.208 0.102 -0.080 0.149 0.132 0.057 -0.134 0.104 -0.435 0.028 -0.555 -0.039 0.206 -0.115 0.129 0.049 -0.115 -0.110 -0.112 -0.012 -0.342 0.257 0.023 0.103 q REPD 6-31+ G* 1.203 -0. CHARGES Molecule acetamide (CH3 CONH2 ) Atom C O N CH3 H2 N (cis) H2 N (trans) H3 C q OPLS q REPD 6-31G* 0.146 0.745 -0.570 -0.086 0.242 q REPD 6-31+ G* 0.462 -0.558 -0.374 0.239 0.067 0.037 -0.080 -0.700 -1.106 -0.020 0.060 -0.121 0.018 -0.047 -0.543 -0.132 0.011 0.250 -0.103 0.101 -0.500 -0.180 0.222 0.033 0.500 0.241 0.447 0.107 224 trans-N-methyl C acetamide O (CH3 CONHCH3 ) CH3 C N CH3 N H3 CC HN H3 CN dimethyl ether ((CH3 )2 O) diethyl ether ((CH3 CH2 )2 O) C O H CH3 CH2 O H3 C H2 C C S H CH3 CH2 S H3 C H2 C CH2 CH3 Cl H2 C H3 C C H C1 O C2 C3 C4 dimethyl sulfide ((CH3 )2 S) diethyl sulfide ((CH3 CH2 )2 S) chloroethane (CH3 CH2 Cl) benzene (C6 H6 ) phenol (C6 H5 OH) .339 0.149 0.150 0.001 0.060 -0.464 -0.760 -1.027 0.140 0.029 0.233 0.104 -0.030 0.400 0.022 0.159 0.080 -0.019 -0.060 0.354 0.061 -0.052 -0.084 0.978 -0.358 0.051 -0.600 -0.026 -0.

140 0.664 0.117 0.070 0.440 -0.433 0.203 -0.148 0.292 0.167 0.162 -0.142 0.180 -0.115 0.124 0.115 -0.173 -0.105 0.348 0.551 0.205 -0.125 0.463 0.155 -0.137 0.117 -0.115 -0.138 -0.101 0.136 -0.134 0.458 0.182 0.136 -0.122 0.047 0.427 0.115 -0.115 0.143 0.129 0.395 -0.171 -0.564 -0.474 q EPD 6-31G* 0.349 0.115 0.435 225 aniline (C6 H5 NH2 ) chlorobenzene (C6 H5 Cl) benzonitrile (C6 H5 CN) benzoic acid (C6 H5 COOH) .123 -0.128 -0.159 0.107 0.142 -0.115 -0.130 0.527 -0.186 0.532 0.081 0.118 0.016 -0.430 -0.115 0.142 0.766 -0.144 0.115 0.146 0.157 -0.289 -0.150 -0.121 0.142 0.014 -0.115 0.073 0.192 -0.APPENDIX A.126 0.152 0.073 0.088 0.137 0.192 -0.129 0.192 0.155 -0.115 -0.143 0.115 0.166 -0.070 0.287 -0.051 -0.122 -0.130 -0.900 -0.100 -0.133 0.067 0.166 0.115 -0.023 -0.450 0. CHARGES Molecule Atom H2 H3 H4 HO C1 N C2 C3 C4 H2 H3 H4 H2 N C1 Cl C2 C3 C4 H2 H3 H4 C1 CN C2 C3 C4 H2 H3 H4 N C1 CO2 C2 C3 C4 H2 H3 H4 OC OH HO q OPLS q REPD 6-31G* 0.119 0.157 -0.077 0.115 0.134 -0.263 -0.119 0.180 0.129 0.524 -0.077 0.146 -0.190 -0.128 -0.131 0.489 q REPD 6-31+ G* 0.302 -0.188 -0.457 0.115 -0.449 0.072 0.033 -0.115 -0.139 -0.710 -0.181 -0.115 0.149 0.446 0.008 -0.141 0.115 0.312 0.587 -0.115 -0.115 -0.173 0.439 -0.530 0.155 0.776 -0.115 0.037 -0.115 0.140 0.115 0.115 0.180 -0.435 0.127 0.470 0.565 0.119 -0.115 0.115 0.156 -0.016 -0.130 -0.603 -0.115 0.528 -0.133 -0.122 0.155 0.297 -0.115 0.138 0.128 0.151 0.157 0.121 0.635 -0.142 0.122 0.156 0.035 0.115 0.132 -0.149 0.044 -0.115 0.146 0.400 0.136 0.478 -0.103 0.684 0.111 0.484 0.213 -0.125 -0.049 -0.083 0.802 -1.137 -0.959 -1.528 -0.139 -0.697 -0.118 0.436 q REPD 6-31+ G* 0.

X. 1116. A. U. G. Wampler. J. 111. Cell. Soc. J. J. L. 12. L. 5. Bradshaw and R. 245. Chemtracts-Org. Chem. L. 7. Fischer. (1997). Sarkar and J. Chem. 291. R. Zhang. 423. D. Chem.. B. (1991). (1996). Am. (1996). 251. (1997). H. 5568. A. M. W.. 461. Biol. McCammon. X. 253. (1996). 4. 4. 8. K. Gothel and M. Soc. M. Phys. L. 100. 3. L. A. G.. A. Chem. J. Essex. Chen. Chem. 214. (1993). A. Jorgensen.. 97. 118. 72. G. Am. M. Shulin and S. Am. Mol. Scherer. 13. Science. 1. Biochemistry. Bush and J. 91. E. 29. Asher. Mol. W. L. (1999).. J.. M. 14. Orozco. 283. S.. McCammon. 3313. Lamb and W. Chem. Kilburn. F. E. Marahiel. Deem. Curr. M. J. 32. Biol. (1990). Chem. 10. Tirado-Rives. Biophys. 14508. Izatt. (1991). S.. S. 9. Jorgensen and J. J. S. Struct. 8. X. G. Hilgenfeld. D. A. M. 6. Curr. 226 . Jorgensen. 12864. 120. J. J. L. J. Gilson. M. J. P. Li. (1998). E. Kollman.. 119.. Wu and M. 1047. J.Bibliography 1. (1999). Chem. Given. Schutkowski. 2. (1997). 11. Biol. Mortishire-Smith and M. J. 286. Soc. Phys. A. Rev. M. S. Opin. Rowley. L. P.. Pernia. A.. Jabs. Jorgensen. Mol. Stewart. Biol. Schreiber. 15. W. J. Life Sci. J. Chem.. Kramer. J. 6625. 16. 10220. Rev. (1998).. (1997). Tirado-Rives and W. Acc. W. 55. Opin. Weiss and R. Reimer and G. 449. (1999).

D. J. Comput. Lau. T. A. Schlenkrich. Theochem. Karplus. T.. Spellmeyer..BIBLIOGRAPHY 17. (1999). Jensen. K. Zhou and J. Allen and D. S. M. F. C. Longman. I. Am. Understanding Molecular Simulation. Gould. I. 3586. (1999). Merz. J. Principals and Applications. Tirado-Rives. Fischer. 25. (1995). Michnick. Maxwell and J. J. B . J. D. Caldwell and P. Fennen. Olafson. Cornell. 5179. J. F. Bayly.. A. R. Essex. Chem. M. B. Karplus. (1983). 227 19. Henchman and J. Nilsson and M. M. P. E. X. W. Wiorkiewiczkuczera. Computer Simulation of Liquids. W. 20. Fox. 20. R. L. A. P. Chem. Gao. Kuchnir. J. P. E. Joseph-McCarthy.. 24. L. Chem. J. . Scott. D. Chem. P. R. R. New Haven. Chem. S. K. van Gunsteren. Hunenberger.. W. 20. (1996).6 . Chem. J. H. E. CT. R. F. Ford and J. A. D. J. D. 102. 118. Phys. H. J. M. (1998). Stote. Guo. Cieplak. D. Mark. 187. S. Nguyen. D. E. Bellott. Ha. Bruccoleri. Kruger and W. J. (1996). 29. L. Tildesley. 21. Molecular Modelling. Academic Press. 483. 18. Essex. Brooks. Chem. A. Prodhom. I. W. M. Comput. 117. Kollman. 103. R. R. Straub. Billeter. (1999). Yin and M. Roux. J. Dunbrack. D. Bergsma. B. C. H. R. T. L. Leach. L. J. Kuczera. W. Bashford. Chem. 7. Phys. P. 28. Comput. 26. J. 23. K. Introduction to Computational Chemistry. D. Mattos. States. Smit. S. 591. 27. Wall. Allinger. (1987). R. 499. 11225. D. (1995). Am. S. M. M. D. (1994). Tironi. W. Huber. Reiher. Henchman and J. Jorgensen. Version 3. W. A. Chem. R. J. Frenkel and B. Torda. S. Wiley. 3596. (1999). J. Soc. 69. H. Essex. Ferguson. Evanseck. (1996). T. . D. I. Watanabe. Karplus. W. G. M. submitted for publication. D. 22. Swaminathan and M. D. Clarendon. 31.. J. Yale University. Leach. B. H. Jorgensen. C. 118. BOSS. Soc. Mackerell. Salt. Field. Henchman.. A. (1986). Smith. F. 30. Ngo. Comput. G. B. C. D. W. T. J. J. Comput. 4. L. W. N. R. R.

R. 6127. Ciccotti and H. (1977). 73. J. R. (1982). A. J. A. M. 46. Ann. H. J.. 300. L. Chem. 103. L. Rev. 23. Rosenbluth. Chem. Straatsma and J. G. H. P. 49. N. 671. Levy and E. Comput. CT. A. 431. J. (1995). Ryckaert. W. Biol. Phys. Greengard and V. Petersen. 44.. Boresch and M. D.. 50. Zwanzig.. C. A. 100. Dicapua. 43. L. (1985). Teller. 407. Phys.. Chem. Yale University. Warwicker and H. P. (1992). Mol. Kofke and P. McCammon. Richards. 33. Zacharias. Phys. Phys. 21. Chem.. 112. 325. 76. MCPRO. Rokhlin. C. 973. (1987). M. 103. Kirkwood. Phys. Watson. Still. 42. 47. 103. Jorgensen and C. G. pp. L. (1994). Tempczyk. M. 93. P.. Chem. Jorgensen. J. J. Phys. 35. 43. Chem. 40. (1997). G. Mol. Straatsma and J. J. J. (1990). Chem. Berendsen. Karplus. (1997). 38. Chem. 64. Mol. Teller and E. N. Phys. Rev Biophys. S. 39. 22. P. P. McCammon. Friedman. Ann. Am. J. 9025.. 327. C. 3. 51. (1921). . Rev. (1954). Chem. Ravimohan. 37. L. Phys.. J.. 83. Phys. A. T. Gallicchio. 3668. Biophys. 36. 157. A. Rev. T. 41. Chem. J. Phys.. J. Phys. 531. Ann. J. Rosenbluth. 3050. King and W. W. A. (1953). Ewald.BIBLIOGRAPHY 228 32. 2395. (1989). J. Metropolis. Beveridge and F. Reynolds. D. W. A. Phys. Hendrickson. W. M.. 45. (1935).. (1998). 48. 18. H. R. Phys. Phys. 92. 253. 34.. G.5 . J. T. 49. 1533–1543... Chem. Kollman. (1999). (29). M. Version 1.. Chem.. 251. C. W. Hawley and T. Ann. New Haven. Phys. Comput. Mol. 1087. P. (1993). Soc. C. 1420. Cummings. (1992).

J. M. 20. A. J. C. X. McCammon. Phys. Berendsen and J. Chem.BIBLIOGRAPHY 52. 184. Samuelsson. 7557.. F. 95. J. (1997). Straatsma and J. D. . Phys. 167. 85. Am.. A. Comput. 118. Y. F. Chem. R. 101. 61. W. J. 1069. Sharp. A. (1989). P. (1996). 56.. A. Tidor. 64. Phys. J. Mol. A. 1175.. J. Kolossvary. F. 1018. Biol. Davis and K. 120. W. Chem. Chem. A. McCammon. 55. Soc. (1994). P. (1999). Kolossvary. Tidor. Mark and W. (1991). L. Chem. J. E. 65. I. J. Jorgensen. 9123. Q. van Gunsteren. L. R. Chem. Smith and W. 401. Chem. J. C. Postma.. 577. Am. 15.. DeWitte and E. 97. Mark. 54. A. 42.. 100. McCammon. T. R. Biol. Soc. Mitchell and J. Pitera and P.. P. J. van Gunsteren. Bruccoleri. J. Eriksson. Soc. J. 62. Medina and J. 271. Comput. Phys. Am. Phys.. 67. (1997). P. 42. D. 105. Acta Biochimica Polonica. (1994). (1991). 9362. 105. Kong and C. L. (1997). T. Xu. Chem. (1984). L. 68. Y. Rev. C.. B. Acc.. 57. 72. 22. J. 281. Chen and A. 60. Phys. 73. Chem. M. E. J. Chem. (1996). I. 70. 268. Mol. J. (1997). J. E. Org. Comput. (1994). S. M. 62. Comp. B . Reddy and M. Phys. (1999). Tropsha. Chem. 10233. Brooks. H. M.. C. 525. Pearlman. B.. 12. J. (1999). 119. 6720. H. Chem. Med. 66. 20. Liu and W. 58. 71. 868. 69. Kollman. J. J. 229 53.. D. 59. (1998). 101. Kollman. Jarque and B. Aqvist. Chem. Still. A. Shakhnovich. Chem. 63... Chem. I. X. (1997). Chem. 11733. McDonald and W. E. Novotny. Senderowitz. Protein Eng. 9900. M. 7. Comput. A. J. (1995).. (1993). Erion. E. (1994). J. 749. Straatsma. (1986). 8. Tembe and J. van Gunsteren. H. 2414. A.. Chem. 385.. J. 240. Pitera and P. Chem. J. J. J.

T. (1987). Am. A. Carlson. T. Y. F. Chem. J. 6044. 116. Gerber and W. J. 76. Am. van Gunsteren. Q. Duffy and W. Am. M. A.. Comput. 89. 96. Am. S.BIBLIOGRAPHY 230 74. H. Soc. Brown.. (1999). Janssen and W. 381. H. C. Morgantini and P. J. 114. D. Chem. 86. Soc. (1996). J. Org. Mol. R.. L. Chem. Chem. N. (1994). M. 956. Smith. C. (1993). P. Briggs. . McDonald and W. 4474.. Sci. Kirchhoff. J. McDonald. van Gunsteren and F. Jorgensen. J. A. A. A.. Miyamoto and P. J. T. J. Soc. 80. K. F. Potter. A. N. Dutasta. Jorgensen. 88. T. D. Acad. Chem. Carlson and J. P. B . Collet. Langridge and P. Chem. M. M. J. A. E. 82. Kollman. G. 118. J. J. A. 905. M. R. J. (1999). 75. Wipff. Theodorou. Still. 84.. Singh. Proc. (1999). E. Lybrand. A. A. Kollman. A. Chem. McCammon. 85. 1240. I. P. Am. T. Am.. Soc. C. P. Chem. A. G. L. Guarnieri. T. McCammon. Science. (1994). M. Collet and J. Soc. Phys. 83. A. 103. J. (1994). 3818. Economou and D. A. (1986). 6293. (1998). Soc. Natl. Vanschaik. P. (1999). 116. 79. A. J. Chem. (1997).. 78. P. E. W. A. Chem. 121. Mark. F. Chem. S. J. Mark. U. Kollman. P. J. R. 5787. Cho and P. 119. Armstrong. C. Z. 64. (1999). 20. Bash. McCammon and A. S. Costantecrassous. L. Am. 235. 116. D. H. E. J.. McCammon and G. Lett. P. Chem. 77. 3593. Phys. 81. Kirchhoff. Diederich. M. 833. Soc. 6337. van Helden.. USA. van Gunsteren. M. 5104. Comput. Marrone. A. 83. (1992). 574. Jorgensen. J. Chem. B. Soc. 222. L. Am. Duffy and W. Burger. F. Phys. 120. Beutler. Kollman. 529.. Eriksson. Boulougouris. C. 87. P. (1994). L.. Nguyen. M. 14. 3668. J. Denti. E.. J. F. Orozco and W.

Alagona. D. J. Chem. 10. R. C. A. 1944. (1953). P. Ortiz. (1942). Severance and W. G. Chandrasekhar. Chem. K. Essex. Chambers.. 563. W. Replogle. Chem. Jorgensen. Madura. G. L. 13. Singh. Impey and M. (1992). Phys. Gomperts. M. K. V. Briggs and M. Fox. 106. H.. 96. D. Weiner. W. Theochem. J. Klein. P. Jorgensen. Schlegel. 6. . Chem. (1995). B. G. (1984). Helms and R. V. 629. J. A. Chen. J. A. Nanayakkara. J. Wade. Ben-Naim and Y. Duffy. D. 79.. 117. (1989). G. B. W. (1997). L. C. R. W. Peng. D. V. J. . Martin. J. C. 292. 425. Comput. Cheeseman. M. D. Ayala. J. Chem. J. Baker. 449. Richards. S. Chem. Soc. 102. D. J. Kumler and G. Soc. Truhlar. Cabani. W. Mookerjee. Design. P. E. Commun. Jabalameli... J. 91. L. P. Stefanov. C. 100.BIBLIOGRAPHY 231 90. P. Kollman. W. U. Foresman. R. Phys. Fohlen. Truhlar. W. Reynolds and W. 926. Phys. J. 105. 93. W. Mollica and L. S. 40. S. Case.. Lepori. Binkley. (1983). A. Keith. C. Naturwissenschaften. E. J. (1997). 61. 99. Marcus. J. Isr. (1998). M. 92. Robb. New Haven. P. G.4 edition. S. J. A. Soc. M. Yale University. 765. Frisch. A. Y. G. E. (1995). 1683. 103. Morgantini and P. 33. 18. M. M. C. Cramer and D. BOSS. A. D. 95. 104. Kollman.. Pittsburgh PA. A. Y. W. Chem. J. GAUSSIAN 94 . J. Pople.. A.. P. J. J. Comput. (1993). Gianni. (1984). Raghavachari. J. J. L. 6057. G. B. Gill. and J. L. 94. Challacombe. (1990). R. M. L. Chem. (1981). Zakrzewski.. V. Chem. Am. Sol. M. Head-Gordon. Soc. Stewart. 64. Hine and P. Weiner. J. C.8 . 2016. J. Version 3. J. J. 323. Contreras. B. M. CT. J. Jorgensen. C. F. C. Wong. 81. G. Gaussian. J. 98. Profeta and P.-Aided Mol. J. Chem. S. L. Ghio. 1152–1154. (1975). R. M. Jorgensen. Defrees. J. W. A. Giesen. T. Sullivan. 97. L. Cramer and D. A. Andres. Y.. Am. H. A. Chem. Bertil Sandell. Gonzalez. Montgomery... Inc. pp. Johnson. J. Trucks. Archibong. 101. J. C. Chem. revision d. Al-Laham. B. 94. 330. Org. J. J. L. K. Am. Cioslowski.

115. Chem.. (1991). 2. 122. Sokalski. 64. (1990). 970. B. 980. (1994).. R. J. Essex and W. . Chirlian and M. 78. J.. 85. S.. Goddard.. A. 107. M. A. 17.. E. Bayly and P. A. L. 120. Williams. Chem. 3036. Chem. Chem. 212. 11. (1996). Rein. (1993). F. Theodorou. 9620. J. 9075. 80. Chem. (1997). J. 304. Chem. E. Cieplak. Chem. E. Phys. Luque.. Mol. King and G. Chem. Comput. J. Quant. 3358. P. T. 665. Chem. W. 52. 622. Chem. 125. Orozco and F. Chem. Chem. T.. 5. 14. G... W. 119.BIBLIOGRAPHY 232 106. 123. 894. 17. 121. Coppens. J. Phys. Stouch and D. Wallqvist and B. M. D. (1993).. Chem. Richards. 29. Rappe and W. (1981). J. 592. Berne. 97. 858. Boone and D. C. Comput.. A. Stouch and D. Chem. J. Phys. 116. Williams. Lett. 124. Kollman. (1976). 13841. J. (1976). Kollman. (1987). 8. (1984). J. C. W. D. 361. Dodd. Comput. Evleth. Comput. (1978).. Singh and P. Momany. 114. J. Am. 109. J. D. Angyan and C. Reynolds. 111. 110.. L. G. L. 15. 115. I. A. Carey. M. J. D. R. 961. (1993). A. Chem.. (1993). 95. R.. R. A. 108. M. B. Su and P. Chipot. 113. N. (1992).. J. R. Francl. T. Soliva. Chem. Francl. 117. J. 129. B. 126. Chirlian and D. M. Biopolymers. K. A. Chem. 118. Comput. Ornstein and R. Mast. (1994). C. (1993). J. Naturforsch. Keller. 2521. A. W. M. (1990). R. Comput. Cox and D. D. E. Williams. Comput. Gange. Phys. C. H. M. J. 127. W. Phys. E. 14. Williams. Soc. (1993).. Colonna and E. E. 114. 112. J. 13. U. Comput. Cornell. Person. F. Breneman and K. Wiberg. 367. J. Williams. Phys.. 719. J. Phys. Am. Z. T. E. Int.. (1992). Comput. 82. Chem. 18. C. L. 1367. Comput. Chem. Newton and W. Soc. (1993). J. J. 48a. J.

A. A. Am.. 300... Comput. Rivail and H.. A. K. Chem. Miller. J.. Comput. Weiner and P. J. M. J. J. Alderton. A. Boatz. 111. 143. J. R.. (1991). 131. Chipot. 14. M. 1504. (1984). 1. 91. R. C. 16. 133. Swaminathan. G. Kuyper. 145. (1992). Comput. Cioslowski. 913.. 17. 56. Crystallogr. 6661. . San Francisco. K. (1989). (1985). W. 146. (1991). W. Nguyen. 83. AMBER 4. 10269. H. B. J. Angyan. L. Phys. 142. A. 129. M. 1347. Chem. Chim. 1047. L.. (1992). Phys. G. N. (1977). (1991). 135. Weinstock and F. Scheraga. 735. Comput. P. Kollman. B.. A. Spackman. Chem. Dupuis and J. Reed. (1983). Maigret. Elbert. Teppen. Merz. J. 1234. Caldwell. J. Ashton. Theor. F. Craven and R. J. Hirschfeld. B40. Seibel. Soc. Appl. F. (1996). Cao. M. J. M. S. Montgomery. 548. D. 98.. (1993). University of California. Rev. A. (1991). 144. Schmidt. L. Crystallog. S. Chem. P. 134. (1993). Evleth and J. Matsunaga. Mol. J.. 12. J. M. Mulliken. T. Mcmullan. Phys. Jensen. 136. Rablen. A. J. CA. J. Phys.. Hunter. K. Department of Pharmaceutical Chemistry. J. Phys. Merz and P.. Chem. Acta. Chem. B. Cieplak. S. Phys. 138. Pearlman. Chem. E. M. 96. 23. Comput. D.. R. Comput. 132. J. Phys. M. Chem. F.. K. Kollman. Windus. 140. L. G. N. Kollman. S. 13. F. 137. Schafer. T.1 (UCSF). 233 130. 95. 14. Chem. Chem. S. Su. J. Bader. J. 13. J. W. A. A. B. E. 893. 11353. Connolly. B. K. Gordon. G. Cornell and P. (1992). Colonna. 97. Chem. Chem. (1985). Koseki. C. 1833. (1993). 139. 44. 8333. J. M. W. Chem. 141. D. A. J. J. I. Weinhold. S. A. J. Bayly. D.. R. M.BIBLIOGRAPHY 128. Stone and M. 10276. Wiberg and P.. K. R. 749. Baldridge. 129. Acta. Chem. L. Pranata and L. (1955). K. Ferenczy. (1994).

J. Y. G. 158. Wall. T. J. Phys. Mohamadi. 162. 114. Org. L. M. 234 148. Crichton. R. J. W. B. 98. 10667. (1992). Livingstone. Thesis. 160. D. W. 151. J. Jorgensen. Chapman and Hall. 9512. L. R. 6798. A. (1996). Chem. Salt. (1990). W. Chem. 237. (1994). L. Rowan. J. McDonald. Stone and R.. (1996). Duffy.01 . G. 150. Soc.. (1983). Caufield. Spellmeyer. Buckner. 129. Weinheim. Brooks. M. H. New York. Ford and J. 52. Continuum regression: A new algorithm for the prediction of biological activity. V1. Chem. Carlson and W.09. K. Phys. Phys.. Carlson and W. (1995). Oxford. I. Chang. 46. Chem. Ford.0. M. J. Livingstone. 10. Lipton. Severance. J. Austin. J. Am. T. 157. A. Aqvist and T. 154. Jorgensen. Chem. J. Liskamp. MacroModel Version 5. G. Livingstone. J. Kaminski. Jorgensen. Salt. C. Blake and J. Tibshirani. F. 282. M. Journal of the Royal Statistical Society Series BMethodological . Guida. Malpass. D. . J. Efron and R. University of Texas. M.. E. Leach. Department of Computer Sciences. (1990). Hansson. Chem. J. L. 112. A. Sci. (1997). Chem. Pearlman and P. Matsui and W. 11. Hendrickson and W. 13077. 152. Comput.. Ford. 440. 159. (1999). 563. Advanced computer assisted techniques in drug discovery. D. A. D. VCH Publishers. 3. Pestic. D. Wynn and D. F. An Introduction to the Bootstrap. Kollman.. W. 193. Salt and D. D. Chem. Oxford University Press. Jorgensen and D. L. T. X. Essex.. UK. (1995). Ford. Phys. Sun. Ph. L.. J. 155. (1989). Still. Am. Jorgensen..BIBLIOGRAPHY 147. D. 4768. 149. C. R. M. H. C. 156. (1990).D. D.. PARAGON Drug Design Software. In Methods and Principles in Medicinal Chemistry. J. G. 161. Chem. in press. Phys. N. N. Salt and M. 99. Data Analysis for Chemists. (1990). Malpass. 153. E. A. W. Med. University of Portsmouth.. 100. Soc. (1995). Richards.

2199. Chem. 93. Valleau. 97. 7838. D. (1991). Macromolecules. Chem. Bader. Torda and W. Chem. van Gunsteren. Chem.. Phys. 69. Phys. J. J. 36. J. Mol. 166. 187. Baumgartner and K. J. (1991). J. 71. Binder. 3808. J. 99. (1992). L. Chem. Straub. 100. (1977). D. 181. J. E. 173. Verdier and W. H. Frantz. M. Chem. F. Lett. 164. (1996). Chem. 182. van Gunsteren and H.. H. (1997).. Z. F. Berendsen. Rossky. D. Deem and J. P.. L. F. H. G. Phys. 31. Phys.. (1993). 9485. Y. W.. Chem. Liu. 165. 1245. Chem. 175. 9185. Lett. Lett. (1996). E. Phys. Computing Science and Statistics. Pangali. 101. Chem. Chem. 87. (1978). D. A. Phys. P. N. Elber and W. 71. 167. A.. Nowak.. W.. H. A. (1998). Phys. Phys. G. Phys. 4628. 176. M. Soc. 211. J. A. E. 55. Berne. Hirono. Torrie and J. 171. 413. Rev. B. 9117. J. 94. 4416. R. 168. J. Phys. Geyer. Am. 116. Chem. Berne. 2769. (1979).. 1417. 2541. Moriguchi and S.BIBLIOGRAPHY 235 163. 107. J. 6310. J. (1978). Doll and H. .. P. P. Freeman and J. V. 170. J. American Statistical Association. 178. J. New York. van Gunsteren. C. Rao and B. L. Mavrantzas and D. Verkhivker.. 174. (1962). P. Liu and B. H. 6071. Jorgensen. Phys. Rev.. (1993).. 23. Lee. J. (1990). Mark and W. F.. (1994). (1997). Phys. 169. Andricioaei and J. Mark. 9. J. S. I. Zhou and B. Tsujishita. 107. Friedman. 97. Phys. Comp. Theodorou. 227. M. Chem. Berne. Phys. 172. 101. J. Detirado and D. Doll. J. (1997). (1992). Stockmayer. J. (1994). Severance. Phys. T. C. 183. Phys. A.. Chem. I. A. 177. Berg and T. 179.. 68. W. I. (1993). Chem. van Gunsteren. H. C.. E. R. Phys. Neuhaus. C. T. L. Huber. Phys. M. G. J. J. Beutler and W. 5926. 180.

P. pp. H.. 199.. 96. Wall and F. 72. J. 63. 191. Comput. Macromolecules. Head. J... D. Sci. R. VCH Publishers. Given and M. Phys. Madura. 197. Head. Giesen. Wesson and D. Friesner. (1999). A. Gilson. S. Hawkins. 195. (1992). Phys. (1997). B .. D. 188. Pendleton and D. S.. Novotny and M. Tomasi. Chem. Mol. A. 1190. (1981). G. Chem. N. J.6 . Biophys. (1997). Phys. K. Phys. (1995). 4077. Chem. A. R. Scrocco and J. A. J. J. V. . Polym. 195. Ponder. C. Physics Letters B . D. E. 55. Chem. 63. Wade. (1998). Wu. Cramer and D. St. Biophys. Depablo. 229–267. Version 3. M. Theodorou. 196. J. Senderowitz and W. T. Cramer and D. C.. C. A. (1995). 192. J. J. D. A. Missouri. Eisenberg. Chandler and B. 283. 1. J. J. TINKER. 246. Washington University. S. C. Truhlar. 198. Davis. Lett. Pant and D. S. Smit. G. WP437. 193. B. Chem. Phys. K. Given and M. 189. Duane. Still. 236 185. A. (1997). J. M. Chem. 200. C. J. Truhlar. J. 19... Phys. Protein Sci. Depablo. K. Edinger. 2061. 268. G. K. 227. B. (1998). Chambers. 187.. Phys. J. C. Laso and U. Inc. Chem. Kennedy.. C. 28. D.. Suter. 102. McCammon. M. 178. 116. R. J. E. T. Gilson. J. Escobedo and J. Bruccoleri. 4592. J. Shenkin and R. B . J. E. (1994). Cortis. 201. 183. 7224. 18. W. Roweth. D. Chem. Davis. Richmond. 117. W. Gilson. E. 194.BIBLIOGRAPHY 184. Luty and J. Miertus. (1997). Leontidis. (1987).. 186. (1975). (1984). 190. 101. Adv. Louis. 101. F. J. P. Biol. (1995). L. 122. Chem. 78. S. Mandel. 2636. (1994). Comput. 216. (1992). E. R. M. 1736. M. Luo. J. Chem. Reviews in Computational Chemistry V . F. S. J. J.

C. 207. 111. D. Mol.. B . G. N. L. (1996). 206. J. Kollman. H. J. 212. 90. Kofke.. Sprous and D. N. . Phys. D. (1996). J. J. Bental and B. L. Gilson. C. 211. Rizzo and W. J. 2460.. Honig. 210. Am. 208. Chem. J. 205. (1999). J. 288. J. Phys. J. G. A. Beveridge. Biol. (1977).. B. 100. Chem. Comput. Chem. 203. 4827. Gao. Bennett. H. 87. A. Cramer and D. M.. K. K. Lett. Jorgensen. 256. Li. Lu and D. 5304. (1989). L. R. Am.. Chem. Phys. D. 121. Truhlar. Soc. J. 209. Chem. (1983). Phys. (1988). 22. 2744. Chem. Pearlman and P. 9.BIBLIOGRAPHY 237 202... (1998). Sharp and B. 4414. 102. Hawkins. Chem. van Gunsteren.. A. (1998).. D. L. (1988). Phys. 939. Phys. W. Fraternali and W. Jorgensen and J. 4212. A. Sitkoff. Comput. 293. Phys. Soc. 110. F. Jorgensen. W. J. C. 327. J. 204. Jayaram. F. 245. Chem. Honig. B. D. (1999). Chem. 9571.

Sign up to vote on this title
UsefulNot useful