Professional Documents
Culture Documents
Rachel Manual
Rachel Manual
SYBYL®-X 2.1
Mid 2013
This material contains confidential and proprietary information of Certara, L.P. and third parties furnished under the
Tripos Software License Agreement. This material may be copied only as necessary for a Licensee’s internal use
consistent with the Agreement. The allowed use includes printing of hardcopy versions hereof as minimally necessary
for Licensee’s internal use. Neither Certara, L.P., nor any person acting on its behalf, makes any warranty or
representation, expressed or implied, with respect to the accuracy, completeness, or usefulness of the material
contained in this manual or in the corresponding electronic documentation, nor in the programs or data described
herein. Certara, L.P. assumes no responsibility nor liability with respect to the use of this manual, any materials
contained herein, or programs described herein, or for any damages resulting from the use of any of the above. Except
for printing of hardcopy versions as stated, no part of this manual may be reproduced in any form or by any means
without permission in writing from Tripos (DE), Inc., 1699 South Hanley Road, Suite 200, St. Louis, Missouri 63144-
2917, USA (314-647-1099).
Selected software programs for methodologies contained or documented herein are covered by one or more of the
following patents: AllChem: US 7,860,657; Comparative Molecular Field Analysis (CoMFA): US 5,025,388; US
5,307,287; US 5,751,605; AT E150883; BE 0592421; CH 0592421; DE 691 25 300 T2; FR 0592421; GB 0592421;
IT 0592421; NL 0592421; SE 0592421. HQSAR: US 6,208,942. Embedded NLM: US 6,675,103. Topomers: US
6,185,506; US 6,240,374; US 7,184,893; US 7,212,951. TopCoMFA: US 7,329,222. DBTop: US 7,330,793. OptiSim:
US 6,535,819. Surflex software programs for chemical analysis by morphological similarity: US 6,470,305 B1.
SYBYL, UNITY, CoMFA, CombiFlexX, Concord, DiverseSolutions, GALAHAD, LeapFrog, OptDesign, StereoPlex,
and Alchemy are registered trademarks of Certara, L.P.
AUSPYX, Benchware, CScore, DISCOtech, Distill, GASP, HQSAR, Legion, MOLCAD, Molecular Spreadsheet,
Muse, OptiDock, OptiSim, Pantheon, ProTable, ProtoPlex, Selector, SiteID, Topomer CoMFA, Topomer Search,
Tuplets, and Tripos Bookshelf are trademarks of Certara, L.P.
RACHEL is a trademark of Drug Design Methodologies.
Surflex, Surflex-Dock, and Surflex-Sim are trademarks of BioPharmics LLC.
“FairCom” and “c-tree Plus” are trademarks of FairCom Corporation and are registered in the United States and other
countries.
All other trademarks are the sole property of their respective owners.
RACHEL Table of Contents
1. Introduction to RACHEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1 What is New with RACHEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 License Requirements for RACHEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2. RACHEL Tutorials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 Create a RACHEL Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 RACHEL Scoring Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Run a RACHEL Combinatorial Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Using Chemical Templates and Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5 Scaffold Replacement Using CHARLIE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.6 Bridge Generation Using CHARLIE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.7 Create a RACHEL Component Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
RACHEL and CHARLIE were developed by Chris M.W. Ho, M.D., Ph.D. of
Drug Design Methodologies, LLC.
With this version of RACHEL the generated ligands do not penetrate the space
occupied by the protein. [SYBYL-X 1.2]
Module-Based Licensing
SYBYL continues to run with a license file issued before the SYBYL-X release.
In that context:
• RACHEL requires a “RACHEL” license.
• Concord requires a "ConcordStandalone" license.
1. It is always a good idea to clear the screen and reset the display before starting.
2. Make a local copy of the RACHEL demo files and give yourself writing permis-
sions for all the copied files.
! Type cmd cp -r $TA_DEMO/rachel . (Include the space and the
period.)
! Type cmd chmod -R a+w rachel
Conventions:
• The rachel directory mentioned in the tutorials refer to the location
where the RACHEL demonstration files have been copied.
• The instructions in the tutorials assume that rachel is a sub-directory of
your current location, which you can set via Options > Set > Default
Directory.
• Differences in the rounding of floating point numbers on different
platforms will produce slightly different results. All numbers reported in
this tutorial were captured on Windows.
! If you have not yet copied the RACHEL demo files, see Prerequisite
to all RACHEL and CHARLIE Tutorials on page 7.
! Press New.
The project name is listed on the Project line in the RACHEL - Setup New
Project dialog.
Upon completion, the RACHEL - Setup New Project dialog will resemble the
following:
8. Designate the anchor bond in the ligand. Note: The order in which the atoms are
selected is important.
The following dialog appears, prompting you to select the first atom in the
anchor bond.
In this tutorial, you will determine what other chemical components might bind
in place of the arginine sidechain of the tripeptide ligand.
! In the SYBYL window, rotate and scale the molecules until you can
clearly see atoms N19 and 20 of the ligand.
! Click ligand atom N19 or type 19 in the dialog prompting you for the
first anchor atom.
A sphere of green dots acknowledges this selection, then the next selection after
you make it.
! Click ligand atom 20 or type 20 in the dialog prompting you for the
second anchor atom.
You have designated the bond from N19 to C20 as the anchor bond of the
optimization site. The amide bond and terminal methyl groups are colored green
to indicate that this region will be replaced by the combinatorial addition of
chemical components.
Usage Note: If you accidentally selected the wrong bond, press Cancel in the
next dialog (atom selection for the target area) and restart the anchor bond
selection.
Thus, you must specify a target atom that will be used to:
• direct the growth of the derivative structure;
• focus the conformational search to increase search efficiency;
• designate the approximate length to which derivative growth will occur.
9. To make the selection of the target atom in the receptor easier, label the residues
by substructure name.
The selected receptor atom is labeled SITE_1, designating it as the target for
ligand growth.
Notes:
• You can choose a receptor or a ligand atom to serve as the target atom.
Often, a receptor atom is either too far away or located in the wrong
position. In this tutorial, you could have designated one of the terminal
atoms of the arginine sidechain (in green) in the ligand to serve as the
target.
• You can also specify up to five ligand site to be optimized simulta-
neously. However, the search engine is limited to 10 rotatable bonds
spread out over the total number of sites defined. Thus, if two sites do
not influence one another (the structures generated for one site do not
enter into contact with those generated for the other), it is better to
conduct two separate searches (each with one site defined) as the search
engine will be able to conduct a more detailed conformational search.
! Press End in the dialog prompting you to select a site 2 anchor atom.
The project directory contains all the files necessary to perform the virtual
combinatorial chemistry experiment.
• key.mol2 and lock.mol2 = the ligand and receptor files were copied to
the project directory.
• Rachel_setup = the information that is displayed in the dialog plus the
active site definition
• Rachel_builddef = Stores RACHEL chemical descriptors
• Rachel_scoredef = Stores RACHEL scoring function
• Rachel_searchdef = Stores conformational search engine parameters
Note: Do not move the project directory once you have created it. If this is
desired, you must erase this directory and the files within it and regenerate the
project using the RACHEL setup process detailed above.
! If you have not yet copied the RACHEL demo files, see Prerequisite
to all RACHEL and CHARLIE Tutorials on page 7.
Note: RACHEL uses SYBYL’s atom definitions to calculate van der Waals
complementarity and strain. For that reason, we recommend that you use the
Tripos force field if you want to minimize the structures when you use your
own data to train the RACHEL scoring function.
1a30_key1.mol2
1a30_lock.mol2
1aaq_key1.mol2
1aaq_lock.mol2
1dmp_key1.mol2
1dmp_lock.mol2
1hsg_key1.mol2
1hsg_lock.mol2
1hvi_key1.mol2
1hvi_lock.mol2
1hvr_key1.mol2
1hvr_lock.mol2
4hvp_key1.mol2
4hvp_lock.mol2
4phv_key1.mol2
4phv_lock.mol2
5hvp_key1.mol2
5hvp_lock.mol2
PLS_hiv.txt
The ligand (key) and receptor (lock) structures for each complex are stored in
separate files. Thus, 1aaq_key1.mol2 and 1aaq_lock.mol2 are a matched
pair.
The text file, PLS_hiv.txt contains the information used by RACHEL to identify
the molecules in the training set.
PLS_hiv.txt contains one line per matched pair, consisting of the ligand and
receptor file names followed by the binding affinity of the ligand for the
receptor (units = -log Ki).
! In the Specify Activity File dialog, select the PLS_hiv.txt file described
above and press OK to continue.
1) 1hvi_key1.mol2
1hvi_lock.mol2
2) 1dmp_key1.mol2
1dmp_lock.mol2
3) 1hvr_key1.mol2
1hvr_lock.mol2
...
At this point, you have a choice to select either a Scoring function or a Target
function. Judging from the data, the predictive power (Q2cum = 0.805) is
adequate using 1 principal component in the PLS derived model. If the data is
not sufficient to generate a scoring function with adequate predictive power
(Q2cum > 0.50), RACHEL will automatically default to a Target function.
The normalized regression coefficients for each scoring function descriptor are
listed.
NPO_NPO E_INTER STERIC STRAIN MW NUM_RBD
0.355 0.129 0.007 -0.458 0.246 -0.166
LOGP_EST NPO_FRAC
0.495 0.515
Study the regression coefficients listed above. They are applied to the various
chemical descriptors of ligand-receptor binding in order to calculate (estimate)
binding affinity.
• NPO_NPO = non-polar non-polar ligand receptor interactions.
• E_INTER = electrostatic interaction energy.
• STERIC = steric complementarity.
• STRAIN = steric strain.
• MW = molecular weight.
• NUM_RBD = number of rotatable bonds.
• LOGP_EST = estimate of Log P (ligand).
• NPO_FRAC = non-polar fraction of ligand atoms.
Artifacts in the analysis or in the crystal structure data itself can generate
scoring functions that may produce poor structures. In this case, notice that the
coefficient for E_INTER is positive. Using this scoring function, a ligand would
be penalized for complementary electrostatic interactions with the receptor.
This is because negative electrostatic interaction values mean favorable electro-
static interactions and tighter binding. In essence, ligands with more favorable
electrostatic interactions with the receptor will be scored lower.
The solution is to either flip the sign on the E_INTER (electrostatic interaction)
term by editing the scoring function or by implementing a target function.
The data presented is fairly typical when crystal data is generated for a
particular drug discovery project. In the graph generated for the training set,
notice the cluster of ligands that bind the receptor with high affinity (nM or
better). There are far fewer compounds that exhibit lower binding affinities (µM
or worse). This is often the case as time is rarely spent elucidating coordinates
for poorer binding structures. Thus, the available data may be skewed. This is
unfortunate because the poorer binding ligands may give better insight into
improving the activity of the lead compounds.
A molecular spreadsheet is created that contains one row for each compound in
the training set. The columns contain the following information:
• CMPD = names of the files containing the ligands in the training set (in
the rachel/TSET directory).
• PREDICTED = binding affinity value predicted by the model.
• OBSERVED = binding affinity value from the file PLS_hiv.txt.
• DIFFERENCE = OBSERVED - PREDICTED
The graph plots the PREDICTED versus OBSERVED values for the 9 rows. It
shows a good correlation between the predicted and observed binding affinities.
At first glance, the numbers indicate that this model is a good scoring function.
However, there may be shortcomings when scoring functions that were derived
from crystal structures are applied to real-world data.
RACHEL prompts you to select the ligand structure for which you wish to
predict the binding affinity. Often you will be iteratively modifying the
structure of a ligand and wish to determine whether changes are improving
receptor affinity or diminishing it. If you do not have the ligand in question in a
molecule area, simply press End in order to load it from a Mol2 file.
! Press End in the dialog prompting you for a ligand to load it from a
Mol2 file.
! Navigate to the rachel/PREDICT directory, select 1hiv_key1.mol2
and press OK.
RACHEL prompts for you to select the receptor structure involved in the
binding. If you the receptor in question is in a molecule area, simply click it and
press OK.
! Press End in the dialog prompting you for a receptor to load it from a
Mol2 file.
! Navigate to the rachel/PREDICT directory, select 1hiv_lock.mol2
and press OK.
1 /home/nicole/rachel/tutorial//.PLS/key.mol2
/home/nicole/rachel/tutorial//.PLS/lock.mol2
This information is also stored in the PREDICT file in the project directory.
The target function is similar to the scoring function in that the same chemical
descriptors are employed. However, instead of scoring the ligand-receptor inter-
action like a force field, the target function stores the ideal values for each
The advantage of target functions is that they are not prone to any statistical
artifacts. They allow RACHEL to rapidly generate structures that mirror the
chemical characteristics of the most active ligands. They can also be elucidated
from as few as one ligand-receptor complex. In addition, they are easier to
tweak in order to direct the genesis of new classes of compounds. Using target
functions, structures will be generated that exploit receptor binding character-
istics similar to the training set compounds. Thus, it is recommended that you
use only the best compounds in the training set.
You will generate a target function using only the ligand and receptor that you
used to set up this tutorial.
This is because you cannot perform a PLS analysis with just one ligand-receptor
complex. RACHEL automatically defaults to a target function when too few
complexes exist or if the predictive power (q2) of the resulting scoring function
is below 0.3.
A message in the console indicates that the target function has been saved in
rachel/tutorial/Rachel_scoredef.
The last successful training activity determines the values entered in the
RACHEL - Adjust Target Function dialog (dialog description on page 90):
Since you have just completed the generation of a target function, that is what is
presented here. The eight descriptors used by RACHEL to describe ligand-
receptor interactions are listed along with their target values and weighting
factors.
In your own work you may modify any of these parameters in order to stress
one chemical property over another. Initially, you should adjust the weighting
factors of the chemical descriptors in scoring a particular compound. For
example, to stress electrostatic interactions, simply set the corresponding scalar
to 2.0. To diminish the impact of a particular parameter, set the corresponding
scalar <1.0. The practical range of weighting factors is 0.0–5.0.
You will use this target function to perform combinatorial optimization of a test
compound in the next section of this tutorial (Run a RACHEL Combinatorial
Search on page 23).
! Press Cancel to close the RACHEL - Adjust Target Function dialog.
More often than not, you will be running searches on a project created during a
previous session.
The information about the RACHEL project is loaded at the top of the RACHEL
dialog.
You must define the storage location for all successfully generated structures
(hits) within the project directory. This allows you to store multiple runs, each
utilizing different parameter setups, in an organized fashion.
Note: Because all the hits are saved in individual .mol2 files, adding the
extension .mdb to the directory name makes it easier later to review the hits in
a Mol2 database or in a molecular spreadsheet.
! Press OK to start the search.
The top scoring ligand is highlighted in the RACHEL - Status dialog. Its corre-
sponding structure and highest scoring conformation are displayed within the
receptor in the SYBYL window.
In this tutorial you are using a target function. The maximum score for the
target function is 10.0 (see Understand the RACHEL Score Values on page 26).
Thus, as compounds that are generated improve iteratively, their scores will
approach this value.
5. You can now view the other structures that RACHEL has produced so far.
The Viewer acts as a remote control, enabling you to quickly cycle through the
hits while permitting full interaction with the structures.
! Pressing the left and right arrows allows you to move up and down
the list of hits while the double arrows move one “page” at a time.
! Toggle Original Ligand on to display the original ligand for
reference.
Study the hits that RACHEL has generated. You should see a variety of deriv-
ative structures to replace the original ligand region that was specified in the
setup. With the chemical diversity present in the NCI 3D database, many
different replacement derivatives are possible. Because you did not define any
chemical descriptors (the topic of the next section in this tutorial), any chemical
structure is allowed, and some of the hits may be chemically improbable.
6. While you are viewing the structures, RACHEL continues to search for new
hits. At any time, you may load new hits from the project run.
The scoring function scores reflect the values (and units) used in the training set
of compounds used for the RACHEL analysis. If you are using the default
scoring function, the scores indicate the ligand binding affinity (-log K). If the
default scoring function is generating high values (>11), it may not be suitable
for the unique characteristics of your receptor-binding site. Try training a
scoring function using your own data, or switch to a target function.
Target functions have a maximum score of 10.0. Any deviation in any of the
measured scoring function descriptors from the values derived from the test set
compounds subtracts from the maximum score. Thus, the resulting scores
simply relate the characteristics of the ligands to the ideal test set of
compounds. Although the author postulates that a higher score should indicate a
more desired compound, one cannot derive any direct measure of binding from
these numbers.
Early in the search, you may see structures with negative scores. The reason is
that RACHEL implements a distance penalty function in order to generate
compounds that fill the active site region. Otherwise, numerous ligands that
barely fill the target region may be generated. RACHEL penalizes developing
ligands until they reach 66% of the distance between the anchor bond and the
target. After this distance has been reached scoring is performed as usual.
Compounds with negative scores are deemed pseudo hits. RACHEL does not
discard these pseudo hits because they provide important data about steric and
electrostatic complementarity (or disparity). RACHEL uses this information to
determine heuristically which chemical groups to utilize in succeeding genera-
tions. The number of pseudo hits is displayed in parenthesis after the number of
true hits in the RACHEL monitor window.
! Close the RACHEL - Status dialog and return to the RACHEL dialog.
Chemical Templates
Chemical Descriptors
The information about the RACHEL project is loaded at the top of the RACHEL
dialog.
Because you have also run a RACHEL combinatorial search for this project
(Run a RACHEL Combinatorial Search on page 23), the name of the database
used for the search (rachel/DBASE/nci3d) is also posted at the bottom of the
dialog.
! At the top of the dialog toggle both Original Ligand and Receptor
on.
From a steric perspective, the arginine sidechain fits tightly into a very defined
pocket. This pocket is quite flat, forming a narrow cavity into which the
arginine guanido terminus is wedged. One can clearly see that enough room
exists for a cyclic system to substitute for the guanido terminus; however, it
must be planar.
possibly with the hydroxy group of Tyr 228. This binding pocket is highly polar
with an abundance of negatively charged functional groups. Thus, the ideal
component to complement this region should contain numerous hydrogen bond
donors.
The carboxyl terminus of the arginine residue also resides in a region where
growth can occur. In addition a hydrogen bond is made with the amide nitrogen
of Gly 193. Any derivative components that are placed in this region should
maintain this hydrogen bond as well.
Given this knowledge of the ligand-receptor interactions within the active site,
the first task is to formulate a RACHEL template that will describe the
chemistry of the ligand derivatives. The template consists of a user-determined
arrangement of defined components and wildcard designations. RACHEL
chemical descriptors are then assigned to the defined components. These
descriptors act as filters to enrich the database for the functionality desired at
the various positions in the template. The diagram below depicts the RACHEL
template that you will generate and the various chemical descriptors that you
will assign to the defined components.
You will now implement these tools to constrain RACHEL’s process of gener-
ating derivative chemical structures to replace the arginine sidechain of the
ligand tripeptide. These descriptors, in conjunction with the knowledge of the
chemical interactions within the active site, govern the structure-based drug
design and refinement.
! At the top of the dialog toggle both Original Ligand and Receptor
off.
• The “SITE1” label denotes the target that was selected to direct the
growth of derivative fragments.
• The “W” atom represents a wildcard component. When a search is set
up, RACHEL places a wildcard at each anchor bond by default so that
growth is allowed to proceed unhindered should you decide to forego
any descriptors or constraints.
! In the RACHEL - Modify Chemical Descriptors dialog, make sure that
the Component Type is set to Defined.
! Press Attach Component.
! When the Select Atom dialog pops up, click the W wildcard component
to attach a defined component.
In the SYBYL window, defined component C2 has been inserted between the
anchor and the wildcard component.
8. To complete the template, you must attach one more defined component that
branches off the main chain at component C2.
RACHEL assigns a series of default site level descriptors to limit the generation
of chemically inappropriate structures. These can be seen easily.
! In the RACHEL - Modify Chemical Descriptors dialog, activate Site
Descriptor.
The dialog lists the descriptors that have been assigned to the selected site.
These descriptors all act as constraints to eliminate chemically inappropriate
atom types or undesired linkages (see the glossary of chemical descriptors on
page 28).
Notice that the ID number of the currently displayed site (Site: 1) is shown in
the lower left corner of the dialog.
Often you will invoke numerous descriptors for either a site or a component.
This menu enables you to filter and view descriptors of the same type. If you
select ATYPES, only the single ATYPES descriptor will be listed.
Component C1
12. Add a component descriptor of type RATOMS to specify the minimum and
maximum number of ring atoms for this component.
! In the Edit Chemical Descriptors section of the RACHEL - Modify
Chemical Descriptors dialog, press Add Descriptor.
The Number of Ring Atoms dialog appears (dialog description on page 88).
! Set the Low and High values to 5 and 6, respectively, then press
OK.
13. Use the BONDS descriptor to indicate that candidates for the specified
component must contain a potential hydrogen bond donor.
The Bonded Atom Constraints dialog appears (dialog description on page 85).
! For the first pair of atom types shown, select O.3 on the left and H on
the right.
! Set the remaining pair of atom types as follows:
- O.co2 and H
- N.3 and H
- N.2 and H
- N.am and H
- N.pl3 and H
This descriptor will allow RACHEL to isolate components that contain potential
hydrogen bond donors.
! Set the Operator to >.
! Click the right arrow next to the top slider to set the Value to 1.
! Press OK.
Component C2
15. Add a component descriptor of type ATOMS to specify the minimum and
maximum number of atoms for this component.
! Press Add Descriptor.
! Set the Low and High values to 1 and 5, respectively, then press
OK.
The ATOMS descriptor has been assigned to Component C2 and appears in the
list.
16. Add a component descriptor of type ATTACH to specify that the new ligand
scaffold must connect to the fixed part of the ligand through an sp3 carbon.
ATOMS 1 - 5
ATTACH C.3 -> ANCHOR
Component C3
Chemical descriptors for component C3 will indicate that all the database
components that RACHEL substitutes in this position must have at least one
oxygen that is a potential hydrogen bond acceptor
18. Add a component descriptor of type ATYPES to specify the minimum and
maximum number of atoms for this component.
The Atom Type Constraints dialog appears (dialog description on page 84).
! Press OK.
Because there is only one site, it is highlighted in the SYBYL window. All the
descriptors that RACHEL defined automatically for this component are listed in
the dialog.
20. Use the RBONDS descriptor to indicate that the scaffold of three components
for Site 1 must have between 4 and 6 rotatable bonds.
The Number of Rotatable Bonds dialog appears (dialog description on page 89).
! Set the Low and High values to 4 and 6, respectively, then press
OK.
The RBONDS descriptor has been assigned to Site 1 and appears at the bottom
of the list.
21. Use the MW descriptor to indicate that the scaffold must have a molecular
weight between 50 and 300.
! Press Add Descriptor.
! Set the Low and High values to 50 and 300, respectively, then press
OK.
The MW descriptor has been assigned to Site 1 and appears in the list.
22. Use the ATYPES descriptor to eliminate halogens. The NCI-3D database
contains a few compounds with sp1 carbons. These too will be eliminated.
! Press Add Descriptor.
The Atom Type Constraints dialog appears (dialog description on page 84).
! In the list of atom types select C.1, F, Cl, and Br.
! Press OK.
The ATYPES descriptor has been assigned to Site 1 and appears in the list.
23. Use the ATYPES descriptor to prevent RACHEL from adding too many
heteroatoms to the entire ligand scaffold for this site.
! In the list of atom types select N.4, N.3, N.2, N.1, N.ar, N.am, N.pl3,
O.3, O.2, and O.co2.
! Set the Operator to <.
! Use the arrows next to the integer slider to set the Value to 6.
! Press OK.
Another ATYPES descriptor has been assigned to Site 1 and appears in the list.
Note how the star-labeled SITE1 is next to the carboxylic carbon of the ASP
189 sidechain, which was the target atom used to define the site when you
created the project (see Designate the RACHEL Target Atom on page 11).
25. Use a PHARM descriptor to specify that any derivative of this site must place a
hydrogen-bond donor within 3.5 Å of the carboxylate carbon in Asp189.
! Click the carboxylic acid carbon of the Asp189 sidechain (the atom
closest to SITE1).
The selected atom is highlighted by a sphere or colored dots, and its XYZ
coordinate are shown in the dialog.
! In the list of atom types select H.
27. Add a PHARM descriptor for the hydrogen bond acceptor at the carbonyl end of
the arginine residue.
The selected atom is highlighted by a sphere of colored dots, and its XYZ
coordinate are shown in the dialog.
The definitions for all the descriptors associated with this project are saved in
the file Rachel_builddef in the project directory.
Because you have specified a large number of chemical descriptors, the search
may be highly constrained. To insure a significant number of hits you will
reduce the diversity index. A high diversity index (0.85–0.99) will result in
fewer, but more diverse hits. The default value should be used in searches that
have few chemical constraints. Otherwise, the resulting hits may all appear very
similar.
The Adjust Search Parameters dialog appears (dialog description on page 78).
! Press OK.
The search parameters associated with this project are saved in the file
Rachel_searchdef in the project directory.
However, the strength of a target function is that it is far easier to adjust than a
true scoring function. In this case, you can simply lower the electrostatic inter-
action target value (which will favor hydrogen bond formation) and increase the
associated scalar. The end result is that the importance of electrostatic interac-
tions is increased.
! In the RACHEL dialog, press Scoring Function: Edit.
The Adjust Target Function dialog appears (dialog description on page 90)
! Press OK.
The scoring function parameters associated with this project are saved in the file
Rachel_scoredef in the project directory.
You must define the storage location for all successfully generated structures.
These will be stored in the .mol2 format.
Note: Because all the hits are saved in individual .mol2 files, adding the
extension .mdb to the directory name makes it easier later to review the hits in
a Mol2 database or in a molecular spreadsheet.
! Press OK to start the search.
32. Let the calculation run for a few iterations, then view the results.
This indicates that you have successfully used the chemical descriptors to enrich
the component lists for each of the defined structures in the template. Notice
that component C3 contains a large number of substituent candidates. This
indicates that a substantial portion of the NCI-3D database contains ring struc-
tures.
33. You can view the structures that RACHEL has produced so far.
! Press Viewer at the bottom of the RACHEL - Status dialog.
The Viewer dialog acts as a remote control, enabling you to quickly cycle
through the hits while permitting full interaction with the structures.
! Pressing the left and right arrows in the Viewer dialog allows you to
move up and down the list of hits while the double arrows move one
“page” at a time.
! Toggle the Original Ligand on to display the original ligand for
reference.
You should see a great deal of variation in the sidechain components utilized.
However, all of the derivatives that have positive scores meet all the chemical
criteria.
You could also potentially see a large variation in scores. Depending upon how
long the search is allowed to run, structures with negative scores may be
present. A distance based penalty score is implemented until the structure
attains a distance of 0.67 times the target distance. The target distance is deter-
mined by the location of the target atom. In addition, structures are penalized if
they do not meet all of the pharmacophore descriptors.
You may see structures that contain improper chemistry. These are due to struc-
tural errors in the NCI-3D database itself or in its translation to 3D coordinates.
Appropriate chemical descriptors can be used to eliminate unwanted structures.
It is very important to know the composition of the compounds in your
database. As you become more familiar with these constraints, you will
probably develop and tailor your own default list of specifications to use.
34. When you are finished viewing the hits terminate the combinatorial chemistry
search.
! Press [X] in the Viewer dialog to close it.
! In the Status dialog press STOP Search then press Yes to confirm
this.
The receptor and ligand used in this tutorial are alpha-thrombin and a tripeptide
inhibitor (PDB code 1dwe). For the sake of this exercise, you will assume that
the arginine guanido group and the phenyl ring of the phenylalanine are critical
pharmacophoric elements for recognition and binding. Thus, you will use
CHARLIE to generate novel chemical scaffolds that span these two groups.
2. Make sure that you have all the necessary files. These are the same as those
used in the other RACHEL tutorials.
! If you have not yet copied the RACHEL demo files, see Prerequisite
to all RACHEL and CHARLIE Tutorials on page 7.
3. Start RACHEL.
! Append the name of the new project, charlie, to the directory name
in the adjacent field.
! Press OK.
The project name is listed on the Project line in the RACHEL - Setup New
Project dialog.
Upon completion, the RACHEL - Setup New Project dialog will resemble the
following:
The ligand is a tripeptide inhibitor of Thrombin. For the sake of this exercise,
you will assume that the arginine guanido group and the phenyl ring of the
phenylalanine are critical pharmacophoric elements for recognition and binding.
Thus, you will use CHARLIE to generate novel chemical bridges that span
these two groups.
9. Designate the anchor bond. This indicates where CHARLIE will begin building
the scaffold.
Note: The order in which the atoms are selected is important. Always select the
atoms sequentially from the anchor bond towards the target bond (as illustrated
by the arrows in the Select Atom dialog).
! In the SYBYL window, rotate and scale the molecules until you can
clearly see atoms N26 and 25 of the ligand.
You have designated the bond from N26 to C25 as the anchor bond of the
scaffold. The bond and the rest of the molecule are colored green. Note: If you
accidentally selected the wrong bond, press Cancel in the next dialog (atom
selection for the target area) and restart the anchor bond selection.
10. Designate the splice target bond. This indicates where CHARLIE must
terminate the scaffold and link it with the remainder of the ligand.
As in the figure above, the portions of the ligand that you wish to retain are
colored by atom type once again. The region that will be replaced by the new
scaffold is colored green.
11. The RACHEL setup is complete: only one site will be defined for this tutorial.
! Press End in the dialog prompting you to select an anchor atom in
the ligand.
A message in the console acknowledges that the project directory has been set
up successfully.
Note: Do not move the project directory once you have created it. If this is
desired, you must erase this directory and the files within it and regenerate the
project using the RACHEL setup process detailed above.
For each defined site the following information is listed in the RACHEL dialog:
• Site: 1 = The site’s ID number.
• Anchor = 26-25: The ligand atoms defining the anchor bond. The first
atom will remain fixed, the second atom determines the region that will
be optimized.
• –> 5-6 = The ligand atoms defining the target bond. CHARLIE will
attempt to engineer a derivative linker to join the anchor bond to the
target bond.
CHARLIE can build linkers to replace five different ligand scaffolds simulta-
neously. The only requirement is that each anchor and target bond pair must be
unique. No two linkers may share the same anchor bond nor terminate on the
same target bond. In this exercise, you will replace only one region.
14. You can improve the chances of finding hits by increasing the number of
conformations per structure to one million:
15. A high diversity index (0.85 –> 0.99) will result in fewer, but more diverse hits.
A higher value should be used in searches that have few chemical constraints.
Otherwise, a multitude of hits may result that all appear very similar. A lower
diversity index (0.25 – 0.5) will allow a greater number of hits; however, they
will be more chemically similar. When running CHARLIE, it is a good idea to
begin with a low diversity index value. This value can be raised in succeeding
runs if necessary.
16. The maximum splice tolerance is the RMS error allowed when CHARLIE joins
two ligand fragments by generating a linker structure. The tolerance is measured
at the linker bond - target bond overlap. A certain amount of error is necessary
to compensate for the fixed bond lengths and angles that are employed by the
search engine. The smaller the tolerance, the better the fit between the linker
and the static portions of the ligand, and the better the structure. However,
CHARLIE may have more difficulty producing a true hit.
As a general rule, a longer bridge will require a larger splice atom tolerance. If
you get a plethora of hits, you should decrease the splice atom tolerance.
Conversely, if you obtain no hits after 50 or more iterations, consider increasing
this value. In practice, you will alter this parameter value depending upon the
number and quality of hits you obtain.
! Increase Maximum splice atom tolerance to 0.75
Notice that the template consists of a wildcard (red) component plus a linker
(green). The linker tells CHARLIE that this is the terminus of the scaffold, and
that the terminal bond must be overlapped with the target bond to link with the
remainder of the ligand.
19. For a better perspective, display the original ligand to show how the linker
region will join the desired portions of the original ligand.
Keep in mind that the template is simply a schematic. Do not be alarmed that
the anchor bond appears altered in comparison to the original ligand. CHARLIE
will maintain all bond angles and lengths while adding the derivative compo-
nents.
• Defined Component #1
• Must join with anchor bond using an sp3 carbon (C.3). This ensures
appropriate chemistry.
• Wildcard
• RACHEL will substitute freely at this position given the steric and
electrostatic environment.
• RACHEL will add components to this position with varying connec-
tivity as necessary.
• Defined Component #2
• Must contain between 8 and 10 ring atoms.
• Defined Component #3
• Must join with target bond using an sp3 carbon (C.3).
20. You will now define the template and associated component descriptors. The
above schematic illustrates the strategy. The main group is component #2. This
component must contain a bicyclic ring. Thus, you will specify an 8-10 ring
atom descriptor, which should allow for bicyclic 5 and 6 membered rings in
various combinations. On either side of component #2, the wildcards will allow
CHARLIE to substitute components, as necessary. Components #1 and #3
simply ensure that sp3 carbons (C.3) atoms join with the guanido and phenyl
! Toggle off the Display Original Ligand check box to allow access to
the template.
21. Insert the first component between the nitrogen of the anchor bond and the
wildcard component.
! When the Select Atom dialog pops up, click the blue atom connected
to the wildcard to designate the insertion point for this component.
22. Insert the second component (C2) between the wildcard and the linker terminus.
! Press Insert Component.
23. Insert the third component (C3) between component C2 and the linker terminus.
! Press Insert Component.
Do not be concerned that the template seems large. This is a schematic repre-
sentation. The actual bridging components will rotate and stagger to fit the
space. Also keep in mind that CHARLIE will fill the wildcard components only
if necessary. They were inserted solely to give CHARLIE more freedom in
choosing components.
! Toggle on the Display Original Ligand check box to see how many
rotatable bonds were present in the original molecule.
! Toggle it off to bring back the display of the template before you go
on.
! Press Select and click on the neighboring anchor bond atom (blue)
to select it as the target for the attachment descriptor.
26. Define a descriptor of type ATTACH for component C3 to specify that the new
scaffold must connect to the linker through an sp3 carbon
28. Review the list of descriptors in the RACHEL - Modify Chemical Descriptors
dialog.
! On the Select line, press All.
! Scroll to the bottom of the descriptor list where you will see:
Site 1 Cmpnt 1: ATTACH C.3 -> ANCHOR
Site 1 Cmpnt 2: RATOMS 8-10
Site 1 Cmpnt 3: ATTACH C.3 -> LINKER
You may either import the parameters from another successful run or edit them
interactively.
To Import an Existing Target Function: If you have already run the RACHEL
tutorial (see RACHEL Tutorials on page 7), you already have a target function
stored in that project’s directory (rachel/tutorial/Rachel_scoredef) and
you may simply import it.
! On the RACHEL dialog’s Scoring Function line press Import.
A message dialog displays the source and target directories for this operation.
! Press Yes to confirm that you want to overwrite the current scoring
function.
32. Modify the target function for building scaffolds. Ideally, the derivative
scaffolds should link the anchor and target regions with the most direct bridge,
producing steric and electrostatic complementarity. You can direct CHARLIE
to accomplish this by altering key target function parameters.
! On the RACHEL dialog’s Scoring Function line press Edit.
The Adjust Target Function dialog appears (dialog description on page 90)
! Press OK.
The scoring function parameters associated with this project are saved in the file
Rachel_scoredef in the project directory.
34. Start the CHARLIE search and specify where the results will be stored.
You must define the storage location for all successfully generated structures
(hits) within the project directory. These will be stored in the .mol2 format.
Note: Because all the hits are saved in individual .mol2 files, adding the
extension .mdb to the directory name makes it easier later to review the hits in
a Mol2 database or in a molecular spreadsheet.
! Press OK to start the search.
In this tutorial you are using a target function. The maximum score for the
target function is 10.0 (see Understand the RACHEL Score Values on page 26).
Thus, as compounds that are generated improve iteratively, their scores will
approach this value.
Note: This search may take a while to generate the first successful scaffold.
Approximately 30 - 40 generations may be necessary. If no successful scaffolds
(scores > 0.00) have been produced in 50 iterations, try terminating the search
and re-starting. Each time you start RACHEL (CHARLIE), a new random seed
value is generated. This gives RACHEL or CHARLIE a different set of starting
components, selected from the database, to begin derivatization or scaffold
building.
36. You can now view the other structures that CHARLIE has produced so far.
! Press Viewer at the bottom of the RACHEL - Status dialog.
The Viewer dialog acts as a remote control, enabling you to quickly cycle
through the hits while permitting full interaction with the structures.
! Press the left and right arrows to move up and down the list of hits.
Use the double arrows to move one “page” at a time.
! Toggle Original Ligand on to it (in green) for reference.
37. While you are viewing the structures, CHARLIE continues to search for new
hits. At any time, you may load new hits from the project run.
38. When you are finished viewing a few successful scaffolds, terminate the search.
! In the Status dialog press STOP Search then press Yes to confirm
this.
The scientific context for this tutorial is bridge generation. In this scenario,
several fragments from two or more well characterized lead compounds bind to
separate regions of the active site. The task is to generate appropriate linker
structures to join the separate fragments into a single compound. In doing so,
you must also consider the receptor cavity and optimize both steric and electro-
static complementarity.
The ligand used in this tutorial consists of two separate fragments taken from
the tripeptide inhibitor of Thrombin (PDB code 1dwe). You will use this as the
test ligand to demonstrate the formation of chemical bridges between fragments.
! If you have not yet copied the RACHEL demo files, see Prerequisite
to all RACHEL and CHARLIE Tutorials on page 7.
2. Start RACHEL.
The project name is listed on the Project line in the RACHEL - Setup New
Project dialog.
The ligand used in this tutorial consists of two separate fragments taken from
the tripeptide inhibitor of Thrombin. You will use this as the test ligand to
demonstrate the formation of chemical bridges between fragments. You will
bridge from the anchor bond [C10 –> C2] to the target bond C[1 –> C3].
8. Select the two atoms forming the anchor bond for the bridge. The order of atom
selection is important.
! Click on atom 10 (it is then highlighted with a green sphere).
! Click End when prompted for Site 2 anchor bond. This will signify
that you are finished with the setup.
You may either import the parameters from another successful run or edit them
interactively.
To Import: If you have already run the other CHARLIE tutorial (see Bridge
Generation Using CHARLIE on page 61), you may simply import the search
parameters used in that run:
! On the RACHEL dialog’s Search Parameters line press Import.
11. To Edit:
14. Define a descriptor of type RATOMS for component C1 and specify the
minimum and maximum number of ring atoms for this component.
The Number of Ring Atoms dialog appears (dialog description on page 88).
! Set the Low and High values to 6 and 10, respectively, then press
OK.
A message dialog displays the source and target directories for this operation.
! Press Yes to confirm that you want to overwrite the current scoring
function.
For more information about the specific parameters, see Prepare the Target
Function on page 57.
17. Start the CHARLIE search and specify where the results will be stored.
You must define the storage location for all successfully generated structures
(hits) within the project directory. These will be stored in the .mol2 format.
Note: Because all the hits are saved in individual .mol2 files, adding the
extension .mdb to the directory name makes it easier later to review the hits in
a Mol2 database or in a molecular spreadsheet.
! Press OK to start the search.
In this tutorial you are using a target function. The maximum score for the
target function is 10.0 (see Understand the RACHEL Score Values on page 26).
Thus, as compounds that are generated improve iteratively, their scores will
approach this value.
Note: This search may take a while to generate the first successful bridge.
Approximately 30 - 40 generations may be necessary. If no successful bridges
(scores > 0.00) have been produced in 50 iterations, try terminating the search
and re-starting. Each time you start RACHEL (CHARLIE), a new random seed
value is generated. This gives RACHEL or CHARLIE a different set of starting
components, selected from the database, to begin derivatization or bridge
building.
19. You can now view the other structures that CHARLIE has produced so far.
The Viewer dialog acts as a remote control, enabling you to quickly cycle
through the hits while permitting full interaction with the structures.
! Press the left and right arrows to move up and down the list of hits.
Use the double arrows to move one “page” at a time.
! Toggle Original Ligand on to it (in green) for reference.
20. While you are viewing the structures, CHARLIE continues to search for new
hits. At any time, you may load new hits from the project run.
! Press Refresh at the bottom of the RACHEL - Status dialog to reload
the most current hits from the search.
21. When you are finished viewing a few successful bridges, terminate the search.
! In the Status dialog press STOP Search then press Yes to confirm
this.
Typically, you will generate the RACHEL database from a corporate database
or from a publicly or commercially available compound database.
The RACHEL demo files include a set of 25 ligands1 extracted from the Protein
Data Bank. The ligands are stored in a Multi-Mol2 file, rachel/DBASE/
Demo_dbase_multi.mol2.
! If you have not yet copied the RACHEL demo files, see Prerequisite
to all RACHEL and CHARLIE Tutorials on page 7.
2. Start RACHEL
! Press OK.
! Press Yes in the small dialog that pops up to confirm the creation of
a new RACHEL database.
RACHEL extracts the components from the Multi-Mol2 file and adds them to
the new test_dbase database. A small window monitors the process by
showing the number of structures in the Multi-Mol2 file processed and the
number of components extracted. RACHEL extracts only unique components.
Thus, the number of extracted components will rise rapidly. However, as
common components are stored, the number of novel components added will
gradually diminish. Individual components with more than 256 atoms or bonds
are rejected.
Because there are a few structures in this demo multi-mol2 input file, the
extraction process may appear instantaneous. A structural database of 500,000
compounds may actually require 30–60 minutes of real time to process
(depending upon processor, disk, and network conditions) resulting in 25–
50,000 unique components depending upon the inherent chemistry.
RACHEL has a limit of 100,000 unique components that can be registered and
stored in any single component database.
When the extraction has completed, you will see a status window describing the
number of unique components that were stored.
Extraction of databases components completed.
Database contains 43 unique components.
Press =Enter= to finish.
! Use the Next and Prev button to display a few of the database
components.
Project Definition
Site Information
Parameters
Database
Action Buttons
Run Search Start the RACHEL combinatorial search. You will need
to specify a directory to store the hits (.mol2 files) and
associated information files.
Note: Because all the hits are saved in individual .mol2
files, adding the extension .mdb to the directory name
makes it easier later to review the hits in a Mol2 data-
base or in a molecular spreadsheet.
Status Select the directory containing the compounds gener-
ated by the search and access the RACHEL - Status dia-
log.
Note: The ligand and receptor molecules must share the same coordinate space.
The default values for the parameters accessible through this dialog are stored
in the text file $RACHEL_HOME/Rachel_searchdef. If you modify the
search parameters before a RACHEL run. their new values are stored in the
Rachel_searchdef within the project directory.
Access: In the RACHEL dialog press Search Parameters Import then use a
file browser to select the project directory from which you want to import the
search parameters.
Edit Templates
Insert Compo- Press the button then click in the SYBYL window on
nent the two insertion points (atoms or components).
Delete Compo- Press the button then click in the SYBYL window on
nent the component to be deleted.
Access:
• In the RACHEL dialog press Chemical Descriptors: Edit.
• Then, in the RACHEL - Modify Chemical Descriptors dialog, select Site
Descriptor or Component Descriptor or All.
• Then, press Add Descriptors.
Site level descriptors entail parameters that govern an entire optimization site,
that is the combination of components that complement a user defined region.
Component level descriptors are used to restrict the selection of substituent
candidates for each defined component.
Descriptors Applicability
The descriptors you are likely to use most often are LINKS, ATYPES, and
BONDS. These will mainly be used to filter out unwanted chemical construc-
tions or components from use in derivative structures.
Access:
• In the Select Component Descriptor Type dialog, set the Descriptor Types
to ATOMS and press OK.
• Or in the RACHEL - Modify Chemical Descriptors dialog, select a
descriptor of type ATOMS in the list and press Edit Descriptor.
Access:
• In the Select Component Descriptor Type dialog, set the Descriptor Types
to ATTACH and press OK.
• Or in the RACHEL - Modify Chemical Descriptors dialog, select a
descriptor of type ATTACH in the list and press Edit Descriptor.
Atom Type List Select the desired atom type for the attachment atom of
the component.
Select Press the button then click the atom in the ligand that is
the anchor point for this component.
Access:
• In the Select Component Descriptor Type dialog, set the Descriptor Types
to ATYPES and press OK.
• Or in the RACHEL - Modify Chemical Descriptors dialog, select a
descriptor of type ATYPES in the list and press Edit Descriptor.
Atom Types Select one or more SYBYL atom types in the list.
Operator =, <, or >.
Value Click the appropriate arrow or drag the appropriate
slider to specify either integer or fractional values.
Examples:
N.4 N.3 N.2 N.1 N.ar N.am N.pl3 > 1
Two or more atoms in the specified component or site must be nitrogens.
F Cl Br I = 0
The desired component or site must be void of halogens.
Access:
• In the Select Component Descriptor Type dialog, set the Descriptor Types
to BONDS and press OK.
• Or in the RACHEL - Modify Chemical Descriptors dialog, select a
descriptor of type BONDS in the list and press Edit Descriptor.
Examples:
C.ar C.ar = 6
If used as a component-level descriptor, the specified component must
contain six aromatic bonds.
If used as a site-level descriptor, a component with six aromatic bonds must
be present in the site.
Access:
• In the Select Component Descriptor Type dialog, set the Descriptor Types
to LINKS and press OK.
• In the RACHEL - Modify Chemical Descriptors dialog, select a descriptor
of type LINKS in the list and press Edit Descriptor.
At the bottom of the standard list of atom types is another one where all the
names are in parentheses. Use these to specify the type of any atom associated
with a ring.
Examples:
C.3 (C.3) > 0
At least one rotatable bond must be present between an sp3 carbon and an
sp3 carbon located in a ring structure in the site derivative.
O.3 O.3 = 0
No bonded oxygens may be present between site components.
3.3.7 MW Descriptor
A site descriptor to specify the range of molecular weight.
Access:
• In the Select Component Descriptor Type dialog, set the Descriptor Types
to PHARM and press OK.
• In the RACHEL - Modify Chemical Descriptors dialog, select a descriptor
of type MW in the list and press Edit Descriptor.
Low, High The extreme values of molecular weight for each com-
ponent to be added to the selected site.
Access:
• In the Select Component Descriptor Type dialog, set the Descriptor Types
to PHARM and press OK.
• In the RACHEL - Modify Chemical Descriptors dialog, select a descriptor
of type PHARM in the list and press Edit Descriptor.
Pick Atom Press this button then click the atom that RACHEL will
attempt to replace with another scaffold. Usual candi-
dates are a hydrogen bond donor or acceptor in the
original ligand, or a receptor atom, or a water atom in
the receptor active site. The selected atom will be
bridged to the growing chain.
XYZ As an alternative to selecting an existing atom, enter a
position’s 3D coordinates in the fields.
Mol2 Press this button to select an atom in other molecule.
Desired Atom Select at least one SYBYL atom type in the list.
Types
Error Use the arrows or the slider to define the radius (in Å)
of the sphere centered on the designated coordinates.
Example:
1.223, -2.546, 0.443 O.3 O.2 O.co2 0.50
RACHEL will attempt to select components that will place a hydrogen bond
acceptor (O.3, O.2, or O.co2) within 0.50 Å of the 3D coordinate position
{1.223, -2.546, 0.443}.
Access:
• In the Select Component Descriptor Type dialog, set the Descriptor Types
to RATOMS and press OK.
• Or in the RACHEL - Modify Chemical Descriptors dialog, select a
descriptor of type RATOMS in the list and press Edit Descriptor.
Access:
• In the Select Component Descriptor Type dialog, set the Descriptor Types
to RBONDS and press OK.
• In the RACHEL - Modify Chemical Descriptors dialog, select a descriptor
of type RBONDS in the list and press Edit Descriptor.
Base offset Sets the appropriate scale and range given the training
value set data. Available only for Scoring Functions.
Nonpolar Inter- Estimates the non-polar interaction energy between
actions ligand and receptor. Higher values reflect the appropri-
ate association between hydrophobic regions of the
receptor with hydrophobic portions of the ligand.
Higher values are conducive to increased ligand recep-
tor affinity.
Electrostatic Estimates the electrostatic interaction energy between
Interactions ligand and receptor. Lower values reflect more comple-
mentary associations between oppositely charged
ligand and receptor atoms. Thus, lower values are con-
ducive to increased ligand receptor binding.
Steric Interac- Estimates the steric complementarity between ligand
tions and receptor. Higher values indicate a tighter associa-
tion between ligand and receptor.
Steric Strain Estimates the steric strain between ligand and receptor.
In contrast to the above term, lower values indicate a
tighter association and fewer inappropriate steric con-
tacts between ligand and receptor.
The default values for the parameters accessible through this dialog are stored
in the text file $RACHEL_HOME/Rachel_scoredef.
3.4.2 Train
To select the type of RACHEL training function.
3.4.3 Import
With experience, you may become comfortable with a set of parameters that
produces desired search results. You may also wish to use the same values for
another project.
Access: In the RACHEL dialog press Scoring Function: Import then use a
file browser to select the project directory from which you want to import the
scoring function.
3.4.4 Predict
To use the current scoring function (established with the training set or
imported from another project) to predict the binding affinity of ligands. You
will prompted for the location of the ligand(s) and of the receptor.
Note: You must have write permission to the directory and database file that
you will create or add components to.
Note: You must have write permission to the directory and database file that
you will create or add components to.
Database Full path of the currently open database. Use the file
browser to select another database.
Num Atoms Specify the minimum and maximum number of atoms
in the components to be extracted by Search.
Ring Atoms Specify the minimum and maximum number of ring
atoms in the components to be extracted by Search.
Mol Wt Specify the range of molecular weight for the compo-
nents to be extracted by Search.
Num Links Specify the minimum and maximum number of links in
the components to be extracted by Search.
Descriptors
Hit Directory The full path to the directory containing the derivative
compounds.
Note: Because all the hits are saved in individual .mol2
files, adding the extension .mdb to the directory name
makes it easier later to review the hits in a Mol2 data-
base or in a molecular spreadsheet.
Summary Name and location of the project files and the full
description of the search parameters, chemical descrip-
tors, and scoring function.
Cavity Generate an extended radius active site cavity sur-
rounding the generated ligand space.
Export Structure Select a structure in the results list to save it to a .mol2
file in a SAVE subdirectory within the location defined
by the Hit Directory.
Results For each successful hit (compound derivative), the
name of the .mol2 file and its RACHEL score.
Viewer Access the Viewer dialog.
Refresh Reload all the structures found so far. Note that this
operation clears all molecule areas and associated back-
ground images (such as MOLCAD surfaces).
Stop Search Stops the search at the end of the current iteration. You
will be prompted to confirm this action.
Close Close this dialog and return to the RACHEL dialog.
Pseudo Recep- Name of the output file that will contain the pseudo
tor receptor. The default file extension is .mol2.
Modify vdW Select one or more atoms in the Atom Expression dialog
then enter the amount of “additional vdW clearance” at
the keyboard.
Atoms of the ligand involved in hydrogen bonds with the receptor penetrate the
mesh surface.
Directory With the browser open the SAVE directory within the
project’s directory.
Receptor File containing the receptor used for the project. This is
file is retrieved automatically.
Generate Display a mesh surface of the receptor cavity. The sur-
Extended face is drawn by adding 1.5 Å to the van der Waals
Radius Cavity radii of the receptor atoms near the ligand.
Slider Use Prev and Next to scroll through the saved ligands.
The corporate structural database (on the left) may contain hundreds of
thousands of compounds. All structures are composed of non-rotatable chemical
groups separated by rotatable bonds as defined by the laws of chemistry. These
non-rotatable groups represent the components or fundamental building blocks
that will be used to generate new derivative compounds (on the right).
The RACHEL software has a far greater problem. While other builder-type
applications contain databases with 100 components or less, RACHEL can
extract upwards of 40-50,000 components, depending upon the size and
diversity of the corporate database. Thus, the number of potential fragment
combinations is nearly immeasurable. Clearly, a method is needed to rapidly
focus on the appropriate combinations that are likely to satisfy binding require-
ments.
The figure above demonstrates how this fragment property index is generated.
The image on the left depicts a representative component database. Using the
stored chemical attributes, the database is sorted and mapped into a multi-
dimensional array, where each axis represents a different descriptor. In this
example, only size, polarity, and valence (number of connections) are shown for
simplicity. Each axis provides a gradient along which components can be distin-
guished. As a result, components that are similar with respect to the various
descriptors are grouped together.
This fragment property index offers a powerful means to improve the gener-
ation of complementary ligands. Over time, builder-type programs evolve
compounds with improved binding. A moderate affinity structure has
In the example above, RACHEL determines that the naphthalene group (blue)
and carboxylic acid group (red) of a ligand derivative should be replaced with
other components to improve binding. The naphthalene group is large and very
non-polar since it consists strictly of hydrocarbons. Conversely, the carboxylic
acid group is quite small, but highly polar. Using the active site map as
described above, RACHEL determines that these characteristics are indeed ideal
for complementing the receptor at each respective component. Using the
fragment property index, RACHEL can cross-reference other database compo-
nents that exhibit similar characteristics, as shown in the red and blue boxes on
the right. These components are then combinatorially used to generate a new
family of derivatives for testing. Each derivative retains the optimal receptor
binding characteristics. However, enough variability is generated to potentially
improve receptor complementarity.
The figure below demonstrates this with an example. The lead compound
scaffold in the center contains an amide bond with various sidechains extending
from it.
region is smaller and more spherical. Thus, only single rings are
acceptable although they need not be aromatic. In addition, this region is
very hydrophobic; thus, only hydrocarbon components are acceptable.
• The third group, shown in red, is quite different from the other two. This
region of the active site is highly charged and requires a small polar
group to interact with. Thus, no ring structures are acceptable.
Furthermore, heteroatoms (nitrogen, oxygen) are required.
Using the individual databases, shown in Figure 7 as the blue, green, and red
boxes, RACHEL combinatorially generates all possible derivatives within the
constraints of the active site. In so doing, an immense number of diverse
chemical structures may be constructed and tested in a defined and controlled
manner.
• The LINK constraint limits the atom types that can be utilized in
rotatable bonds.
• The PHARM constraint signifies that a specific atom type must be
present at a precise location in the active site.
• The #CMPNTS restriction places upper and lower bounds on the total
number of components a structure can possess.
• The ATYPE constraint stipulates how many atoms of a specific type can
be present in both individual components as well as the entire structure.
• The BOND constraint places limits on the types of bonded atoms that
can be present within a component.
As one can see, this again gives the user a tremendous amount of control over
the structures generated by RACHEL.
RACHEL will then generate chemically diverse structures using the template as
shown in Figure 9. The static portions of the template are left untouched and
they are incorporated into every generated derivative. However, the wildcard
For example,
• The steric interaction energy is calculated as the number of receptor
atoms that are within a specific distance (i.e. 5 Å) of any ligand atom.
The higher the value, the more interactions between ligand and receptor
atoms.
• The electrostatic interaction energy is computed using Coulomb’s law.
• The hydrophobicity is represented by LogP, which is a measure of the
compound’s solubility in oil versus water. The higher the value, the
more greasy and oily the compound.
In short, these descriptors are simple and very easy to calculate. This allows for
the rapid determination of characteristics that relate to ligand binding strength.
It is important to note that this example is very simplistic. In reality, some
scoring functions contain over twenty terms.
The figure above presents four complexes whose binding affinity has been
measured and whose descriptors have been calculated. Statistical tools, such as
partial least squares regression, are then employed to relate the numerical trends
in the descriptors with the corresponding binding affinities. In the resulting
equation, estimated affinity is a function of the calculated descriptors (steric,
electrostatic, and logP). Coefficients (A, B, C) relate the calculated descriptors
to the actual affinities and are determined by the statistical analysis. In the
example, as steric interaction energy increases, so does the biological binding
activity. Thus, the coefficient A is positive. On the other hand, a negative
electrostatic interaction energy is conducive to tighter binding since opposite
charges attract. Therefore, the corresponding coefficient B is negative. LogP
follows a similar trend as steric interaction energy; thus, coefficient C is
positive.
• Molecular weight
• Number of rotatable bonds
• LogP estimation
• Nonpolar atom fraction
By limiting the training set to structures binding within the same receptor, a
focused scoring function is biased towards the interactions that govern ligand
association with the target active site. If hydrophobic contacts predominate, the
hydrophobic descriptors will be emphasized. Conversely, if electrostatic forces
are important to binding, those descriptors will be accentuated. Even something
as simple as the size of the active site can have a tremendous impact on the
allowable ligands. This is a descriptor that cannot be adequately represented in
generalized scoring functions. Given their built-in adaptability, focused scoring
functions have a greater predictive power when estimating ligand-receptor
binding.
Imagine the green and red dots to be structure-activity data points for individual
ligand-receptor complexes. The lines passing through them represent potential
scoring functions attempting to describe their distribution.
• In the graph on the left, the dataset contains a large number of
complexes whose activities cover a wide range of values. This wide
distribution allows for an easy determination of a best-fit line. The
scoring function generated from this set thoroughly represents the data.
• The dataset in the middle graph contains too few compounds to generate
an accurate fit of the data. Notice the ambiguity that exists in deter-
mining the best-fit line. Any scoring function derived from this dataset
has little predictive value.
• In the graph on the right there is no lack of data. However, money and
time constraints may have limited studies of poorly binding compounds,
resulting in a cluster of high-affinity data points. This graph shows that
is difficult to elucidate an accurate scoring function when the structure
activity data is not broad enough.
In situations where the dataset is either too small or too clustered, RACHEL
offers another means of generating a focused scoring system from proprietary
structure activity data. When RACHEL determines that the derived scoring
function offers little predictive value, it switches to a target function. A target
function is formed by simply averaging the descriptor values of the highest
affinity complexes in the training set. These “ideal” descriptor values are then
used as a guide to determine if newly generated derivative structures will be
kept or discarded. This is illustrated in the figure below.
Fortunately, this is often the exact task at hand for pharmaceutical chemists. By
the time a drug development project has reached maturity, the ligands that have
been developed are often optimal binding compounds. Therefore, a target
function is usually sufficient as it allows the drug designer to construct alternate
chemical architecture that retains optimal binding characteristics.
A LINKS 86
MW 87
ATOMS descriptor 83 PHARM 87
ATTACH descriptor 83 RATOMS 88
RBONDS 89
ATYPES descriptor 84
tutorial 28
component database
B adding structures 94
building blocks 102
BONDS creating 93
descriptor 85 opening 93
selection system 104
C tutorial 69
viewing 95
CHARLIE creating a project 8
tutorials graphical interface 73
bridge generation 61 introduction 5
scaffold replacement 47 project setup 77
pseudo receptor 99
receptor cavity 100
D scoring functions
Database automated elucidation 111
RACHEL components 93 editing parameters 90
Descriptors 80 focused 114
generalized 113
importing parameters 92
L predicting 92
target function 114
License requirements
training function 91
RACHEL 6
tutorial 14
LINKS descriptor 86 search parameters
editing 78
importing 79
M search status 97
MW descriptor 87 theory 101
tutorials 7
utilities 99
P
RATOMS descriptor 88
PHARM descriptor 87
RBONDS descriptor 80, 89
Project
Receptor
setup in RACHEL 77
Pseudo receptor 99
Pseudo receptor 99 view cavity 100
R S
RACHEL Scoring methods
chemical descriptors 80 RACHEL 14, 111
adding 82
ATOMS 83
ATTACH 83 T
ATYPES 84 Tutorials
BONDS 85 RACHEL 7