You are on page 1of 30

PART I FOUNDATIONS AND BASIC TECHNIQUES OF

DOCKING

CHAPTER 1

Modern Tools and Techniques in


Computer-Aided Drug Design
TAMANNA ANWAR • PAWAN KUMAR • ASAD U. KHAN

1 OVERVIEW OF COMPUTER-AIDED DRUG learning (ML), deep learning (DL), artificial intelligence
DESIGN techniques, and data mining to further enhance the speed
The approaches applied in drug development in the pre- and accuracy of drug discovery. In future, drug discovery
sent time are very expensive and slow irrespective of the strategies will very much rely on these advanced IT tech-
tremendous technological advancements in drug discov- niques, which will help in the selection of features (drug
ery approaches. In such situation of rising pressure of and receptor features), image processing, clustering of
reducing time and cost for safe and effectivedrug discovery, compounds, etc. For example, to see the drug’s impact
the focus has moved toward the initial phases of drug on patients, ML approaches are used which benefits in
discovery and development. Computer-aided drug design the development of drugs that are safe and effective
(CADD) approaches are now immensely used in the and take less time in the development than the conven-
discovery of drug more efficiently and accurately. The tional methods. The importance of ML in CADD is well
cost of discovery and development of drugs can be reduced recognized and there are several reports on its successful
by 50% with the use of CADD (Xiang et al., 2012). applications (Khamis & Gomaa, 2015; Vamathevan et al.,
For more than three decades, CADD approaches 2019). In the ML-based approach, large data sets are
have been applied in various stages of drug discovery trained with the help of mathematical framework, which
(Fig. 1.1). Several of the marketed drugs discovered is then applied for the prediction or classification of a
till date have been developed with the help of CADD new data set (Deo, 2015).
techniques (Table 1.1). Furthermore, CADD also helps Advancement in the different aspects of computational
in predicting the novel therapeutic uses of the FDA approaches aid in CADD such as ML approaches help in
(Food and Drug Administration) approved drugs; this modeling complex systems that will provide insight into
strategy is termed as “drug repurposing” and will be the designing and essential knowledge of molecules.
discussed later in the chapter. However, DL approaches help in quickly selecting com-
The aim of using CADD approaches is to predict a pounds based on pattern recognition, as well as it can
promising compound that brings a desired effect after be used for early detection of disease and management
binding to the particular biological target. Convention- of the disease. Traditional CADD approaches can be
ally, high-throughput screening is used for testing large broadly divided into two groups depending upon the
number of compounds on automated assays to achieve availability of the target protein structure: (1) structure-
the required effects. In this case, the drug development based drug design (SBDD) and (2) ligand-based drug
procedure is not only time-consuming but requires exten- design (LBDD). Availability of the target protein structure
sive investment. Therefore, to reduce this burden, CADD provides additional edge in the direct hit to lead opti-
approaches are applied so that the chemical compounds mization process. SBDD includes approaches such as mo-
can be virtually screened first, which will significantly lecular docking, virtual screening (VS), structure-based
reduce the number of compounds going for experimental pharmacophore modeling, and de novo drug design,
screening (Yu & Mackerell, 2017). With the advancement whereas LBDD approaches include similarity-based
in the information technology (IT), computational po- screening, quantitative structureeactivity relationship
wer, and availability of big data, recently new approaches (QSAR) modeling, ligand-based pharmacophore
have been applied in CADD, which includes machine modeling, and scaffold hopping (Fig. 1.2).

Molecular Docking for Computer-Aided Drug Design. https://doi.org/10.1016/B978-0-12-822312-3.00011-4


Copyright © 2021 Elsevier Inc. All rights reserved. 1
2 PART I Foundations and Basic Techniques of Docking

FIG. 1.1 Computer-aided drug design approaches applied in various stages of drug discovery.

TABLE 1.1
List of Drugs Developed with Computer-Aided Drug Design (CADD) Approaches.
Drug Indication CADD Approach Status References
Saquinavir Inhibitor of HIV proteases Structure-based drug design Approved Drie (2007)
1995
Nelfinavir Inhibitor of HIV Structure-based drug design Approved Fischer & Robin Ganellin
1997 (2006)
Norfloxacin Bacterial DNA gyrase Quantitative structureeactivity Approved Roy (2015)
Inhibitor relationship 1998
Zanamivir Antiviral (influenza A and B) Modeling de novo design Approved Clark (2006)
1999
Amprenavir HIV Protein modeling and Approved Wlodawer & Vondrasek
molecular dynamics 1999 (1998)
Zolmitriptan Migraine Pharmacophore modeling Approved Clark (2006), Glen et al.
2003 (1995)
Dorzolamide Glaucoma and ocular Fragment-based screening Approved Grover et al. (2006)
hypertension 2012

2 CHEMICAL LIBRARIES chemical databases having millions of compounds


Traditionally, for finding a hit against any target in drug to shortlist potential compounds for synthesis. A large
discovery, the structure of compounds that can act as in- number of databases offer structures of chemical
hibitor or activator is required for docking/VS. A high- compounds, biological targets, and data pertaining to
throughput virtual screening (HTVS) method utilizes bioactivity for drug discovery. These databases are an
CHAPTER 1 Modern Tools and Techniques in Computer-Aided Drug Design 3

FIG. 1.2 Classification of Computer aided drug design (CADD).

exclusive source for identifying new chemical structures target with a very deep, large, and/or highly charged
against biological targets. Apart from being a conven- binding pocket is considered unsuitable for SBDD (Fau-
tional diverse database of chemical structures, consider- man et al., 2011). Generally, a structure with high resolu-
able attention is given on annotating chemical libraries tion (1.5 Å) and a large ligand binding in its active site is
with a view to provide information on the correlation preferred (Rueda, Bottegoni, & Abagyan, 2010).
among the chemical compound and its biological func-
tion. Several public and commercial repositories of 3.1 Target Structure and Validation
chemical compounds essential for CADD are high- The most extensively used resources of 3D structure
lighted. The information of the drug-like compounds determined either by X-ray crystallographic method or
and their physiochemical properties can be retrieved nuclear magnetic resonance (NMR) is the Protein
from various databases that are available freely, e.g., Data Bank (PDB) database available at http://www.
PubChem, ZINC, ChEMBL, DrugBank, etc. (Table 1.2). rcsb.org/pdb. The current version contains 162,529
Many resources are also available commercially such as structures, which is largely determined by X-ray crystal-
Jubilant BioSys, GVK Bio, and Aureus Pharma. These are lography (88.9%); the fraction of NMR spectroscopy
large databases of target-centric compounds, focusing and electron microscopy (EM) determined structures
mainly on kinases, G proteinecoupled receptors, is very low (https://www.rcsb.org/stats/summary)
nuclear hormone receptors, or ion channels. The major (Berman et al., 2002). In cases where the protein
source of chemical data in these databases comes structure is not determined experimentally, again
from patents. computational approaches can be applied to model
the protein structure by homology modeling. The
homologous structure is modeled with the help of
3 STRUCTURE-BASED APPROACHES AND sequence similarity to the experimentally determined
SCREENING structure of a similar protein. One of the most
SBDD method utilizes the knowledge of 3D structure of frequently used software for homology modeling which
the receptor or target for VS and lead optimization. Thus, is freely available is MODELLER (Andrej  Sali, 1993).
for receptors/targets having their crystal structure or There are several other homology modeling tools/
modeled structure available, this method can be applied. servers available freely for, e.g., Swiss Model, Phyre2,
Types of SBDD methods include molecular docking, LOMETS, CPHmodels 3.2, I-TASSER, etc.
structure-based 3D pharmacophore modeling, and de Among the available solved structures in PDB,
novo drug design methods. It is imperative to check X-rayebased crystal structures are still dominating
whether the selected target is “druggable,” i.e., its biolog- over the other experimental approaches such as NMR
ical behavior can be altered by binding small molecule. A and cryo-EM (Cooper et al., 2011). In the drug design
4 PART I Foundations and Basic Techniques of Docking

TABLE 1.2
General Resources for Retrieving Chemical Compounds for Docking and Virtual Screening.
Database Description License type
ChemSpider It is a free database of chemical structures Free
http://www.chemspider.com that provides fast text and structure-based
searches across 81 million chemical
compounds gathered from 278 data sources.
eMolecules Plus It contains more than 8 million chemical Commercial
https://www.emolecules.com compounds obtained from the network of
global chemical suppliers. The chemicals can
be ordered from the website as suppliers are
directly connected.
ACD (BIOVIA Available Chemicals It is one of the largest structure-searchable Commercial
Directory) collections of commercially available
https://www.3ds.com/products- chemicals in the world, having 10 million
services/biovia/products/scientific- unique chemical structures.
informatics/biovia-databases/
iResearch Library It consists of over 160 million commercially Commercial
https://www.chemnavigator.com/cnc/ available chemical structures.
products/iRL.asp
PubChem It is a huge collection of chemical Free
https://pubchem.ncbi.nlm.nih.gov/ compounds that mostly includes small
molecules but macromolecules are also
included. PubChem Substance (253 million),
PubChem Compound (103 million), and
PubChem Bioactives (268 million) are the
three components of the dynamically
expanding PubChem database.
ZINC It is a large database of 230 million Free
https://zinc.docking.org/ purchasable compounds along with their
physicochemical properties. The molecules
are available in 3D formats that are ready to
dock.
ChEMBL This database includes bioactive molecules Free
https://www.ebi.ac.uk/chembl/ that have properties of drug-like compounds
as well as the data of their chemical,
bioactivity, and genomic properties are also
included. It consists of around 2 million
compounds, 13377 drug targets, and
15996368 activities.
BindingDB It is a publicly available database of binding Free
www.bindingdb.org/bind/index.jsp affinities of small drug-like molecules with
their corresponding candidate drug targets. It
includes 1,854,767 binding data, for 7493
protein targets and 820,433 small molecules.
PDBeChem A database of ligands, small molecules, and Free
https://www.ebi.ac.uk/pdbe-srv/ monomers referred in Protein Data Bank
pdbechem/ (PDB) entries. It is consisting of 30899
ligands data.
SuperNatural II This database consists of naturally occurring Free
http://bioinf-applied.charite.de/ products. It consists of 325,508 natural
supernatural_new/index.php compounds.
NPACT This is a database of 1574 phytochemicals Free
http://crdd.osdd.net/raghava/npact/ with anticancerous activity.
CHAPTER 1 Modern Tools and Techniques in Computer-Aided Drug Design 5

TABLE 1.2
General Resources for Retrieving Chemical Compounds for Docking and Virtual Screening.dcont’d
Database Description License type
DrugBank The latest version 5.1.5 contains 13548 Free
https://www.drugbank.ca/ chemical compounds including 2628
FDA-approved molecules, 1372 approved
biologics, 131 nutraceuticals, and over 6363
experimental drugs.
SuperDRUG2 This is a database of marketed drugs that Free
http://cheminfo.charite.de/superdrug2/ consists of 4600 active pharmaceutical
index.html ingredients.
GDB-17 This database consists of 166.4 billion Free
http://gdb.unibe.ch molecules, which are up to 17 atoms of
C, N, O, S, and halogens.
KEGG Drug Database It is a compressive database of drugs Free
https://www.genome.jp/kegg/drug/ approved in Japan, the United States, and
Europe. It consists of 11,274 drug entries.
SPECS It contains more than 3,50,000 compounds Commercial
https://www.specs.net/ suitable for synthesis.
Maybridge It consists of over 53,000 hit-like and Free
https://www.maybridge.com lead-like organic compounds.

pipeline, crystallography has gained more importance ED maps are now provided for all deposited
as this technique is at the heart of SBDD and structures and can be used by both experts and novice
fragment-based drug design approaches (Cooper to assess more about the quality and characteristic of
et al., 2011). As per the study published by Westbrook the protein under consideration. Understanding of the
et al., 210 new molecular entries (NMEs) are approved user from the ED maps also ruled out the possible
by the FDA between 2010 and 2016, and for these biases incorporated by the used modeling procedure,
NMEs, around 94% of molecular targets are available crystallographer expertise, and familiarity. Though ED
in the PDB database (Westbrook & Burley, 2019). maps have given the flexibility to the user to analyze
Very recently, the wwPDB OneDep system has been the experimental structure carefully, however, the cor-
set up as a single channel for deposition, validation, rect representation of the small ligand molecules at
and biocuration of all incoming structures (Young the binding site is still a matter of concern. Interpre-
et al., 2017). OneDep will ensure consistency in tation of the ligand position binding partly or full,
the process at the data deposition as well as internal with or without water from the available ED maps, is
biocuration level. a laborious task (Smart et al., 2018). Low-resolution
As the starting structure influences the outcomes in structures especially below 3 Å tend to be trickier where
drug designing process, several quality checks are now water-based interactions play a crucial role between
introduced apart from the structural resolution and ligand and protein. To emphasize the critical challenges
R-factor to assess the quality of the experimental associated with proteineligand complex crystallog-
structure (Table 1.3). To maintain the data accuracy of raphy, Smart et al. (2018) have analyzed the PDB
the PDB structure, several measures have been taken ligand and assess the validation report in detail and
such as no theoretical structure is now considered examined the geometric and ED fit for the same
from 2006 onwards, structure factor amplitudes/ (Smart et al., 2018).
intensities for crystal structures are required with each
structural deposition, and each submitted structure 3.2 Molecular Docking and Virtual
should be published in the journal (Kirchmair et al., Screening
2008). At the structure level, the validation matrix One of the most extensively used computational tools
is provided to show the accuracy at the structural, in CADD is molecular docking, which is used for deter-
geometric, and electron density (ED) level (Fig. 1.3). mining the complex structure produced by two or more
6 PART I Foundations and Basic Techniques of Docking

TABLE 1.3
Tools/Web Server Generally Used for the Protein Structure Validation and Quality Assessment.
Program Description Stand-alone/Web server
PQS Analyze the quaternary protein Web server
structures deposited in the Protein
Data Bank
WHAT IF Tool for protein structure quality Both
checks
Prosa-web Assess the quality score with respect Web server
to known protein structures.
PROCHECK Tool to check the stereochemical Stand-alone
quality of the protein structure
PROCHECKdnuclear magnetic Tool to check the stereochemical Stand-alone
resonance (NMR) quality of the NMR protein structure
MolProbity Validate the protein structure at Web server
different levels
NQ Flipper Erroneous Asn and Gln rotamer Web server
detection
PSVS Protein structure assessment suite Web server

FIG. 1.3 Summary quality metrics available in the wwPDB validation reports. PDB-ID 6GUK (A & C), and
6Q3C (B & D) Residues showing the deviation from the experimantal Electron Density Map are shown in red
colour (C & D).

interacting molecules. The docking process involves ligand poses are generated through molecular docking
predicting the 3D conformation of the hit or ligand in- which are then ranked on the basis of scoring function
side the binding cavity of the target. Several possible (SF). The process of simulating the ligand and the
CHAPTER 1 Modern Tools and Techniques in Computer-Aided Drug Design 7

receptor to form a stable complex can be considered as a available. The docking program generates the poses by
“lock-and-key model,” where the position of key treating the ligand molecule as flexible, and the
(ligand) is optimized to accommodate into the lock conformational search algorithm is used for sampling
(target binding pocket). The three vital components of the ligand’s torsional degrees of freedom and keeping
molecular docking include the “receptor,” the “ligand,” the target rigid. The accuracy of docking relies on the
and the docking program. The prediction of binding conformational sampling coverage as well as the SF.
interaction among the protein target and the ligand, Structure-based virtual screening (SBVS) can be done
the orientation of the ligand in the target’s binding to identify the potential activities available in a large
pocket, and the scoring of the interaction are achieved chemical compound database by carrying out docking
by docking programs. The conformational search (Clark, 2008; Schneider, 2010).
algorithm explores the poses inside a particular confor-
mational space, while the role of SF is to score each pose 3.2.1 Sampling algorithm
that shows its relative binding affinity (Meng et al., The mode of ligand and target binding is possible in
2012). Considerably, the docking program will generate several ways as they have six degrees of translational
a group of poses for each ligand such that every pose has and rotational freedom in addition to the freedom of
its own docking score. Generally, the pose that is ranked conformational degrees. Generating all the possible
at the top is considered the best pose of docking; conformations computationally would be highly
however, the selection of the final pose should not expensive. Thus, several sampling algorithms were
only depend upon the docking score but also on the proposed and extensively applied in molecular docking
chemical knowledge and experimental data, if tools (Table 1.4). The ligand is mapped into the active

TABLE 1.4
Docking Programs Used in Computer-Aided Drug Design (CADD) and Their Features.
Docking
Program Characteristic Sampling Algorithm Scoring Function License References
AutoDock It is an automated tool Genetic algorithms, Force field based Open Forli et al.
for docking consisting of Monte Carlo source (2016)
an autogrid, which is
used to compute grid,
and an autodock, which
is used for docking
ligands on the grid
created by autogrid.
DOCK The latest release is built Incremental Force field based Academic Ewing et al.
with an improved construction, Energy (2001)
algorithm to predict minimization
binding poses by adding
new features like force
field scoring enhanced
by solvation and
receptor flexibility.
FRED An exhaustive search Exhaustive search Knowledge based Academic McGann
(ES) algorithm is used to (2011)
identify the ligand’s best
binding pose in the
receptor binding site.
FlexX It is a tool provided by Incremental Empirical Commercial Kramer et al.
BioSolveIT for flexible construction (1999)
ligand docking. It is fully
automated and docking
is performed with an
incremental construction
algorithm.
Continued
8 PART I Foundations and Basic Techniques of Docking

TABLE 1.4
Docking Programs Used in Computer-Aided Drug Design (CADD) and Their Features.dcont’d
Docking
Program Characteristic Sampling Algorithm Scoring Function License References
Glide Glide is a molecular Exhaustive search, Empirical Commercial Friesner et al.
docking suite of software energy minimization, (2004)
provided by Monte Carlo
Schrödinger. It offers
several modes for virtual
screening such as
high-throughput virtual
screening, standard
precision, and extra
precision.
GOLD It applies a genetic Genetic algorithms Empirical, Commercial Verdonk
algorithm for predicting knowledge based et al. (2003)
poses of the ligand. It
can be configured.
ICM This is an easy-to-use Monte Carlo Empirical Commercial Neves et al.
software provided by (2012)
Molsoft, LLC. The
software can be used for
chemical clustering,
chemical similarity
searching, molecular
modeling, virtual
screening of ligands,
fully flexible docking, etc.
Surflex- In Surflex-Dock, the Incremental construction Empirical Commercial Jain (2007)
Dock active site ligand is used
to produce putative
poses, and a
combination of similarity
searches methods is
applied to predict the
probable pose of ligand
in the binding site.

site of the target with the help of matching algorithms, simultaneous search (Eisen et al., 1994) and LUDI
on the basis of its shape features and chemical proper- (Böhm, 1992a). Programs that implement fragment-
ties. The benefit of matching algorithms is its speed; based methods comprise DOCK 4.0 (Ewing et al.,
therefore, active compounds enrichment from vast 2001), FlexX (Rarey et al., 1996), and Surflex
libraries can be done using this method (Moitessier (Jain, 2003).
et al., 2008). This algorithm was used in the older Exhaustive search (ES) is a type of systematic search
versions of DOCK (Kuntz et al., 1982). Incremental algorithm, which is used for flexible ligand docking. To
construction algorithm utilizes fragmental and incre- perform ES, the ligand’s rotatable bonds are systemati-
mental method to place the ligand in the active site. cally rotated at a certain interval, which results in a
The ligand is fragmented along the rotatable bonds, huge number of ligand conformations. Thus, for initial
and then at first the largest fragment is docked inside screening, geometric/chemical constraints are applied
the binding pocket leading to the addition of rest of after which more accurate refinement procedures are
the fragments incrementally (Rarey et al., 1996). Other used. FRED (McGann et al., 2003) and Glide (Friesner
fragment-based algorithms include multiple copy et al., 2004) are examples of programs that use ES
CHAPTER 1 Modern Tools and Techniques in Computer-Aided Drug Design 9

algorithm. Monte Carlo (MC) and genetic algorithms in the ligand-protein complex. To improve the accuracy
(GA) belong to the class of stochastic methods. In this of docking prediction, two or more SFs are applied in
class, the conformational space is searched by randomly some programs, which is referred to as “Consensus
changing the conformation of the ligand. Both of these Scoring” (Huang et al., 2010). The docking programs
algorithms produce a series of random modifications to applying different SFs are cited in Table 1.4.
a ligand or an ensemble of ligands, which is further Recently, ML-based SFs trained on the complex
evaluated on the basis of probability or fitness function. structures of protein and ligand have gained much
Due to the randomness of conformational sampling, attention. This model does not work on predetermined
docking is run several times to confirm that the conver- functional forms but is rather developed by supervised
gence is reached. The programs that apply the MC learning algorithms (Li et al., 2020). By using the SFs
methods include an earlier version of AutoDock (Good- based on ML, the intermolecular binding interactions
sell & Olson, 1990), ICM (Abagyan et al., 1994), and can be captured implicitly that are difficult to model
Glide (Friesner et al., 2004). GA have been applied in explicitly. ML-based applications have speedup the
programs such as AutoDock (Morris et al., 1998) and inhibitor designing process with desired pharmacody-
GOLD (Verdonk et al., 2003). Molecular dynamics namics and pharmacokinetic properties compared
(MD) (Cornell et al., 1995; Weiner et al., 1984) and with the rational in silico approaches (Mak & Pichika,
energy minimization Mare powerful simulation 2019). Due to the enormous possibility from the
methods used extensively in MD. These methods are available chemical, genomic, and structural data, its
computationally expensive; thus, these methods are applications are now ranging from the VS-based inhib-
applied for refining or rescoring ligand poses produced itor identification, target protein prediction (Kaushik
by other methods. The simulation method is used by et al., 2020; Zheng et al., 2020), improved consensus
the programs DOCK (Kuntz et al., 1982) and Glide docking score development (Ericksen et al., 2017),
(Friesner et al., 2004). protein structure prediction (Torrisi et al., 2020),
proteineprotein interaction prediction (Du et al.,
3.2.2 Scoring function 2017), de novo molecule design (Kadurin et al., 2017;
The SF is applied to evaluate the docking poses gener- Olivecrona et al., 2017), and many more.
ated by docking programs to quantitatively measure ML-based SF used for the prediction of binding affinity
the quality of the fit (Rajamani & Good, 2007). Along performed better than several classical SFs (Ain et al.,
with the evaluation of ligand poses, the SF also 2015; Ballester et al., 2014; Khamis et al., 2015). In a
evaluates the ligand binding energy and ranks them very recent study, Su et al. in 2020 have related the
accordingly to select the best binding ligand. The two performance of six different ML-based SF models to
main components of any SF are its speed and accuracy. nullify the assumption of overlapping training and test
There are three classical categories of SF, i.e., force field set. The study reports that the performance of the ML
(FF)-, empirical-, and knowledge-based SFs. The SF models is mostly dependent on the size of the training
based on FF is calculated on physical atomic inter- set used as well as on the content of the training set
actions like van der Waals (VDW) and electrostatic (Su et al., 2020). However, the docking software does
interactions as well as on bond lengths, bond angles, not implement ML-based SFs directly, rather these are
and dihedrals (Aqvist et al., 2002; Kollman, 1993). generally used for rescoring as these SFs are dependent
The disadvantage with the FF-based SF is its computa- on training data sets (Zhang, Ai, et al., 2017). The
tional speed, which is very slow. Extensions of ML-based SFs help in improving the precision of docking
FF-based SFs include the hydrogen bonds, solvations, done by classical methods by rescoring.
and entropy contributions. Further refinement of the
result of FF-based docking can be done by applying 3.2.2.1 Support vector machine. The application of
techniques like linear interaction energy and free energy support vector machine (SVM) in SBVS is often done to
perturbation (FEP) methods. Empirical SFs are applied separate active and inactive ligand poses, and regression
to measure the binding free energy (FE) by utilizing model of SVM is applied to predict the binding affinities
various aspects of a proteineligand complex, for (Zhang, Ai et al., 2017). A study was done where SVM
example, hydrogen bond, VDW energy, ionic interac- was combined with the empirical function on the basis
tion, hydrophobic effect, binding entropy, etc. (Guedes of energy terms; as a result, there was an increase in the
et al., 2018). Knowledge-based SFs use the experimen- accuracy of prediction in VS, as well as a correlation
tally determined structures to get the information of among SVM-based and experimental binding affinities
frequencies as well as distance of interatomic contacts was reported (Brylinski, 2013; Kinnings et al., 2011).
10 PART I Foundations and Basic Techniques of Docking

Analysis of the HIV protease by ML-based SF SVM-SP Stepniewska-Dziubinska et al., 2018). It has been
performed better than Glide, ChemScore, GoldScore, revealed that convolutional neural network models
and X-Score (Li et al., 2011). In another study on 40 perform better when compared with classical
DUD2 targets, MIEC-SVM proved to be better than ML models (Bengio et al., 2013), but it is more time-
Glide and X-Score (Ding et al., 2013). consuming due to the increase in the network
complexity of model.
3.2.2.2 Random forest. In this classification algo- Given a set of training data consisting of an active
rithm, learning is based on multiple decision trees, and inactive compounds, the data can be trained by
which is used for classification, regression, etc. The applying ML-based SFs such as RF-Score (Ballester &
randomness of features is used while building each Mitchell, 2010), NNScore (Durrant & McCammon,
tree to produce uncorrelated forest with multiple trees, 2011) and SFCscore (Sotriffer et al., 2008; Zilian &
the prediction accuracy of the ensemble of trees is much Sotriffer, 2013) to find out the known ligands by
more than any of the individual trees. Random forests potency with high accuracy (Wójcikowski et al.,
(RFs) have been shown to increase the accuracy of 2017). As mentioned earlier, the SF’s accuracy can be
conventional SF by replacing multiple linear regression further improved by applying a hybrid SF that is an
(Afifi & Al-Sadek, 2018; Wang & Zhang, 2017). In a integration of different SFs. However, the hybrid SFs
recent study, RF-based score was developed and are more efficient but more time taking.
compared with five classical SFs. ML-based SF has
achieved a very high hit rate at 1% level (55.6%) 3.3 De Novo Drug Design
compared to Vina, which only showed the 16.2% hit De novo drug design approach is another most prom-
rate. Compared to Vina-based predicted activity ising SBDD method which allows the generation of
correlation (Pearson correlation 0.18), RF score has the chemical compounds from scratch in the receptor
gained Pearson correlation of 0.56 (Wójcikowski binding site with desirable drug-like properties (Mauser
et al., 2017). & Guba, 2008; Schneider & Fechner, 2005). Though this
approach of novel molecular design is nearly two de-
3.2.2.3 Artificial neural network. Recently, artificial cades old, its contribution in the drug discovery projects
neural network (ANN) has been used extensively in is recently increasing due to its sound applicability and
CADD. It is a computational model inspired by bio- availability of the de novo designing computational
logical neural networks. ANN is generally used for program (Schneider & Fechner, 2005). This approach
QSAR modeling (Cang et al., 2018), but often it is of the drug design process attempts to explore the
also used to predict binding affinities. An ANN-based virtually infinite chemical search space and only
SF “NNScore 2.0” predicts binding affinity, as the captures the building blocks, which is necessary for
latest version considers more of binding properties filling the available interaction space in the substrate
(Durrant & McCammon, 2011). Moreover, NNScore binding site (Schneider et al., 2009). So, in the de
rescoring function can be applied to increase the novo approach, virtual compound generation protocol
performance of scoring (Durrant et al., 2013). The attempts to imitate the medicinal/synthetic chemist
prediction accuracy of the classical ANN-based SF can way of designing the virtual compound, while applied
be greatly increased by incorporating techniques such SF preform as a virtual assay (Lameijer et al., 2007).
as boosting or bagging (Ashtawy & Mahapatra, 2018). To facilitate the de novo drug designing process,
Despite the high precision in the prediction of many different tools are published to adapt the multi-
binding affinity, the ANN-based SFs are incapable of objective optimization process (Devi et al., 2015; Nico-
working fine with high dimension data, limiting their laou et al., 2012) and so this approach comes up with
application in commercial docking tools. many solutions depending upon the initial parameters
chosen. Ludi (Böhm, 1992b), LEGEND (Honma et al.,
3.2.2.4 Deep learning. The DL-based SF can extract 2001), LigBuilder (Wang et al., 2000), BIBuilder
features from unsupervised data, which is (Teodoro & Muegge, 2011), and LiGen (Beccari et al.,
unstructured or unlabeled along with model fitting. 2013) are some programs which are developed to assist
The most common model of DL-based SF is the de novo drug designing process. As this approach
convolutional neural network (Ragoza et al., 2017; uses all possible combinations to link the available
Wallach et al., 2015), which can be applied for blocks in the respective protein substrate binding site,
classification of drug binding and prediction of different sets of rules are formulated to reduce the
binding affinity (Gomes, Ramsundar, et al., 2017; generated chemical space to a very feasible number of
CHAPTER 1 Modern Tools and Techniques in Computer-Aided Drug Design 11

compounds. Following rules can be implemented to is computed from. Several tools are available for
select the de novo chemical hit compound. computing molecular descriptors which will be
(1) Compound should be synthetically accessible discussed later in the chapter. Molecular fingerprint
(2) Compound should follow the drug-like/lead-like and similarity searches, pharmacophore modeling,
properties and QSAR are the popular approaches of LBDD
(3) Generated compounds should be diverse in (Acharya et al., 2010).
scaffold
4.1 Molecular Fingerprint and Similarity
Searches
4 LIGAND-BASED APPROACHES AND In this technique, compound libraries are screened
SCREENING based on the molecular fingerprint taken from the
Contrary to SBDD, LBDD does not require the target 3D known ligands of a particular target to search
structure information, rather the minimum informa- compounds with similar fingerprint (Vogt & Bajorath,
tion critical for LBDD method is the knowledge about 2011). The theory behind this approach is that the
at least one active compound, which is then utilized molecules having chemical or physicochemical
for ligand-based virtual screening (LBVS) to pull out similarity ought to possess similarity in binding proper-
similar compounds from databases. This method ties (Gomes, Muratov, et al., 2017; Yu & Mackerell,
collects information from the set of reference 2017). This approach does not consider the biological
compounds that are reported in different studies to activity of the known ligands. Similarity searches are
interact with the target of interest or possess the desired simple but effective and computationally less expensive
activity. The compounds are represented such that the than pharmacophore modeling and QSAR. In VS,
physiochemical properties relevant to the preferred similarity search method is advantageous when only
interaction are retained, while other irrelevant info- few distinct ligands are known to inhibit a particular
rmation is excluded. LBDD method for drug discovery target and other methods as pharmacophore screening
is based on “similar property principle” according or structure-based design cannot be applied. The most
to which compounds having structural similarity widely used tool for similarity searching is molecular
(structure, pharmacophoric features, molecular fields, fingerprint, in which the molecular structure and prop-
etc.) will have similar properties. The fundamental erties are represented as bit strings. The bit string helps
approaches for LBDD to identify known actives are in the identification of presence or absence of molecular
either based on chemical similarity or building a model features (Xue et al., 2003), which is represented in a
to predict biological activity from chemical structures. quantifiable manner. Every bit in the bit string denotes
LBDD techniques include ligand-based pharmaco- one molecular substructure/fragment or feature. The bit
phore, fingerprint-based similarity methods, and is fixed to 1 if the fragment is present and 0 if the
QSAR. The techniques used in LBVS such as substruc- fragment is absent (Fig. 1.4). The fingerprint-based
ture mining and fingerprint searches are faster in methods include substructure keyebased fingerprints,
comparison to SBVS methods like molecular docking. topological or hashed fingerprints, and circular finger-
The LBVS technique has helped in finding several prints (Cereto-Massagué et al., 2015). The basic
promising compounds on the basis of properties such difference in these approaches is in the method of
as physiochemical or thermodynamic properties (Forli, translating structural information into the bit string.
2015). However, the SBVS approach of VS is considered Each bit represents a certain descriptor or value in
better than LBVS when the target’s 3D structure is substructure keyebased fingerprints (Fig. 1.4a) (James
available (Lyne, 2002). In some cases, where both the et al., 2011). In topological fingerprints, analysis of all
target and ligand are known, a hybrid method is used the fragments of a molecule is done. Generally, a path
that combines both SBVS and LBVS for achieving better is created up to a predefined number of bonds and
results. next all the paths are hashed to build fingerprints. It is
LBVS methods represent compounds with a set of likely that the same bit is set by multiple fragments in
features/descriptors; these descriptors could be either this method (Fig. 1.4b). The circular fingerprints are
structural or physiochemical and generated with tools also hashed, but here in place of considering paths in
based on mechanisms like knowledge-based, molecular the molecule, each atoms environment is documented
mechanics, or quantum mechanics. The molecular up to a defined radius. This method is widely applied
descriptors are classified as 1D, 2D, 3D, 4D, etc., in VS on the basis of full structure similarity
according to the chemical structure’s dimensionality it (Fig. 1.4c) (Cereto-Massagué et al., 2015).
A C
OH O

HO
OH

OH NH2
O
HO
HO
OH 0 2 4
OH O
NH2 C
HO OH
NH2 NH2

0 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0

B
OH O

HO
OH

NH2
HO

3
OH O
2
0 1
OH
NH2 NH2 NH2 NH2 NH2 NH2 NH2 NH2 NH2

0 1 0 0 1 1 0 0 1 0 0 1 0 1 1 1 0 0 1 0
FIG. 1.4 (A) An illustration of a substructure keyebased fingerprint; molecular substructures represented by
bits that are present in the molecule (encircled) are set to 1 and those absent are set to 0. (B) Representation of
a topological fingerprint. All atoms starting from the amino group of the molecule are shown; the fragment
length and subsequent bit in the fingerprint are denoted. Different linear pathway fragments are generated
based on the preset number of bonds that are translated into bit strings. (C) Representation of a circular
fingerprint in which fragment generation starts from a central atom and considers the fragments within a preset
radius (e.g., two or four bonds); these fragments are then transformed into bit strings.

Apart from the substructure fingerprint, properties of can be accessed in different ways; several similarities
molecules can also be defined as fingerprint; these and distance-based metrics used with fingerprints are
property-based fingerprints include functional class mentioned in Table 1.4. Generally, euclidean distance
fingerprints, pharmacophore fingerprints, reaction is used for this purpose, but as per the industry
fingerprints, etc. The pharmacophore models can also standards for molecular fingerprint, Tanimoto coeffi-
be used as a type of molecular fingerprint. The cient is usually used (Bajusz et al., 2015), which can
fragments of the molecule can be transformed into be evaluated by the formula given in Table 1.5.
pharmacophoric features; the existence or nonexistence Tanimoto coefficient lies between the range of
of these features aids in fingerprint creation. However, 0 and 1; however, sometimes it is also represented in
3D pharmacophore models are frequently applied to percent. A value 0.85 of the Tanimoto coefficient rep-
detect chemical functionalities necessary for biological resents two compounds that are reasonably similar
activity as well as for searching large databases of 3D (Martin et al., 2002).
compounds (Cereto-Massagué et al., 2015). It has been observed that the longer bit strings
The bit string once created using any of the indi- perform better in similarity searching as they have a
vidual approaches described that the similarity within greater amount of stored information (Sastry et al.,
two molecules is quantified. The molecular similarity 2010). Fingerprint similarity search has been
CHAPTER 1 Modern Tools and Techniques in Computer-Aided Drug Design 13

molecules, and this response modulates the biological


TABLE 1.5
outcomes. How compounds interact with respective
List of Similarity Coefficients and Distances Used
protein receptors depends upon the combination of
for Fingerprint Search.
interaction patterns available between protein and
Similarity/Distance ligand molecules. Chemical interactions or chemical
coefficient Expression Range features such as hydrophobic, hydrogen bond acceptor,
Tanimoto/Jaccard Nc/Na þ Nb  Nc 0e1 hydrogen bond donor, and ring are the major driving
coefficient forces in defining the proteineligand interactions. In
Dice coefficient 2Nc/Na þ Nb 0e1 the computational drug discovery pipeline, encoding
the chemical features in high degree of abstraction is
Cosine similarity Nc/O(NaNb) 0e1
known as 3D pharmacophore features. The term “3D
Euclidean distance O(Na þ Nb  2Nc) 0eN pharmacophore” came into the picture at the starting
Hamming distance Na þ Nb  2Nc 0eN of the 19th century; however, the concept gradually
RusselleRAO Nc/m 0e1 progressed through many stages, and around the late
coefficient 80s and early 90s, VS experiments were performed
with the help of computational programs (Table 1.7).
Forbes coefficient Ncm/NaNb 0e1
With time, the pharmacophore concept has evolved
Soergel distance Na þ Nb2Nc/ 0e1 from ligand-based approach and receptor-ligand based
Na þ Nb  Nc
approach to ab initio receptor-based approach (Kumar
Note: For the fingerprint of two compounds a and b, Na represents the
et al., 2017; Yang, 2010). With the help of this
total number of bits set to 1 in compound a, Nb is the total number of approach, many successful applications of lead optimi-
bits set to 1 in compound b, Nc is the number of bits set to 1 in both a zation and finding of active molecules have been
and b, and m represents the total number of bits present in the achieved (Neves et al., 2009; Schuster et al., 2008).
fingerprint.
Apart from the drug discoveryebased application,
implemented in various chemical databases for pharmacophore features are also now in use to design
searching similar compounds within a range of defined focused chemical library and for scaffold hopping
Tanimoto coefficient, for example, PubChem (Wang, (Shin & Seong, 2013). Apart from the ligand-based
Bryant, et al., 2017), ChEMBL (Bento et al., 2014), pharmacophore modeling, proteineligand complexe
ZINC (Irwin & Shoichet, 2005), ChemSpider (Pence & based pharmacophore features are also found to be
Williams, 2010; Royal Society of Chemistry, 2015), very valuable in finding the novel inhibitors (Salam
etc. The fingerprint method can be used to study the da- et al., 2009; Yang et al., 2009). Apart from the ligand
tabases for compound diversity by grouping similar and proteineligand interactionebased pharmacophore
compounds. The software and web servers used for approach, many other pharmacophores perceiving
fingerprint-based VS are listed in Table 1.6. approaches are reported in the literature, and some
The latest approach in fingerprint-based similarity are detailed below.
searching is to use a combination of different VS
methods (either fingerprint-based or other VS 4.2.1 Water pharmacophore approach
methods), specifically combining molecular fingerprint Water molecules occupied at the unliganded protein
similarity method with SBVS (Ahmed et al., 2014; Broc- binding site are mostly engaged with directional forces
catelli & Brown, 2014; Willett, 2013). As a result of or with hydrophobic forces, and over 85% of the
applying a combination of approaches, the compounds proteineligand complexes have been identified to
performing best will be those that are ranked highest by have one or more bridging water interacting with both
different methods, leading to an increase in the perfor- protein and ligand (Lu et al., 2007). Most of the time,
mance of the VS. Fingerprint-based methods are very water-mediated interactions are found to affect the ther-
extensively used for activity predictions because of their modynamic signature of the binding affinity of the
speed, particularly in the area of target fishing, where ligand (Duan et al., 2017; Spyrakis et al., 2017).
the query compound is compared with millions of Incoming ligand displaces the ordered water molecules
compounds having known activities. from the receptor binding site and consequently
disturbs the hydrogen bond network between water
4.2 Pharmacophore Modeling and protein. This displacement of the water to the
Most of the biological structures such as proteins or bulk solvent affects the entropy-driven thermodynamic
DNA respond to the binding of small chemical properties of the system (Dunitz, 1994). It thereby
14 PART I Foundations and Basic Techniques of Docking

TABLE 1.6
Software and Web Resources for Fingerprint-Based Virtual Screening.
Software/Web server License Type Web Address
Instant JChem Free https://chemaxon.com/products/
instant-jchem
Open Babel Free http://openbabel.org
RDKit Free http://www.rdkit.org
Chemistry Development Kit Free http://sourceforge.net/projects/cdk/
Indigo Toolkit Free http://ggasoftware.com/opensource/
indigo
ChemFP Free http://chemfp.com
DecoyFinder Free http://urvnutrigenomica-ctns.github.
io/DecoyFinder/
FLAP Free http://www.moldiscovery.com/soft_
flap.php
jCompoundMapper Free http://jcompoundmapper.
sourceforge.net/
MayaChemTools Free http://www.mayachemtools.org/
OEChem TK Commercial https://www.eyesopen.com/oechem-
tk
Canvas from Schrödinger Commercial http://www.schrodinger.com/
Canvas/
Molecular Operating Environment Commercial https://www.chemcomp.com/
(MOE) Products.htm
SYBYL-X Commercial http://www.tripos.com/
Pipeline Pilot Commercial http://accelrys.com/
PubChem Free http://pubchem.ncbi.nlm.nih.gov/
AURAmol Free https://www.cs.york.ac.uk/auramol/
ChemSpider Free http://www.chemspider.com/
ZINCPharmer Free http://zincpharmer.csb.pitt.edu/
ChemDes Free http://www.scbdd.com/chemdes
wwLigCSRre Free https://bioserv.rpbs.univ-paris-
diderot.fr/services/wwLigCSRre/
SwissSimilarity Free http://www.swisssimilarity.ch/

renders the water molecule as a crucial mediator to deci- In the last decade, several computational approaches
pher the proteineligand binding (Cappel et al., 2017; have been employed to estimate the thermodynamic
Wong & Lightstone, 2011). Studies suggest that not all components of the FE and to assess accurate binding
binding site water is displaceable, and so some unsuc- FE (Bucher et al., 2018). These methods have been
cessful attempts during lead optimization are also re- used to pinpoint the potential hotspots for novel inhib-
ported (Clarke et al., 2001; Kadirvelraj et al., 2008). itor designing as well as existing inhibitor optimization.
Strongly bound conserved water molecules are found Apart from the thermodynamics-based approach, many
to be difficult to replace so weakly bound waters are crystal structures and simulation-based strategies were
considered as a good choice for lead optimization. also established to identify and characterize the binding
CHAPTER 1 Modern Tools and Techniques in Computer-Aided Drug Design 15

TABLE 1.7
List of 3D Pharmacophore Generation Tools/Web Interfaces Commonly Employed for Pharmacophore
Features Extraction From the Ligand, Apo protein Binding Site, or the ProteineLigand Complex.
Program Name Input Type Scoring Method References
Pharao Ligand Overlay Taminau et al. (2008)
Pharmagist Ligand Overlay Schneidman-Duhovny et al. (2008)
Pharmer Ligand, complex RMSD Koes & Camacho (2011)
Phase Ligand, apo structure, complex RMSD Dixon et al. (2006)
MOE Ligand, apo structure, complex RMSD Inc. (2015)
Catalyst Ligand, apo structure, complex Overlay BIOVIA Discovery Studio |
Pharmacophore and Ligand-Based
Design (n.d.)
LigandScout Ligand, apo structure, complex Overlay Wolber & Langer (2005)

TABLE 1.8
Advanced Pharmacophore Perceiving Approaches Based on the Apo Protein Ensemble Structures.
Category Approach Used References
Molecular dynamicsebased approach Hydration siteerestricted pharmacophore Hu & Lill (2012)
SLICS-pharmacophore with multiprobe Yu et al. (2015)
molecules
MixMDdMixed solvent Lexa & Carlson (2011)
simulationsebased hotspot mapping
Water pharmacophore Jung et al. (2018)
Molecular interaction field gridebased CliquePharmdA clique-based Ab initio Kaalia et al. (2016)
approach multiclass pharmacophore generation
program
WaterMap Cappel et al. (2017)

hotspots for novel inhibitor designing (Table 1.8). approaches have been demonstrated to identify the
Very recently, Jung et al. (2018) have demonstrated a pharmacophore models by utilizing many crystal struc-
water-based ab initio pharmacophore modeling from tures (Wieder et al., 2016), NMR structures (Hornak &
the receptor binding site only. In this approach, protein Simmerling, 2007), or MD (Carlson et al., 2000). These
conformations were sampled using the MD, and water approaches were successfully implemented against
at the binding pocket was explored for hydration site diverse proteins such as HIV integrase (Carlson et al.,
analysis and pharmacophore features annotation. This 2000), HIV protease (Hornak & Simmerling, 2007),
approach is successfully implemented on the seven and DHFR proteins (Lerner et al., 2007).
different proteins (Jung et al., 2018).
4.2.3 Ab initio specificity/selectivity
4.2.2 Dynamic pharmacophore approach pharmacophore approach
Proteins are the main apparatus for biological function, Interaction complementarity is the necessity that is
and flexibility is the key that determines its function mostly utilized by the pharmacophore features. Ligand-
(Teague, 2003). SBDD methods mainly neglect this and proteineligandebased pharmacophore modeling
factor (Lexa & Carlson, 2011). Many reports have are mostly limited by the availability of the appropriate
shown that incorporation of protein dynamics bioactive assays and diverse scaffold complexes,
improves the accuracy of the predicted hit molecules respectively. So, to circumvent this limitation, many
(Lexa & Carlson, 2011). Many ensemble-based new methods have emerged in the last decade utilizing
16 PART I Foundations and Basic Techniques of Docking

FIG. 1.5 Specificity and selectivity pharmacophore models for malarial protease (plasmepsin class of protein)
generated by the CliquePharm approach. The left side figure shows the five-point specificity pharmacophore
model carrying two hydroxyl probes (OH), two amides (N), and one carbonyl probe (O); while on the right side,
five-point selectivity pharmacophore model is shown having the same size and feature types; however, this
model is selective for malarial aspartic protease, not for human aspartic protease. So, the selective model will
exclude the features that are common in both malarial and human protease and only design the model from the
features that are only available to the malarial class of protease.

only receptor binding site information itself, and many The structural information is denoted as molecular
new methods have been developed to encompass the descriptors, and the biological activity in QSAR is
possible combinations of available interactions to estimated in terms of the function of molecular descrip-
design and improve the inhibitor/lead molecules tors (biological activity ¼ f (molecular descriptors)).
(Schaller et al., 2020). Molecular interaction field A large amount of training data set is required in this
(MIF) based methods dominated in this field, and method to extract descriptors or molecular features.
FLAP program (Baroni et al., 2007) from the molecular The model developed based on the biological activities
discovery was developed to elucidate the pharma- of the similar known ligands is used for predicting
cophore features from the apoprotein structure. new compounds. Some of the techniques used in model
Very recently, one more MIF-based method named generation include multiple linear regression, principal
CliquePharm is developed to design the specificity component regression, partial least squares (PLS) regres-
and selectivity ab initio pharmacophore models for sion, ML, neural networks, etc. The main difference in
the aspartic protease class of proteins (Kaalia et al., the pharmacophore model and QSAR approaches is
2016). In this approach, a clique-based method is that the pharmacophore model only considers the fea-
employed to identify the most frequent cliques across tures of the ligand while in QSAR ligand features as
the selected protease followed by rule-based selectivity well as the features correlated with the biological activity
pharmacophore modeling having features selective to among the ligand and the receptor is also considered.
malarial aspartic protease over human protease One of the advantages of QSAR method over pharmaco-
(Fig. 1.5). The study has reported an ensemble of phar- phore model is that it can recognize whether a particular
macophore models of different sizes and combination feature of a drug is influencing positively or negatively to
types which can aid in the fragment to lead molecule its activity (Leelananda & Lindert, 2016). The QSAR
generation (Kumar et al., 2017). method is generally classified either on the basis of
dimensionality of the descriptors or on the basis of bio-
4.3 Quantitative StructureeActivity logical activity, which include chemical measurements
Relationship and biological assays. The dimensionality of QSAR
The QSAR approach is based on the fact that the descriptors ranges from 0D to 6D, but generally 2D
biological activity of a ligand depends on the arrange- QSAR and 3D QSAR methods are used for model
ment of atoms in its molecular structure. Putting it generation. A 2D QSAR method includes geometric
differently, making changes in the structure of the and topological properties, molecular fingerprints, and
molecule will result in the modification of its activity. polar surface area, but it excludes 3D orientation of
QSAR is a widely used technique in LBDD. the molecule. However, in 3D QSAR method,
CHAPTER 1 Modern Tools and Techniques in Computer-Aided Drug Design 17

conformation of the molecule and alignment are used descriptors, and SVM method. VS of these compounds
for the descriptor calculation (Dudek et al., 2006; resulted in shortlisting of 176 possible antimalarial
Todeschini & Consonni, 2010). A 3D QSAR utilizes compounds, while after validation 25 hits showed
different methods including principal component anal- antimalarial activities having very less cytotoxicity to
ysis, ANNs, PLS method, cluster analysis, etc., and 3D mammalian cells (Zhang et al., 2013). In another study,
GRID-based methods, such as comparative molecular different ML methods were combined with molecular
field analysis, comparative molecular similarity indices fingerprints to develop QSAR models which were
analysis, hypothetical active site lattice, etc., are used used to identify potential hits against Mycobacterium
for predicting the desired properties quantitatively tuberculosis (causative agent of tuberculosis). The model
(Arulsudar, Subramanian, & Murthy, 2005; Hemmatee- was used for screening all the chalcone compounds
nejad et al., 2004; Oprea, 2003). retrieved, and the results indicate that designed
In drug discovery, QSAR approaches aid in pulling heteroaryl chalcones identified can be promising lead
out hits from large molecular compound libraries. The candidates against tuberculosis (Gomes, Braga, et al.,
strength of QSAR methods can be further improved 2017). In another study, adamantane-based inhibitors
by incorporating varying chemical and biological of Ebola virus envelope glycoprotein were identified us-
data using ANNs. This technique was found to be very ing docking, pharmacophore and 3D-QSAR ap-
effective in the development of antistreptococcal drugs proaches. The predicted molecule performed better
(Speck-Planche et al., 2013). Several software packages than the standard drugs oseltamivir and zanamivir
are available for ANN analysis such as Matlab, (Mali & Chaudhari, 2019). In a recent study, QSAR
Mathematica, Stuttgart Neural Network Simulator, etc. approach combined with scaffold hopping was used
Molecular descriptors can be calculated using the tools to identify three potent inhibitors of FABP4 (adipocyte
like PaDEL-Descriptor, Dragon, etc. PaDEL-Descriptor fatty acid binding protein 4), which was later identified
is an open source software for the generation of as a molecular target for treating certain type of cancers,
descriptors and fingerprints developed using Chemistry type 2 diabetes, and other metabolic diseases (Floresta
Development Kit. It can generate 1875 descriptors et al., 2019).
including 1D, 2D, and 3D and 12 types of fingerprints
(Yap, 2011). Dragon is capable of generating more
than 4000 descriptors for a single molecule. It also 5 APPLICATIONS OF CADD IN DRUG
has a web-based version which is available freely but DISCOVERY
only for limited number of compounds and features 5.1 Virtual Screening
are also restricted (Tetko et al., 2005). It is an integral part of CADD to screen novel active
There are abundant of tools available for performing compounds from chemical libraries. VS is routinely
QSAR, apart from the available software packages, and used by scientists and pharmaceutical companies as
workflow automation tools such as Taverna, Pipeline one of the methods in the process of drug discovery
Pilot, Galaxy, KNIME, etc., are used to develop (Lavecchia & Giovanni, 2013). Presently, the methods
complete QSAR workflows. It is a convenient and used in VS have immensely improved in terms of
more proficient method to manage large chemical performance, utility, and user-friendliness, leading to
data sets, automate lengthy process, and assist in data the extensive use of VS in drug discovery. With the
analysis. The automated QSAR modeling workflow of advent of supercomputing and cloud computing, it is
KNIME integrates all the tools required to perform now possible to narrow down huge chemical space
various steps in QSAR analysis. The advantage of these within few hours. Both structure-based and ligand-
workflows is that the QSAR models can be easily built based approaches discussed above are used for VS to
by directly accessing the online or private chemical discover lead compound.
databases without having proficiency in ML or
programming. Some of the widely used tools used for 5.2 Lead Optimization
building and analysis of QSAR models are listed in The role of lead optimization is to preserve or improve the
Table 1.9. required characteristics of the main components of the
The success of QSAR has been reported in a great drug, at the same time minimizing its toxicity. The lead
number of researches. A QSAR model was built compounds identified after VS can be refined to
consisting of 3133 compounds that were either active increase selectivity and specificity for a given target. Prior
or inactive against the malaria-causing parasite to synthesis of the lead compounds, properties such as
Plasmodium falciparum. The models were developed binding affinity, selectivity, physiochemical and adsorp-
using descriptors 0D, 1D, 2D, ISIDA-2D fragments tion, distribution, metabolism, excretion (ADME),
18 PART I Foundations and Basic Techniques of Docking

TABLE 1.9
Tools Used for Quantitative StructureeActivity Relationship (QSAR) Modeling in Drug Design.
Tool Description License Type
AutoQSAR Fully automated creation and application Commercial
https://www.schrodinger.com/autoqsar of QSAR models
QSARpro Used for QSAR modeling and activity Commercial
https://www.vlifesciences.com/products/ prediction
QSARPro/Product_QSARpro.php
PharmQSAR Software package for automated QSAR Commercial
https://new.pharmacelera.com/ model development
pharmqsar/
eTOXlab Automated QSAR model development Free
http://phi.imim.es/envoy/ and validation
OCHEM A web-based platform for fully automated Free
https://www.eyesopen.com/molecular- QSAR modeling
modeling
DELPHOS It is used for development of QSAR Free
http://lidecc.cs.uns.edu.ar/index.php/sw/ models
delphos
AutoWeka Software used for data mining for QSAR Free
https://www.cs.ubc.ca/labs/beta/ and model development
Projects/autoweka/
3D-QSAR Used for the development of QSAR Free
https://www.3d-qsar.com/ models
AZOrange It is a QSAR modeling package based on Free
http://github.com/AZcompTox/AZOrange machine learning
GUSAR Web-based platform for QSAR modeling Free
http://www.way2drug.com/gusar/
Taverna A chemoinformatics workflow Free
http://cdk.sourceforge.net/cdk-taverna/
Pipeline Pilot Tool for workflow automation Commercial
https://www.3ds.com/products-services/
biovia/products/data-science/pipeline-
pilot/
Galaxy Tool for workflow automation Free
https://galaxyproject.org/
KNIME Tool for workflow automation Free
https://www.knime.org/

and toxicity (T) are optimized (Cheng et al., 2011). ML tools, such as ANN, hidden Markov models
The approaches used in VS such as QSAR and pharma- (HMM), SVM, decision tree learning, RF, Naive Bayes,
cophore modeling are significantly used in lead optimi- and belief networks, are also employed in lead optimiza-
zation (John et al., 2011; Pirhadi et al., 2013). Hopfinger tion (Byvatov et al., 2003; Olivecrona et al., 2017).
et al. applied 4D-QSAR modeling to develop a virtual Finally, the analysis of ADMET properties is carried out
screen for glycogen phosphorylase inhibitors (Hopfinger in the lead optimization phase (Macalino et al., 2015).
et al., 1999). Singh et al. combined 3D-QSAR with 3D Several computational tools as well as web servers are
pharmacophore searching for screening and optimizing available for the prediction of ADMET properties
specific integrin antagonists (Singh et al., 2002). (Table 1.10).
CHAPTER 1 Modern Tools and Techniques in Computer-Aided Drug Design 19

TABLE 1.10
Software and Web Resources for ADMET Prediction.
Software/Web server License Type Web Address
ADMET Predictor Commercial http://www.simulations-plus.com/software/admet-property-
prediction-qsar/
QikProp Commercial https://www.schrodinger.com/
ADMET and predictive Commercial https://www.3ds.com/products-services/biovia/products/
toxicology molecular-modeling-simulation/biovia-discovery-studio/
qsar-admet-and-predictive-toxicology/
PreADMET PC version 2.0 Commercial https://preadmet.bmdrc.kr/preadmet-pc-version-2-0/
PreADMET Free https://preadmet.bmdrc.kr/
SwissADME Free http://www.swissadme.ch/
ADMETlab Free http://admet.scbdd.com/calcpre/index/
admetSAR Free http://lmmd.ecust.edu.cn/admetsar1/
FAF-Drugs4 Free http://fafdrugs4.mti.univ-paris-diderot.fr/
ALOGPS Free http://www.vcclab.org/lab/alogps/

5.3 Scaffold Hopping Depending upon the case study and availability of
Drug discovery pipeline is intended to identify the the background information, different methodologies
novel chemical entities, which is not only unique in are adapted for scaffold hopping and employed (Bajor-
its mechanism of action against a particular target ath, 2017; Sun et al., 2012). In general, all available
protein but also novel in the chemical structural core scaffold hopping methodology can be classified as
and not available in the known drug chemical space. ligand-based hoping and protein structureebased hop-
A high-throughput screening approach can identify ping. In the former category are pharmacophore-based
the initial lead molecules having the potential to inhibit (Lauri & Bartlett, 1994), topological pharmacophore
the function of the target protein; however, computa- graph-based (Nakano et al., 2020), lead molecule
tional approaches can be further used for assessing the shape-based (Rush et al., 2005), and selected candidate
chemical diversity as well as the lead optimization compound-based hopping (Vogt et al., 2010). Scaffold
process. Among the different computational ap- hopping by 2D fingerprint was thoroughly analyzed by
proaches, scaffold hopping or lead hopping is used to the Bajorath group and pointed out the possible limita-
design the “isofunctional” chemical entity, i.e., two tion and applicability of these fingerprints in this regard
compounds having the nearly same activity but carrying and based on the findings; the paper has reported the
the different chemical scaffolds (Nakano et al., 2020). guidelines useful for scaffold hopping using the 2D
Scaffold hopping, in a general sense, is viewed to find fingerprint (Vogt et al., 2010). In the latter case, protein
the diverse chemical entities from the computational structural information is integrated in a different way to
screening (Chen et al., 2014); however, more systematic achieve the scaffold hopping. In one case study,
application includes the scaffold replacement by step- predocked fragment database is used to analyze the
by-step modification of the core scaffold of the receptor binding site by scoring the different sets of
compound series (Fig. 1.6). Modifications that might fragments at a particular site followed by proposing
generally be employed in this case include heterocyclic the new ligand molecules having the maximum interac-
replacements, ring closure or opening, peptidomimet- tion score (Lin & Tseng, 2011). In one more study of the
ics, and chemical topologyebased modifications. same line, Silverman and co-workers have reported that
Among the different reasons to carry out the scaffold improved selectivity and drug-like properties can be
hopping, replacing a chemically complex natural achieved using the fragment-based scaffold hopping
product with a synthetically accessible molecule and (Ji et al., 2009). Calculation of the proteineligand
improving the pharmacological properties of known binding FEs is a regular task to estimate the strength
actives are the main applications for a medicinal of the interaction between two molecules (protein
chemist (Hu et al., 2017). and bound ligand) and can be further used to estimate
20 PART I Foundations and Basic Techniques of Docking

FIG. 1.6 Scaffold hopping to optimize the initial lead identified from the high-throughput screening against
human vasopressin 1a receptor (Ratni et al., 2015).

the change of FEs while changing the particular R- molecular level (e.g., understanding the interaction
groups; however, it is challenging to estimate FE for pattern critical for selective ligand designing).
evaluating the scaffold hopping modification. Wang Conventional single targetebased drug discovery is
et al. have come up with an FEP-based method to found to be blinded, which neglects the other processes
pursue FEP for scaffold hopping related modification directly/indirectly connected through complex meta-
in computationally feasible manner (Wang, Deng, bolic/signaling networks (Maggiora, 2011). Drug
et al., 2017). Wang et al. have implemented proposed targetebased analysis has shown that some drugs can
method for six pharmaceutically important proteins simultaneously bind to many protein targets, thereby
and showed that predicted binding affinities for eliciting either biological activities or adverse effects
each modification have good correlation with the (Frantz, 2005). One of the example drugs of this type
experimental affinity. is aspirin, which has shown many different processes
of mechanisms along with cyclooxygenase inhibition
5.4 Multitargeted Approaches/ (Koeberle & Werz, 2014). Large-scale multidimensional
Polypharmacology experimental biological data have demonstrated that
Over the years, the drug discovery pipeline has incorpo- biological processes are arranged in higher levels of hi-
rated many new features such as in vivo models to a erarchical nature and single perturbation affects the
single protein target drug for a single mechanism. This whole complex network. This introduced a multitar-
single targetebased drug design process primarily geted drug design paradigm (Hopkins, 2008). Less
supported by the lock-and-key model proposed more than decades-old polypharmacology (interaction of
than a century ago by E. Fischer (1894). Through over single drugs to multiple targets) (Paolini et al., 2006),
the many decades, multidimensional biological data approach is now shifting the central dogma of the
have piled up to understand the underlying mechanism drug discovery process as this approach has found
of particular drug targets; however, drug discovery many encouraging results over former drug discovery
efforts continued to identify the selective drugs (keys) approaches (Hopkins, 2008). This approach is identi-
to inhibit single mechanismebased target (lock). This fied as a capable solution for the treatment of the
approach of drug design is also supported mainly by complex diseases like cancer (Raghavendra et al.,
the reductionist view of systems biology (Maggiora, 2018), neurological disease (Stephenson et al., 2005;
2011) and by an ever-growing number of crystal Więckowska et al., 2016), inflammation (Hwang
structures to understand the mechanistic view at the et al., 2013), and infectious diseases (Li et al., 2014).
CHAPTER 1 Modern Tools and Techniques in Computer-Aided Drug Design 21

TABLE 1.11
List of Databases of Repurposed Drugs.
Database Drugs Web Address
Repurposed Drug Database 300 http://www.drugrepurposingportal.com/repurposed-drug-
database.php
RepoDb 268 http://apps.chiragjpgroup.org/repoDB/
Drug Repurposing Hub 13,553 http://www.broadinstitute.org/repurposing
ReDO (Repurposing Drugs in w270 http://www.redo-project.org/db/
Oncology)
COVID-19 Drug Repurposing 128 https://www.excelra.com/covid-19-drug-repurposing-database/
Database

Drug repurposing is a direct way of analyzing the fast track. Several US FDA-approved drugs have shown
polypharmacological indication other than an indica- activity against multiple targets (also called promiscu-
tion they were initially being directed. Among the ity) implying that it can be repurposed for therapeutic
in silico methods, docking-based protocol is well benefit to tackle other diseases. In comparison to the
known for polypharmacological inhibitor design. One discovery of novel medicine, the advantages of drug
study reported by Pinzi et al. (2018) has discussed the repurposing are that the drug can move to the trial
role of proper protein conformations selection for the fast, reduced cost, and less risk of unfavorable results.
docking-based protocol to achieve a polypharmacolog- Drug repurposing has also changed the failures in
ical profile. The author has suggested that for proteins drug discovery into breakthroughs by discovering their
having dynamics and different states (active/inactive), new therapeutic uses. Presently, several hundred drugs
conformation assessment is required, as a multitarget are utilized for the disease they were originally
drug is accommodated by diverse binding pockets developed for as well as for repurposed indications.
(Pinzi et al., 2018). In the same line, many articles Acetylsalicylic acid (aspirin) was launched in 1897 as
have reported different chemoinformatic/in silico a nonsteroidal antiinflammatory drug that was later in
methods/techniques which can be utilized for multitar- 1956 used as an antithrombotic drug (Desborough &
geted drug designing (Koutsoukas et al., 2011; Antonio Keeling, 2017). There are several repositories of repur-
Lavecchia & Cerchia, 2016; Zhang, Pei, & Lai, 2017). posed drugs that are freely accessible (Table 1.11).
Using the chemical data mining approach, Bajorath CADD approaches greatly help in fast-tracking the
group has analyzed the crystal complexes data depos- repositioning process by revealing the unknown targets
ited in the PDB for finding a template for multitarget of the approved or failed drugs. Repurposing of several
ligand design. After performing the systematic search drugs has been done in the past, one of the classical
of available protein complexes, 702 ligands were examples is Sildenafil. The pill was originally developed
identified that bound to different protein families and for treating high blood pressure and angina but later
so-called multitargeted ligands. From the multitargeted developed as a treatment for erectile dysfunction and
ligands, analog-based scaffolds were isolated as a now repurposed for pulmonary hypertension. The
template for further multitargeted ligand design novel coronavirus (nCoV-2019) that is the causative
(Gilberg et al., 2018). agent of the disease COVID-19 has quickly caused
pandemic. Irrespective of the severity of the disease,
5.5 Drug Repurposing/Reprofiling pathogen-specific antivirals are missing. Hence, for a
One of the approaches that became increasingly popu- short-term response to fight nCoV-2019, computational
lar recently in the field of discovery and development of drug repurposing techniques stand out as a potential
drugs is to identify novel uses of already approved approach. Several approved drugs have been identified
drugs, known as “drug repurposing” or repositioning. which show anti-nCoV-2019 activity, some of these
The term drug repurposing was greatly highlighted compounds including chloroquine, tetrandrine, umife-
recently as COVID-19 pandemic prevailed throughout novir, carrimycin, damageprevir, lopinavir are in phase
the world, putting drug repurposing techniques on 4 of clinical trials (Lima et al., 2020).
22 PART I Foundations and Basic Techniques of Docking

Computational drug repurposing approaches can be predictive models. Currently, abundance of data
broadly classified into phenotypic or blind screening pertaining to the information on the protein target,
method, knowledge-based method, and data-based structure of small molecules, side effect profiles, gene
drug repurposing method. Furthermore, knowledge expression data, etc., is available, which helps in study-
and data-based drug repurposing method is classified ing the mechanism of the disease or mode of action of
into target-based approach, signature-based approach, the drug as well as in identifying the new indications for
network-based approach, and targeted mechanism- prevailing drugs. ML-based methods have excelled
based approach. Phenotypic or blind screening method greatly in the last few years, and several approaches
including biological activity, structural, or pharmaceu- based on this method have been suggested (Li et al.,
tical information about the mode of action of drug is 2016). The ML approaches that are applied in drug
not included, rather it depends on unexpected identifi- repurposing include logistic regression, DL, RF, SVM,
cation from experiments performed for certain drugs and neural network. For an efficient drug repurposing,
and diseases (Ma et al., 2013). This method is useful any of the in silico drug repurposing methods or their
when the target structure information is very little or combination can be applied depending upon the
unknown. As this approach does not require prior objectives and information availability.
structural information of the target, it can be apt for
numerous diseases. Target-based drug repurposing
method employs either target or ligand structure for 6 CONCLUSIONS
HTVS of compound libraries through molecular CADD is a powerful tool in modern drug discovery for
docking/pharmacophore modeling. Here the tools the search of potential therapeutic compounds. It has
and techniques described for molecular docking and now become the most suitable alternative for high-
VS are applied. This is one of the most common drug throughput screening, which is used routinely in drug
repurposing techniques used by scientists (Jin & discovery and development. The techniques/tools
Wong, 2014; Li et al., 2016). Jin et al. (2020) used a used in CADD are applied in almost all the stages of
target-based approach for screening more than 10,000 drug discovery pipeline, as it has the ability to fast-
molecules against the target protease (Mpro) of track the process of hit identification, hit to lead, and
COVID-19 virus and identified N3 and ebselen as lead optimization (binding affinity, ADME and
possible candidates to treat COVID-19 (Jin et al., toxicity, etc.). In the last two decades, advancements
2020). Knowledge and data-based drug repurposing in the computational drug designing protocol at the
method uses bioinformatics or cheminformatics ap- level of techniques/tools coupled with the progress in
proaches to apply existing knowledge of drugs such as computational power have enabled the scientific com-
the chemical structure of the drug and target, drug target munity to generate disease-oriented quick and reliable
networks, FDA approval labels, pathways information, solutions at low cost in manageable time space. The
clinical trial information, adverse effect, etc., to drug current pandemic state emerged due to the coronavirus
repurposing method, improving the accuracy of can be understood as the best example of this type
prediction. Zhou et al. (2020) implemented the where, in the 2e3 months time scale, many reports
network proximity analyses of drug targets and host have come up across the globe against different corona-
interactions of human coronavirus (HCoV) in the virus target proteins. Modern computational tech-
human interactome and identified 16 repurposable niques have helped in achieving molecular-level
drugs aginst HCoV; the study also provides a powerful understanding and also predicting the promising
network-based approach to quickly identify repurpos- inhibitors. Apart from the advancements in the rational
able drugs or drug combinations against novel corona- drug designing approaches, large-scale generation of
virus (Zhou et al., 2020). the multidimensional biological data has geared up
Genome-wide association studies (GWAS) based the ML/DL/AI-based model development, which has
method is used to identify the single-nucleotide poly- shown improvement in the prediction accuracy
morphisms (SNPs) linked to specific disease for identi- and in time complexity compared to traditional
fication of the genes that could be potential drug targets. approaches. Though these data-driven models are
Thousands of SNPs can be detected simultaneously highly dependent upon the quality and quantity of
with the help of GWAS; these data are then used to the data, as more and more data will come up, the pre-
distinguish genes related with the specific disease and diction accuracy will also scale up. At the same time,
to understand the drug response to these variations there is a continuous need of further improvements
(Sanseau et al., 2012). ML-based approaches require a in prediction algorithms to discover promising new
considerably large amount of data for building drugs and predict new indications for existing drugs.
CHAPTER 1 Modern Tools and Techniques in Computer-Aided Drug Design 23

REFERENCES Journal of Chemical Information and Modeling, 47(2),


Abagyan, R., Totrov, M., & Kuznetsov, D. (1994). ICMda new 279e294. https://doi.org/10.1021/ci600253e
method for protein modeling and design: Applications to Beccari, A. R., Cavazzoni, C., Beato, C., & Costantino, G.
docking and structure prediction from the distorted native (2013). LiGen: A high performance workflow for chemistry
conformation. Journal of Computational Chemistry. https:// driven de Novo design. Journal of Chemical Information and
doi.org/10.1002/jcc.540150503 Modeling, 53(6), 1518e1527. https://doi.org/10.1021/
Acharya, C., Coop, A., Polli, J. E., & MacKerell, A. D. (2010). ci400078g
Recent advances in ligand-based drug design: Relevance Bengio, Y., Courville, A., & Vincent, P. (2013). Representation
and utility of the conformationally sampled pharmaco- learning: A review and new perspectives. IEEE Transactions
phore approach. Current Computer-Aided Drug Design. on Pattern Analysis and Machine Intelligence. https://doi.org/
https://doi.org/10.2174/157340911793743547 10.1109/TPAMI.2013.50
Afifi, K., & Al-Sadek, A. F. (2018). Improving classical scoring Bento, A. P., Gaulton, A., Hersey, A., Bellis, L. J., Chambers, J.,
functions using random forest: The non-additivity of free Davies, M., Krüger, F. A., Light, Y., Mak, L., McGlinchey, S.,
energy terms’ contributions in binding. Chemical Biology Nowotka, M., Papadatos, G., Santos, R., & Overington, J. P.
and Drug Design. https://doi.org/10.1111/cbdd.13206 (2014). The ChEMBL bioactivity database: An update.
Ahmed, A., Saeed, F., Salim, N., & Abdo, A. (2014). Condorcet Nucleic Acids Research. https://doi.org/10.1093/nar/gkt1031
and borda count fusion method for ligand-based virtual Berman, H. M., Battistuz, T., Bhat, T. N., Bluhm, W. F.,
screening. Journal of Cheminformatics. https://doi.org/ Bourne, P. E., Burkhardt, K., Feng, Z., Gilliland, G. L.,
10.1186/1758-2946-6-19 Iype, L., Jain, S., Fagan, P., Marvin, J., Padilla, D.,
Ain, Q. U., Aleksandrova, A., Roessler, F. D., & Ballester, P. J. Ravichandran, V., Schneider, B., Thanki, N., Weissig, H.,
(2015). Machine-learning scoring functions to improve Westbrook, J. D., & Zardecki, C. (2002). The protein data
structure-based binding affinity prediction and virtual bank. Acta Crystallographica Section D Biological Crystallography.
screening. Wiley Interdisciplinary Reviews: Computational Mo- https://doi.org/10.1107/S0907444902003451
lecular Science. https://doi.org/10.1002/wcms.1225 BIOVIA Discovery Studio | Pharmacophore and Ligand-Based
Aqvist, J., Luzhkov, V. B., & Brandsdal, B. O. (2002). Ligand Design. (n.d.). Retrieved June 17, 2020, from https://www.
binding affinities from MD simulations. Accounts of Chemi- 3dsbiovia.com/products/collaborative-science/biovia-disco
cal Research. https://doi.org/10.1021/ar010014p very-studio/pharmacophore-and-ligand-based-design.html.
Arulsudar, N., Subramanian, N., & Murthy, R. S. R. (2005). Böhm, H. J. (1992a). Ludi: Rule-based automatic design of new
Comparison of artificial neural network and multiple linear substituents for enzyme inhibitor leads. Journal of Computer-
regression in the optimization of formulation parameters Aided Molecular Design. https://doi.org/10.1007/BF00126217
of leuprolide acetate loaded liposomes. Journal of Pharmacy Böhm, H. J. (1992b). The computer program Ludi: A new
and Pharmaceutical Sciences, 8(2), 243e258. method for the de novo design of enzyme inhibitors.
Ashtawy, H. M., & Mahapatra, N. R. (2018). Boosted neural Journal of Computer-Aided Molecular Design, 6(1), 61e78.
networks scoring functions for accurate ligand docking https://doi.org/10.1007/BF00124387
and ranking. Journal of Bioinformatics and Computational Broccatelli, F., & Brown, N. (2014). Best of both worlds: On the
Biology. https://doi.org/10.1142/S021972001850004X complementarity of ligand-based and structure-based
Bajorath, J. (2017). Computational scaffold hopping: Corner- virtual screening. Journal of Chemical Information and
stone for the future of drug design? Future Medicinal Modeling. https://doi.org/10.1021/ci5001604
Chemistry, 9(7), 629e631. https://doi.org/10.4155/fmc- Brylinski, M. (2013). Nonlinear scoring functions for
2017-0043 similarity-based ligand docking and binding affinity
Bajusz, D., Rácz, A., & Héberger, K. (2015). Why is Tanimoto prediction. Journal of Chemical Information and Modeling.
index an appropriate choice for fingerprint-based similarity https://doi.org/10.1021/ci400510e
calculations? Journal of Cheminformatics. https://doi.org/ Bucher, D., Stouten, P., & Triballeau, N. (2018). Shedding light
10.1186/s13321-015-0069-3 on important waters for drug design: Simulations versus
Ballester, P. J., & Mitchell, J. B. O. (2010). A machine learning grid-based methods. Journal of Chemical Information and
approach to predicting protein-ligand binding affinity with Modeling. https://doi.org/10.1021/acs.jcim.7b00642
applications to molecular docking. Bioinformatics. https:// Byvatov, E., Fechner, U., Sadowski, J., & Schneider, G. (2003).
doi.org/10.1093/bioinformatics/btq112 Comparison of support vector machine and artificial neural
Ballester, P. J., Schreyer, A., & Blundell, T. L. (2014). Does a network systems for drug/nondrug classification. Journal of
more precise chemical description of protein-ligand com- Chemical Information and Computer Sciences. https://doi.org/
plexes lead to more accurate prediction of binding 10.1021/ci0341161
affinity? Journal of Chemical Information and Modeling. Cang, Z., Mu, L., & Wei, G. W. (2018). Representability of alge-
https://doi.org/10.1021/ci500091r braic topology for biomolecules in machine learning based
Baroni, M., Cruciani, G., Sciabola, S., Perruccio, F., & scoring and virtual screening. PLoS Computational Biology.
Mason, J. S. (2007). A common reference framework for https://doi.org/10.1371/journal.pcbi.1005929
analyzing/comparing proteins and ligands. Fingerprints Cappel, D., Sherman, W., & Beuming, T. (2017). Calculating
for ligands and proteins (FLAP): Theory and application. water thermodynamics in the binding site of proteins e
24 PART I Foundations and Basic Techniques of Docking

applications of WaterMap to drug discovery. Current development, and 3D database screening: 1. Methodology
Topics in Medicinal Chemistry. https://doi.org/10.2174/ and preliminary results. Journal of Computer-Aided Molecular
1568026617666170414141452 Design, 20(10e11), 647e671. https://doi.org/10.1007/
Carlson, H. A., Masukawa, K. M., Rubins, K., Bushman, F. D., s10822-006-9087-6
Jorgensen, W. L., Lins, R. D., Briggs, J. M., & Drie, J. H. (2007). Computer-aided drug design: The next 20
McCammon, J. A. (2000). Developing a dynamic pharma- years. Journal of Computer-Aided Molecular Design. https://
cophore model for HIV-1 integrase. Journal of Medicinal doi.org/10.1007/s10822-007-9142-y
Chemistry. https://doi.org/10.1021/jm990322h Duan, L., Feng, G., Wang, X., Wang, L., & Zhang, Q. (2017). Ef-
Cereto-Massagué, A., Ojeda, M. J., Valls, C., Mulero, M., Garcia- fect of electrostatic polarization and bridging water on
Vallvé, S., & Pujadas, G. (2015). Molecular fingerprint sim- CDK2-ligand binding affinities calculated using a highly
ilarity search in virtual screening. Methods. https://doi.org/ efficient interaction entropy method. Physical Chemistry
10.1016/j.ymeth.2014.08.005 Chemical Physics. https://doi.org/10.1039/c7cp00841d
Cheng, K., Korfmacher, W., White, R., & Njoroge, F. (2011). Dudek, A., Arodz, T., & Galvez, J. (2006). Computational
Lead optimization in discovery drug metabolism and phar- methods in developing quantitative structure-activity rela-
macokinetics/case study: The Hepatitis C virus (HCV) tionships (QSAR): A review. Combinatorial Chemistry and
protease inhibitor SCH 503034. Dyes and Drugs. https:// High Throughput Screening. https://doi.org/10.2174/
doi.org/10.1201/b13128-15 138620706776055539
Chen, Y.-C., Totrov, M., & Abagyan, R. (2014). Docking to Dunitz, J. D. (1994). The entropic cost of bound water in crys-
multiple pockets or ligand fields for screening, activity pre- tals and biomolecules. Science. https://doi.org/10.1126/
diction and scaffold hopping. Future Medicinal Chemistry, science.264.5159.670
6(16), 1741e1755. https://doi.org/10.4155/fmc.14.113 Durrant, J. D., Friedman, A. J., Rogers, K. E., & McCammon, J. A.
Clark, D. E. (2006). What has computer-aided molecular (2013). Comparing neural-network scoring functions and
design ever done for drug discovery? Expert Opinion on the state of the art: Applications to common library
Drug Discovery. https://doi.org/10.1517/17460441.1.2.103 screening. Journal of Chemical Information and Modeling.
Clarke, C., Woods, R. J., Gluska, J., Cooper, A., Nutley, M. A., & https://doi.org/10.1021/ci400042y
Boons, G. J. (2001). Involvement of water in carbohydrate- Durrant, J. D., & McCammon, J. A. (2011). NNScore 2.0: A
protein binding. Journal of the American Chemical Society. neural-network receptor-ligand scoring function. Journal
https://doi.org/10.1021/ja004315q of Chemical Information and Modeling. https://doi.org/
Cooper, D. R., Porebski, P. J., Chruszcz, M., & Minor, W. 10.1021/ci2003889
(2011). X-ray crystallography: Assessment and validation Du, X., Sun, S., Hu, C., Yao, Y., Yan, Y., & Zhang, Y. (2017).
of protein-small molecule complexes for drug discovery. DeepPPI: Boosting prediction of protein-protein interac-
Expert Opinion on Drug Discovery. https://doi.org/10.1517/ tions with deep neural networks. Journal of Chemical
17460441.2011.585154 Information and Modeling. https://doi.org/10.1021/acs.jcim.
Cornell, W. D., Cieplak, P., Bayly, C. I., Gould, I. R., 7b00028
Merz, K. M., Ferguson, Spellmeyer, D. C., Fox, T., Eisen, M. B., Wiley, D. C., Karplus, M., & Hubbard, R. E.
Caldwell, J. W., & Kollman, P. A. (1995). A 2nd Generation (1994). HOOK: A program for finding novel molecular ar-
Force-Field for the Simulation of Proteins, Nucleic-Acids, chitectures that satisfy the chemical and steric requirements
and Organic-Molecules. Journal of the American Chemical So- of a macromolecule binding site. Proteins: Structure,
ciety, 117(19), 5179e5197. Function, and Bioinformatics. https://doi.org/10.1002/prot.
Clark, D. E. (2008). What has virtual screening ever done for 340190305
drug discovery? Expert Opinion on Drug Discovery, 3(8), Ericksen, S. S., Wu, H., Zhang, H., Michael, L. A., Newton, M. A.,
841e851. Hoffmann, F. M., & Wildman, S. A. (2017). Machine
Deo, R. C. (2015). Machine learning in medicine. Circulation. learning consensus scoring improves performance across
https://doi.org/10.1161/CIRCULATIONAHA.115.001593 targets in structure-based virtual screening. Journal of Chemi-
Desborough, M. J. R., & Keeling, D. M. (2017). The aspirin cal Information and Modeling. https://doi.org/10.1021/
story e from willow to wonder drug. British Journal of acs.jcim.7b00153
Haematology. https://doi.org/10.1111/bjh.14520 Ewing, T. J. A., Makino, S., Skillman, A. G., & Kuntz, I. D.
Devi, R. V., Sathya, S. S., & Coumar, M. S. (2015). Evolutionary (2001). Dock 4.0: Search strategies for automated molecu-
algorithms for de novo drug design - a survey. Applied Soft lar docking of flexible molecule databases. Journal of
Computing Journal. https://doi.org/10.1016/j.asoc.2014. Computer-Aided Molecular Design. https://doi.org/10.1023/
09.042 A:1011115820450
Ding, B., Wang, J., Li, N., & Wang, W. (2013). Characterization of Fauman, E. B., Rai, B. K., & Huang, E. S. (2011). Structure-
small molecule binding. I. Accurate identification of strong based druggability assessment-identifying suitable targets
inhibitors in virtual screening. Journal of Chemical for small molecule therapeutics. Current Opinion in Chemi-
Information and Modeling. https://doi.org/10.1021/ci300508m cal Biology. https://doi.org/10.1016/j.cbpa.2011.05.020
Dixon, S. L., Smondyrev, A. M., Knoll, E. H., Rao, S. N., Fischer, E. (1894). Einfluss der Configuration auf die Wirkung
Shaw, D. E., & Friesner, R. A. (2006). Phase: A new engine der Enzyme. Berichte Der Deutschen Chemischen Gesellschaft.
for pharmacophore perception, 3D QSAR model https://doi.org/10.1002/cber.18940270364
CHAPTER 1 Modern Tools and Techniques in Computer-Aided Drug Design 25

Fischer, J., & Robin Ganellin, C. (2006). Analogue-based drug Guedes, I. A., Pereira, F. S. S., & Dardenne, L. E. (2018). Empir-
discovery. Analogue-based Drug Discovery. https://doi.org/ ical scoring functions for structure-based virtual screening:
10.1002/3527608001 Applications, critical aspects, and challenges. Frontiers in
Floresta, G., Cilibrizzi, A., Abbate, V., Spampinato, A., Pharmacology. https://doi.org/10.3389/fphar.2018.01089
Zagni, C., & Rescifina, A. (2019). 3D-QSAR assisted Hemmateenejad, B., Safarpour, M. A., Miri, R., & Taghavi, F.
identification of FABP4 inhibitors: An effective scaffold (2004). Application of ab initio theory to QSAR study of
hopping analysis/QSAR evaluation. Bioorganic Chemistry. 1,4-dihydropyridine-based calcium channel blockers using
https://doi.org/10.1016/j.bioorg.2018.11.045 GA-MLR and PC-GA-ANN procedures. Journal of Computa-
Forli, S. (2015). Charting a path to success in virtual screening. tional Chemistry. https://doi.org/10.1002/jcc.20066
Molecules. https://doi.org/10.3390/molecules201018732 Honma, T., Hayashi, K., Aoyama, T., Hashimoto, N.,
Forli, S., Huey, R., Pique, M. E., Sanner, M. F., Goodsell, D. S., & Machida, T., Fukasawa, K., Iwama, T., Ikeura, C., Ikuta, M.,
Olson, A. J. (2016). Computational protein-ligand docking Suzuki, T. I., Iwasawa, Y., Hayama, T., Nishimura, S., &
and virtual drug screening with the AutoDock suite. Nature Morishima, H. (2001). Structure-based generation of a new
Protocols. https://doi.org/10.1038/nprot.2016.051 class of potent Cdk4 inhibitors: New de Novo design strategy
Frantz, S. (2005). Playing dirty. Nature. https://doi.org/ and library design. Journal of Medicinal Chemistry, 44(26),
10.1038/437942a 4615e4627. https://doi.org/10.1021/jm0103256
Friesner, R. A., Banks, J. L., Murphy, R. B., Halgren, T. A., Hopfinger, A. J., Reaka, A., Venkatarangan, P., Duca, J. S., &
Klicic, J. J., Mainz, D. T., Repasky, M. P., Knoll, E. H., Wang, S. (1999). Construction of a virtual high throughput
Shelley, M., Perry, J. K., Shaw, D. E., Francis, P., & screen by 4D-QSAR analysis: Application to a combinato-
Shenkin, P. S. (2004). Glide: A new approach for rapid, ac- rial library of glucose inhibitors of glycogen phosphorylase
curate docking and scoring. 1. Method and assessment of b. Journal of Chemical Information and Computer Sciences.
docking accuracy. Journal of Medicinal Chemistry. https:// https://doi.org/10.1021/ci990032þ
doi.org/10.1021/jm0306430 Hopkins, A. L. (2008). Network pharmacology: The next para-
Gilberg, E., Stumpfe, D., & Bajorath, J. (2018). X-ray-structure- digm in drug discovery. Nature Chemical Biology. https://
based identification of compounds with activity against doi.org/10.1038/nchembio.118
targets from different families and generation of templates Hornak, V., & Simmerling, C. (2007). Targeting structural flexi-
for multitarget ligand design. ACS Omega. https://doi.org/ bility in HIV-1 protease inhibitor binding. Drug Discovery
10.1021/acsomega.7b01849 Today, 12(3e4), 132e138. https://doi.org/10.1016/j.drudis.
Glen, R. C., Martin, G. R., Hill, A. P., Hyde, R. M., 2006.12.011
Woollard, P. M., Salmon, J. A., Buckingham, J., & Huang, S. Y., Grinter, S. Z., & Zou, X. (2010). Scoring functions
Robertson, A. D. (1995). Computer-aided design and syn- and their evaluation methods for protein-ligand docking:
thesis of 5-substituted tryptamines and their pharmacology Recent advances and future directions. Physical Chemistry
at the 5-HT1d receptor: Discovery of compounds with po- Chemical Physics. https://doi.org/10.1039/c0cp00151a
tential anti-migraine properties. Journal of Medicinal Hu, B., & Lill, M. A. (2012). Protein pharmacophore selection
Chemistry. https://doi.org/10.1021/jm00018a016 using hydration-site analysis. Journal of Chemical Information
Gomes, M. N., Braga, R. C., Grzelak, E. M., Neves, B. J., and Modeling. https://doi.org/10.1021/ci200620h
Muratov, E., Ma, R., Klein, L. L., Cho, S., Oliveira, G. R., Hu, Y., Stumpfe, D., & Bajorath, J. (2017). Recent advances in
Franzblau, S. G., & Andrade, C. H. (2017). QSAR-driven scaffold hopping. Journal of Medicinal Chemistry, 60(4),
design, synthesis and discovery of potent chalcone derivatives 1238e1246. https://doi.org/10.1021/acs.jmedchem.6b01437
with antitubercular activity. European Journal of Medicinal Hwang, S. H., Wecksler, A. T., Wagner, K., & Hammock, B. D.
Chemistry. https://doi.org/10.1016/j.ejmech.2017.05.026 (2013). Rationally designed multitarget agents against
Gomes, M. N., Muratov, E. N., Pereira, M., Peixoto, J. C., inflammation and pain. Current Medicinal Chemistry.
Rosseto, L. P., Cravo, P. V. L., Andrade, C. H., & https://doi.org/10.2174/0929867311320130013
Neves, B. J. (2017). Chalcone derivatives: Promising start- Inc, C. C. G. (2015). Molecular Operating Environment
ing points for drug design. Molecules. https://doi.org/ (MOE), 2015.01. 1010 Sherbooke St.West, suite #910, Mon-
10.3390/molecules22081210 treal, QC, Canada, H3A 2R7.
Gomes, J., Ramsundar, B., Feinberg, E. N., & Pande, V. S. Irwin, J. J., & Shoichet, B. K. (2005). Zinc - a free database of
(2017). Atomic convolutional networks for predicting protein- commercially available compounds for virtual screening.
ligand binding affinity (pp. 1e17). Retrieved from http:// Journal of Chemical Information and Modeling. https://
arxiv.org/abs/1703.10603. doi.org/10.1021/ci049714þ
Goodsell, D. S., & Olson, A. J. (1990). Automated docking of Jain, A. N. (2003). Surflex: Fully automatic flexible molecular
substrates to proteins by simulated annealing. Proteins: docking using a molecular similarity-based search engine.
Structure, Function, and Bioinformatics. https://doi.org/ Journal of Medicinal Chemistry. https://doi.org/10.1021/
10.1002/prot.340080302 jm020406h
Grover, S., Apushkin, M. A., & Fishman, G. A. (2006). Topical Jain, A. N. (2007). Surflex-Dock 2.1: Robust performance from
dorzolamide for the treatment of cystoid macular edema in ligand energetic modeling, ring flexibility, and knowledge-
patients with retinitis pigmentosa. American Journal of based search. Journal of Computer-Aided Molecular Design.
Ophthalmology. https://doi.org/10.1016/j.ajo.2005.12.030 https://doi.org/10.1007/s10822-007-9114-2
26 PART I Foundations and Basic Techniques of Docking

James, C., Weininger, D., & Delaney, J. (2011). Daylight theory Kirchmair, J., Markt, P., Distinto, S., Schuster, D., Spitzer, G. M.,
manual version 4.9. Laguna Niguel, CA: Daylight Chemical Liedl, K. R., Langer, T., & Wolber, G. (2008). The Protein
Information Systems, Inc. Data Bank (PDB), its related services and software tools as
́
Ji, H., Li, H., Martasek, P., Roman, L. J., Poulos, T. L., & key components for in silico guided drug discovery. Journal
Silverman, R. B. (2009). Discovery of highly potent and of Medicinal Chemistry. https://doi.org/10.1021/jm8005977
selective inhibitors of neuronal nitric oxide synthase by Koeberle, A., & Werz, O. (2014). Multi-target approach for nat-
fragment hopping. Journal of Medicinal Chemistry, 52(3), ural products in inflammation. Drug Discovery Today.
779e797. https://doi.org/10.1021/jm801220a https://doi.org/10.1016/j.drudis.2014.08.006
Jin, Z., Du, X., Xu, Y., Deng, Y., Liu, M., Zhao, Y., Zhang, B., Koes, D. R., & Camacho, C. J. (2011). Pharmer: Efficient and
Li, X., Zhang, L., Peng, C., Duan, Y., Yu, J., Wang, L., exact pharmacophore search. Journal of Chemical Information
Yang, K., Liu, F., Jiang, R., Yang, X., You, T., Liu, X., … and Modeling. https://doi.org/10.1021/ci200097m
(2020). Structure of Mpro from COVID-19 virus and dis- Kollman, P. (1993). Free energy calculations: Applications to
covery of its inhibitors. Nature. https://doi.org/10.1038/ chemical and biochemical phenomena. Chemical Reviews.
s41586-020-2223-y https://doi.org/10.1021/cr00023a004
Jin, G., & Wong, S. T. C. (2014). Toward better drug reposition- Koutsoukas, A., Simms, B., Kirchmair, J., Bond, P. J.,
ing: Prioritizing and integrating existing methods into effi- Whitmore, A. V., Zimmer, S., Young, M. P., Jenkins, J. L.,
cient pipelines. Drug Discovery Today. https://doi.org/ Glick, M., Glen, R. C., & Bender, A. (2011). From in silico
10.1016/j.drudis.2013.11.005 target prediction to multi-target drug design: Current data-
John, S., Thangapandian, S., Sakkiah, S., & Lee, K. W. (2011). bases, methods and applications. Journal of Proteomics.
Discovery of potential pancreatic cholesterol esterase inhibi- https://doi.org/10.1016/j.jprot.2011.05.011
tors using pharmacophore modelling, virtual screening, and Kramer, B., Rarey, M., & Lengauer, T. (1999). Evaluation of the
optimization studies. Journal of Enzyme Inhibition and FLEXX incremental construction algorithm for proteine
Medicinal Chemistry. https://doi.org/10.3109/14756366. ligand docking. Proteins: Structure, Function, and Bioinformat-
2010.535795 ics, 37(2), 228e241. https://doi.org/10.1002/(SICI)1097-
Jung, S. W., Kim, M., Ramsey, S., Kurtzman, T., & Cho, A. E. 0134(19991101)37:2<228::AID-PROT8>3.0.CO;2e8
(2018). Water pharmacophore: Designing ligands using Kumar, P., Kaalia, R., Srinivasan, A., & Ghosh, I. (2017). Mul-
molecular dynamics simulations with water. Scientific tiple target based pharmacophore designing from active site
Reports. https://doi.org/10.1038/s41598-018-28546-z structures. SAR and QSAR in Environmental Research. https://
Kaalia, R., Srinivasan, A., Kumar, A., & Ghosh, I. (2016). ILP- doi.org/10.1080/1062936X.2017.1401555
assisted de novo drug design. Machine Learning, 103(3), Kuntz, I. D., Blaney, J. M., Oatley, S. J., Langridge, R., &
309e341. https://doi.org/10.1007/s10994-016-5556-x Ferrin, T. E. (1982). A geometric approach to
Kadirvelraj, R., Foley, B. L., Dyekjær, J. D., & Woods, R. J. macromolecule-ligand interactions. Journal of Molecular
(2008). Involvement of water in carbohydrate-protein Biology. https://doi.org/10.1016/0022-2836(82)90153-X
binding: Concanavalin A revisited. Journal of the American Lameijer, E.-W., Tromp, R. A., Spanjersberg, R. F., Brussee, J., &
Chemical Society. https://doi.org/10.1021/ja8039663 Ijzerman, A. P. (2007). Designing active template mole-
Kadurin, A., Nikolenko, S., Khrabrov, K., Aliper, A., & cules by combining computational de novo design and hu-
Zhavoronkov, A. (2017). DruGAN: An advanced generative man chemist’s expertise. Journal of Medicinal Chemistry,
adversarial autoencoder model for de novo generation of 50(8), 1925e1932. https://doi.org/10.1021/jm061356þ
new molecules with desired molecular properties Lauri, G., & Bartlett, P. A. (1994). Caveat: A program to facili-
in silico. Molecular Pharmaceutics. https://doi.org/10.1021/ tate the design of organic molecules. Journal of Computer-
acs.molpharmaceut.7b00346 Aided Molecular Design, 8(1), 51e66. https://doi.org/
Kaushik, A. C., Mehmood, A., Dai, X., & Wei, D. Q. (2020). 10.1007/BF00124349
A comparative chemogenic analysis for predicting drug- Lavecchia, Antonio, & Cerchia, C. (2016). In silico methods to
target pair via machine learning approaches. Scientific address polypharmacology: Current status, applications
Reports. https://doi.org/10.1038/s41598-020-63842-7 and future perspectives. Drug Discovery Today. https://
Khamis, M. A., & Gomaa, W. (2015). Comparative assessment doi.org/10.1016/j.drudis.2015.12.007
of machine-learning scoring functions on PDBbind 2013. Lavecchia, A., & Giovanni, C. (2013). Virtual screening strategies
Engineering Applications of Artificial Intelligence. https:// in drug discovery: A critical review. Current Medicinal
doi.org/10.1016/j.engappai.2015.06.021 Chemistry. https://doi.org/10.2174/09298673113209990001
Khamis, M. A., Gomaa, W., & Ahmed, W. F. (2015). Machine Leelananda, S. P., & Lindert, S. (2016). Computational
learning in computational docking. Artificial Intelligence in methods in drug discovery. Beilstein Journal of Organic
Medicine. https://doi.org/10.1016/j.artmed.2015.02.002 Chemistry. https://doi.org/10.3762/bjoc.12.267
Kinnings, S. L., Liu, N., Tonge, P. J., Jackson, R. M., Xie, L., & Lerner, M. G., Bowman, A. L., & Carlson, H. A. (2007). Incor-
Bourne, P. E. (2011). A machine learning-based method porating dynamics in E. coli dihydrofolate reductase
to improve docking scoring functions and its application enhances structure-based drug discovery. Journal of
to drug repurposing. Journal of Chemical Information and Chemical Information and Modeling, 47(6), 2358e2365.
Modeling. https://doi.org/10.1021/ci100369f https://doi.org/10.1021/ci700167n
CHAPTER 1 Modern Tools and Techniques in Computer-Aided Drug Design 27

Lexa, K. W., & Carlson, H. A. (2011). Full protein flexibility is using docking, pharmacophore and 3D-QSAR. SAR and
essential for proper hot-spot mapping. Journal of the QSAR in Environmental Research. https://doi.org/10.1080/
American Chemical Society, 133(2), 200e202. https:// 1062936X.2019.1573377
doi.org/10.1021/ja1079332 Martin, Y. C., Kofron, J. L., & Traphagen, L. M. (2002). Do struc-
Lima, W. G., Brito, J. C. M., Overhage, J., & Nizer, W. S. da C. turally similar molecules have similar biological activity?
(2020). The potential of drug repositioning as a short- Journal of Medicinal Chemistry. https://doi.org/10.1021/
term strategy for the control and treatment of COVID-19 jm020155c
(SARS-CoV-2): A systematic review. Archives of Virology. Mauser, H., & Guba, W. (2008). Recent developments in de
https://doi.org/10.1007/s00705-020-04693-5 novo design and scaffold hopping. Current Opinion in
Lin, F.-Y., & Tseng, Y. J. (2011). Structure-based fragment Drug Discovery and Development, 11(3), 365e374.
hopping for lead optimization using predocked fragment McGann, M. (2011). FRED pose prediction and virtual
database. Journal of Chemical Information and Modeling, screening accuracy. Journal of Chemical Information and
51(7), 1703e1715. https://doi.org/10.1021/ci200136j Modeling. https://doi.org/10.1021/ci100436p
Li, K., Schurig-Briccio, L. A., Feng, X., Upadhyay, A., Pujari, V., McGann, M. R., Almond, H. R., Nicholls, A., Grant, J. A., &
Lechartier, B., Fontes, L., Yang, H., Rao, G., Zhu, W., Brown, F. K. (2003). Gaussian docking functions.
Gulati, A., No, J. H., Cintra, Gi., Bogue, S., Liu, Y. L., Biopolymers. https://doi.org/10.1002/bip.10207
Molohon, K., Orlean, P., Mitchell, D. A., Freitas-Junior, L., Meng, X.-Y., Zhang, H.-X., Mezei, M., & Cui, M. (2012). Molec-
… (2014). Multitarget drug discovery for tuberculosis ular docking: A powerful approach for structure-based drug
and other infectious diseases. Journal of Medicinal discovery. Current Computer-Aided Drug Design. https://
Chemistry. https://doi.org/10.1021/jm500131s doi.org/10.2174/157340911795677602
Li, H., Sze, K. H., Lu, G., & Ballester, P. J. (2020). Machine- Moitessier, N., Englebienne, P., Lee, D., Lawandi, J., &
learning scoring functions for structure-based drug lead Corbeil, C. (2008). Towards the development of universal,
optimization. Wiley Interdisciplinary Reviews: Computational fast and highly accurate docking/scoring methods: A long
Molecular Science. https://doi.org/10.1002/wcms.1465 way to go. British Journal of Pharmacology, 153(1), S7eS26.
Li, L., Wang, B., & Meroueh, S. O. (2011). Support vector Morris, G. M., Goodsell, D. S., Halliday, R. S., Huey, R.,
regression scoring of receptor-ligand complexes for rank- Hart, W. E., Belew, R. K., & Olson, A. J. (1998). Automated
ordering and virtual screening of chemical libraries. Journal docking using a Lamarckian genetic algorithm and an
of Chemical Information and Modeling. https://doi.org/ empirical binding free energy function. Journal of Computa-
10.1021/ci200078f tional Chemistry. https://doi.org/10.1002/(SICI)1096e1987
Li, J., Zheng, S., Chen, B., Butte, A. J., Swamidass, S. J., & Lu, Z. X(19981115)19:14<1639::AID-JCC10>3.0.CO;2-B
(2016). A survey of current trends in computational drug Nakano, H., Miyao, T., & Funatsu, K. (2020). Exploring topo-
repositioning. Briefings in Bioinformatics. https://doi.org/ logical pharmacophore graphs for scaffold hopping. Journal
10.1093/bib/bbv020 of Chemical Information and Modeling, 60(4), 2073e2081.
Lu, Y., Wang, R., Yang, C. Y., & Wang, S. (2007). Analysis of https://doi.org/10.1021/acs.jcim.0c00098
ligand-bound water molecules in high-resolution crystal Neves, M. A. C., Dinis, T. C. P., Colombo, G., & Sá E Melo, M. L.
structures of protein-ligand complexes. Journal of Chemical (2009). Fast three dimensional pharmacophore virtual
Information and Modeling. https://doi.org/10.1021/ screening of new potent non-steroid aromatase inhibitors.
ci6003527 Journal of Medicinal Chemistry. https://doi.org/10.1021/
Lyne, P. D. (2002). Structure-based virtual screening: An jm800945c
overview. Drug Discovery Today. https://doi.org/10.1016/ Neves, M. A. C., Totrov, M., & Abagyan, R. (2012). Docking and
S1359-6446(02) 02483e2 scoring with ICM: The benchmarking results and strategies
Macalino, S. J. Y., Gosu, V., Hong, S., & Choi, S. (2015). Role of for improvement. Journal of Computer-Aided Molecular
computer-aided drug design in modern drug discovery. Ar- Design. https://doi.org/10.1007/s10822-012-9547-0
chives of Pharmacal Research. https://doi.org/10.1007/ Nicolaou, C., Kannas, C., & Loizidou, E. (2012). Multi-
s12272-015-0640-5 objective optimization methods in de novo drug design.
Ma, D. L., Chan, D. S. H., & Leung, C. H. (2013). Drug reposi- Mini Reviews in Medicinal Chemistry, 12(10), 979e987.
tioning by structure-based virtual screening. Chemical https://doi.org/10.2174/138955712802762284
Society Reviews. https://doi.org/10.1039/c2cs35357a Olivecrona, M., Blaschke, T., Engkvist, O., & Chen, H. (2017).
Maggiora, G. M. (2011). The reductionist paradox: Are the laws Molecular de-novo design through deep reinforcement
of chemistry and physics sufficient for the discovery of new learning. Journal of Cheminformatics. https://doi.org/
drugs? Journal of Computer-Aided Molecular Design. https:// 10.1186/s13321-017-0235-x
doi.org/10.1007/s10822-011-9447-8 Oprea, T. (2003). 3D QSAR modeling in drug design. Compu-
Mak, K. K., & Pichika, M. R. (2019). Artificial intelligence in tational Medicinal Chemistry for drug Discovery. https://
drug development: Present status and future prospects. doi.org/10.1201/9780203913390.ch22
Drug Discovery Today. https://doi.org/10.1016/j.drudis. Paolini, G. V., Shapland, R. H. B., Van Hoorn, W. P.,
2018.11.014 Mason, J. S., & Hopkins, A. L. (2006). Global mapping of
Mali, S. N., & Chaudhari, H. K. (2019). Molecular modelling pharmacological space. Nature Biotechnology. https://
studies on adamantane-based Ebola virus GP-1 inhibitors doi.org/10.1038/nbt1228
28 PART I Foundations and Basic Techniques of Docking

Pence, H. E., & Williams, A. (2010). Chemspider: An online Sanseau, P., Agarwal, P., Barnes, M. R., Pastinen, T.,
chemical information resource. Journal of Chemical Richards, J. B., Cardon, L. R., & Mooser, V. (2012). Use of
Education. https://doi.org/10.1021/ed100697w genome-wide association studies for drug repositioning.
Pinzi, L., Caporuscio, F., & Rastelli, G. (2018). Selection of pro- Nature Biotechnology. https://doi.org/10.1038/nbt.2151
tein conformations for structure-based polypharmacology Sastry, M., Lowrie, J. F., Dixon, S. L., & Sherman, W. (2010).
studies. Drug Discovery Today. https://doi.org/10.1016/ Large-scale systematic analysis of 2D fingerprint methods
j.drudis.2018.08.007 and parameters to improve virtual screening enrichments.
Pirhadi, S., Shiri, F., & Ghasemi, J. B. (2013). Methods and Journal of Chemical Information and Modeling. https://
applications of structure based pharmacophores in drug doi.org/10.1021/ci100062n
discovery. Current Topics in Medicinal Chemistry. https:// Schaller, D., Sribar, D., Noonan, T., Deng, L., Nguyen, T. N.,
doi.org/10.2174/1568026611313090006 Pach, S., Machalz, D., Bermudez, M., & Wolber, G. (2020).
Raghavendra, N. M., Pingili, D., Kadasi, S., Mettu, A., & Next generation 3D pharmacophore modeling. Wiley Inter-
Prasad, S. V. U. M. (2018). Dual or multi-targeting inhibi- disciplinary Reviews: Computational Molecular Science. https://
tors: The next generation anticancer agents. European Journal doi.org/10.1002/wcms.1468
of Medicinal Chemistry. https://doi.org/10.1016/j.ejmech. Schneider, G. (2010). Virtual screening: An endless staircase?
2017.10.021 Nature Reviews Drug Discovery. https://doi.org/10.1038/
Ragoza, M., Hochuli, J., Idrobo, E., Sunseri, J., & Koes, D. R. nrd3139
(2017). Protein-ligand scoring with convolutional neural Schneider, G., & Fechner, U. (2005). Computer-based de novo
networks. Journal of Chemical Information and Modeling. design of drug-like molecules. Nature Reviews Drug Discov-
https://doi.org/10.1021/acs.jcim.6b00740 ery, 4(8), 649e663. https://doi.org/10.1038/nrd1799
Rajamani, R., & Good, A. C. (2007). Ranking poses in struc- Schneider, G., Hartenfeller, M., Reutlinger, M., Tanrikulu, Y.,
ture-based lead discovery and optimization: Current trends Proschak, E., & Schneider, P. (2009). Voyages to the (un)
in scoring function development. Current Opinion in Drug known: Adaptive design of bioactive compounds. Trends
Discovery and Development, 10(3), 308e315. 17554857. in Biotechnology, 27(1), 18e26. https://doi.org/10.1016/
Rarey, M., Kramer, B., Lengauer, T., & Klebe, G. (1996). A fast j.tibtech.2008.09.005
flexible docking method using an incremental construction Schneidman-Duhovny, D., Dror, O., Inbar, Y., Nussinov, R., &
algorithm. Journal of Molecular Biology. https://doi.org/ Wolfson, H. J. (2008). PharmaGist: A webserver for ligand-
10.1006/jmbi.1996.0477 based pharmacophore detection. Nucleic Acids Research,
Ratni, H., Rogers-Evans, M., Bissantz, C., Grundschober, C., 36(Web Server), W223eW228. https://doi.org/10.1093/
Moreau, J., Schuler, F., Fischer, H., Alvarez Sanchez, R., & nar/gkn187
Schnider, P. (2015). Discovery of highly selective brain- Schuster, D., Nashev, L. G., Kirchmair, J., Laggner, C.,
penetrant vasopressin 1a antagonists for the potential treat- Wolber, G., Langer, T., & Odermatt, A. (2008). Discovery
ment of autism via a chemogenomic and scaffold hopping of nonsteroidal 17b-hydroxysteroid dehydrogenase 1 in-
approach. Journal of Medicinal Chemistry, 58(5), hibitors by pharmacophore-based screening of virtual com-
2275e2289. https://doi.org/10.1021/jm501745f pound libraries. Journal of Medicinal Chemistry. https://
Roy, K. (2015). Quantitative structure-activity relationships in doi.org/10.1021/jm800054h
drug design, predictive toxicology, and risk assessment. Shin, W. J., & Seong, B. L. (2013). Recent advances in pharma-
Quantitative Structure-Activity Relationships in Drug Design, cophore modeling and its application to anti-influenza
Predictive Toxicology, and Risk Assessment. https://doi.org/ drug discovery. Expert Opinion on Drug Discovery. https://
10.4018/978-1-4666-8136-1 doi.org/10.1517/17460441.2013.767795
Royal Society of Chemistry. (2015). ChemSpider. Search and Singh, J., Abraham, W. M., Adams, S. P., Van Vlijmen, H.,
Share Chemistry. Royal Society of Chemistry. Liao, Y., Lee, W. C., Cornebise, M., Harris, M., Shu, I. H.,
Rueda, M., Bottegoni, G., & Abagyan, R. (2010). Recipes for the Gill, A., & Cuervo, J. H. (2002). Identification of potent
selection of exptl protein conformations for virtual and novel a4b1 antagonists using in silico screening. Jour-
screening. Journal of Chemical Information and Modeling, nal of Medicinal Chemistry. https://doi.org/10.1021/
50(1), 186e193. jm020054e
Rush, T. S., Grant, J. A., Mosyak, L., & Nicholls, A. (2005). Smart, O. S., Horský, V., Gore, S., Vareková, R. S., Bendová, V.,
A shape-based 3-D scaffold hopping method and its appli- Kleywegt, G. J., & Velankar, S. (2018). Validation of ligands
cation to a bacterial proteinprotein interaction. Journal of in macromolecular structures determined by X-ray
Medicinal Chemistry, 48(5), 1489e1495. https://doi.org/ crystallography. Acta Crystallographica Section D: Structural
10.1021/jm040163o Biology. https://doi.org/10.1107/S2059798318002541
Salam, N. K., Nuti, R., & Sherman, W. (2009). Novel method Sotriffer, C. A., Sanschagrin, P., Matter, H., & Klebe, G. (2008).
for generating structure-based pharmacophores using SFCscore: Scoring functions for affinity prediction of
energetic analysis. Journal of Chemical Information and protein-ligand complexes. Proteins: Structure, Function and
Modeling. https://doi.org/10.1021/ci900212v Genetics. https://doi.org/10.1002/prot.22058

Sali, A. (1993). MODELLER A program for protein structure Speck-Planche, A., Kleandrova, V. V., & Cordeiro, M. N. D. S.
modeling. In Comparative protein modelling by satisfaction of (2013). Chemoinformatics for rational discovery of safe
spatial restraints. antibacterial drugs: Simultaneous predictions of biological
CHAPTER 1 Modern Tools and Techniques in Computer-Aided Drug Design 29

activity against streptococci and toxicological profiles in Vogt, M., & Bajorath, J. (2011). Predicting the performance of
laboratory animals. Bioorganic and Medicinal Chemistry. fingerprint similarity searching. Methods in Molecular Biology
https://doi.org/10.1016/j.bmc.2013.03.015 (Clifton, N.J.). https://doi.org/10.1007/978-1-60761-839-3_6
Spyrakis, F., Ahmed, M. H., Bayden, A. S., Cozzini, P., Vogt, M., Stumpfe, D., Geppert, H., & Bajorath, J. (2010).
Mozzarelli, A., & Kellogg, G. E. (2017). The roles of water Scaffold hopping using two-dimensional fingerprints:
in the protein matrix: A largely untapped resource for True potential, black magic, or a hopeless endeavor?
drug discovery. Journal of Medicinal Chemistry. https:// Guidelines for virtual screening. Journal of Medicinal Chem-
doi.org/10.1021/acs.jmedchem.7b00057 istry, 53(15), 5707e5715. https://doi.org/10.1021/
Stephenson, V. C., Heyding, R. A., & Weaver, D. F. (2005). The jm100492z
“promiscuous drug concept” with applications to Alz- Wallach, I., Dzamba, M., & Heifets, A. (2015). AtomNet: A deep
heimer’s disease. FEBS Letters. https://doi.org/10.1016/ convolutional neural network for bioactivity prediction in
j.febslet.2005.01.019 structure-based drug discovery (pp. 1e11). Retrieved from
Stepniewska-Dziubinska, M. M., Zielenkiewicz, P., & http://arxiv.org/abs/1510.02855.
Siedlecki, P. (2018). Development and evaluation of a Wang, Y., Bryant, S. H., Cheng, T., Wang, J., Gindulyte, A.,
deep learning model for proteineligand binding affinity Shoemaker, B. A., Thiessen, P. A., He, S., & Zhang, J.
prediction. Bioinformatics. https://doi.org/10.1093/bioin- (2017). PubChem BioAssay: 2017 update. Nucleic Acids
formatics/bty374 Research. https://doi.org/10.1093/nar/gkw1118
Su, M., Feng, G., Liu, Z., Li, Y., & Wang, R. (2020). Tapping on Wang, L., Deng, Y., Wu, Y., Kim, B., LeBard, D. N.,
the black box: How is the scoring power of a machine- Wandschneider, D., Beachy, M., Friesner, R. A., & Abel, R.
learning scoring function dependent on the training set? (2017). Accurate modeling of scaffold hopping transforma-
Journal of Chemical Information and Modeling. https:// tions in drug discovery. Journal of Chemical Theory and
doi.org/10.1021/acs.jcim.9b00714 Computation, 13(1), 42e54. https://doi.org/10.1021/
Sun, H., Tawa, G., & Wallqvist, A. (2012). Classification of acs.jctc.6b00991
scaffold-hopping approaches. Drug Discovery Today, Wang, R., Gao, Y., & Lai, L. (2000). LigBuilder: A multi-purpose
17(7e8), 310e324. https://doi.org/10.1016/j.drudis. program for structure-based drug design. Journal of Molecu-
2011.10.024 lar Modeling, 6(7e8), 498e516. https://doi.org/10.1007/
Taminau, J., Thijs, G., & De Winter, H. (2008). Pharao: Phar- s0089400060498
macophore alignment and optimization. Journal of Molecu- Wang, C., & Zhang, Y. (2017). Improving scoring-docking-
lar Graphics and Modelling. https://doi.org/10.1016/j.jmgm. screening powers of proteineligand scoring functions using
2008.04.003 random forest. Journal of Computational Chemistry. https://
Teague, S. J. (2003). Implications of protein flexibility for drug doi.org/10.1002/jcc.24667
discovery. Nature Reviews Drug Discovery, 2(7), 527e541. Weiner, S. J., Kollman, P. A., Singh, U. C., Case, D. A., Ghio, C.,
https://doi.org/10.1038/nrd1129 Alagona, G., Profeta, S., & Weiner, P. (1984). A new force
Teodoro, M., & Muegge, I. (2011). BIBuilder: Exhaustive field for molecular mechanical simulation of nucleic acids
searching for de novo ligands. Molecular Informatics, and proteins. Journal of the American Chemical Society.
30(1), 63e75. https://doi.org/10.1002/minf.201000122 https://doi.org/10.1021/ja00315a051
Tetko, I. V., Gasteiger, J., Todeschini, R., Mauri, A., Livingstone, D., Westbrook, J. D., & Burley, S. K. (2019). How structural biolo-
Ertl, P., Palyulin, V. A., Radchenko, E. V., Zefirov, N. S., gists and the protein Data Bank contributed to recent FDA
Makarenko, A. S., Tanchuk, V. Y., & Prokopenko, V. V. new drug approvals. Structure. https://doi.org/10.1016/
(2005). Virtual computational chemistry laboratory - design j.str.2018.11.007
and description. Journal of Computer-Aided Molecular Design. Więckowska, A., Ko1aczkowski, M., Bucki, A., Gody n, J.,
https://doi.org/10.1007/s10822-005-8694-y Marcinkowska, M., Więckowski, K., Zaręba, P., Siwek, A.,
Todeschini, R., & Consonni, V. (2010). Molecular descriptors for Kazek, G., G1uch-Lutwin, M., Mierzejewski, P.,
chemoinformatics. Molecular Descriptors for Chemoinformatics. Bienkowski, P., Sienkiewicz-Jarosz, H., Knez, D.,
https://doi.org/10.1002/9783527628766 Wichur, T., Gobec, S., & Malawska, B. (2016). Novel
Torrisi, M., Pollastri, G., & Le, Q. (2020). Deep learning multi-target-directed ligands for Alzheimer’s disease:
methods in protein structure prediction. Computational Combining cholinesterase inhibitors and 5eHT6 receptor
and Structural Biotechnology Journal. https://doi.org/ antagonists. Design, synthesis and biological evaluation.
10.1016/j.csbj.2019.12.011 European Journal of Medicinal Chemistry. https://doi.org/
Vamathevan, J., Clark, D., Czodrowski, P., Dunham, I., 10.1016/j.ejmech.2016.08.016
Ferran, E., Lee, G., Li, B., Madabhushi, A., Shah, P., Wieder, M., Perricone, U., Seidel, T., Boresch, S., & Langer, T.
Spitzer, M., & Zhao, S. (2019). Applications of machine (2016). Comparing pharmacophore models derived from
learning in drug discovery and development. Nature Reviews crystal structures and from molecular dynamics
Drug Discovery. https://doi.org/10.1038/s41573-019-0024-5 simulations. Monatshefte Fur Chemie, 147(3), 553e563.
Verdonk, M. L., Cole, J. C., Hartshorn, M. J., Murray, C. W., & https://doi.org/10.1007/s00706-016-1674-1
Taylor, R. D. (2003). Improved protein-ligand docking us- Willett, P. (2013). Fusing similarity rankings in ligand-based
ing GOLD. Proteins: Structure, Function and Genetics. https:// virtual screening. Computational and Structural Biotechnology
doi.org/10.1002/prot.10465 Journal. https://doi.org/10.5936/csbj.201302002
30 PART I Foundations and Basic Techniques of Docking

Wlodawer, A., & Vondrasek, J. (1998). Inhibitors of HIV-1 pro- Guranovic, V., Hendrickx, P. M. S., … (2017). OneDep: Uni-
tease: A major success of structure-assisted drug design. fied wwPDB system for deposition, biocuration, and valida-
Annual Review of Biophysics and Biomolecular Structure. tion of macromolecular structures in the PDB archive.
https://doi.org/10.1146/annurev.biophys.27.1.249 Structure. https://doi.org/10.1016/j.str.2017.01.004
Wójcikowski, M., Ballester, P. J., & Siedlecki, P. (2017). Perfor- Yu, W., Lakkaraju, S. K., Raman, E. P., Fang, L., &
mance of machine-learning scoring functions in structure- Mackerell, A. D. (2015). Pharmacophore modeling using
based virtual screening. Scientific Reports. https://doi.org/ site-identification by ligand competitive saturation (SILCS)
10.1038/srep46710 with multiple probe molecules. Journal of Chemical Informa-
Wolber, G., & Langer, T. (2005). LigandScout: 3-D pharmaco- tion and Modeling. https://doi.org/10.1021/ci500691p
phores derived from protein-bound ligands and their use as Yu, W., & Mackerell, A. D. (2017). Computer-aided drug design
virtual screening filters. Journal of Chemical Information and methods. Methods in Molecular Biology. https://doi.org/
Modeling. https://doi.org/10.1021/ci049885e 10.1007/978-1-4939-6634-9_5
Wong, S. E., & Lightstone, F. C. (2011). Accounting for water Zhang, L., Ai, H. X., Li, S. M., Qi, M. Y., Zhao, J., Zhao, Q., &
molecules in drug design. Expert Opinion on Drug Liu, H. S. (2017). Virtual screening approach to identifying
Discovery. https://doi.org/10.1517/17460441.2011.534452 influenza virus neuraminidase inhibitors using molecular
Xiang, M., Cao, Y., Fan, W., Chen, L., & Mo, Y. (2012). docking combined with machine-learning-based scoring
Computer-aided drug design: Lead discovery and function. Oncotarget. https://doi.org/10.18632/oncotarget.
optimization. Combinatorial Chemistry and High Throughput 20915
Screening. https://doi.org/10.2174/138620712799361825 Zhang, L., Fourches, D., Sedykh, A., Zhu, H., Golbraikh, A.,
Xue, L., Godden, J. W., Stahura, F. L., & Bajorath, J. (2003). Ekins, S., Clark, J., Connelly, M. C., Sigal, M., Hodges, D.,
Design and evaluation of a molecular fingerprint involving Guiguemde, A., Guy, R. K., & Tropsha, A. (2013). Discovery
the transformation of property descriptor values into a bi- of novel antimalarial compounds enabled by QSAR-based
nary classification scheme. Journal of Chemical Information virtual screening. Journal of Chemical Information and
and Computer Sciences. https://doi.org/10.1021/ci030285þ Modeling. https://doi.org/10.1021/ci300421n
Yang, S.-Y. (2010). Pharmacophore modeling and applications Zhang, W., Pei, J., & Lai, L. (2017). Computational multitarget
in drug discovery: Challenges and recent advances. Drug drug design. Journal of Chemical Information and Modeling.
Discovery Today, 15(11e12), 444e450. https://doi.org/ https://doi.org/10.1021/acs.jcim.6b00491
10.1016/j.drudis.2010.03.013 Zheng, S., Li, Y., Chen, S., Xu, J., & Yang, Y. (2020). Predicting
Yang, H., Shen, Y., Chen, J., Jiang, Q., Leng, Y., & Shen, J. drugeprotein interaction using quasi-visual question
(2009). Structure-based virtual screening for identification answering system. Nature Machine Intelligence. https://
of novel 11b-HSD1 inhibitors. European Journal of Medicinal doi.org/10.1038/s42256-020-0152-y
Chemistry. https://doi.org/10.1016/j.ejmech.2008.06.005 Zhou, Y., Hou, Y., Shen, J., Huang, Y., Martin, W., & Cheng, F.
Yap, C. W. (2011). PaDEL-descriptor: An open source software (2020). Network-based drug repurposing for novel corona-
to calculate molecular descriptors and fingerprints. Journal virus 2019-nCoV/SARS-CoV-2. Cell Discovery. https://
of Computational Chemistry. https://doi.org/10.1002/ doi.org/10.1038/s41421-020-0153-3
jcc.21707 Zilian, D., & Sotriffer, C. A. (2013). SFCscoreRF: A random
Young, J. Y., Westbrook, J. D., Feng, Z., Sala, R., Peisach, E., forest-based scoring function for improved affinity predic-
Oldfield, T. J., Sen, S., Gutmanas, A., Armstrong, D. R., tion of protein-ligand complexes. Journal of Chemical Infor-
Berrisford, J. M., Chen, L., Chen, M., Di Costanzo, L., mation and Modeling. https://doi.org/10.1021/ci400120b
Dimitropoulos, D., Gao, G., Ghosh, S., Gore, S.,

You might also like