You are on page 1of 31

Review

For reprint orders, please contact: reprints@future-science.com

Artificial intelligence in drug design:


algorithms, applications, challenges and
ethics
Alya A Arabi*,1,2
1
Department of Biochemistry, College of Medicine & Health Sciences, United Arab Emirates University, AlAin PO Box 17666, UAE
2
Centre for Computational Science, University College London, 20 Gordon St, London, UK
*Author for correspondence: Alya.arabi@dal.ca

The discovery paradigm of drugs is rapidly growing due to advances in machine learning (ML) and artificial
intelligence (AI). This review covers myriad faces of AI and ML in drug design. There is a plethora of AI
algorithms, the most common of which are summarized in this review. In addition, AI is fraught with
challenges that are highlighted along with plausible solutions to them. Examples are provided to illustrate
the use of AI and ML in drug discovery and in predicting drug properties such as binding affinities and
interactions, solubility, toxicology, blood–brain barrier permeability and chemical properties. The review
also includes examples depicting the implementation of AI and ML in tackling intractable diseases such
as COVID-19, cancer and Alzheimer’s disease. Ethical considerations and future perspectives of AI are also
covered in this review.

Graphical abstract: In the middle, there is a human-shaped word cloud about drug design and artificial
intelligence. The human shape represents the idea of personalized/precision medicine. The enzyme in
the right is related to Covid-19 (PDB code: 6LU7). To the left, there is my painting of a molecule from a
QTAIM study related to generating features for AI (DOI:10.4155/fmc-2017-0136). The background shows
connected neurons to represent the networks in machine learning.

First draft submitted: 10 November 2020; Accepted for publication: 10 March 2021; Published online:
29 April 2021

Keywords: algorithms • Alzheimer’s disease • artificial intelligence • cancer • challenges in AI • COVID-19 • drug
design and discovery • ethics • machine learning • neural networks • QSAR

10.4155/fdd-2020-0028 
C 2021 Alya A. Arabi Future Drug. Discov. FDD59 eISSN 2631-3316
Review Arabi

A brief historical perspective about drug design


Historically, drug discovery relied primarily on the extraction of medicine from natural products [1,2]. Serendipity
also played a key role in drug discovery. The discovery of penicillin by Sir Alexander Fleming, during the first half
of the twentieth century, was an accidental one. Fleming said: "One sometimes finds what one is not looking for. When
I woke up just after dawn on September 28, 1928, I certainly didn’t plan to revolutionize all medicine by discovering
the world’s first antibiotic, or bacteria killer. But I guess that was exactly what I did." [3] The erectile dysfunction
drug Viagra R
was also discovered by chance during a Phase I clinical trial of its active ingredient, sildenafil.
Sildenafil was originally designed for dilating the blood vessels in the heart to treat cardiovascular diseases [4]. The
discovery paradigm started to progress with advancements in, among others, the synthesis of organic molecules [5]
and the use of identification techniques such as nuclear magnetic resonance, mass spectrometry, GC-MS and
HPLC. In the early 1980s the development of computers and crystallography assisted in advancing the field of
drug design in such a way that structure-based drug discovery [6] and rational drug design [7–9] approaches gained
popularity over the ‘random searches’ approach. However, this did not last long given that, in the first quarter of
the 1990s, random high-throughput screening dominated the business of drug design. High-throughput screening
is based on scanning datasets as large as 106 –107 molecules [10] for testing their binding with a given protein. Up
to 107 –1010 scans would be performed for DNA-related datasets [11]. However, the hit rate was 0–0.01% [12].
The drug industry eventually settled on the combined efforts of rational design and the automated screening of
massive industrial databases and libraries, as is the case with the ultra-high-throughput screening. Due to the advent
of high-performance computing with exascale [13] and zettascale [14] computer powers and advanced graphical
processing units for parallel processing, and the ever-growing technology in genome sequencing, bioinformatics,
computational sciences, computational compound profiling and accurate protein structure databases, drug design
hit the era of artificial intelligence (AI) and personalized/precision medicine. At the economic level, AI is expected
to save the US medical and pharmaceutical sectors up to $100bn per year [15].

AI & machine learning


AI trains machines and/or computers to perform human tasks and to take decisions while solving a given prob-
lem. Problem solving is based on the learning experience attained through memory and adaptation, and on the
generalization skill acquired by training the machines to handle updated challenges [12]. The most primitive seed
of AI is believed to go back to 1963 when Alan Turing introduced the Turing machine. Between the 1950s and
1980s, AI was limited to the symbolic AI which is related to solving logical problems such as playing chess. In the
1990s AI became more sophisticated, involving the use of algorithms to analyze data, learn from it, predict new
properties and take decisions. In a nutshell, AI today encompasses reasoning, pattern natural language processing,
planning and machine learning (ML) [16,17]. ML is teaching the machine to complete a certain task, based on what
it learns from the relationships between raw data input and the inferences it generates, without writing explicit
codes that command expert instructions. In other words, ML is the process of using algorithms to train comput-
ers to learn from parsed data – for example, electronic health records (including patient, laboratory and billing
information), clinical data, medical imaging and ‘omics’. The training is preferably completed over multiple itera-
tions to permit accurate determinations of targeted properties and reliable predictions. ML involves computational
intelligence [18] tools such as pattern classification [19], statistical pattern recognition [20], probability theories and
statistical learning [21], clustering and neural networks. The difference between AI and computational intelligence
is that the latter is inspired by biological processes like evolution, transmitting information though neurons and
genetic algorithms. For building models with good accuracy and precision, big data and good ML techniques are
desired. ‘Big data’ is big in volume (TB up to ZB), varies very quickly over short periods of time and is highly
diverse (graphs, vectors, words, symbols, pictures etc.) [22]. In AI, a good ML model is a model that can, without
overfitting [21], generalize well to new cases that have not been implemented in the training database; in other words,
predictions from extrapolated data are reasonable, and predictions within the dataset range, or interpolations, are
usually easily achieved. One way to avoid overfitting is weight regularization (L1 or L2 versions); that is, limiting
the weights in the neural network to small figures. Another way is to mitigate the vanishing gradient problem [23]
for the gradient-based learning methods. The vanishing gradient problem can be avoided through the pretraining
method where the training is for each layer at a time, or through rectified linear activation. Overfitting can also be
minimized through the dropout method where, during training, some of the randomly selected neurons are set to

10.4155/fdd-2020-0028 Future Drug. Discov. FDD59 future science group


AI in drug design: algorithms, applications, challenges & ethics Review

K-means, IK, Fuzzy C-


means, Hierarchical,
Gaussian mixture, neural
networks and Bayesian
networks

GLM, SVR, GPR,


logistic regression,
ensemble methods and
decision trees
Unsupervised
learning Markov
decision
Regression process
Reinforcement
learning

Supervised
learning
Algorithms

Classification

SVM, discriminant Semi-


analysis, naive supervised
Bayes, k-NN learning

Deep
learning

Auto-
encoders
ANN, DNN,
FFNN, MLP, CNN,
SAE, DAE, RNN, LSTM,
CAE, VAE FFNN, GAN, RBM,
DBN

Figure 1. Cluster diagram of the machine learning types included in this summary.

zero. For images in particular, overfitting can be avoided by data augmentation, whereby transformations (rotations,
cropping, addition of noise, translations, resizing etc.) are applied on the database to increase the number of data
points. In ML applications in sciences, it is a common practice to keep parts of the training set to be used for
validation purposes. The performance of models can be assessed through metrics such as, among others, accuracy,
precision, F1 scores, AUC, correlation, kappa, normalized discounted cumulative gain, perplexity, inception score,
mean absolute errors, bilingual evaluation understudy score, receiver operating characteristic curve, peak signal-to-
noise ratio, mean reciprocal rank and mean average precision [21,24,25]. There exist several types of ML [26–28], as
will be discussed below (Figure 1).

Supervised learning (labeled data)


Supervised learning [29] is a method where both inputs and outputs can be perceived. It involves having some prior
knowledge about the classification of the data in the dataset. It learns from existing data (relationships between
inputs and outputs of a training dataset) to predict the outcomes for new cases. Supervised learning is thus a set
of predictors used to predict a certain targeted outcome. Examples of this type of ML are ‘regression’ models,
including the generalized linear model or linear regression, support vector regression, Gaussian process regression,
logistic regression, ensemble methods and decision trees. Other examples in supervised learning are ‘classification’
models, including support vector machines (SVMs), discriminant analysis, naive Bayes and nearest neighbor like

future science group 10.4155/fdd-2020-0028


Review Arabi

K-nearest neighbor (k-NN). Supervised learning models also include backpropagation neural network and random
forest.

Unsupervised learning (unlabeled data)


Unsupervised learning [30] is a method where the correct outcomes are not known. It is an iterative process for
classifying, without setting a specific target, a large amount of unlabeled data based on detecting hidden patterns
and clustering. Examples of this type of ML include K-means and a priori algorithm, K-medoids, fuzzy c-means,
hierarchical, Gaussian mixture, neural networks and Bayesian networks.

Semisupervised learning (a combination of labeled & unlabeled data)


Semisupervised learning [31] is a mixture of supervised and unsupervised learning, where both labeled and unlabeled
data are used in the input.

Reinforcement learning (learning from mistakes)


Reinforcement learning [32] is a method that aims to maximize the accuracy of algorithms through learning the
associations between stimuli and actions depending on rewards (pleasant events) and penalties (unpleasant events).
These associations are strengthened/reinforced by positive and negative reinforcers for rewards and punishments,
respectively. It is a similar approach to the supervised learning, with the major difference being the training. In
supervised learning the system is trained on a predefined dataset, while reinforcement learning trains itself based
on trial and error to make decisions under uncertainty, for example via a Markov decision process [33].

Deep learning
Deep learning (DL) [34–41] is used for complex nonlinear relationships. Unlike the traditional artificial neural
networks (ANNs), which are rather shallow networks with a maximum of three layers, DL is composed of
multilayer neural networks. ANNs were developed in 1943, inspired by the biological neural networks, and
are used in nonlinear classifications. The backbone of DL is a deep neural network (DNN) which is basically
an ANN but with many hidden layers. In DNNs, the number of hidden layers exceeds three layers of nonlinear
functions, and the input data do not need to be engineered features; that is, in DNN the data could be raw data
that get transformed to learned features, through the process of feeding the output of one layer as the input of
the following layer without the intervention of experts, a concept known as ‘automatic feature extraction’. Thus in
DNNs, the input layers are fed with input features which undergo nonlinear transformations through layers to
produce the predicted class at the output node. Given the numerous layers in the network, data can take many
different paths, causing the expressive power of DNNs to grow exponentially with the number of layers. Among
the distinguished applications of DL is a DL with a tree search that was used, in combination with reinforcement
learning, by Google to develop the AlphaGo game that defeated the best human player (deepmind.com/research/
case-studies/alphago-the-story-so-far). DL was also used in one of the top toxicity prediction software packages,
Tox21 (ncats.nih.gov/news/releases/2015/tox21-challenge-2014-winners).
DL involves learning from data iteratively through neural networks that build the so-called ‘high-level’ features
through successive layers. It tries to minimize or even omit the reliance on feature engineering. DL is meant to detect
features from extensive sets of unlabeled (or even labeled) training sets. However, DL requires large computational
power and big databases for training the numerous hidden layers. Because the hidden layers are trained by fitting,
they are viewed as a ‘black box’ that generates the output without knowing how it was generated; among the major
concerns in DL, therefore, are overfitting and difficulties in interpretability [37].
In DL, ‘transfer learning’ [42] is the process of passing the data learned from one database to a different one with
a new type of data. This allows the DL models to be reused, creating the so-called ‘pretrained models’ which are
useful for fine-tuning and feature extraction and for reducing the vanishing gradient problem. This reusability is
especially successful when the training is completed on a large dataset (as in the convolutional neural network,
discussed below) and the reuse is on a small dataset. Precomputed embedding is a transfer learning process applied
to categorical data [43]. In cases where computational resources are limited, it is possible to use the weight pruning
method, which helps to decrease the complexity of the data by using a smaller training set while maintaining the
same accuracy.
Examples of DL algorithms that are used in widespread applications are: feed-forward neural networks (FFNN)
or multi-layer perceptrons (MLPs), convolutional neural networks (CNNs), recurrent neural networks (RNNs),

10.4155/fdd-2020-0028 Future Drug. Discov. FDD59 future science group


AI in drug design: algorithms, applications, challenges & ethics Review

long short-term memory networks (LSTMs), fully connected FFNNs, autoencoders (sparse autoencoder, denoising
autoencoder, contractive autoencoder, variational autoencoder), generative adversarial networks (GANs), restricted
Boltzmann machines and deep belief networks (DBN).
In the simplest DNN – the FFNN – the data travel unidirectionally from input to output, there are no feedback
connections and data cannot go in cycles.
The fully connected FFNN is based on the FFNN. It is often used for systems with multiple features and
a few samples. Each input neuron is fully connected to all neurons of the following layer in such a way that
backpropagation is possible. Backpropagation is an algorithm through which weights can be modified to minimize
errors in predictions [44]. While fully connected neural networks have their advantages, it is possible to scale the
training of ANNs using adaptive sparse connectivity.
The CNN [45] is inspired by the design of the visual cortex. A CNN is a FFNN with multiple hidden layers
including convolutional, subsampling, normalization, pooling and fully connected layers. CNNs account for
shift-invariance and translation. It uses supervised learning to analyze data. In the CNN, not all the hidden
layers are necessarily globally connected; some of the connections are local between selected hidden layers. CNN
works efficiently in image/video/pattern recognition and it is easily trainable given the small number of training
parameters. When coupled with backpropagation, CNNs are scalable.
An RNN is also an FFNN, but with dynamic behavior that makes it highly competitive for handling data that
vary over time. The neurons are associated with time steps where predictions are made. The RNN is like a polymer
chain that is made of monomer units put together in a sequence. In other words, an RNN is a chain of units of layers
(or artificial neurons) that are sequentially connected with a delay to form a single recurrent layer. After receiving
the input, the RNN updates the hidden state information that depends on the preceding calculation. That is, in
RNNs, the hidden layers keep the information from the previous steps using the same weights and bias, to allow
persistence in information for meaningful predictions using patterns. RNNs are good for speech recognition [46].
LSTMs and gated recurrent units are refined versions of RNNs that avoid the vanishing gradient problem.
An LSTM is basically an RNN with stable gradients for updating weights. Adjustments of the weights (or
coefficients of each neuron in the network) are used in the training phase to minimize errors. The LSTM has the
capacity to store patterns in memory for longer periods while selectively controlling (through the gated memory
blocks) what to discard/remove and what information to recall/add in each step. LSTMs can handle complex
problems such as language translation and stock market predictions. They can be also used for classification
purposes [47].
Autoencoders encode data and represent them in a reduced dimensionality. This is done through unsupervised
learning to keep important data while filtering out the noise.
A GAN is based on unsupervised learning with a set of two networks, often FFNN with CNN, which keep
repeating in cycles until the desired robustness is achieved. In a GAN, one network generates data through self-
learning from input patterns and regularities, and the other one classifies the generated real versus fake data. This
is accomplished through evaluating the probability of the data being in one category given certain features, p
(category|feature). GANs are often used in cybersecurity and health diagnostics [48,49].
The restricted Boltzmann machine is a probabilistic graphical model. It can complete a binary factor analysis
while limiting the communication between layers. It is used in filtering and feature learning.
The DBN is an unsupervised probabilistic model for generative learning. In a DBN the connection is only
between the layers; the units of each layer are not connected to each other. It is used in face recognition and
motion-capture data [50].
One of the challenges in DL is interpretability. It is not understood how the machine generates certain outputs.
Repeatability/reproducibility is also a concern given that the output of neural networks depends on the input in
such a way that it is difficult for two scientists to reproduce the same results.
A plethora of methods and algorithms exists. However, there are some comprehensive mind maps and summaries
to guide the use of these algorithms depending on the research question [51,52]. Some methods may be based on
Bayesian networks instead of neural networks, and each method has its own advantages and disadvantages. In
general, Bayesian networks have the advantage of accuracy, model interpretability and multiway model evaluation.
The neural networks have the advantage of model evaluation speed and representation power. Both networks are
strong in domain knowledge incorporation [53].

future science group 10.4155/fdd-2020-0028


Review Arabi

Data quality/characteristics

Data normalization

Data collection

Data-related challenges
Heterogeneity/mismatches

Multi-objective optimization Data dimensionality

Uncertainties
Reproducibility

Data representation

Cofounders
Bias
Challenges in AI

Model appropriateness

Catastrophic forgetting

Language

AI adoption

Figure 2. List of challenges in artificial intelligence.

Challenges faced in AI
There are a number of challenges faced in AI. This section offers an elaboration on each of the following challenges:
data-related challenges, multiobjective optimization (MOO), reproducibility, cofounders, model appropriateness,
catastrophic forgetting, language and AI adoption (Figure 2).

Data-related challenges
Scientists are working on developing one-shot learning models that mimic human intelligence; that is, the model
learns from a few examples without the need for massive databases to improve the model’s accuracy. Until the full
elaboration of such models is achieved, scientists currently still rely on collecting big databases for building accurate
ML models. However, the accuracy of ML models does not solely depend on the size of the database used. It is
well understood that big data need big theory [54] and that the data quality (e.g., primary vs derived or high-level,
experimental vs observational) matters. In ML the output of any step is often dependent on the quality of the
input. Ideally, the dataset should be built from computed or measured values that are associated with minimal
experimental errors. Also, upon combination of different forms of primary data, normalization across different
platforms is mandatory to avoid systematic errors [55].
In ML, the data should be balanced and FAIR (findable, accessible, interoperable, reusable; or sometimes ‘fully
AI ready’) [5657] and ALCOA (attributable, legible, contemporaneous, original and accurate) [58]. It is a known
dilemma that almost 80% of ML research time is spent on data collection, cleaning and processing, while the
actual use of algorithms accounts for 20% of the process [59]. In drug design, to save time in the data collection
process, one can use tools that facilitate data extraction, such as (among many others) the Kernel-based approaches
for drug–drug interactions [60], CHEMDNER for recognizing the chemical compound and drug name [61,62],
OSCAR for chemical text mining [63], tmChem and ChemSpot for recognizing chemicals in patents [64] or ML for
recognizing similarities in sentences which can be useful in evidence-based medicine [65].
Extracting value from data sorted together from various experiments creates the challenge of updating meta-
analyses to concur with the novelty of the platforms and datasets. It also creates the difficulty of controlling the
complex noise arising from combined data. The levels of complications arising from mismatches and heterogeneities

10.4155/fdd-2020-0028 Future Drug. Discov. FDD59 future science group


AI in drug design: algorithms, applications, challenges & ethics Review

in the databases can easily amplify. For example, there will be technical issues like format/syntax/semantics
misalignments or vocabulary mismatches, making it difficult to build, for example, datasets with ontology coherence.
This challenge is a result of the quick turnover in terminology and the frequent introduction of new terms/acronyms
to the literature [66,67]. In addition, data dimensionality, evaluated by its arity, could be bulky enough to cause
challenges, especially as the connectivity of the data increases and complicates the networks [68]. In fact, the
combination of possible data points increases exponentially with the level of integration at a rate of 2n −n−1 with
n(n−1)/2 possible connections for n points.
There are also scientific challenges where numbers from different methods may not be reproducible, or may not
be comparable due to the difference in time scales spanning a range of orders of magnitude (e.g., from nanoseconds
in molecular dynamic simulations in silico to hours in metabolic activities in vivo). Datasets could also have
uncertainties; they may pertain to in vitro measurements that are not well connected with in vivo assays. There
could be a lack of translatability whereby experimental results may not be consistent with in vivo evaluations. This
is why obtaining appropriate datasets could be a serious challenge. This highlights the need for creating centralized
systems with systematic data records that could be easily manageable. The Swiss Institute for Bioinformatics
offers a list of publicly available datasets (www.click2drug.org/index.php#Databases). Some useful, yet partially
confidential databases can be found through the SALT Knowledge Share Consortium (www.medchemica.com/t
he-salt-knowledge-share-consortium), EU Innovative Medicines Initiative (www.imi.europa.eu) and the ATOM
Consortium (https://atomscience.org). It is worth noting that, as confirmed by the EU-funded project ExCAPE,
the ML methods trained on public data are often transferrable to the industrial databases [69].
The variety of data representation is yet another challenge. Data representation is a key factor that may affect the
accuracy of the model even more than the algorithm itself. Evolutionary sciences, with their tree representations,
images, text, assays and biometrics, may not be as suitable for a principal component analysis as omics and
quantitative structure–activity relationship (QSAR) platforms would be, given their representation in matrices.
Common vector space representation is popular for pattern recognition and classification and it helps sort the
biological information into ML for drug design [70]. However, it is not always useful, especially in applications with
small diverse molecules where vectors do not have the same size. Thus the representation of small molecules, as
in drug design applications, remains a challenge. Small molecules could be represented using various approaches:
simplified molecular-input line-entry system (SMILES), which shows the kinds of atoms and their connectivity (see
reference [71] for SMILES canonicalization); extended-connectivity fingerprints, which reflects the topology of the
molecules, making it useful for structural activities [72]; Coulomb matrix, which presents the nuclear charges and
their co-ordinates; grid featurizer, which shows structural details of the drug and its receptor and intermolecular
forces, making it useful for predicting binding affinities; symmetry functions, which also show structural features,
namely distances and angles; graph convolution, which represents (using vectors) the kind, valence and hybridization
of atoms surrounding each center (kGCN is a graph convolution neural network for chemical structures [73]); and
weave featurization, which also summarizes (in vector form) information about pairs of atoms – the distance
between them, ring structures and so on [74]. Molecular structures can be better represented as labeled graphs,
kernel functions, trees, fingerprints and/or molecular holograms. Fingerprints are 100- to 1000-bit strings that
encode small molecules as a binary vector (1/0). In other words, they encrypt yes/no options for the availability of
various features, for example, the presence of a side chain in the molecules.
A critical step in data processing in ML is the selection of the training set. Sieg et al. highlighted cases where
ML can be unnoticeably biased because of the choice of the training database, especially in structure-based virtual
screening applications [75]. A common procedure to solve this issue is the cross-validation approach, whereby a big
database is split, over multiple folds, to training and testing subsets. An acceptable splitting ratio is usually 80:20 for
training:testing. Overfitting should be avoided as it negatively affects the predictive power of the model, especially
for the cases that were not included in the training set.

Multiobjective optimization
MOO is a serious challenge in drug design [76,77]. MOO could be looked at as an inverse QSAR problem
where the activity is known and the reverse engineering is completed to match the optimal structure for it.
Pharmaceutically, not only is the drug potency important, but also many other factors should ideally be optimized
simultaneously. These include Lipinski’s rule (the rule of five), combined ligand efficiency, selectivity, solubility,
hydrophobicity/hydrophilicity, toxicity, metabolic stability, permeability and other properties related to absorption,
distribution, metabolism, excretion and toxicology. This makes the task of the AI searching networks a rather

future science group 10.4155/fdd-2020-0028


Review Arabi

complex one, especially that improving one property can often happen at the cost of compromising the quality
of the others. The lack of obedience to Lipinski’s rule is often among the major reasons for the failure of drug
development at the clinical level. The failure rate reaches 40% for improper pharmacokinetics, while it is 7% for
inadequate absorption, distribution, metabolism and excretion properties [78].

Reproducibility
One of the main pillars in research is reproducibility; however, reproducibility remains a major defect in AI thus
far. It is not possible to reproduce outcomes predicted by AI, and this is a notable issue that must be addressed with
high priority [79,80].

Confounders
It is possible not to account for some hidden variables, and therefore the model would likely predict untargeted
properties, or confounders. The solution to this problem is randomization [81] or building one ML platform for
the target property with as many ML models as possible to detect the effect of the confounders.

Model appropriateness
The challenge in ML is finding appropriate models for feature selection, deciding on the model to be used for
predictions and the optimization of hyperparameters. For non-experts in AI, automated ML protocols exist to
automatically determine a pipeline for best learning protocols (i.e., data preparation, feature selection, splitting of
data into training validation and holdout, training the model, result analysis, evaluation and deployment integration
into decision-making processes) without explicitly involving the optimization of each step.

Catastrophic forgetting
Artificial neural networks can easily and quickly forget what they learn. A derived solution to this issue is the
generative replay, which mimics the concept of reactivating patterns in the neuron activity. However, this solution is
useful only for simple tasks and in incremental learning. A more advanced solution is a new version of replay, which
is inspired by brain functioning [82]. This model has a good memory for complex tasks like continual learning, even
when the model does not save information.

Language
Python is commonly used in AI for its simplicity and because it is widely supported. In addition, there are a
wide number of libraries available for numerous tasks. However, it cannot be as quick as C or C++, nor does it
have strong support for parallel processing. This is why DL is sometimes coded with C while providing Python
wrappers. Julia is a new, emerging language that is easy and fast. It also has good libraries for DL, for example, Flux
(https://fluxml.ai/Flux.jl/stable/).

AI adoption
Knowledge management [83] is key for bridging between sciences and business. In order to convince stakeholders
and decision makers in the business sectors to adopt the technologies and discoveries, it is important to fill the skill
gap for business employees, to facilitate access to the infrastructure needed to implement AI, and to highlight the
value of AI and the potential return on investment it may offer. McKinsey & Company reported, as of November
2019, a year-on-year increase of 25% in AI adoption in various businesses. As a result of AI adoption, the reported
decrease in costs exceeded 20% and the reported increase in revenue exceeded 10% (www.mckinsey.com/f eatured-
insights/artif icial-intelligence/global-ai-survey-ai-proves-its-worth-but-few-scale-impact). For further discussions
on the challenges and considerations of AI implementation especially in the health sector, the reader is referred to
the 2019 paper by He et al. [84].
AI has reached advanced levels beyond theoretical investigations. It has been employed in myriad applications,
particularly in the health sector, medicinal sciences and pharmacology, including: using virtual in silico screening
for ligand–receptor complementarity and bioactivity [85]; developing new drug molecules or new molecules with
potential biological activity [6,85–87]; investigating new biomarkers [88]; improving chemical syntheses [89]; associating
certain diseases with specific genes/receptors; multidimensional analyses for structure–activity relationships and
druggability [89,90]; connecting gene expressions with clinical trial results [91]; and understanding the mechanisms
of many biochemical phenomena, all of which are significant enough to involve companies such as Google and

10.4155/fdd-2020-0028 Future Drug. Discov. FDD59 future science group


AI in drug design: algorithms, applications, challenges & ethics Review

IBM in this business. Other companies that have implemented AI in drug design include Atomwise, Recursion,
BenevolentAI, In-silico Medicine, Exscientia, twoXAR, Data2Discovery, Insitro and Collaborations Pharmaceuti-
cals. There are also 3D printing companies that engage AI in their projects, for example, the Zipdose at Aprecia
Pharmaceuticals. AI also feeds into the creation of imaging for augmented reality and virtual reality, which are used
in visualizing 3D interactions and drug–receptor interactions for drug design. The implementation of AI in drug
design is a constructive tool given that, from Phase I clinical trials to the final approval of the drug, the success rate
without AI does not exceed 6% [92].

AI & ML in discovering drugs & predicting drug properties


Binding affinities & interactions
In many medicinal and targeted drug-design applications, the binding site of the receptor would be known.
However, in some applications, the binding site remains to be determined. In the early 2000s a Canadian/German
team published a neural network algorithm that is capable of identifying binding sites in macromolecules by
classifying fragmented overlapping patches into four categories of protein binding: protein (this class includes
proteins and peptides with more than eight amino acids); DNA (this class includes DNA and RNA); ligand (this
class includes anything that is not in the other classes); or nonbinding [93–95]. This algorithm involves multiple
stages: molecular features are described based on fuzzy set theory and molecular surfaces, the algorithm determines
the orientation of the substrate in its receptor, and the receptor is analyzed in fragments to predict the kind of
substrate it can bind to. A total of 1,255,853 surface domains were considered for both training (13,994 surface
domains) and testing purposes. The network was fed with six molecular properties as input: local lipophilicity,
H-bond donor density, H-bond acceptor density, electrostatic potential, surface topography index and cavity depth.
The accuracy rate ranged from 76 to 90%. One drawback of this method is that the initial steps involve visual
inspection to identify potential binding sites. Another drawback is that only large binding sites are considered,
with the assumption that small binding sites do not form stable complexes with binding molecules. There also
exist other ML models for recognizing protein allosteric states and residues [96].
Efforts have been also put toward identifying binding sites of ions based on amino acid sequence information [97].
Using the position weight scoring matrix algorithm, the accuracy of finding binding sites for some metals (Zn2+ ,
Cu2+ , Fe2+ , Fe3+ , Ca2+ , Mg2+ , Mn2+ , Na+ , K+ and Co2+ ) exceeded 80%.
HS-pharm, Pharm-IF and DeepSite implemented a ML technology to predict 3D pharmacophore properties [98].
HS-pharm aimed at reducing the number of features in the ML algorithms built for predicting cavity atoms that
are key for binding ligands [99]. Pharm-IF implemented ML to rank the docking poses of small molecules [100].
DeepSite used AI to anticipate, based on images, the druggability at protein binding sites [101].
Coveney’s group built two models [102,103] for accurate and reproducible binding affinities that are implemented
in ML [102]. The models are thermodynamic integration with enhanced sampling (TIES) [104,105] and enhanced
sampling of molecular dynamics with approximation of continuum solvent (ESMACS) [104–106]. The strength of
these models is the use of ensemble molecular dynamics (MD). One-trajectory MD simulations, even if run over
an extended period of time, cannot yield the accuracy achieved by running multiple short ones [104,106–108]. This
is because macroscopic properties such as the free energy will be calculated as an ensemble average of microscopic
states. TIES is an alchemical protocol of an ensemble of MD simulations. It has the advantage of quantifying
uncertainties and therefore controlling errors. The repeated simulations are performed on intermediate states
determined through alchemical transformation between two ligands. The first simulation will be on the first ligand
(coupling parameter = 0) and the last one will be on the second ligand (coupling parameter = 1). This method was
tested on 30 ligand–protein systems, with a mean absolute error of only 0.7 kcal/mol [104]. There exist two versions:
the one-trajectory and the three-trajectory simulations. In the former, only the protein–ligand complex is optimized,
in the latter the free ligand and free protein are optimized, in addition to optimizing the protein–ligand complex. The
three-trajectory simulations account for the so-called deformation energy that results from complexing the protein
and ligand together. TIES is more accurate, but almost three-times more computationally expensive than ESMACS.
The ESMACS is an end point protocol based on an ensemble of short molecular mechanics/Poisson–Boltzmann
surface area simulations. Despite its accuracy over the molecular mechanics/Poisson–Boltzmann method, which
itself is more accurate than docking, ESMACS is not as accurate as the more computationally expensive exact
alchemical approaches such as the free energy perturbation or thermodynamic integration methods [109,110]. The
adaptive ensemble algorithms were an improvement to ESMACS and TIES that led to a 2.5-times reduction
in the core hours needed to complete the simulations [103,111,112]. The integrated and scalable prediction of

future science group 10.4155/fdd-2020-0028


Review Arabi

resistance (INSPIRE) project is a project that targets personalized medicine for precise cancer treatment using ML
in combination with molecular dynamics protocols like ESMACS and TIES (www.compbiomed-conference.org/
wp-content/uploads/2019/07/CBMC19 paper 42.pdf ).
eSimDock is yet another tool to predict binding affinities after the step of molecular docking. The accuracy of
ligand ranking is improved through the integration of scoring functions using nonlinear ML. Testing this method on
the Cambridge Crystallographic Data Centre dataset resulted in a root mean square deviation (RMSD) of <3 Å for
67.9% of the predictions in binding poses. It was also found that eSimDock is insensitive to distortions/deformations
in the structure of the target, making it a flexible tool which can be used irrespective of the quality of the protein
structure. This, however, comes at the cost of compromising the quality of the selectivity option. eSimDock is ideal
for screening similar scaffolds with different R groups [113].
3D-QSAR is a QSAR model where conformational descriptors are provided; it is widely used in medicinal
chemistry to predict binding affinities. Examples of linear versions of this model include the widely used comparative
fields analysis model and the autocorrection molecular electrostatic potential (autoMEP) vectors coupled with the
partial least-square (PLS) or the autoMEP/PLS method [114,115]. Examples of nonlinear versions of this model
include the autoMEP coupled with the response surface analysis (RSA), or the autoMEP/RSA method, the
ANNs and the support vector machines. The linearity/nonlinearity refers to the kind of correlations between
the descriptors and the response space. The autoMEP/PLS and autoMEP/RSA methods were tested on Gly/N-
methyl-D-aspartate receptor antagonists [116]. With a training set of 62 molecules and a test set of six molecules
(and four additional molecules), it was found that the cross-validated r, after leave-one-out analysis, was 0.81 with
PLS and 0.99 with RSA.
It is beyond the scope of this review to discuss the plentiful models available for evaluating binding affinities
(including, but not limited to, free energy perturbation, quantum mechanics/molecular mechanics (QM/MM)
free energy perturbation, thermodynamic integration and slow growth). There are a number of studies that highlight
the advances in modeling binding affinities and reviews about the methods available for affinity predictions [117–123].
Several ML and DL models have been developed to evaluate various kinds of interactions. In one, a random
projection ensemble classifier was used to determine new protein–protein interactions according to amino acid
sequencing. The method was then tested on three databases, using fivefold cross-validation. The accuracies reported
ranged from 88 to 97% depending on the dataset used [124].
In an application on cell signaling allosteric proteins (PDZ2), a decision tree model resulted in a prediction
accuracy of 75%, while that of an ANN model was 80%. These models could distinguish between the bound and
unbound states based on the Cα distances and the torsion angles in the backbone, although the 3D shapes of these
two states are rather similar as evaluated through experimental data and MD simulations [96].
Another study focused on the prediction of P-glycoprotein inhibitors using classification models based on ligand
and structure. The classification was based on IC50 values and the multidrug resistance ratio, which is equal to E50
in the absence of adriamycin/E50 in the presence of adriamycin, where E50 is the effective dose that has therapeutic
effects on 50% of the sample. A dataset of 2548 compounds was collected from the literature. The dataset was split
into training and test sets using D-optimal onion design. A total of 62 descriptors, 166 molecular access systems
and 307 substructure fingerprints were used. SVMs and random forest (RF) performed better than k-NN and
QSAR. The SVM accuracies were 73 and 75% for inhibitors and noninhibitors, respectively; the accuracies were
61% based on scoring functions from docking (using MMFF94x force field) [125].
RPITER is a hierarchical DL framework that includes CNN and stacked autoencoder. Using RPITER, protein
interactions with noncoding RNA can be predicted fairly accurately, with AUC ranging from 0.82 to 0.99 [126].
Probabilistic matrix factorization was employed in the prediction of drug–target interactions [127]. This is
a method that needs large amounts of data input for accurate predictions and is therefore potentially good for
applications with a plethora of data (e.g., enzymes and ion channels). However, neither G-protein-coupled receptors
nor nuclear receptors are good systems to be studied using this method given the scarcity of sufficient data to feed
the prediction platform.
In another study, compound–protein interactions were targeted [128]. In this study, multichannel pairwise input
neural networks were developed to benefit from sparse data in the literature. The authors found that using protein
features results in better accuracies than using compound features.
ML models have benefited from the advances in structure-based drug design [129–131] that involve 3D structural
information of small molecules as inhibitors. The challenge of the interpretability of the model based on feature
relevance has been addressed by training the model on selected protein–ligand interaction fingerprints and by

10.4155/fdd-2020-0028 Future Drug. Discov. FDD59 future science group


AI in drug design: algorithms, applications, challenges & ethics Review

evaluating the importance of the features using Shapley values. The features used included interatomic distances,
topological features from graphs and substrate–receptor interactions. Given its abundance of available data, HIV-1
protease was chosen for building the model. This will certainly have the limitations of applicability to exclusively
similar proteins. The dataset was built from nine common core structures including those of the US FDA-approved
inhibitors amprenavir and darunavir. Four algorithms were benchmarked, and although the accuracy levels were
similar, the gradient boosting decision trees were the best compared with SVM, gradient boosting and random
forest. Unlike the hydrogen bonding which seems to be prominently reported as an important feature [132,133],
the van der Waal interactions were among the more important features in this model [134]. It is worth noting that
solvation, which plays a key factor in H-bonding interactions, is usually not included in most models. There exists
another model, namely an SVM model, that was developed to anticipate the weakest bonds to be broken by HIV-1
protease in oligopeptides. The model has a 100% self-consistency and a good prediction rate of 87% [135].
In another study focusing on the prediction of drug specificity and drug–target interactions,
chemogenomics/proteochemometrics were used to predict off-target proteins to which candidate drugs could
bind. Deep (with data augmentation techniques like multiview and transfer learning) and shallow (based mainly on
expert knowledge) learning methods for chemogenomics were explored. Sequence encoders were used to learn com-
binatorial compound and protein representations that could be fed into the built chemogenomics neural network
as an FFNN. Shallow learning outperformed DL on a small dataset like DBColi with a maximum of a few thousand
interactions. For extensive datasets like DBHuman, with more than 10,000 interactions, DL outperformed shallow
learning [136].

Solubility
In medicinal chemistry, solubility is one of the key properties of a drug. Boobier et al. found that human predictions
of solubility for drug-like organic molecules are comparable to the ML predictions [137]. Such results give credibility
to the use of ML, especially if the amount of time and efforts required for this use are reasonable.
Using ab initio electronic structure theory, such as density functional theory (DFT), with the conductor-like
screening model (COSMO), Klamt et al. developed COSMO-RSol [138]. COSMO-RSol is meant to predict the
equilibrium constants not only for liquid–liquid and liquid–vapor (as in COSMO-RS [139]), but also for solid states
by considering the energy of fusion. The model was trained on the aqueous solubilities of 150 drug-like neutral
molecules. The RMSD was only 0.66 log-units despite the small number of fitting parameters, namely three. The
model was then tested for the aqueous solubilities of 107 highly diverse structures of neutral pesticides; the RMSD
was only 0.61 log-units, with indications that the error came mainly from experimental errors. The method has
the advantage of predicting solubility in any solvent or mixture of solvents. Another advantage is the ability to
calculate, from the results of the same simulations, other properties like Henry constants, vapor pressures and
partition coefficients. The main disadvantage of slowness, which prohibits its use in high-throughput screening,
was solved using COSMOfrag [140].
Many other studies have investigated the prediction of solubility using quantitative structure–property rela-
tionships (QSPR) [141–143]. QSPR and QSAR are similar in concept, with the focus being on properties in the
former and activity in the latter. For example, one study was about predicting solubilities with respect to changes in
temperature using QSPR [141]. This model may perform slightly better than COSMO-RS. However, the latter has
the advantage of being more robust given its greater dependence on theory (both thermodynamics and quantum
chemistry) rather than parametrization. This model is based on, first, the parametrization of a linear equation that
correlates the lg of the solubility with temperature. Then the model uses random forest to develop a twofold QSPR
model. This approach was tested on 421 organic compounds with an R2 value of 0.97 on the test set and a root
mean square error (RMSE) of 0.35.
In the context of predicting solubility and classifying aqueous solubilities, Cheng et al. developed an SVM coupled
with a reduction and recombination method for selecting features [144]. The model was trained on 41,501 molecules
from the PubChem BioAssay database. It was tested on an internal set of 4510 compounds (accuracy = 83%) and
an external set of 32 drug-like molecules with a maximum accuracy of 61% [145].
Another method was developed to predict the solubility of molecules in dimethyl sulfoxide, with positive
predictive values of 99% [146].

future science group 10.4155/fdd-2020-0028


Review Arabi

Toxicology
Although in vivo toxicology tests are ultimately still inevitable, in silico evaluations of toxicity help to bypass
the excessive cost and labor time and the ethical issues of testing on animals. Toxicity may be evaluated through
measuring the drug dose that kills half of the population tested; that is, the LD50 . The toxicity can be tested
through acute (one-time) or multiple exposures, at various doses, via different intake routes. The current databases
for toxicity include TOXNET, ToxCast, Tox21, PubChem, DrugBank, ToxBank Data Warehouse, ECOTOX and
SuperToxic [147]. One of the noteworthy open platforms for in silico toxicity prediction is OpenTox (opentox.
net) [148]. OpenTox Framework aims at transparency, interoperability and extensibility, and it involves state-of-
the art subunits and models that follow regulatory guidelines and Organisation for Economic Co-operation and
Development principles. This tool offers means of generating features for ML in QSAR and other models for
toxicity.
In the effort to improve in silico predictions of drug toxicity, multiple computational models have been de-
veloped [147,149–152]. These include structural alerts and rule-based models, read-across, dose–response and time–
response models, pharmacokinetic and pharmacodynamic models, uncertainty factors models and QSAR models.
Raies and Bajic have extensively reviewed these models [149]; therefore the studies reported in this section will focus
on content that was not covered in their review. In a more recent review [147], it was shown that shallow architectures
(with AUC up to 0.95) could outperform DL (with AUC up to 0.87). This is likely because of the lack of databases
that are adequately large for the rather demanding multilayered DL methods. This suggests the need to leverage
more data to build better ML models.
The tissue metabolism simulator (TIMES) model is a sophisticated model that involves more ‘theory’ of toxicity.
It considers the metabolic pathways that may detoxify drug molecules after certain metabolic transformations. About
30 metabolic transformations were considered, including dehalogenation, hydrolysis, tautomerization, isomeriza-
tion, oxidation, dealkylation, hydroxylation, acetylation, epoxidation, glucuronidation, sulfation and conjugation
reactions [153]. This simulator is not the first of its kind. There are five variants of computational tools that involve
biochemical transformations and transformation rates – group contribution, hybrid, explicit, rule-based and statis-
tical [153] – and TIMES belongs to the hybrid category. In addition to abiotic reactions along with probabilities of
the occurrence of the transformations under specific conditions, this simulator has a library of potential biological
metabolic transformations. The model transformation uses rate data or transformation probabilities translated from
metabolic maps, and then it measures the toxic effects. The TIMES model was tested for its ability to simulate
activated metabolites through mutagenicity end point. A subset of 860 compounds was collected from the Procter
& Gamble genotox/carcinogenicity database (T100 model). The sensitivity was not too high at 77%, but the
majority of the false positive predictions are actually positive in similar tests for genotoxicity or carcinogenicity.
In another study, QSAR models were developed to predict the toxicity of 221 phenols in the ciliated protozoan
Tetrahymena pyriformis [154]. ML methods such as k-NN, SVM and classification trees were used to classify the
molecules in one of the four modes of toxic action: polar narcotics, weak acid respiratory uncouplers, proelectrophiles
and soft electrophiles. The accuracy of the tested methods exceeded 90%, which highlights the benefit of using ML
for prediction models. The only exceptions were the linear and logistic regressions, which resulted in accuracies of
79 and 87%, respectively. The highest accuracy achieved in this work is 97%, while the best accuracy reported in
other similar studies does not exceed 93%.
To predict hepatotoxicity or drug-induced liver injury, various methods (logistic regression, SVM, gradient
boosting, adaboost, Xgboost, ExtraTrees, Lightgbm and Catboost) were built. A dataset of 450 molecules from
the DILIrank was split 400:50 for training:testing, and fivefold cross-validation was implemented. The whole
process was replicated 100 times. Using FDA data, the most relevant eight out of 12 fingerprints were selected.
Seven molecular descriptors and the top five out of nine base classifiers were used. Ensemble vote classifier was
used to calculate the weight ratio of molecular fingerprints and molecular descriptors, with a threshold of 0.5. An
accuracy of 82% was reported, with an AUC of 0.80 [155]. Ancuceanu et al. used different ML algorithms with
different descriptors to develop 267 QSAR models that predict hepatotoxicity or drug-induced liver injury from
the structure of the drug [156]. After excluding the drugs labeled as ambiguous, they used a total of 791 molecules
from the DILIrank database [157]. A total of 3839 molecular descriptors and 19 blocks of the same descriptors were
computed. For feature selection, 17 different tools were used. For classification, 11 algorithms were used. Nine
features were commonly selected. A nested cross-validation approach was used, with ten repeated cycles. The range
of sensitivity was 80–95%, but the specificity was closer to 50%, and the accuracy in predictions was roughly 75%.

10.4155/fdd-2020-0028 Future Drug. Discov. FDD59 future science group


AI in drug design: algorithms, applications, challenges & ethics Review

The top models were applied on the molecules in the ZINC database to find that 20% of its molecules do not
cause hepatotoxicity. This work may serve as an illustration to the fact that big data is only a tool, and big theory
is needed along with it to reach distinguished results [54].

Blood–brain barrier permeability


The penetration of the blood–brain barrier (BBB) can be predicted computationally with 75–97% accuracy (for
penetrating molecules and nonpenetrating molecules) using 67–199 descriptors [158]. Zhao et al. reported that
although the accuracy for the nonpenetrating molecules is smaller (60–80%), this bias in the statistical learning
methods was resolved by using recursive feature elimination to choose the features through only 19 molecular
descriptors which included polarizability, polarity-related properties, hydrogen bond properties, volume, weight,
surface area, bond rotations and pKa. The molecules in their training set were classified with an accuracy that
exceeded 90% [158]. Their model predicted the BBB penetration of the molecules in a test set with an accuracy that
exceeded 95%.
In another study, a decision tree induction ML method was built to predict BBB permeability [159]. This model
had a good accuracy, with a successful classification rate of 90%.
In another study, using 7–37 descriptors selected through automatic relevance determination and expert knowl-
edge, a Bayesian regularized neural net was built to predict partition coefficients on an external test set with an R2
range of 0.59–0.65 [160].
Garg and Verma built a multilayer perceptron ANN which uses seven descriptors, the most important of which
are molecular weight and the topology of the polar surface area. This model predicted, on a test set, the BBB
ratios with a correlation coefficient of 0.89 [161]. A slightly better accuracy was achieved using a Kohonen self-
organizing map ANN on a database of 1336 penetrating molecules and 360 nonpenetrating molecules [162]. With
five parameters (including charges and electronegativities) out of 55 descriptors, the accuracy in prediction reached
97.2 and 90.3% for the penetrating and nonpenetrating molecules, respectively. Using another MLP ANN model,
the BBB penetration was predicted with a correlation coefficient of 0.87 [163].

Chemical properties
Nonparametric statistical RF regression, namely the Pfizer charge assignment method, was used to predict the
charges of the elements H, C, N, O, F, S and Cl at the quality of ab initio methods [164]. The charges were fitted to
those generated from the electrostatic potential of B3LYP/6-31G* simulations. The predicted charges were refined
using a least square fit approach that includes a constraining harmonic penalty function. The training was performed
on 80,000 molecules from the Pfizer and ZINC databases, while the test set comprised 5000 molecules from the
same databases. The model was used to predict the hydration free energy of 210 molecules that were not part of the
training dataset [165,166]. The reported mean unsigned error of the predicted hydration energies was 1 kcal/mol with
an R2 of 0.90. The reported RMSE was 0.0052 atomic units and 1.2 D for the electrostatic potential and dipole
moment, respectively. This result outperforms the charges estimated from empirical or semi-empirical methods,
although the latter have the advantage of the charges being automatically reported in the output files of practically
any simulation.
Pereira et al. used ML to predict the dipole moments evaluated by DFT, using 10,071 molecules from the ZINC
and GDB-13 databases [167]. The best ML model (among RF, SVM, MLP and Gaussian radial basis function)
was the RF regression model. Testing it on 3368 external molecules that were randomly selected from the original
dataset prior to training gave a mean absolute error of 0.44 D, a RMSE of 0.68 D and R2 of 0.87.
There are also ML models for predicting the strengths of hydrogen bonds for hydrogen bond donors and
hydrogen bond acceptors. The quantity to be predicted is the quantum chemical free energy of the hydrogen
bonding of the molecules with acetone (in a 1:1 solution) and 4-fluorophenol (also in a 1:1 solution) [168]. A total
of 3326 hydrogen bond acceptors (filtered from 276,004 molecules) and 1088 hydrogen bond donors (filtered
from 50,268 molecules) were obtained from the pKBHX database and Varnek [169]. The ML, with tenfold cross-
validations, predicted free energies with RMSEs (compared with experimental values) of 3.8 and 2.3 kJ/mol for
hydrogen bond acceptors and donors, respectively.
Litsa et al. developed an automated ML guided by atom mapping to evaluate the bond stability based on its
environment [170]. It subsequently predicts the mechanism by determining the bonds that will make and break based
on their strengths. It was trained and tested on 7782 reactions and was found to be accurate even on unbalanced

future science group 10.4155/fdd-2020-0028


Review Arabi

reactions. The percentage error in predictions ranged from 0.24 to 21.0% (with the second-largest percentage being
14.3%) for the six classes of reactions classified based on the enzyme catalyzing the reaction.
QSAR is one of the most heavily used tools to predict chemical properties. QSAR correlates chemical properties
to the biological activity of medicines. The correlation is particularly between the structural and chemical properties
or descriptors of smaller submolecular moieties of drugs and their biological activity. Gilles Klopman was among
the first scientists who studied, back in 1984, AI and structure–activity relationships [171]. The descriptors in
QSAR [71,116,172–193] include steric bulkiness, hydrophobicity/hydrophilicity, electronic and quantum properties
such as electron densities, volumes, charges, electron populations and bond/ring/cage critical points. The atomic
descriptors can be obtained from quantum partitioning schemes such as the quantum theory of atoms in molecules
(QTAIM) [194]. QTAIM uses the wavefunction of the system to extract information from it. Computing or predicting
the wavefunction of a system is equivalent to predicting ‘everything’, because myriad ground-state properties can be
derived from it. Schütt et al. used the SchNOrb DL framework to predict the quantum mechanical wavefunction
of molecules at the computational cost of much cheaper methods (compared with ab initio methods) such as force
fields [195]. SchNOrb is SchNet for orbital, where SchNet is a deep tensor neural network [196]. This work is a
substantial improvement in computational chemistry, because the bottleneck in this field is obtaining accurate
quantum evaluations of a system at affordable costs.
Many ML methods were operated to model QSAR and then predict various properties: P2X7 antagonist
activity [197], mutagenicity and carcinogenicity [179,198,199], inhibition of Plasmodium falciparum Dd2 and inhibition
of hepatitis C virus [199], Free–Wilson analysis [184] and more [182].
Bioisosteric substitutions in drug design [200,201] have been studied using QTAIM. Bioisosteres are groups of drug
molecules that can be interchangeably used without affecting the biological activity of the drug, while enhancing
the pharmacokinetic and pharmacodynamic properties of the medicine. Quantum properties of several bioisosteres
of carboxylic acid, including tetrazole, tetrazol-5-one, methylsquarate, isoxazole, oxadiazole, oxazolidinedione,
thiazolidinedione and sulfonamide, have been reported [202–205]. The aim of these studies was to show that
bioisosteres share the same average electron densities, which are deemed useful as a new feature in ML to develop
new bioisosteres. Another atomic property, which could be useful as a feature in AI, is the energy contribution
of each atom in a molecule to the dissociation energy of certain bonds in the molecule. Previous studies have
highlighted the contribution of all atoms in a molecule to the bond dissociation energy of energy-rich compounds,
for example, the homolytic cleavage of an inorganic phosphate and the heterolytic hydrolysis of ATP [206–208].
QSAR applications use, among others, random forests, linear regression models, Bayesian neural networks, DNN
and SVM. Among the best QSAR learning methods is random forest [176]. There are also one to six dimension-based
QSAR methods (1D–6D). The 1D models are used for predictions of global molecular properties (e.g., pKa and
log P), and 2D-QSAR models are used to predict 2D pharmacophores. The 2D- and 3D-QSAR models correlate
activity with structural patterns and noncovalent interactions, respectively. The dimension of ensemble ligand
configurations is implemented in 4D-QSAR. 5D-QSAR represents various induced-fit models in 4D-QSAR. In
6D-QSAR the dimension of solvation is implemented [175]. A simple model was developed to predict Mary’s
experimental electrophilicities of molecules using regression tools and decision-tree models. The input features
included basic and composite global conceptual DFT descriptors, basic and composite atomic conceptual DFT
descriptors and QTAIM descriptors at the atomic level; for example, the localization index and atomic energies and
topological features of molecular graph using graph theory (extracted treelets) [209]. Testing the gradient boosting
decision tree on 11 electrophiles, the mean absolute error reported was 0.72, with a coefficient of determination
equal to 0.98.
In another study, three open-source QSAR models were built to predict the pKa of acids and bases: SVM
in conjunction with k-NN, extreme gradient boosting and DNN [186]. PaDEL was used to produce molecular
descriptors, fingerprints and fragment counts. KNIME was used for curating and standardizing the structures.
DataWarrior was used to obtain the experimental pKa for 7912 chemicals. The RMSE and R2 of the predictions
from the QSAR models with respect to the experimental data were approximately 1.5 and 0.80, respectively. These
predictions are similar to the results obtained by the commercial packages ACD/Labs and ChemAxon.
Olier et al. published a comprehensive study on meta-QSAR in drug discovery [176]. A total of 2764 QSAR
targets were represented with three different datasets each. From each dataset, 11 meta-features were extracted using
random forest methods, and 450 meta-features were extracted based on the target represented by the database.
A total of 2394 meta-features, along with the 2764 targets, comprised the final meta-dataset. The most relevant
meta-features were related to information theory. A total of 18 regression algorithms were applied. Using k-NN and

10.4155/fdd-2020-0028 Future Drug. Discov. FDD59 future science group


AI in drug design: algorithms, applications, challenges & ethics Review

a multitarget regression method, QSAR methods were classified according to kinds of predictions. In each category,
the QSAR methods and QSAR combinations were ranked according to their performance based on the reported
RMSE. This was done using a multitarget regression implemented through a multivariate random forest regression
with 500 trees. A tenfold cross-validation was used in the assessments of all implementations. The authors compared
the performance of meta-learning versus base-learning QSAR and reported that the former outperformed the latter
by up to 13%.
Kausar and Falcao developed an automated framework that can build QSAR models without prior parametriza-
tion or proficiency in ML programming. By specifying a target, the framework will use curated online databases
(or even separately implemented databases) to develop, without bias, an accurate QSAR model for even complex
cases [185].

Drug discovery
Although targeting genomes (or ‘druggable genomes’) was once thought of as very promising in the field of
therapeutics, this turns out not to be the case [210] and targeting proteins and proteomes is still the predominant
way of drug development [78]. Thus it remains crucial to focus on the properties of proteins.
To get more information about the 3D folding of proteins, neural networks were used to classify the secondary
structures of proteins in three categories: helix, beta strand and coil. The success rate of this classification was
70% [211]. The accuracy rate was later improved to 80% [212].
Zou et al. focused mainly on the folding of the supersecondary structures (SSS) in proteins [213]. These are the
mini-sequences of amino acids that nucleate the 3D folding of a full protein. The authors used an SVM and
increment of diversity combined with quadratic discriminant analysis (IDQD). Each SSS was represented through
a 36-dimensional vector. They created a new feature representation of the SSS based on the pseudo amino acid
composition model developed by Chou [214]. The representation they used could have been improved by including
information about the hydropathicity of the amino acids in the sequences. The SVM and IDQD models were
trained and tested on a set of 3088 proteins (that include α-α, α-β, β-α and β-β motifs) from the ArchDB40
database and produced an accuracy of 77.7 and 69.4% for training and predicting sets, respectively. The IDQD
slightly outperformed the SVM model.
Bhowmik et al. also used DL to improve the clustering of molecular dynamics simulations on bimolecular
systems [215]. They used a convolutional variational autoencoder (CVAE), with contact matrices as input, and
Bayesian hyperparameter optimization to complete deep clustering of protein folding simulations into three
categories, namely potentially folded, partially folded and misfolded states. The three protein folding prototypes
tested are Fs-peptide, villin head piece and ββα proteins. The prediction accuracy using CVAE reached 89%.
This work was completed to help reduce the high dimensionality of datasets to low dimensional embedding that is
manageable for the extraction of useful biophysical information. This model was also used to learn new interpretable
latent features related to protein folding. The CVAE classification into the three categories of folding/misfolding
could have been replaced with other tools such as torsion angles or 3D co-ordinates.
DL was used for designing small bioactive compounds. Merk et al. developed a DL model to generate new
bioactive drug-like molecules [216]. The model is based on two steps: a learning model using RNN and LSTM based
on the SMILES of an extensive database of 541,555 bioactive molecules from ChEMBL 22 [217], and a second
step on a focused dataset fine-tuned through transfer learning to recognize retinoid X and peroxisome proliferator-
activated receptor agonists. Five de novo structures were proposed, four of which were active experimentally. The
strength of this AI tool is that none of the de novo suggested molecules was identical to the molecules in the training
sets. In addition, five of the top-ranked designed drugs, despite the availability of their building blocks, were not
found in any of five very common databases such PubChem. Another advantage of this model is its capability of
implicitly accounting for the accessibility of synthesis: it was confirmed that all the five selected molecules can be
synthesized in the lab.
Ghasemi et al. applied DBN using different sampling methods (contrastive divergence, persistent contrastive
divergence and fast persistent contrastive divergence) to initiate DNNs for predicting biological activities of
Kaggle targets [218]. Although not highly accurate (R2 of DBN [with contrastive divergence sampling method]–
DNN = 0.61], this model outperformed DNN models where parameters would be selected randomly (R2 = 0.47).
It also tackled the issue of local minima and overfitting. This method has other advantages: because it is coupled
with DBN, the DNN model need not be complex to generate accurate predictions. It is actually simple, and
therefore it can be run on personal computers instead of high-performance machines.

future science group 10.4155/fdd-2020-0028


Review Arabi

Bayesian belief network for classification was used to rapidly predict the activities of bioactive molecules [219].
This method had a high accuracy of 99% for homogeneous datasets with low diversity, and a lower accuracy of
79% for heterogeneous datasets with high diversity.
An SVM of polynomial degree, with fourfold cross-validation, was used to determine ‘drug-likeness’ with an
error of 7% on an external dataset [220]. It was trained and tested on 207,001 molecules from the WDI database
for drug-like molecules and the ACD database for non-drug-like molecules. This is an improvement by over 60%
compared with the errors reported elsewhere [221,222].
To bridge AI in drug design with some of the experimental techniques, there were advances in NP-structure
predictor (source code available at npstructurepredictor.cmdm.tw/NPSP.rar). This model can generate lists of
molecules that carry the same mass/charge ratio (m/z) [223]. The m/z of a mixture of chemicals can be obtained
experimentally using LC–MS. The predictor then generates the list of unknown molecules in plants, with statistical
probabilities of existence and ranking, based on comparisons to rich databases. Based on the m/z and prior
knowledge of possible core scaffolds, related scaffolds are generated (out of 83,242 available scaffolds) and combined
with side chains from the Natural Products Database. The algorithm then finds the structures from the database and
even suggests new structures that have the exact matching m/z originally fed as input. The accuracy of the method
was validated on four mixtures of Chinese herbs.
One of the methods to generate de novo bioactive molecules is LatentGAN, which is a heteroencoder with a
GAN. Prykhodko et al. used LatentGAN to generate random drug-like and target-biased molecules with accuracies
ranging from 0.85 to 0.99 [224].

AI & ML applications in intractable diseases


The COVID-19 and Alzheimer’s disease subsections below include a few AI models used in the diagnosis of each
of these diseases. Although these examples are not directly related to drug design, they have been included given
their relevance and their contribution to the progress in the field of AI impacting human health.

COVID-19
COVID-19 is a pandemic that started in Wuhan, China in December 2019. It is challenging to kill the causative
SARS-CoV-2 virus given its stability compared with its counterparts SARS and Middle East respiratory syn-
drome coronavirus. AI was used to help speed up the diagnosis of COVID-19 because the traditional RT-PCR-
based test may take up to 48 h before obtaining the results.
AI models have been built successfully to diagnose COVID-19 cases. In one study a neural network was built to
predict, from CT scans in combination with clinical data, whether nor not a patient tested positive for COVID-19.
This neural network combined two separate ones: a deep CNN used to detect certain peculiarities in the CT scans,
and SVM, MLP and random forest used in the classification of the patients according to the collected clinical data.
Using the imaging and clinical data of 905 patients, the model was trained on 60%, fine-tuned on 10% and tested
on 30% of the data. The joint AI model (AUC = 0.92) outperformed a senior radiologist (AUC = 0.84) [225].
In another study, AI produced marked improvements in the diagnosis of pneumonia caused by COVID-19 in
patients from different nationalities [226]. Using chest CT scan images of 1280 patients, the accuracy of analysis
of CT scanning for detecting pneumonia caused by COVID-19 reached 91%. Another group also worked on
the diagnosis of COVID-19 pneumonia through AI [227]. Using CT scans from 4154 patients, they built an AI
that has accuracies of 91.4% in detecting COVID-19 pneumonia, 83.3% in detecting common pneumonia and
94.1% for normal controls. A few publications have discussed using AI in drug repurposing to find a cure for
COVID-19 [228,229], in predicting rates of recovery and mortality and severity of the illness [230], in uncovering
molecular mechanisms and in selecting targets to discover COVID-19 vaccines [231].
ML was also used to propose potential drugs to treat COVID-19 patients. Kowalewski and Ray developed a
ML method (SVM in conjunction with the radial basis function kernel; and SVM in conjunction with regularized
random forest algorithm) to discover new drugs for COVID-19 [232]. They gathered experimental data about 65
human proteins that respond to SARS-CoV-2 proteins, then ML algorithms were trained to detect the inhibition of
proteins. They used the ZINC database and ChEMBL 25 for bioassay data. The ML algorithms were then used to
predict the inhibitors from a database of approximately 100,000 drug molecules approved by the FDA (from UNII,
DrugBank and Therapeutic Targets databases) and another database of about 14 million commercially available
molecules. Based on toxicity levels (excluding molecules with LD50 for rats <500 mg/kg) and vapor pressure
(according to data in the EPI Suite retrieved from the Environmental Protection Agency), the list of candidates

10.4155/fdd-2020-0028 Future Drug. Discov. FDD59 future science group


AI in drug design: algorithms, applications, challenges & ethics Review

was then shortlisted. Bootstrap analysis and cross-validations were used to avoid the bias in selections. The bioassay
quantitative measures included Ki (inhibitory constant), IC50 , AC50 (the concentration that activates 50% of the
enzymes) and Log LD50 . The authors reported that among the top approved drugs that could be used to inhibit
SARS-CoV-2 targets are phenazopyridine, abemaciclib and promazine.
Gao et al. suggest that, based on the 96.1% similarity they reported, the chemotherapies applicable to SARS-CoV
should be useful in treating SARS-CoV-2 [233]. They also developed a generative network complex ML to generate
new drugs for COVID-19 using 115 SARS-CoV protease inhibitors from the ChEMBL database and another
training set for binding affinities. Using 3D DL to refine the list of potential drugs, they selected 15 drugs that
were tested to have reasonable partition coefficients and solubilities and were synthesizable. They also completed
tests to confirm that, for COVID-19 treatment, these 15 drugs are better inhibitors than the Kaletra and Norvir
anti-HIV drugs.
Using AI, Ke et al. found that the drugs bedaquiline, brequinar, celecoxib, clofazimine, conivaptan, gemcitabine,
tolcapone and vismodegib can be repurposed for blocking the proliferation of feline infectious peritonitis virus.
They proved this result using in silico and in vitro studies [234].
The information below is included to give an idea of how the European CompBiomed consortium is currently
implementing AI in the discovery of a drug for Covid-19 (www.compbiomed.eu/compbiomed-webinar-13).
However, this part does not include many results and findings as they are still unpublished. This project focuses
on the proteins of SARS-coV-2. At first, a database of 10 billion molecules was used. In summary, featurization
is completed followed by docking then reinforced learning. Using ensemble MD simulations as described in
the Binding Affinities and Interactions subsection, thousands of molecules are then ranked according to the
strength of their binding to determine their potential candidacy. The refined set is passed for experimental
assays. The cycle of the study includes an active learning approach for virtual screening [235]. This step is very
critical because the virtual screening itself has its own challenges [78], and therefore the machine may learn from
models contaminated with inaccuracies e.g. in differentiating between tautomers, neglecting the stereochemistry,
inaccurately counting the number of hydrogen-bond donors and acceptors as is the case with amides, etc. [78].
DeepDriveMD is used in the ML part. The training data is generated by fitting crystal structures from the advanced
photon sources (APS) to specified targets. The nine target proteins considered in this project are: main protease
(3CLPro), papain-like protease (PLPro), orf7a (replication), RNA-dependent polymerase, spike protein, nsp15,
nsp3 (ADRP), nsp9 and nsp10-nsp16 complex. A few more details are provided to further elaborate on this
project. The AI part is initiated by preparing machine readable drug datasets of small molecules collected from [236]:
Enamine, DrugBank, ZINC15, GDB, eMolecules, BindingDB, cureFFI, molecular sets (MOSES), LINCS and
PubChem, CAS COVID-19 Antiviral Candidate Compounds, CheMBL db of bioactive mols with drug-like
properties, DrugCentral Online Drug Compendium, DUDE database of useful decoys, 15.5M-molecule subset
of ENA, CureFFI FDA-approved drugs and CNS drugs, GDB-13 small organic molecules up to 13 atoms, GDB-
17-Set up to 17 atom extension of GDB-13, Harvard Organic Photovoltaic Dataset, COVID-relevant small mols
extracted from literature, MOSES, MCULE compound database, QM9 subset of GDB-17, Repurposing-related
drug/tool compounds, Synthetically Accessible Virtual Inventory (SAVI) and SureChEMBL dataset of molecules
from patents. Data is then standardized in format using the canonical SMILES, Mordred descriptors, molecular
fingerprints and 2D images. For filtration, a deep learning machine is fed with the standardized data. The code
used for training is available on Github.com/globus-labs/covid-nlp. While ensemble MD simulations are used to
predict accurate binding energies, graph-based methods are used to detect similarities in compound structures in
such a way side chains that may not substantially affect the binding affinities are ignored. Reinforced learning is also
used to explore different moieties of a molecule using e.g. the graph traversal-based inference. Isolated fragments
or moieties of the molecule are bound to the receptor. Then combined moieties are bound together to the receptor.
The target is to predict the ranking of the binding of moieties in a molecule to the receptor. The ranking indices
then facilitate the task of suggesting combinations of moieties that can form a molecule with a high binding affinity
to the receptor. To minimize the computing overhead, in situ analysis is completed through DeepDriveMD, which
uses ESMACS results to account for conformational changes. In this analysis, data storage in disks (at the steps
post-simulation and pre-analytics) is bypassed in such as way the problem of storing data at the exascale level is
eliminated. The DeepDriveMD speeds up the simulations by almost an order of magnitude. The deep clustering
ML technique is applied for protein folding simulations using CVAE. [215] For the Fs-peptide, 840 simulations were
run to report a structural RMSD of 0.29 Å indicating that the optimized and native proteins have similar structures.
To account for all the peptides in the spikes of the virus (i.e. ca. 1.5 million atoms), 1.14 billion parameters are

future science group 10.4155/fdd-2020-0028


Review Arabi

needed. To optimize the performance in clustering, layer-wise adaptive rescaling was implemented. Unfortunately,
the results of this mega project are still not revealed.
For studies at a smaller scale, the paper by Park et al. [237] highlights: the use of AI by the Benevolent company
to screen medical databases in the literature to highlight approved drugs that may be used to cure COVID-19; the
implementation of AI by AlphaFold to release protein structures connected with COVID-19 based on which
blocking molecules are proposed; and the use of genome sequencing and AI to observe viral mutations as the virus
passes from one person to another.

Alzheimer’s disease
Columbia University, Yonsei University College of Medicine, National Health Insurance Service Ilsan Hospital,
Korea and Brookhaven lab worked collaboratively on training algorithms with clinical data to diagnose cases
of Alzheimer’s disease. They constructed and validated their ML model using up to 4894 features on 40,736 South
Korean patients over an 8-year period, from 2002 to 2010. The methods tested included random forest, SVM and
logistic regression. The reported AUCs of the used models for predicting Alzheimer’s disease were around 0.7 [238].
Schizophrenia, Parkinson’s and Alzheimer’s diseases can be treated by antagonists and agonists of serotonin
receptors. A few QSAR models were developed to discover drugs that treat neurological diseases by targeting a
particular type of serotonin receptor, namely 5-hydroxytryptamine subtype 6 receptors (5-HT6 R). The efficiency
in predictions, measured by R2 , ranged from 0.68 to 0.97 [239].
Scientists from China and Singapore also used random forest and SVM to explore gamma-secretase in-
hibitors [240,241], which are targeted for the treatment of Alzheimer’s disease [242]. A total of 758 noninhibitors (refined
from 150,000 compounds from the MDDR database [www.akosgmbh.de/accelrys/databases/mddr.htm]) and 675
inhibitors from more than ten subgroups of molecular structures were used for both training and testing the models.
Three-dimensional structures were used to calculate 189 molecular descriptors, including constitutional, quantum
chemical, topological and structural/geometrical descriptors. The model was refined by controlling the receiver
operating characteristics, the fivefold cross-validation for feature selection and the out-of-bag indicators until the
accuracy in the power of prediction of both models for inhibitors and noninhibitors ranged between 96.5 and
99.3% on a validation dataset. Applying the random forest method on the ZINC database, 368 inhibitors were
selected as potential inhibitors of gamma-secretase.
In another study that focused on the classification of serotonin subtype 1 receptor (5-HT1A ) ligands, an Adaboost-
SVM model containing seven descriptors was reported with 100% prediction accuracy. In the same study, other
SVM models were developed with prediction accuracies ranging from 77.5 to 99.1% [243].
Potent 5-HT2B R ligands are identified and ligand selectivity to 5-HT2B R versus 5-HT1B R is achieved using
a hierarchical combination of ML, docking and scoring. The first filtration step, based on a neighboring substructures
fingerprint with activity classifiers, filters the database from 4.8 million molecules to 1,327,851 compounds. The
filtered molecules are then further filtered to 24,849 compounds based on a neighboring substructures fingerprint
using selectivity classifiers. Then docking was performed with and without water, followed by a ‘PubChem
novelty’ check and selection for in vitro compounds. This hierarchical protocol led to three hits with significant
affinity to 5-HT2B R even at the subnanomolar range for one of them, which also had a 10,000-fold selectivity for
5-HT2B R [244].

Cancer
The European INSPIRE project focuses on personalized medicine for precise cancer treatment using ML. Another
similar mega-project is the exascale DL and simulation enabled precision medicine for cancer (CANDLE) project
supported by the US Department of Energy and the National Cancer Institute. ML is key in drug design, especially
for personalized treatment of cancer patients. This is because there are 10,000–100,000 mutations that can be
simulated, and it is important to maximize the efficiency of the treatment on an individualized basis. A useful way
of addressing cancer treatment is to correlate the disease with the gene; that is, to use ‘cancer genomics’. For these
purposes, the Catalogue of Somatic Mutations in Cancer [245] is a good database of (as of 2019) approximately
9.7 million gene coding records related to cancer from approximately 27,000 papers. In cancer genomics, DL with
multiple layers is advantageous because it can handle the complexity of cancer-related studies, both single- and
multi-task learning are applicable (in the latter, the model learns from different shared parts) and it can be used
for multimodal learning in which various types of data can be combined. In cancer genomics, one-hot encoding
is used to create a binary map from sequence data, then one-dimensional convolution neural networks filter the

10.4155/fdd-2020-0028 Future Drug. Discov. FDD59 future science group


AI in drug design: algorithms, applications, challenges & ethics Review

data to numerical vectors. Then the neural network goes through iteration where the weights are redistributed to
minimize the global loss function and to predict the output. It is vital not to fall into the trap of setting the wrong
objective when specifying the loss function.
Huang et al. targeted the advancement of personalized medicine. They coupled SVM with standard recursive
feature elimination to estimate, through gene expression profiles, the personal response of each patient to drugs [246].
This code is open source (other AI tools related to drug design are available on Github; see Zhong et al. [247]).
The model was built using the gene expression and drug sensitivity profile of the National Cancer Institute panel
of 60 human cancer cell lines (NCI-60). The features selected for testing seven drugs used in the treatment of
ovarian cancer ranged from 10 to 32. Using the leave-one-out cross-validation method, the accuracies of the seven
models for the seven drugs (carboplatin, cisplatin, paclitaxel, docetaxel, gemcitabine, doxorubicin and gefitinib)
were between 75 and 85%. While filtering the sets reduced their complexity, there was a compromise in accuracy
from 90 to 60% in predicting the sensitivity of doxorubicin. This could imply that the genes related to drug
response may not necessarily be those related to cancer onset. When the models were built to include nine cancer
types compared with only two, the accuracy of predicting the sensitivity of carboplatin increased from 75 to 87.5%.
The model was then tested on predicting the responses of 273 patients with ovarian cancer to the seven drugs.
It was predicted that 75, 58 and 56% of the patients would respond positively to carboplatin, gemcitabine and
cisplatin, respectively.
Thanks to DL frameworks based on medical CT imaging, the doses of radiation used in treating cancer patients
can now be personalized per case to maximize the curative process [248]. With the iGray unit developed by Lou
et al., which was tested on 944 patients, the probability of unsuccessful treatment is less than 5%.
Yanagisawa et al. also managed to build a CNN that examines circulating tumor cells, which are indicative
of malignancy. They could evaluate, with up to 80% accuracy, the efficacy of anticancer drugs in single cancer
cells [249]. This is among the first ML methods to accurately identify cell characteristics at the single-cell level, which
is much more challenging than the identification of properties at the cell population level. Their developed CNN
model is useful in personalized medicine as it can evaluate treatment effects.
AI is also applied in the advancement of cancer treatment [250] through network modeling and pairwise combi-
nation predictive algorithms. For example, the quadratic phenotypic optimization platform has been used to put
forward some new treatments using combinations of existing drugs [250,251]. This model uses parabolic relation-
ships to connect inputs (e.g., different molecules at various doses) with outputs (e.g., reduced levels of toxicity and
potency in minimizing tumor growth). From a list of 114 FDA-approved drug molecules, the quadratic phenotypic
optimization platform resulted in a few drug combinations that optimized the efficacy of myeloma treatment by
reversing DNA methylation and reactivating tumor suppressor genes.
The antitumor activity of benzo[c]phenanthridine derivatives to inhibit topoisomerase-I can be predicted using
SVM [252]. The classification in this method was based on the relative effective concentration compared with
topotecan which has a relative effective concentration set to unity. The dataset had 73 analogs collected from
Lavoie’s group [252]. It was split 80:20 for training:testing, and the cycle was repeated five times. Out of 2031
descriptors, seven were determined using the random forest method. The accuracy of this model was 80–89% on
two datasets of nine and ten molecules.
Jiang et al. studied the physicochemical properties of 1098 BCRP inhibitors and 1701 noninhibitors. They
developed QSAR models that distinguished inhibitors from noninhibitors, knowing that the former have higher
levels of aromaticity and hydrophobicity [253]. The classification algorithms used were based on seven ML methods
and a feature-selecting tool built from coupling a simulated annealing algorithm with random forest, based on
which 144 molecular features were selected. Among the ML models used, SVM performed the best with an AUC of
0.96. DNN and extreme gradient boosting gave better results than stochastic gradient boosting, naive Bayes, k-NN
and regularized logistic regression. The authors then used the information gain method with frequency analysis
to determine the particular moieties of the molecules that were responsible for the inhibitory or noninhibitory
activities. Most of the fragments identified in noninhibitors include nitrogen heterorings (with one to four N per
ring).

Ethical aspects in AI & ML


The world of AI leads to numerous questions related to ethical matters [254,255]. Will AI cause inflation in
the unemployment rate? Who does the intellectual property belong to if an ML algorithm suggests a drug (peptide,
small molecules, antibodies or RNAs) for a given disease? Would the discovery be patentable? How would physicians

future science group 10.4155/fdd-2020-0028


Review Arabi

deal with the information predicted by AI about the health profiles of their patients? Should they, in the hope of
acting proactively, inform the patients that they may likely develop a disease like cancer in a certain number of
years? Or should they keep this information confidential and hidden from patients so as not to distract their mental
health and emotional well-being? Will the health insurance companies take advantage of such technology and
advancements? Will patients reach a point where it is mandatory to submit a report about potential future diseases
prior to getting private health insurance? Will ML end up replacing the diagnosis of physicians, especially given that
successful models have the potential to outperform humans? Will patients trust the diagnosis of machines at all?
To what extent will we be able to trust the results of a machine if we have no control over what it ends up learning
from the data input, especially if there is no guarantee that the fed information is actually systematic and maintains
all the rich historical ‘experience’? How will we be able to control building unbiased machines? Would AI amplify
the bias and the disparities already observed in the health sector [256] or would it actually fix medical records [257]?
According to the principle of double effect, or the utilitarian ethics, how would a machine be able to decide on
whether or not to complete a certain action? How can we teach the machine to exclude or reject inhumane solutions
for fighting a disease? Will we get to a point where the machine becomes smarter than humans and it ends up
controlling us? What would be the legal consequence for crossing the norms? How can an autonomous machine be
held liable for its decisions with respect to normative ethics, applied ethics and meta-ethics [258]? Will the existing
policies still be adequate for the AI-immersed society [259]?
In an effort to understand the behavior of society toward ethical considerations of AI, a moral machine –
basically an online platform – was deployed, with millions of participants accessing it from over 233 countries
and territories. This game served as a demo through which 40 million decisions were collected with respect to
solving moral dilemmas committed by autonomous vehicles. The results show that, in order to reach a consensus
for the ethical considerations regarding AI, there are already some general common grounds that people agreed to
worldwide despite the notable variations in opinions from one culture to another [260].
Milena Pribić from IBM said, ‘Globally, we are starting to see the repercussions that come when companies
do not prioritize AI ethics in their solutions.’ [261]. It is therefore important to proceed with the legislations of all
ethical considerations related to AI, including ethics by design, ethics for design and ethics in design [261], before it
is too late [262]. It has been proposed to update the Belmont Report of principle in human research where big data
and AI are used for the benefit of society without introducing harm [263]. It is also recommended to have coherence
and lack of social disruption [264]. A model called the multiobjective maximum expected utility was developed
for this objective [265]. Among the other suggested proactive solutions, scientists are requesting standards for fair,
transparent and accountable algorithms [266]. Although minimal, there have been some guidelines that address
clinical trial protocols warranting transparency in assessing AI interventions in algorithms for diagnostic accuracy.
The referral to such guidelines is becoming more crucial as the number of clinical trials with AI interventions is
rapidly increasing; as of 2020, there are 267 reported cases in clinicaltrials.gov [267,268]. There are other guidelines
such as the Consolidated Standards of Reporting Trials – AI (CONSORT-AI) which is a revised version of
CONSORT [269], the Enhancing Quality and Transparency of Health Research (EQUATOR) (equator-network.
org), the Standard Protocol Items: Recommendations for Interventional Trials – AI (SPIRIT-AI) which is a revised
version of SPIRIT [270], the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or
Diagnosis – ML (TRIPOD-ML) and others reported by the Institute of Electrical and Electronics Engineers and
found in go.nature.com/2vt6ngr [271]. McCradden et al. pinpointed the gaps where more validation steps in stages
1, 2 and 3 need to be integrated in the guidelines for the healthcare ML system [272]. Such guidelines will promote
optimizing the benefits of AI. It is thus sensible to integrate AI technology which can complete tasks that are
intractable by a human brain while having the human experience as an intervention to fill the gap of deontological
ethics, intuition and emotion. This is a solution that has been already partially reached through a reinforcement
learning algorithm, combined with signaling mechanisms, that is co-operative with humans [273].

Conclusion & future perspective


Despite all the drawbacks associated with AI, and despite all the challenges faced in AI and ML, this technology is
offering invaluable benefits to our society in many fields. It is our responsibility to selectively take full advantage of
AI while minimizing the threats on the humanitarian aspects of our life.
In the future, AI is expected to boost the field of personalized/precision medicine to the extent that it becomes
common practice even in the treatment of simple diseases. While physicians have the advantage of intuition over
AI, the latter has the potential to link data in a way that is not commonly achievable by humans. Once fed with

10.4155/fdd-2020-0028 Future Drug. Discov. FDD59 future science group


AI in drug design: algorithms, applications, challenges & ethics Review

proper and accurate data, AI will be able to diagnose cases that clinicians would not routinely think of. This would
help patients fight the disease more efficiently at earlier stages. Just like AI assisted in revolutionizing the concept
of vaccines to using mRNA instead of attenuated or killed viruses, it may revolutionize the concept of medications
and drug design altogether.

Executive summary
A brief historical perspective about drug design
• In this section, the history of the development of drug design is introduced starting from its early stages, through
the advancements of in vitro and in silico tools, to the ultimate contemporary tools such as machine learning
(ML) and artificial intelligence (AI).
AI & ML
• AI and ML are defined in this section. In addition, the following ML algorithms are introduced and discussed:
supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning, deep learning.
Challenges faced in AI
• This section covers comprehensively the most critical challenges faced in AI. It touches upon the following
themes: data-related challenges (collection, representation, normalization, characterization, heterogeneity,
dimensionality, uncertainties and bias), multiobjective optimization difficulty, reproducibility, confounders, model
appropriateness, catastrophic forgetting, language and AI adoption. Reproducibility is among the most persistent
challenges that need to be urgently addressed.
AI & ML in discovering drugs & predicting drug properties
• This section covers the use of ML to predict binding affinities of a substrate to its receptor and various kinds of
interactions (e.g., protein–protein interactions). It also includes a comparison of ML models developed to predict
drug properties such as solubility, toxicology, blood–brain barrier permeability and chemical properties predicted
mainly through quantitative structure–activity relationship models. This is followed by a discussion about AI and
ML in drug discovery approaches.
AI & ML applications in intractable diseases
• This part of the review provides examples of ML and AI models used to diagnose and cure COVID-19, Alzheimer’s
disease and cancer.
Ethical aspects in AI & ML
• The last part highlights ethical issues related to intellectual property, liability, lack of intuition and rules and
regulations.
Conclusion & future perspective
• The review concludes with remarks about wisely using AI to maximize its benefits in the society while minimizing
the risks on the humanity. Future perspectives about the evolution of the field of AI in the next decade are
provided.

Financial & competing interests disclosure


The author has no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial
conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock
ownership or options, expert testimony, grants or patents received or pending or royalties.
No writing assistance was utilized in the production of this manuscript.

Open access
This work is licensed under the Attribution-NonCommercial-NoDerivatives 4.0 Unported License. To view a copy of this license,
visit http://creativecommons.org/licenses/by-nc-nd/4.0/

References
Papers of special note have been highlighted as: •• of considerable interest
1. Dias DA, Urban S, Roessner U. A historical overview of natural products in drug discovery. Metabolites 2(2), 303–336 (2012).
2. Mishra BB, Tiwari VK. Natural products: an evolving role in future drug discovery. Eur. J. Med. Chem. 46(10), 4769–4807 (2011).
3. Tan S, Tatsumura Y. Alexander Fleming (1881–1955): discoverer of penicillin. Singapore Med. J. 56(07), 366–367 (2015).
4. Goldstein I, Burnett AL, Rosen RC, Park PW, Stecher VJ. The serendipitous story of sildenafil: an unexpected oral therapy for erectile
dysfunction. Sex. Med. Rev. 7(1), 115–128 (2019).
5. Baylon JL, Cilfone NA, Gulcher JR, Chittenden TW. Enhancing retrosynthetic reaction prediction with deep learning using multiscale
reaction classification. J. Chem. Inf. Model. 59(2), 673–688 (2019).

future science group 10.4155/fdd-2020-0028


Review Arabi

6. Batool M, Ahmad B, Choi S. A structure-based drug discovery paradigm. Int. J. Mol. Sci. 20(11), 2783 (2019).
7. Duch W, Swaminathan K, Meller J. Artificial intelligence approaches for rational drug design and discovery. Curr. Pharm. Des. 13(14),
1497–508 (2007).
8. Mandal S, Moudgil M, Mandal SK. Rational drug design. Eur. J. Pharmacol. 625(1–3), 90–100 (2009).
9. Kellici T, Ntountaniotis D, Vrontaki E et al. Rational drug design paradigms: the odyssey for designing better drugs. Comb. Chem. High
Throughput Screen. 18(3), 238–256 (2015).
10. Gong Z, Hu G, Li Q et al. Compound libraries: recent advances and their applications in drug discovery. Curr. Drug Discov. Technol.
14(4), 216–228 (2017).
11. Franzini RM, Randolph C. Chemical space of DNA-encoded libraries. J. Med. Chem. 59(14), 6629–6644 (2016).
12. Schneider P, Patrick W, Listgarten J et al. Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug Discov. 19(5), 353–364
(2020).
13. Itakura K, Loogen R, Philippe B et al. Exascale computing. In: Encyclopedia of Parallel Computing. Springer, NY, USA, 638–644 (2011).
14. Liao X, Lu K, Yang C et al. Moving from exascale to zettascale computing: challenges and techniques. Front. Inf. Technol. Electron. Eng.
19(10), 1236–1244 (2018).
15. Marr B. The 9 biggest technology trends that will transform medicine and healthcare in 2020 (2019). www.forbes.com/sites/bernardma
rr/2019/11/01/the-9-biggest-technology-trends-that-will-transf orm-medicine-and-healthcare-in-2020/?sh=7f f5842f 72cd
16. Russell S, Norvig P. Artificial Intelligence: A Modern Approach (Global Edition). Addison Wesley, MA, USA (2018).
17. Neapolitan RE, Jiang X. Artificial Intelligence (2nd Edition). Taylor & Francis Ltd, FL, USA (2018). https://doi.org/10.1201/b22400
18. Konar A. Computational Intelligence. (1st Edition). Springer, Berlin/Heidelberg, Germany (2005).
19. Duda RO, Hart PE, Stork DG. Pattern Classification (2nd Edition). John Wiley & Sons, NJ, USA (2000).
20. Andrew R, Webb KDC. Statistical Pattern Recognition (3rd Edition). John Wiley & Sons, NJ, USA (2011).
21. The Elements of Statistical Learning. Hastie T, Tibshirani R, Friedman J (Eds). Springer, NY, USA (2009).
22. Krishnan K. Data Warehousing in the Age of Big Data (1st Edition).Morgan Kaufmann, Inc., MA, USA (2013).
23. Hochreiter S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertainty Fuzziness
Knowledge Based Syst. 06(02), 107–116 (1998).
24. Pattern Recognition and Machine Learning. Bishop CM (Ed.). Springer-Verlag, Inc., NY, USA (2006).
25. Strutz T. Data Fitting and Uncertainty. (2nd Edition). Springer Gabler, Weisbaden, Germany (2016).
26. Machine Learning. Murphy KP (Ed.). MIT Press, MA, USA (2012).
27. Bonaccorso G. Mastering Machine Learning Algorithms (2nd Edition). Packt Publishing, Birmingham, UK (2020).
28. The Hundred-Page Machine Learning Book. Burkov A (Ed.). Andriy Burkov, Quebec City, Canada (2019).
•• Covers ML basics in a comprehensive yet concise way.
29. de Mello RF, Ponti MA. Machine Learning Springer-Verlag GmbH, Berlin/Heidelberg, Germany (2018).
30. Unsupervised Learning: Foundations of Neural Computation Illustrated. Hinton G (Ed.). MIT Press, MA, USA (1999).
•• Although relatively old, this is an excellent textbook on unsupervised learning.
31. Chapelle O. Semi-Supervised Learning (1st Edition). MIT Press, MA, USA (2010).
32. Sutton RS, Barto AG. Reinforcement Learning (2nd Edition). MIT Press, MA, USA (2018).
33. Sigaud O, Buffet O. Markov Decision Processes in Artificial Intelligence: MDPs, Beyond MDPs and Applications (1st Edition). ISTE Ltd,
London, UK (2010).
34. Mater AC, Coote ML. Deep learning in chemistry. J. Chem. Inform. Model. 59(6), 2545–2559 (2019).
35. Deep Learning. Goodfellow I, Bengio Y, Courville A (Eds). MIT Press, MA, USA (2016).
36. Hinton GE, Osindero S, Teh Y-W. A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006).
37. Schmidhuber J. Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015).
38. Gawehn E, Hiss JA, Schneider G. Deep learning in drug discovery. Mol. Inf. 35(1), 3–14 (2015).
39. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst.
25, 1097–1105 (2012).
40. Kelleher JD. Deep Learning Illustrated. MIT Press, MA, USA (2019).
41. Aggarwal CC. Neural Networks and Deep Learning (1st Edition). Springer-Verlag GmbH, Berlin/Heidelberg, Germany (2018).
42. Dana D, Gadhiya S, St Surin L et al. Deep learning in drug discovery and medicine; scratching the surface. Molecules 23(9), 2384 (2018).
43. Bote-Curiel L, Muñoz-Romero S, Gerrero-Curieses A, Rojo-Álvarez JL. Deep learning and big data in healthcare: a double review for
critical beginners. Appl. Sci. 9(11), 2331 (2019).

10.4155/fdd-2020-0028 Future Drug. Discov. FDD59 future science group


AI in drug design: algorithms, applications, challenges & ethics Review

44. Lillicrap TP, Santoro A, Marris L, Akerman CJ, Hinton G. Backpropagation and the brain. Nat. Rev. Neurosci. 21(6), 335–346 (2020).
45. Dhillon A, Verma GK. Convolutional neural network: a review of models, methodologies and applications to object detection. Prog.
Artif. Intell. 9(2), 85–112 (2019).
46. Graves A, Mohamed A, Hinton G. Speech recognition with deep recurrent neural networks. Proceedings of the IEEE International
Conference on Acoustics, Speech and Signal Processing. Vancouver, BC, Canada, 26–31 May 2013.
47. Balouji E, Gu IYH, Bollen MHJ, Bagheri A, Nazari M. A LSTM-based deep learning method with application to voltage dip
classification. Proceedings of the 2018 18th International Conference on Harmonics and Quality of Power (ICHQP), Ljubljana, Slovenia,
13–16 May 2018.
48. Yinka-Banjo C, Ugot O-A. A review of generative adversarial networks and its application in cybersecurity. Artif. Intell. Rev. 53(3),
1721–1736 (2019).
49. Lan L, You L, Zhang Z et al. Generative adversarial networks and its applications in biomedical informatics. Front. Public Health. 8, 164
(2020).
50. Susskind JM, Anderson AK, Geoffrey EH, Movellan JR. Generating facial expressions with deep belief nets. In: Affective
Computing I-Tech Education and Publishing
(2008). www.intechopen.com/books/aff ective computing/generating f acial expressions with deep belief nets
51. Microsoft. Machine learning algorithm cheat sheet for Azure machine learning designer. (2020).
https://docs.microsoft.com/en-us/azure/machine-learning/algorithm-cheat-sheet
52. Kojouharov S. Cheat sheets for AI, neural networks, machine learning, deep learning & big data. (2017).
https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463
53. Zhang R, Bivens AJ. Comparing the Use of Bayesian Networks and Neural Networks in Response Time Modeling for Service-Oriented
Systems. Proceedings of the 2007 workshop on Service-oriented computing performance: aspects, issues, and approaches - SOCP
textquotesingle07 ACM Press, NY, USA , 67–74 (2007).
54. Coveney PV, Dougherty ER, Highfield RR. Big data need big theory too. Philos. Trans. R. Soc. London A Math. Phys. Eng. Sci.
374(2080), 20160153 (2016).
55. Quackenbush J. Microarray data normalization and transformation. Nat. Genet. 32(S4), 496–501 (2002).
56. Mons B. Invest 5% of research funds in ensuring data are reusable. Nature 578(7796), 491–491 (2020).
57. Wilkinson MD, Dumontier M, Aalbersberg IJ et al. The FAIR guiding principles for scientific data management and stewardship. Sci.
Data 3, 160018 (2016).
58. Rattan AK. Data integrity: history, issues, and remediation of issues. PDA J. Pharm. Sci. Technol. 72(2), 105–116 (2017).
59. Snyder J. Data cleansing: an omission from data analytics coursework. Info. Sys. Edu. J. 17(6), 22–29 (2019).
60. Zhang Y, Lin H, Yang Z, Wang J, Li Y. A single kernel-based approach to extract drug-drug interactions from biomedical literature.
PLoS ONE 7(11), e48901 (2012).
61. Krallinger M, Rabal O, Leitner F et al. The CHEMDNER corpus of chemicals and drugs and its annotation principles. J. Cheminform.
7(S1), 1–17 (2015).
62. Krallinger M, Leitner F, Rabal O, Vazquez M, Oyarzabal J, Valencia A. CHEMDNER: the drugs and chemical names extraction
challenge. J. Cheminform. 7(S1), 1–11 (2015).
63. Jessop DM, Adams SE, Willighagen EL, Hawizy L, Murray-Rust P. OSCAR4: a flexible architecture for chemical text-mining. J.
Cheminform. 3(1), 41 (2011).
64. Habibi M, Wiegandt DL, Schmedding F, Leser U. Recognizing chemicals in patents: a comparative analysis. J. Cheminform. 8, 59
(2016).
65. Hassanzadeh H, Groza T, Nguyen A, Hunter J. A supervised approach to quantifying sentence similarity: with application to evidence
based medicine. PLoS ONE 10(6), e0129392 (2015).
66. Santos R, Ursu O, Gaulton A et al. A comprehensive map of molecular drug targets. Nat. Rev. Drug Discov. 16(1), 19–34 (2016).
67. Lin Y, Mehta S, Küçük-McGinty H et al. Drug target ontology to classify and integrate drug discovery data. J. Biomed. Semant. 8(1), 50
(2017).
68. Li S. A map of the interactome network of the metazoan C. elegans. Science 303(5657), 540–543 (2004).
69. Sturm N, Mayr A, Le Van T et al. Industry-scale application and evaluation of deep learning for drug target prediction. J. Cheminform.
12(1), 26 (2020).
70. Duran-Frigola M, Fernández-Torras A, Bertoni M, Aloy P. Formatting biological big data for modern machine learning in drug
discovery. Wiley Interdiscip. Rev. Comput. Mol. Sci. 9, e1408 (2018).
71. Karpov P, Godin G, Tetko IV. Transformer-CNN: Swiss knife for QSAR modeling and interpretation. J. Cheminform. 12(1), 17 (2020).
72. Rogers D, Hahn M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50(5), 742–754 (2010).

future science group 10.4155/fdd-2020-0028


Review Arabi

73. Kojima R, Ishida S, Ohta M, Iwata H, Honma T, Okuno Y. kGCN: a graph-based deep learning framework for chemical structures. J.
Cheminform. 12(1), 32 (2020).
74. Nikolova N, Jaworska J. Approaches to measure chemical similarity – a review. QSAR Comb. Sci. 22(9–10), 1006–1026 (2003).
75. Sieg J, Flachsenberg F, Rarey M. In need of bias control: evaluating chemical data for machine learning in structure-based virtual
screening. J. Chem. Inf. Model. 59(3), 947–961 (2019).
76. Nicolotti O, Giangreco I, Introcaso A, Leonetti F, Stefanachi A, Carotti A. Strategies of multi-objective optimization in drug discovery
and development. Expert. Opin. Drug Discov. 6(9), 871–884 (2011).
77. Ekins S, Honeycutt JD, Metz JT. Evolving molecules using multi-objective optimization: applying to ADME/tox. Drug Discov.
15(11–12), 451–460 (2010).
78. Kubinyi H. Opinion: drug research: myths, hype and reality. Nat. Rev. Drug Discov. 2(8), 665–668 (2003).
79. Haibe-Kains B, George Alexandru A, Hosny A et al. Transparency and reproducibility in artificial intelligence. Nature 586(7829),
E14–E16 (2020).
80. McKinney SM, Karthikesalingam A, Tse D et al. Reply to: Transparency and reproducibility in artificial intelligence. Nature 586(7829),
E17–E18 (2020).
•• Reply to a paper that criticizes the transparency and reproducibility in an AI model used in the health sector. Reproducibility is
one of the major challenges in AI.
81. Gallicchio C, Martı́n-Guerrero J, Micheli A, Olivas E. Randomized machine learning approaches: recent developments and challenges.
Proceedings of the 25th European Symposium on Artificial Neural Networks. Bruges, Belgium, (2017).
82. de van Ven GM, Siegelmann HT, Tolias AS. Brain-inspired replay for continual learning with artificial neural networks. Nat. Commun.
11, 4069 (2020).
83. Turban E. Decision Support Systems and Intelligent Systems. 6th Edition. Prentice Hall, NJ, USA (2001).
84. He J, Baxter SL, Xu J, Xu J, Zhou X, Zhang K. The practical implementation of artificial intelligence technologies in medicine. Nat.
Med. 25(1), 30–36 (2019).
85. Rifaioglu AS, Atas H, Martin MJ, Cetin-Atalay R, Atalay V, Dogan T. Recent applications of deep learning and machine intelligence on
in silico drug discovery: methods, tools and databases. Briefings Bioinf. 20(5), 1878–1912 (2018).
86. Olivecrona M, Blaschke T, Engkvist O, Chen H. Molecular denovo design through deep reinforcement learning. J. Cheminform. 9(1), 48
(2017).
87. Kadurin A, Nikolenko S, Khrabrov K, Aliper A, Zhavoronkov A. druGAN: an advanced generative adversarial autoencoder model for de
novo generation of new molecules with desired molecular properties in silico. Mol. Pharm. 14(9), 3098–3104 (2017).
88. Zhan F, Barlogie B, Mulligan G, Shaughnessy JD, Bryant B. High-risk myeloma: a gene expression–based risk-stratification model for
newly diagnosed multiple myeloma treated with high-dose therapy is predictive of outcome in relapsed disease treated with single-agent
bortezomib or high-dose dexamethasone. Blood 111(2), 968–969 (2008).
89. Ramsundar B, Liu B, Wu Z et al. Is multitask deep learning practical for pharma? J. Chem. Inf. Model. 57(8), 2068–2076 (2017).
90. Costa PR, Acencio ML, Lemke N. A machine learning approach for genome-wide prediction of morbid and druggable human genes
based on systems-Level Data. BMC Genomics 11(Suppl. 5), S9 (2010).
91. Rouillard AD, Hurle MR, Agarwal P. Systematic interrogation of diverse omic data reveals interpretable, robust, and generalizable
transcriptomic features of clinically successful therapeutic targets. PLoS Comput. Biol. 14(5), e1006142 (2018).
92. Wong CH, Siah KW, Lo AW. Estimation of clinical trial success rates and related parameters. Biostatistics 20(2), 273–286 (2018).
93. Exner TE, Keil M, Brickmann J. Pattern recognition strategies for molecular surfaces. I. Pattern generation using fuzzy set theory. J.
Comput. Chem. 23(12), 1176–1187 (2002).
94. Exner TE, Keil M, Brickmann J. Pattern recognition strategies for molecular surfaces. II. Surface complementarity. J. Comput. Chem.
23(12), 1188–1197 (2002).
95. Keil M, Exner TE, Brickmann J. Pattern recognition strategies for molecular Surfaces. III. Binding site prediction with a neural network.
J. Comput. Chem. 25(6), 779–789 (2004).
96. Zhou H, Dong Z, Tao P. Recognition of protein allosteric states and residues: machine learning approaches. J. Comput. Chem. 39(20),
1481–1490 (2018).
97. Cao X, Hu X, Zhang X et al. Identification of metal ion binding sites based on amino acid sequences. PLoS ONE 12(8), e0183756
(2017).
98. Schaller D, Šribar D, Noonan T et al. Next generation 3D pharmacophore modeling. Wiley Interdiscip. Rev. Comput. Mol. Sci. 10, e1468
(2020).
99. Barillari C, Marcou G, Rognan D. Hot-spots-guided receptor-based pharmacophores (HS-Pharm): a knowledge-based approach to
identify ligand-anchoring atoms in protein cavities and prioritize structure-based pharmacophores. J. Chem. Inf. Model. 48(7),
1396–1410 (2008).

10.4155/fdd-2020-0028 Future Drug. Discov. FDD59 future science group


AI in drug design: algorithms, applications, challenges & ethics Review

100. Sato T, Honma T, Yokoyama S. Combining machine learning and pharmacophore-based interaction fingerprint for in silico screening. J.
Chem. Inf. Model. 50(1), 170–185 (2009).
101. Jiménez J, Doerr S, Martınez-Rosell G, Rose AS, Fabritiis GD. DeepSite: protein-binding site predictor using 3D-convolutional neural
networks. Bioinformatics 33(19), 3036–3042 (2017).
102. Dakka J, Turilli M, Wright DW et al. High-throughput binding affinity calculations at extreme scales. BMC Bioinformatics 19(Suppl.
18), 482 (2018).
103. Wan S, Bhati AP, Zasada SJ et al. Rapid and reliable binding affinity prediction of bromodomain inhibitors: a computational study. J.
Chem. Theory Comput. 13(2), 784–795 (2017).
104. Bhati AP, Wan S, Wright DW, Coveney PV. Rapid, accurate, precise, and reliable relative free energy prediction using ensemble based
thermodynamic integration. J. Chem. Theory Comput. 13(1), 210–222 (2016).
105. Bunney TD, Wan S, Thiyagarajan N et al. The effect of mutations on drug sensitivity and kinase activity of fibroblast growth factor
receptors: a combined experimental and theoretical study. EBioMedicine 2(3), 194–204 (2015).
106. Wan S, Knapp B, Wright DW, Deane CM, Coveney PV. Rapid, precise, and reproducible prediction of peptide–MHC binding affinities
from molecular dynamics that correlate well with experiment. J. Chem. Theory Comput. 11(7), 3346–3356 (2015).
107. Genheden S, Ryde U. How to obtain statistically converged MM/GBSA results. J. Comput. Chem. 31, 837–846 (2010).
108. Genheden S, Ryde U. A comparison of different initialization protocols to obtain statistically independent molecular dynamics
simulations. J. Comput. Chem. 32, 187–195 (2011).
109. Aldeghi M, Heifetz A, Bodkin MJ, Knapp S, Biggin PC. Accurate calculation of the absolute free energy of binding for drug molecules.
Chem. Sci. 7, 207–218 (2016).
110. Mobley DL, Klimovich PV. Perspective: alchemical free energy calculations for drug discovery. J. Chem. Phys. 137(23), 230901 (2012).
111. Dakka J, Farkas-Pall K, Balasubramanian V et al. Enabling trade-offs between accuracy and computational cost: adaptive algorithms to
reduce time to clinical insight. Proceedings of the 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
(CCGRID). Washington, DC, USA, (2018).
112. Chodera JD, Noé F. Markov state models of biomolecular conformational dynamics. Curr. Opin. Struct. Biol. 25, 135–144 (2014).
113. Brylinski M. Nonlinear scoring functions for similarity-based ligand docking and binding affinity prediction. J. Chem. Inf. Model.
53(11), 3097–3112 (2013).
114. Cramer RD, Patterson DE, Bunce JD. Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to
carrier proteins. J. Am. Chem. Soc. 1988(110), 5959–5967 (1988).
115. Moro S, Bacilieri M, Cacciari B, Klotz KN, Spalluto G. The application of a 3D-QSAR (autoMEP/PLS) approach as an efficient
pharmacodynamic-driven filtering method for small-size virtual library: application to a lead optimization of a human A3 adenosine
receptor antagonist. Bioorg. Med. Chem. 14, 4923–4932 (2006).
116. Bacilieri M, Varano F, Deflorian F et al. Tandem 3D-QSARs approach as a valuable tool to predict binding affinity data: design of new
gly/NMDA receptor antagonists as a key study. J. Chem. Inf. Model. 47(5), 1913–1922 (2007).
117. Gallicchio E, Levy RM. Recent theoretical and computational advances for modeling protein–ligand binding affinities. Adv. Prot. Chem.
Struct. Biol. 85, 27–80 (2011).
118. Genheden S, Ryde U. The MM/PBSA and MM/GBSA methods to estimate ligand-binding affinities. Expert. Opin. Drug Discov. 10(5),
449–461 (2015).
119. Reddy M, Reddy C, Rathore R et al. Free energy calculations to estimate ligand-binding affinities in structure-based drug design. Curr.
Pharm. Des. 20(20), 3323–3337 (2014).
120. Fratev F, Sirimulla S. An improved free energy perturbation FEPmathplus sampling protocol for flexible ligand-binding domains. Sci.
Rep. 9(1), 16829 (2019).
121. Erdas-Cicek O, Atac AO, Gurkan-Alp AS, Buyukbingol E, Alpaslan FN. Three-dimensional analysis of binding sites for predicting
binding affinities in drug design. J. Chem. Inf. Model. 59(11), 4654–4662 (2019).
122. Kairys V, Baranauskiene L, Kazlauskiene M, Matulis D, Kazlauskas E. Binding affinity in drug design: experimental and computational
techniques. Expert. Opin. Drug Discov. 14(8), 755–768 (2019).
123. Pham TT, Shirts MR. Identifying low variance pathways for free energy calculations of molecular transformations in solution phase. J.
Chem. Phys. 135(3), 034114 (2011).
124. Song X-Y, Chen Z-H, Sun X-Y, You Z-H, Li L-P, Zhao Y. An ensemble classifier with random projection for predicting protein–protein
interactions using sequence and evolutionary information. Appl. Sci. 8(1), 89 (2018).
125. Klepsch F, Vasanthanathan P, Ecker GF. Ligand and structure-based classification models for prediction of P-glycoprotein inhibitors. J.
Chem. Inf. Model. 54(1), 218–229 (2014).
126. Peng C, Han S, Zhang H, Li Y. RPITER: a hierarchical deep learning framework for ncRNA–protein interaction prediction. Int. J. Mol.
Sci. 20(5), 1070 (2019).

future science group 10.4155/fdd-2020-0028


Review Arabi

127. Cobanoglu MC, Liu C, Hu F, Oltvai ZN, Bahar I. Predicting drug–target interactions using probabilistic matrix factorization. J. Chem.
Inf. Model. 53(12), 3399–3409 (2013).
128. Lee M, Kim H, Joe H, Kim H-G. Multi-channel PINN: investigating scalable and transferable neural networks for drug discovery. J.
Cheminform. 11(1), 46 (2019).
129. Kuhn B, Guba W, Hert J et al. A real-world perspective on molecular design: miniperspective. J. Med. Chem. 59, 4087–4102 (2016).
130. de Ruyck J, Brysbaert G, Blossey R, Lensink MF. Molecular docking as a popular tool in drug design, an in silico travel. Adv. Appl.
Bioinform. Chem. 9, 1–11 (2016).
131. Moroy G, Martiny VY, Vayer P, Villoutreix BO, Miteva MA. Toward in silico structure-based ADMET prediction in drug discovery.
Drug Discov. 17, 44–55 (2012).
132. Agniswamy J, Shen C-H, Aniana A, Sayer JM, Louis JM, Weber IT. HIV-1 protease with 20 mutations exhibits extreme resistance to
clinical inhibitors through coordinated structural rearrangements. Biochemistry (Mosc.) 51, 2819–2828 (2012).
133. Agniswamy J, Louis JM, Shen C-H, Yashchuk S, Ghosh AK, Weber IT. Substituted bis-THF protease inhibitors with improved potency
against highly resistant mature HIV-1 protease PR20. J. Med. Chem. 58, 5088–5095 (2015).
134. Yasuo N, Sekijima M. An improved method of structure-based virtual screening via interaction-energy-based learning. J. Chem. Inf.
Model. 59, 1050–1061 (2019).
135. Cai Y-D, Liu X-J, Xu X-B, Chou K-C. Support vector machines for predicting HIV protease cleavage sites in protein. J. Comput. Chem.
23(2), 267–274 (2002).
136. Playe B, Stoven V. Evaluation of deep and shallow learning methods in chemogenomics for the prediction of drugs specificity. J.
Cheminform. 12(1), 11 (2020).
137. Boobier S, Osbourn A, Mitchell JBO. Can human experts predict solubility better than computers? J. Cheminform. 9(1), 63 (2017).
138. Klamt A, Eckert F, Hornig M, Beck ME, Bürger T. Prediction of aqueous solubility of drugs and pesticides with COSMO-RS. J.
Comput. Chem. 23(2), 275–281 (2002).
139. Klamt A, Eckert F. Erratum to ‘COSMO-RS: a novel and efficient method for the a priori prediction of thermophysical data of liquids’
[Fluid Phase Equilib. 172 (2000) 43–72]. Fluid Phase Equilib. 205(2), 357 (2003).
140. Hornig M, Klamt A. COSMOfrag:hspace0.167em a novel tool for high-throughput ADME property prediction and similarity screening
based on quantum chemistry. J. Chem. Inf. Model. 45(5), 1169–1177 (2005).
141. Klimenko K, Kuz’min V, Ognichenko L et al. Novel enhanced applications of QSPR models: temperature dependence of aqueous
solubility. J. Comput. Chem. 37(22), 2045–2051 (2016).
142. Kovdienko NA, Polishchuk PG, Muratov EN et al. Application of random forest and multiple linear regression techniques to QSPR
prediction of an aqueous solubility for military compounds. Mol. Inf. 29(5), 394–406 (2010).
143. Kholod YA, Muratov EN, Gorb LG et al. Application of quantum chemical approximations to environmental problems: prediction of
water solubility for nitro compounds. Environ. Sci. Technol. 43(24), 9208–9215 (2009).
144. Cheng T, Li Q, Wang Y, Bryant SH. Binary classification of aqueous solubility using support vector machines with reduction and
recombination feature selection. J. Chem. Inf. Model. 51(2), 229–236 (2011).
145. Llinàs A, Glen RC, Goodman JM. Solubility challenge: can you predict solubilities of 32 molecules using a database of 100 reliable
measurements? J. Chem. Inf. Model. 48(7), 1289–1303 (2008).
146. Tetko IV, Novotarskyi S, Sushko I et al. Development of dimethyl sulfoxide solubility models using 163,000 molecules: using a domain
applicability metric to select more reliable predictions. J. Chem. Inf. Model. 53(8), 1990–2000 (2013).
147. Wu Y, Wang G. Machine learning based toxicity prediction: from chemical structural description to transcriptome analysis. Int. J. Mol.
Sci. 19(8), 2358 (2018).
148. Hardy B, Douglas N, Helma C et al. Collaborative development of predictive toxicology applications. J. Cheminform. 2(1), 7 (2010).
149. Raies AB, Bajic VB. In Silico Toxicology: computational methods for the prediction of chemical toxicity. Wiley Interdiscip. Rev. Comput.
Mol. Sci. 6(2), 147–172 (2016).
150. Hemmerich J, Ecker GF. In Silico Toxicology: from structure–activity relationships towards deep learning and adverse outcome
pathways. Wiley Interdiscip. Rev. Comput. Mol. Sci. 10(4), e1475 (2020).
151. Ekins S. Computational Toxicology (1st Edition). John Wiley & Sons, NJ, USA (2018).
152. Isayev O, Rasulev B, Gorb L, Leszczynski J. Structure–toxicity relationships of nitroaromatic compounds. Mol. Diversity 10(2), 233–245
(2006).
153. Mekenyan O, Dimitrov S, Pavlov T, Veith G. A systematic approach to simulating metabolism in computational toxicology. I. The
TIMES heuristic modelling framework. Curr. Pharm. Des. 10(11), 1273–93 (2004).
154. Brito-Sánchez Y, Castillo-Garit J, Le-Thi-Thu H et al. Comparative study to predict toxic modes of action of phenols from molecular
structures. SAR QSAR Environ. Res. 24(3), 235–251 (2013).

10.4155/fdd-2020-0028 Future Drug. Discov. FDD59 future science group


AI in drug design: algorithms, applications, challenges & ethics Review

155. Wang Y, Xiao Q, Chen P, Wang B. In Silico Prediction of drug-induced liver injury based on ensemble classifier method. Int. J. Mol. Sci.
20(17), 4106 (2019).
156. Ancuceanu R, Hovanet M, Anghel A et al. Computational models using multiple machine learning algorithms for predicting drug
hepatotoxicity with the DILIrank dataset. Int. J. Mol. Sci. 21(6), 2114 (2020).
157. Chen M, Suzuki A, Thakkar S, Yu K, Hu C, Tong W. DILIrank: the largest reference drug list ranked by the risk for developing
drug-induced liver injury in humans. Drug Discov. 21(4), 648–653 (2016).
158. Zhao YH, Abraham MH, Ibrahim A et al. Predicting penetration across the blood–brain barrier from simple descriptors and
fragmentation schemes. J. Chem. Inf. Model. 47(1), 170–175 (2007).
159. Suenderhauf C, Hammann F, Huwyler J. Computational prediction of blood–brain barrier permeability using decision tree induction.
Molecules 17(9), 10429–10445 (2012).
160. Winkler DA, Burden FR. Modelling blood–brain barrier partitioning using bayesian neural nets. J. Mol. Graphics Model. 22(6), 499–505
(2004).
161. Garg P, Verma J. In Silico Prediction of blood–brain barrier permeability: an artificial neural network model. J. Chem. Inf. Model. 46(1),
289–297 (2006).
162. Wang Z, Yan A, Yuan Q. Classification of blood–brain barrier permeation by Kohonentextquotesingles Self-Organizing Neural Network
(KohNN) and Support Vector Machine (SVM). QSAR Comb. Sci. 28(9), 989–994 (2009).
163. Karelson M, Dobchev D. Using artificial neural networks to predict cell-penetrating compounds. Expert. Opin. Drug Discov. 6(8),
783–796 (2011).
164. Rai BK, Bakken GA. Fast and accurate generation of ab initio quality atomic charges using nonparametric statistical regression. J.
Comput. Chem. 34(19), 1661–1671 (2013).
165. Shivakumar D, Deng Y, Roux B. Computations of absolute solvation free energies of small molecules using explicit and implicit solvent
model. J. Chem. Theory Comput. 5(4), 919–930 (2009).
166. Shivakumar D, Williams J, Wu Y, Damm W, Shelley J, Sherman W. Prediction of absolute solvation free energies using molecular
dynamics free energy perturbation and the OPLS force field. J. Chem. Theory Comput. 6(5), 1509–1519 (2010).
167. Pereira F, Aires-de-Sousa J. Machine learning for the prediction of molecular dipole moments obtained by density functional theory. J.
Cheminform. 10(1), 43 (2018).
168. Bauer CA, Schneider G, Göller AH. Machine learning models for hydrogen bond donor and acceptor strengths using large and diverse
training data generated by first-principles interaction free energies. J. Cheminform. 11(1), 59 (2019).
169. Glavatskikh M, Madzhidov T, Solovtextquotesingleev V, Marcou G, Horvath D, Varnek A. Predictive models for the free energy of
hydrogen bonded complexes with single and cooperative hydrogen bonds. Mol. Inf. 35(11–12), 629–638 (2016).
170. Litsa EE, Peña MI, Moll M, Giannakopoulos G, Bennett GN, Kavraki LE. Machine learning guided atom mapping of metabolic
reactions. J. Chem. Inf. Model. 59(3), 1121–1135 (2019).
171. Klopman G. Artificial intelligence approach to structure–activity studies. Computer automated structure evaluation of biological activity
of organic molecules. J. Am. Chem. Soc. 106(24), 7315–7321 (1984).
172. Li Z, Nie K, Wang Z, Luo D. Quantitative structure–activity relationship models for the antioxidant activity of polysaccharides. PLoS
ONE 11(9), e0163536 (2016).
173. Shoombuatong W, Schaduangrat N, Nantasenamat C. Towards understanding aromatase inhibitory activity via QSAR modeling.
EXCLI J. 17, 688–708 (2018).
174. Garcı́a I, Fall Y, Gómez G. Using topological indices to predict anti-Alzheimer and anti-parasitic GSK-3 inhibitors by multi-target
QSAR in silico screening. Molecules 15(8), 5408–5422 (2010).
175. Patel H, Noolvi M, Sharma P et al. Quantitative structure–activity relationship (QSAR) studies as strategic approach in drug discovery.
Med. Chem. Res. 23(12), 4991–5007 (2014).
176. Olier I, Sadawi N, Bickerton G et al. Meta-QSAR: a large-scale application of meta-learning to drug design and discovery. Machine
Learning 107(1), 285–311 (2018).
177. Galvez J. Computational methods in developing quantitative structure–activity relationships (QSAR): a review. Comb. Chem. High
Throughput Screen. 9(3), 213–228 (2006).
178. Vu O, Mendenhall J, Altarawy D, Meiler J. BCL:Mol2D – a robust atom environment descriptor for QSAR modeling and lead
optimization. J. Comput. Aided Mol. Des. 33(5), 477–486 (2019).
179. Benigni R, Bossa C. Predictivity and reliability of QSAR models: the case of mutagens and carcinogens. Toxicol. Mech. Methods 18(2–3),
137–147 (2008).
180. Dobchev D, Karelson M. Have artificial neural networks met expectations in drug discovery as implemented in QSAR
framework? Expert. Opin. Drug Discov. 11(7), 627–639 (2016).
181. Simões RS, Maltarollo VG, Oliveira PR, Honorio KM. Transfer and multi-task learning in QSAR modeling: advances and challenges.
Front. Pharmacol. 9, 74 (2018).

future science group 10.4155/fdd-2020-0028


Review Arabi

182. Winkler D. Sparse QSAR modelling methods for therapeutic and regenerative medicine. J. Comput. Aided Mol. Des. 32(4), 497–509
(2018).
•• Delivered as the 2017 ACS Herman Skolnik lecture. Summarizes the work completed over 20 years to improve the potential of
QSAR to optimize bioactive molecules and shows the application of this method in omics technologies and regenerative medicine.
183. Wei Y, Li W, Du T, Hong Z, Lin J. Targeting HIV/HCV coinfection using a machine learning-based multiple quantitative
structure-activity relationships (multiple QSAR) method. Int. J. Mol. Sci. 20(14), 3572 (2019).
184. Chen H, Carlsson L, Eriksson M, Varkonyi P, Norinder U, Nilsson I. Beyond the scope of Free-Wilson analysis: building interpretable
QSAR models with machine learning algorithms. J. Chem. Inf. Model. 53(6), 1324–1336 (2013).
185. Kausar S, Falcao AO. An automated framework for QSAR model building. J. Cheminform. 10(1), 1 (2018).
186. Mansouri K, Cariello NF, Korotcov A et al. Open-source QSAR models for pKa prediction using multiple machine learning approaches.
J. Cheminform. 11(1), 60 (2019).
187. Aniceto N, Freitas AA, Bender A, Ghafourian T. A novel applicability domain technique for mapping predictive reliability across the
chemical space of a QSAR: reliability-density neighbourhood. J. Cheminform. 8(1), 69 (2016).
188. Osoda T, Miyano S. 2D-QSAR for 450 types of amino acid induction peptides with a novel substructure pair descriptor having wider
scope. J. Cheminform. 3(1), 50 (2011).
189. Martı́nez MJ, Ponzoni I, Dı́az MF, Vazquez GE, Soto AJ. Visual analytics in cheminformatics: user-supervised descriptor selection for
QSAR methods. J. Cheminform. 7, 39 (2015).
190. Stålring JC, Carlsson LA, Almeida P, Boyer S. AZOrange – high performance open source machine learning for QSAR modeling in a
graphical programming environment. J. Cheminform. 3, 28 (2011).
191. Patra JC, Singh O. Artificial neural networks-based approach to design ARIs using QSAR for diabetes mellitus. J. Comput. Chem.
30(15), 2494–2508 (2009).
192. Bitam S, Hamadache M, Salah H. 2D QSAR studies on a series of
(4S,5R)-5-[3,5-bis(trifluoromethyl)phenyl]-4-methyl-1,3-oxazolidin-2-one as CETP inhibitors. SAR QSAR Environ. Res. 31(6),
423–438 (2020).
193. Matta CF, Arabi AA. Electron-density descriptors as predictors in quantitative structure–activity/property relationships and drug design.
Future Med. Chem. 3(8), 969–994 (2011).
194. Atoms in Molecules: A Quantum Theory. Bader RFW (Ed.). Oxford University Press, Oxford, UK (1990).
195. Schütt KT, Gastegger M, Tkatchenko A, Müller K-R, Maurer RJ. Unifying machine learning and quantum chemistry with a deep
neural network for molecular wavefunctions. Nat. Commun. 10(1), 5024 (2019).
196. Schütt K, Kindermans P-J, Sauceda HE, Chmiela S, Tkatchenko A, Müller K-R. SchNet: a continuous-filter convolutional neural
network for modeling quantum interactions. In: Proceedings of the 31st Conference on Neural Information Processing Systems Curran
Associates, Inc., CA, USA (2017).
197. Ahmadi M, Shahlaei M. Quantitative structure–activity relationship study of P2X 7 receptor inhibitors using combination of principal
component analysis and artificial intelligence methods. Res Pharm Sci. 10(4), 307–325 (2015).
198. Benfenati E, Benigni R, Demarini DM et al. Predictive models for carcinogenicity and mutagenicity: frameworks, state-of-the-art, and
perspectives. J. Environ. Sci. Health C 27(2), 57–90 (2009).
199. Chakravarti SK, Alla SRM. Descriptor free QSAR modeling using deep learning with long short-term memory neural networks. Front.
Artif. Intell. 2, 17 (2019).
200. Yuan H, Silverman RB. New substrates and inhibitors of γ-aminobutyric acid aminotransferase containing bioisosteres of the carboxylic
acid group: design, synthesis, and biological activity. Bioorg. Med. Chem. 14(5), 1331–1338 (2006).
201. Silverman RB, Holladay MW. The Organic Chemistry of Drug Design and Drug Action (3rd Edition). Elsevier, Oxford, UK (2014).
202. Arabi AA. Routes to drug design via bioisosterism of carboxyl and sulfonamide groups. Future Med. Chem. 9(18), 2167–2180 (2017).
203. Matta CF, Arabi AA, Weaver DF. The bioisosteric similarity of the tetrazole and carboxylate anions: clues from the topologies of the
electrostatic potential and of the electron density. Eur. J. Med. Chem. 45(5), 1868–1872 (2010).
204. Arabi AA, Matta CF. Electrostatic potentials and average electron densities of bioisosteres in methylsquarate and acetic acid. Future Med.
Chem. 8(4), 361–371 (2016).
205. Arabi AA. Atomic and molecular properties of nonclassical bioisosteric replacements of the carboxylic acid group. Future Med. Chem.
12(12), 1111–1120 (2020).
206. Matta CF, Arabi AA. Energy richness of ATP in terms of atomic energies: a first step. Quantum Biochemistry. 473–498 (2010).
https://onlinelibrary.wiley.com/doi/10.1002/9783527629213.ch15
207. Arabi AA, Matta CF. Where is electronic energy stored in adenosine triphosphate? J. Phys. Chem. A 113(14), 3360–3368 (2009).
208. Matta CF, Arabi AA, Keith TA. Atomic partitioning of the dissociation energy of the P–O(H) bond in hydrogen phosphate anion
(HPO42-): disentangling the effect of Mg2mathplus. J. Phys. Chem. A 111(36), 8864–8872 (2007).

10.4155/fdd-2020-0028 Future Drug. Discov. FDD59 future science group


AI in drug design: algorithms, applications, challenges & ethics Review

209. Gaüzère B, Brun L, Villemin D. Two new graphs kernels in chemoinformatics. Pattern Recognit. Lett. 33(15), 2038–2047 (2012).
210. Drews J. Strategic trends in the drug industry. Drug Discov. 8(9), 411–420 (2003).
211. Rost B, Sander C. Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232(2), 584–599 (1993).
212. Przybylski D, Rost B. Alignments Grow, Secondary Structure Prediction Improves. Proteins 46(2), 197–205 (2002).
213. Zou D, He Z, He J, Xia Y. Supersecondary structure prediction using Chou’s pseudo amino acid composition. J. Comput. Chem. 32(2),
271–278 (2011).
214. Chou K-C. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 43(3), 246–255 (2001).
215. Bhowmik D, Gao S, Young MT, Ramanathan A. Deep clustering of protein folding simulations. BMC Bioinformatics 19(Suppl. 18), 484
(2018).
216. Merk D, Friedrich L, Grisoni F, Schneider G. De novo design of bioactive small molecules by artificial intelligence. Mol. Inf. 37(1–2),
1700153 (2018).
217. Bento AP, Gaulton A, Hersey A et al. The ChEMBL bioactivity database: an update. Nucleic Acids Res. 42(D1), D1083–D1090 (2013).
218. Ghasemi F, Fassihi A, Pérez-Sánchez H, Mehri Dehnavi A. The role of different sampling methods in improving biological activity
prediction using deep belief network. J. Comput. Chem. 38(4), 195–203 (2017).
219. Abdo A, Leclère V, Jacques P, Salim N, Pupin M. Prediction of new bioactive molecules using a Bayesian belief network. J. Chem. Inf.
Model. 54(1), 30–36 (2014).
220. Müller K-R, Rätsch G, Sonnenburg S, Mika S, Grimm M, Heinrich N. Classifying ‘drug-likeness’ with kernel-based learning methods. J.
Chem. Inf. Model. 45(2), 249–253 (2005).
221. Sadowski J, Kubinyi H. A scoring scheme for discriminating between drugs and nondrugs. J. Med. Chem. 41(18), 3325–3329 (1998).
222. Byvatov E, Fechner U, Sadowski J, Schneider G. Comparison of support vector machine and artificial neural network systems for
drug/nondrug classification. J. Chem. Inf. Comput. Sci. 43(6), 1882–1889 (2003).
223. Harn Y-C, Su B-H, Ku Y-L, Lin OA, Chou C-F, Tseng YJ. NP-StructurePredictor: prediction of unknown natural products in plant
mixtures. J. Chem. Inf. Model. 57(12), 3138–3148 (2017).
224. Prykhodko O, Johansson SV, Kotsias P-C, Bjerrum EJ, Engkvist O, Chen H. A de novo molecular generation method using latent vector
based generative adversarial network. J. Cheminform. 11(1), 74 (2019).
225. Mei X, Lee H-C, Diao K et al. Artificial intelligence–enabled rapid diagnosis of patients with COVID-19. Nat. Med. 26(8), 1224–1228
(2020).
226. Harmon SA, Sanford TH, Xu S et al. Artificial intelligence for the detection of COVID-19 pneumonia on chest CT using multinational
datasets. Nat. Commun. 11(1), 4080 (2020).
227. Zhang K, Liu X, Shen J et al. Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of
COVID-19 pneumonia using computed tomography. Cell 181(6), 1423–1433.e11 (2020).
228. Zhou Y, Wang F, Tang J, Nussinov R, Cheng F. Artificial intelligence in COVID-19 drug repurposing. Lancet Digit. Health 2(12),
E667–E676 (2020).
229. Sanders JM, Monogue ML, Jodlowski TZ, Cutrell JB. Pharmacologic treatments for coronavirus disease 2019 (COVID-19). JAMA
323(18), 1824–1836 (2020).
230. Tayarani N, Mohammad H. Applications of artificial intelligence in battling against COVID-19: a literature review. Chaos Solitons
Fractals 142, 110338 (2020).
231. Arshadi AK, Webb J, Salem M et al. Artificial intelligence for COVID-19 drug discovery and vaccine development. Front. Artif. Intell. 3,
65 (2020).
232. Kowalewski J, Ray A. Predicting novel drugs for SARS-CoV-2 using machine learning from a >10 million chemical space. Heliyon 6(8),
e04639 (2020).
•• Discusses the use of ML to find a solution for the COVID-19 pandemic; the authors managed to narrow down the list to a very
few potential drugs out of millions of tested candidates.
233. Gao K, Nguyen DD, Wang R, Wei G-W. Machine intelligence design of 2019-nCoV drugs. bioRxiv. doi:10.1101/2020.01.30.927889
(2020) (Epub ahead of print).
234. Ke Y-Y, Peng T-T, Yeh T-K et al. Artificial intelligence approach fighting COVID-19 with repurposing drugs. Biomed. J. 43(4), 355–362
(2020).
235. Plewczynski D, Spieser S, Koch U. Performance of machine learning methods for ligand-based virtual screening. Comb. Chem. High
Throughput Screen. 12(4), 358–368 (2009).
236. Babuji Y, Blaiszik B, Chard K et al. Lit - A collection of literature extracted small molecules to speed identification of COVID-19
therapeutics. doi:10.26311/LIT (2020).
237. Park Y, Casey D, Joshi I, Zhu J, Cheng F. Emergence of new disease: how can artificial intelligence help? Trends Mol. Med. 26(7),
627–629 (2020).

future science group 10.4155/fdd-2020-0028


Review Arabi

238. Park JH, Cho HE, Kim JH et al. Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health
data. NPJ Digit. Med. 3, 46 (2020).
239. Ivanenkov YA, Majouga AG, Veselov MS, Chufarova NV, Baranovsky SS, Filkov GI. Computational approaches to the design of novel
5-HT 6 R ligands. Rev. Neurosci. 25(3), 451–467 (2014).
240. Czerwinski A, Valenzuela F, Afonine P, Dauter M, Dauter Z.
N-[N-[2-(3,5-difluorophenyl)acetyl]-(S)-alanyl]-(S)-phenylglycinetert-butyl ester (DAPT): an inhibitor of γ-secretase, revealing fine
electronic and hydrogen-bonding features. Acta Crystallogr. C. 66(12), o585–o588 (2010).
241. Mullard A. Inhibiting γ-secretase activity. Nat. Rev. Mol. Cell Biol. 8(4), 272–273 (2007).
242. Yang X, Lv W, Chen Y, Xue Y. In Silico Prediction and Screening of γ-secretase inhibitors by molecular descriptors and machine learning
methods. J. Comput. Chem. 31(6), 1249–1258 (2010).
•• The authors used ML methods to find a cure for Alzheimer’s disease.
243. Cheng Z, Zhang Y, Zhou C, Zhang W, Gao S. Classification of 5-HT(1A) receptor ligands on the basis of their binding affinities by
using PSO-Adaboost-SVM. Int J Mol Sci. 10(8), 3316–3337 (2009).
244. Rataj K, Kelemen AA, Brea J, Loza MI, Bojarski AJ, Keserű GM. Fingerprint-Based Machine Learning Approach to Identify Potent and
Selective 5-HT2BR Ligands. Molecules. 23(5), 1137(2018).
245. Forbes SA, Beare D, Boutselakis H et al. COSMIC: somatic cancer genetics at high resolution. Nucleic Acids Res. 45(D1), D777–D783
(2016).
246. Huang C, Mezencev R, McDonald JF, Vannberg F. Open source machine-learning algorithms for the prediction of optimal cancer drug
therapies. PLoS ONE 12(10), e0186906 (2017).
247. Zhong F, Xing J, Li X et al. Artificial intelligence in drug design. Sci. China Life Sci. 61(10), 1191–1204 (2018).
248. Lou B, Doken S, Zhuang T et al. An image-based deep learning framework for individualising radiotherapy dose: a retrospective analysis
of outcome prediction. Lancet Digit. Health 1(3), e136–e147 (2019).
249. Yanagisawa K, Toratani M, Asai A et al. Convolutional neural network can recognize drug resistance of single cancer cells. Int. J. Mol. Sci.
21(9), 3166 (2020).
250. Ho D. Artificial intelligence in cancer therapy. Science 367(6481), 982–983 (2020).
251. Rashid MBMA, Toh TB, Hooi L et al. Optimizing drug combinations against multiple myeloma using a quadratic phenotypic
optimization platform (QPOP). Sci. Transl. Med. 10(453), eaan0941 (2018).
252. Thai K-M, Nguyen T-Q, Ngo T-D, Tran T-D, Huynh T-N-P. A support vector machine classification model for benzo[c]phenathridine
analogues with topoisomerase-I inhibitory activity. Molecules 17(4), 4560–4582 (2012).
253. Jiang D, Lei T, Wang Z, Shen C, Cao D, Hou T. ADMET evaluation in drug discovery. 20. Prediction of breast cancer resistance
protein inhibition through machine learning. J. Cheminform. 12, 16 (2020).
254. Yu H, Shen Z, Miao C, Leung C, Lesser VR, Yang Q. Building Ethics into Artificial Intelligence. Proceedings of the 27th International
Joint Conference on Artificial Intelligence, Stockholm, Sweden, (2018).
255. Calvo RA, Peters D. AI surveillance studies need ethics review. Nature 557(7703), 31 (2018).
256. Chen IY, Joshi S, Ghassemi M. Treating health disparities with artificial intelligence. Nat. Med. 26(1), 16–17 (2020).
257. Willyard C. Can AI fix medical records? Nature 576(7787), S59–S62 (2019).
258. Bonnemains V, Saurel C, Tessier C. Embedded ethics: some technical and ethical challenges. Ethics Inf. Technol. 20, 41–58 (2018).
259. Smallman M. Policies designed for drugs won’t work for AI. Nature 567(7746), 7 (2019).
•• Highlights some of the ethical issues where existing policies for drug design may not apply to recent discovery methods that
involve AI.
260. Awad E, Dsouza S, Kim R et al. The moral machine experiment. Nature 563(7729), 59–64 (2018).
261. Dignum V. Ethics in artificial intelligence: introduction to the special issue. Ethics Inf. Technol. 20, 1–3 (2018).
262. Arnold T, Scheutz M. The ‘big red button’ is too late: an alternative model for the ethical evaluation of AI systems. Ethics Inf. Technol.
20, 59–69 (2018).
263. Raymond N. Safeguards for human studies can’t cope with big data. Nature 568(7752), 277 (2019).
264. Bryson JJ. Patiency is not a virtue: the design of intelligent systems and systems of ethics. Ethics Inf. Technol. 20, 15–26 (2018).
265. Vamplew P, Dazeley R, Foale C, Firmin S, Mummery J. Human-aligned artificial intelligence is a multiobjective problem. Ethics Inf.
Technol. 20(1), 27–40 (2017).
266. Rahwan I. Society-in-the-loop: programming the algorithmic social contract. Ethics Inf. Technol. 20, 5–14 (2017).
267. Liu X, Samantha Cruz R, Moher D, Calvert MJ, Denniston AK. Reporting guidelines for clinical trial reports for interventions involving
artificial intelligence: the CONSORT-AI extension. Nat. Med. 26(9), 1364–1374 (2020).
268. Reporting guidelines for clinical trials evaluating artificial intelligence interventions are needed. Nat. Med. 25(10), 1467–1468 (2019).

10.4155/fdd-2020-0028 Future Drug. Discov. FDD59 future science group


AI in drug design: algorithms, applications, challenges & ethics Review

269. Schulz KF, Altman DG, Moher D. CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. Int.
J. Surg. 9(8), 672–677 (2011).
270. Chan A-W, Tetzlaff JM, Altman DG, Dickersin K, Moher D. SPIRIT 2013: new guidance for content of clinical trial protocols. Lancet
381(9861), 91–92 (2013).
271. Collins GS, Moons KGM. Reporting of artificial intelligence prediction models. Lancet 393(10181), 1577–1579 (2019).
272. McCradden MD, Stephenson EA, Anderson JA. Clinical research underlies ethical integration of healthcare artificial intelligence. Nat.
Med. 26(9), 1325–1326 (2020).
273. Crandall JW, Oudah M, Tennom Cooperating with machines. Nat. Commun. 9(1), 233 (2018).

future science group 10.4155/fdd-2020-0028

You might also like