You are on page 1of 5

Small-molecule drug discovery SMDD is a multidimensional challenge that involves huge

expenditure and time-consuming. The average time required from the invention to market
is about 14 years and costing around US$2.8 billion. Two major factors for the failures in
discovery is efficacy and toxicity. Though the discovery of drugs is slow process with high
investment, the pharmaceutical companies and academic institutes are still spending
money because of high commercial potential and also benefits the society.  Huge amount of
experimental data accumulated from the past decades including invitro (biochemical)
assays, invivo assays and clinical trials. This data will become a valuable source
for learning and understanding success and failure of compounds in entire discovery
process. The acquired knowledge will be useful to predict the future drug candidate in the
discovery experiments. Generation of the knowledge/hypothesis from the known data can
be implemented by using the machine learning/deep learning methods. Since the data
acuminated is huge (Big-Data), proper curation, efficient mining and building hypothesis
(ML/DL models) need to be implemented in drug discovery and development pipeline of
pharmaceutical industry to increase their success rates. The integration of these tools as
Artificial Intelligence (AI) can serve end-to-end drug discovery and development.  Thus,
combining the drug discovery process with AI transform the paradigm of drug discovery.

Millions of experimental data (known data) available in public domain (PubChem, CheMBL,
Binding DB, PDB etc.). The experimental data includes invitro and invivo data for each
disease, ADME/Tox and many more. Properly curated data should be considered for
generation of precise ML/DL models. Two types of machine learning methods are widely
used in the small molecule drug discovery, unsupervised and supervised. Unsupervised
methods are used to cluster the molecules based on the chemical similarity. As the data is
large, quicker clustering methods, k-means, k-median, mean-shifting, Gaussian mixture can
be applied to yield better results. The clustering methods are useful to identify the nearest
neighbors and has a greater application in repurposing and off-targets prediction.
Supervised machine learning models are useful in generating the models for the data sets
having the experimental activity. These methods are predictive methods with either
quantitative/continues or qualitative/categorical based on the experimental activity of the
training data. Generating the models for each protein or type of disease and when applied,
will classify the unexplored data more precisely for to identify new hits. However, the
precision mainly depends on the quality of the input data. The predictive methods
combining with molecular modeling will accelerate the discovery process from hit
identification to lead optimization.  The supervised learning methods will have major role to
identify the druggable compounds based on ADMET. Building a highly predictive models for
Absorption, distribution, Metabolism, excretion and toxicity using adequate samples and
filter the compounds during the screening will furnish most druggable compounds for
biological studies.

Physio-chemical properties, quantum mechanical properties, 2D descriptors, 3D descriptors,


molecular patterns, molecular figure-prints of the training data sets will be used to generate
the machine learning models. Methods such as PCR, PLS, Support vector machines (SVMs),
naïve Bayes, random forest, neural networks, recursive partitioning .. are quit often used for
generation of ML models by correlating descriptors with experimental activity. 

Recently deep neural network (DNN) gain an importance not only in drug discovery also in
other areas of science and business. DNN is a deep learning neural network method which
build hierarchical internal representations of the input data with the help of multiple hidden
layers. Four major DNNs, Convolutional neural networks (CNN),  Recurrent neural network
(RNN), Deep autoencoders (AE), Deep belief network (DBN) are having their own advantage
for the model generation. These methods are applied in prediction of    biological activity,
ADMET properties and physico-chemical parameters. For a small training data, ML methods
will perform equally or better than DNN, but with  large datasets DNN will outperform ML.
Overfitting is a major challenge during model generation, recent developments available to
overcome this challenge such as  DropOut and DropConnect. Significant development in
these methods in the areas of de novo design, binding energy between ligand-receptor,
chemical syntheses, nanoparticles, formulations and many   more.

The ML/DL provide the end-to-end service in drug discovery, development and beyond. The
new rational pipeline will accelerate the discovery and reduce the failures in the discovery.
In future the knowledge based innovation will generate new medicines with cost-effective
treatment for chronic diseases

*IDS has inhouse ML/DL models and providing services for our clients
While at the University of Toronto, we saw early on how machine learning
and convolutional neural networks could be applied for image recognition.
We founded Atomwise to prove the concept that the same technology could
be applied to drug discovery by performing computational screens to find
molecules worth testing in the lab.

Building on this early successs we've expanded the robustness of our


models through early collaborations with academic researchers and
biopharma partnerships. AtomNet® technology now scales to screening
billions of molecules for multiple projects simultaneously. We now use AI as
a robust engine for drug discovery resulting in joint ventures, jointly held
assets, and our wholly owned pipeline to address previously undrugged
diseases.

First Application of AI/ML to Molecular Recognition

We leverage similar technology to that used for image recognition to help medicinal
chemists to discover better medicines, faster.

Since then, our award-winning AtomNet® technology has been used to find small
molecule hits for more undruggable targets than any other AI drug discovery platform.
AtomNet® technology teaches itself college chemistry
AtomNet® technology is the first drug discovery algorithm to use a deep convolutional
neural network. This type of network came to prominence only a few years ago and has a
unique property: it excels at understanding complex concepts as a combination of smaller and
smaller pieces of information. This property is a key reason why convolutional networks have
produced the world’s best results for image classification, speech recognition, and other
longstanding problems. For example, a convolutional model can learn to recognize faces by
first learning a set of basic features in an image, such as edges. Then, the model can learn to
identify parts such as noses, ears, and eyes by combining the edges. Finally, the model can
learn to recognize faces by combining those parts.

Similarly, AtomNet® technology might learn that proteins and ligands are made up of a
variety of specialized chemical structures. This would be an exciting result because it would
suggest that AtomNet® model was learning fundamental concepts in organic chemistry.
Intriguingly, this is what AtomNet® platform does. When we examine different neurons on
the network we see something new: AtomNet® platform has learned to recognize essential
chemical groups like hydrogen bonding, aromaticity, and single-bonded carbons.

AtomNet® model learning to recognize sulfonyl groups – a structure often found in


antibiotics.

Critically, no human ever taught our AtomNet® technology the building blocks of organic
chemistry. Our AtomNet® model discovered them itself by studying vast quantities of target
and ligand data. The patterns it independently observed are so foundational that medicinal
chemists often think about them, and they are studied in academic courses. Put simply,
AtomNet® technology is teaching itself college chemistry.
AtomNet® technology can reproduce hundreds of
historical experiments
Another way to test AtomNet® technology is to see if it could have predicted what happened
in physical experiments done in the past. For this purpose, a group at the University of
California, San Francisco developed a challenging benchmark, called DUD-E. This
benchmark asks systems like our AtomNet® model to make over 1 million predictions, and
compares the answers to the historical results. It is a hard and well-respected test, and our
AtomNet® model achieves the best results of any structure-based algorithm we know of:

AtomNet® technology accuracy compared to previous technologies (DOCK and Autodock-


smina) – on the DUD-E benchmark developed at UCSF.

Put in real-world terms, AtomNet® technology's benchmark results suggest it could save
something on the order of half of early stage drug screening experiments and greatly improve
the success rate of many more.

You might also like