Deep Learning For Drug Discovery: Wengong Jin Massachusetts Institute of Technology

Deep learning for drug
discovery
Wengong Jin
Massachusetts Institute of Technology

Drug discovery is a time-consuming process
Average time/cost for designing one drug = 10 years + $2.6B
Figure source: Pharmaceutical Research and Manufacturers of America 2

Obviously, we can’t wait for 10 years…
3
OD
01 0
growth
1 Drug
T R discovery is a challenging search problem 0
0 01
20 40 0 80 100 0 20 40 0 80 100
redicted molecules redicted molecules (highest to lowest predictions)
H A good drug (e.g., kills virus)

I
1 0
N 0
Search in the 08
S
Tanimoto similarity
0
chemical space O S
OD ( 00nm)
0 N N
0
S N
O
04 0
H2N
0
02
0
0
y
0 00
Data source: PhRMA.org
1000 1 00 2000 2 00 4
Number of possible
drug-like molecules
60
≈
10
(Kirkpatrick, et al. 2004)
5
• Experimental facilities in industry can only test 10 compounds/day
Figure: Koch et al., PNAS 2005

Kirkpatrick, et al., Nature. 2004 5
Automate drug discovery with computation
Let AI find
good drugs!
Figure source: Andrii Buvailo
6
m is to Computational drug discovery: three schemes
acterize
m nsmitt-
mate im
e is isaim
toto is aim
ultimate to is to
y. This
racterize
acterize
Simulation Virtual screening De novo drug design
eate,
d and characterize
characterize
nent nverse
ransmitt-
nsmitt-
component
transmitt- transmitt-
sly.
y. This This This
This
simultaneously.
aneously.
dthe
p,” inverse
nverse loop,”
and and inverse
inverse
12, 15).
perties
ing the
hodsatomic
roperties
perties reveal properties
eal properties
)specifying
fying
ing ly coor-
the
afterthespecifying
the the
satomic
he t name
atomic
tituentconstituent
atomic atomic
ng D)
)nal with
coor-
coor-
mensional
(3D) coor- (3D) coor-
srse
gn, for
itsname anits name
name
design,
as as its name
ngtying
adigm is the
with
with
startingby starting
with with
ngucture.
for
and
arching foran anfor an for an
searching
tut
he to one
is
isinput
Here thethe
theisinput
the is the
of
ucture. prob-
tructure.
tput is the structure.
the structure.
2) pto uses
to
oneoneto map
ecessarily
ily map one to one
aethods
fofprob-
bution prob-
distribution
of prob-of prob-
ality
2) . 2) of
uses
uses Fig. 2)
2. uses
Schematic of the different approaches toward molecular design. Inverse design starts
Down
n (Fig. 2)(Fig.
design uses
7
methods
ethods
and
arch Figure from
source:
search
methods desired properties
Sanchez-Lengeling
methods and361,
et al., Science ends in chemical
360–365 (2018) space, unlike the direct approach that leads from
m is to Simulation is often too slow
acterize
m nsmitt-
mate im
e is isaim
toto is aim
ultimate to is to
y. This
racterize
acterize
Simulation Virtual screening De novo drug design
eate,
d and characterize
characterize
nent nverse
ransmitt-
nsmitt-
component
transmitt- transmitt-
sly.
y. This This This
This
simultaneously.
aneously.
dthe
p,” inverse
nverse loop,”
and and inverse
inverse
12, 15).
perties
ing the
hodsatomic
roperties
perties reveal properties
eal properties
)specifying
fying
ing ly coor-
the
afterthespecifying
the the
satomic
he t name
atomic
tituentconstituent
atomic atomic
ng D)
)nal with
coor-
coor-
mensional
(3D) coor- (3D) coor-
srse
gn, for
itsname anits name
name
design,
as as its name
ngtying
adigm is the
with
with
startingby starting
with with
ngucture.
for
and
arching foran anfor an for an
searching Takes
tut
he to one
is
isinput
Here thethe the is thedays for one
theisinput
of
ucture. prob-
tructure.
tput is the structure. compound
the structure.
2) pto uses
to
oneoneto map
ecessarily
ily map one to one
aethods
fofprob-
bution prob-
distribution
of prob-of prob-
ality
2) . 2) of
uses
uses Fig. 2)
2. uses
Schematic of the different approaches toward molecular design. Inverse design starts
Down
n (Fig. 2)(Fig.
design uses
8
methods
ethods
and
arch Figure from
source:
search
methods desired properties
Sanchez-Lengeling
methods and361,
et al., Science ends in chemical
360–365 (2018) space, unlike the direct approach that leads from
growth predi
0
OD ( 0
01 04
0 01
Virtual screening
02
0
2 NR
0 20 40 0 80 100 0 10 20 0 40 0 0 0
• Virtual screening: assess whether a compound is a good drug using computation

redicted molecules (highest to lowest predictions) redicted molecules
H
1 models (Walters et al., I1998;
0
McGregor et al., 2007; …)
N 0
08
S
Tanimoto similarity
0
O S
OD ( 00nm)
0 N N
S N
04 Prediction: good!
O
04 0
H2N
02
02 Virtual screening
Compound 01 Experiments
model
B 2 11
0 0
0 00 1000 1 00 2000 2 00 10
-
10
-4
10
- -2
10 10
-1
10
0
10
1
10
2
10
• Virtual screening
Ranked training set molecules is much faster than experimental screening in web labs.
[halicin] g ml
8
It can test 10 compounds within a day, while experimental screening
ation of Halicin
•
by 2,560 molecules within the FDA-approved drug library supplemented with a natural product collection.
takes years
e growth inhibitory molecules; blue are non-growth inhibitory molecules.

training. Dark blue is the mean of six individual trials (cyan).
ng Hub molecules that were not present in the training dataset.
• It is also much cheaper than experimental screening
ere curated for empirical testing for growth inhibition of E. coli. Fifty-one of 99 molecules were validated as
wn is the mean of two biological replicates. Red are growth inhibitory molecules; blue are non-growth
9
Virtual screening: inherent trade-off
• Virtual screening is restricted to Ease for
synthesis
commercially available compounds
(e.g., ZINC library)
Virtual
• Advantage: no need to synthesize screening
any compounds (faster testing)
• Limitation 1: it loses coverage — at

9
best, we can screen 10 compounds
• Limitation 2: traditional techniques

are based on hand-crafted features Coverage
10
growth predicti
OD ( 00n
OD ( 00n
0 0
De novo drug design

04 01 04
02 1 T R 02
2 NR
0 time between 0 01steps. The ultimate aim is to 0
0 20 40 0 concurrently
80 100 propose,
0 create, 40and characterize
20 0 80 100 0 10 20 0 40 0
• De novo drug design: directly
redicted molecules generate a
redictedcompound
molecules (highest towith
new materials, with each component transmitt- desired
lowest predictions) properties redicted molecules
(Moon
G et al., 1991; Clark et ingal.,
and1995;H Schneider
receiving
1
& Fechner,
data simultaneously. This2005; I …)
0
process is called “closing the loop,” and inverse
N 0
design is a critical
08 facet (12, 15). S
Tanimoto similarity
0
Property criteria
O S
OD ( 00nm)
N
Inverse design
0 N
04
(potency, safety, …) O
S N
Quantum chemical 04 methods reveal properties 0
H2N
of a molecular system only after specifying the 02
Drugessential
design parameters
model 02
of the A good drug
constituent atomic 0 1Experiments
B 2 11
training set nuclei and their 0 three-dimensional (3D) coor- 0
Broad library
ime betweenhalicin
steps. The ultimate aim is to dinate positions (16).
0 Inverse
00 design,
1000 1as00its name
2000 2 00 10
-
10
-4
10
-
10
-2
10
-1
10
0
10
1
1
oncurrently propose, create, and characterize suggests, inverts this paradigm by

Ranked training starting with
set molecules [halicin] g ml
new materials, with each component transmitt-the desired functionality and searching for an
Figure 2. Initial Model Training and ideal molecular
the Identification structure.
of Halicin Here the input is the
ng and receiving data We simultaneously.
need to solve This
(A) Primary screening data for growth inhibition of E. coli by and
functionality 2,560 the
molecules within
output is the
theFDA-approved
structure. drug library supplemented with a natural product c
process is called “closing the loop,” and inverse
Shown an inverse problemreplicates.
is the mean of two biological Red are growth inhibitory molecules; blue are non-growth inhibitory molecules.
Functionality need not necessarily map to one
design is a critical facet (12, 15).
(B) ROC-AUC plot evaluating model performance after training. Dark blue is the mean of six individual trials (cyan).
(C) Rank-ordered prediction scores of Drug unique structure
Repurposing but to a that
Hub molecules distribution of prob-
were not present in the training dataset.
nverse design able instructures.
(D) The top 99 predictions from the data shown (C) were curated Inverse design
for empirical (Fig.
testing 2) uses
for growth inhibition of E. coli. Fifty-one of 99 molecules were va
true positives
Quantum chemical basedreveal
methods on a cut-off optimization,
of OD600
properties <0.2. Shown issampling,
the mean ofand search methods
two biological replicates. Red are growth inhibitory molecules; blue are no
11
inhibitory molecules. to navigate the manifold of functionality of
De novo drug design: inherent trade-off
• Virtual screening is restricted to Ease for
commercially available compounds synthesis
(e.g., ZINC library)
• Advantage: can explore the entire Virtual

screening
chemical space efficiently
• Limitation 1: we need to synthesize

new compounds, which can be hard
De novo
drug design
• Limitation 2: traditional techniques
explores the space based on hand-
designed rules (e.g., genetic algorithms)
Coverage
12
Deep learning: a promising direction
• Deep learning has achieved human-level accuracy in computer vision (He et al., 2016)
Feature learning
The key to success:
automatic feature learning
• Virtual screening: traditional methods are based on hand-crafted features

N
H NH2
N
N N
Use deep learning to learn Prediction: good!
N
features automatically
F
Features designed
by experts Model
He et al., “Deep residual learning for image recognition." CVPR 2016 13

Re
re
Tru
04
02 02
02
Deep learning: a promising direction

A C 08 0 002
0 0 0
0 00 1000 1 00 2000 2 00 000 0 02 04 0 08 1 0 1000 2000
late inde alse positive rate Ranked
D E F
• Deep generative
12 modelspredictions
can generate realistic
10 text and images with 12
pr
desired properties
growth prediction score

1 1
08 1 08
OD ( 00nm)
OD ( 00nm)
0
Deep Generate an image
0
generative
04
of an armchair
0 1 in the 04
shape of avocado
02
models
1 T R 02
2 NR
0 0 01 0
Ramesh et0al., 2020
20 40 0 80 100 0 20 40 0 80 100 0 10 20 0
redicted molecules redicted molecules (highest to lowest predictions) redicted
• G
De novo drug H
design: generate a compound
1 with desired I
properties 0
N 0
08
Use deep S
Tanimoto similarity
0
O
Property criteria
S A good0 4
OD ( 00nm)
N N
(potency, safety, …)
generative 0
O
S N drug
models 04 0
H2N
02
02
01
training set B 2 11
Silver et al., “Mastering the game of Go with deep neural networks and tree search”, Nature (2016).
0 0
Broad library 14-2
Ramesh et al., “DALL-E: creating images from text ”, OpenAI blog 0 00 1000 1 00 2000 2 00 10
-
10
-4
10
-
10
growth predicti
OD ( 00n
0
01
Main technique: graph neural networks 04
02
2 NR
0 01 0
100 0 20 40 Virtual screening / molecular
0 80 0 10
100 20 property
0 40 prediction
0 0 0
redicted molecules
(Duvenaud et (highest to lowest
al. 2015; predictions)
Kearnes et al. 2016; Jin et al., 2017;
redicted molecules
Gilmer et al., 2017; …)
Graph
H I
1 0 encoding
N 0
08
S
Tanimoto similarity
0
O S
OD ( 00nm)
0 N N
S N
04 Property
O
04 0 (numerical attributes)
H2N
02
02
Graphs 01 Graph
0 0
B 2 11 generation
0 00 1000 1 00 2000 2 00 10
-
10
-4
10
- -2
10 10
-1
10
0
10
1
10
2
10
Ranked training set molecules De novo drug design [halicin] g ml
(Olivecrona et al., 2018; Gomez-bombarelli et al., 2018; Jin et al., 2018; Popova et al., 2018; …)
ntification of Halicin
E. coli by 2,560 molecules within the FDA-approved drug library supplemented with a natural product collection.
Red are growth inhibitory molecules; blue are non-growth inhibitory molecules.
15
e after training. Dark blue is the mean of six individual trials (cyan).
Example: discovery of new antibiotics
Stokes, Yang, Swanson, Jin et al, Cell 2020 16

Outline of today’s lecture
• Part 1: graph neural networks for antibiotic discovery
[ICML’17, NeurIPS’17, JCIM’19, Cell’20]
• Part 2: Incorporate biological knowledge into graph neural networks:

application to COVID-19 drug combination discovery
[PNAS (In submission)]
• Part 3: Generative models for de novo drug design

[ICML’18, ICLR’19, ICML’20a,b,c]
17
Part 1: antibiotic discovery
History of antibiotic discovery
• After 1990s, we struggle to discover novel antibiotic classes (Silver et al., 2011;
Brown et al., 2014; Shore & Coukell, 2016)
• We need novel antibiotic classes due to antibiotic resistance
Figure source: ReAct group FDA = U.S. Food and Drug Administration 18
Virtual screening for antibiotic discovery
• Through collaboration with the Broad Institute, we collected 2560
molecules with measured growth inhibition against E. coli (BW25113)
Why graph neural

Drug Antibacterial networks?
Training
Nitrocefin Yes data
Predict
Reserpine No
antibacterial
Penicillin Yes properties
IQ-1S No
Graph neural network
…… ……
19
Traditional approach: hand-crafted features
• Traditional methods are based on fixed, hand-
engineered molecular features.
• Molecular weight, number of heavy atoms
• More sophisticated features: Morgan

fingerprint (Rogers & Hahn 2010)
• Exhaustive enumeration of all possible

substructures, up to radius 3
• Result: high dimensional features (2048),

different substructures merged by hash
20
A C 08 0 002
0 0
0 02 04 0 08 1 0 1000 2000 000 4000 000
Problem of traditional features

alse positive rate Ranked molecules
E F
10 12
predictions
1 • Traditional methods are0 8based on fixed, hand-engineered molecular features.
OD ( 00nm)
• Molecular weight, number
0 of heavy atoms, etc.
01 04
• Problem: we don’t know 02
all the antibacterial patterns
• So these hand-engineered0 features can miss some of the unknown patterns
2 NR
0 01
0 20 40 0 80 100 0 10 20 0 40 0 0 0
•
redicted molecules (highest to lowest predictions) redicted molecules
Graph neural networks automatically learn a feature representation from data
I
1 0
N 0
08
S
Tanimoto similarity
0
O S
OD ( 00nm)
0 N N
04
O
S N Features are learned Prediction: good!
04 0
H2N automatically
02
Features
02 designed
Compound 0 1 by experts Model
B 2 11
0 0
0 00 1000 1 00 2000 2 00 10
-
10
-4
10
- -2
10 10
-1
10
0
10
1
10
2
10
Ranked training set molecules [halicin] g ml
21
Graph neural network (GNN)
• Rich history of GNNs (Gori et al., 2005, Scarselli et al., 2009, Duvenaud et
al. 2015, Kearnes et al. 2016, Jin et al., 2017, Gilmer et al., 2017, Zitnik et
al., 2018, etc.)
• A molecule is represented as a graph
Each bond is an edge in the graph
Each atom is a node in the graph
22
Graph
Pooling
convolution
This vector It encodes

Graphafeature
Atom type encodes a larger subgraph
representation
local subgraph
23
Antibacterial
property
Graph Feature Feed-forward

convolution representation network
Le
arn
i xed e d
F Hand-crafted features
Antibacterial
property
Le Deep learned features
ar n e d
ed r n
L ea
24
D E F
12 10 12
predictions
Use GNN for virtual screening

1 1
08 1 08
OD ( 00nm)
OD ( 00nm)
4
We virtually screened 10 compounds in Broad drug repurposing hub
0 0
• 04 01 04
• We0experimentally
2 1 T R tested the top 99 compounds in the Broad Institute
0 2 2 NR
0 0 01 0
• 51 of them
0 are indeed
20 40 0antibacterial
80 100 — hit
0 rate
redicted molecules
20 = 51.5%
40 0 80 100
redicted molecules (highest to lowest predictions)
0 10 20
r
G H I
1 0
N 0
08
S
Tanimoto similarity
0
O S Compound SU3327
OD ( 00nm)
N
51 drugs Low
Structural 0 N
04
novelty S (renamed as Halicin)
toxicity O N
04 0
H2N
02
02
01
training set B 2 11
0 0
Broad library
0 00 1000 1 00 2000 2 00 10
-
10
-4
10
-
halicin
Ranked training set molecules 25
12 12 10 12
predictions predictions

1 1 1
Halicin is a novel and potent antibiotic

08
OD ( 00nm)
08 1 08
OD ( 00nm)
OD ( 00nm)
0 0 0
04 04 01 04
• Halicin
02 shows
02
potent
1 T R
growth inhibition against E. coli in vitro
02
2 NR 2 N
•
0
80 100 It is also
0 structurally different from known
10
0
100 antibiotics
200
0 0 20 40
20 400 00 0 80
0 01
40 0 80 100
0
0 10
west predictions) redicted molecules
redicted molecules redicted molecules (highest to lowest predictions)
I G H I
0 1 0
0 N 0
08
S
Tanimoto similarity
0 0
O
OD ( 00nm)
OD ( 00nm)
N 0 N N
04 04
N S N
O
0 Inhibition 04 0
H2N
02 02
02
01 Low similarity to existing antibiotics 01
B training
2 11 set B 2 11
0 0 0
Broad library
2000 2 00 10
-
10
-4
10
-
10
-2
10
-1
10
0
10
1
10
2
10 0 00 1000 1 00 2000 2 00 10
-
10
-4
10
halicin
ecules [halicin] g ml Ranked training set molecules
Figure 2. Initial Model Training and the Identification of Halicin

(A) Primarywith
approved drug library supplemented screening data
a natural for growth
product inhibition of E. coli by 2,560 molecules within the FDA-approved drug library supplemente
collection. 26
10 0
10
10
4hr 4hr
OD (
C
C
C
4 hr 10 10 4 hr 10
10 02 10
8hr 4
10 8hr 4
Halicin is potent to resistant bacteria in mice

ml
10 A. baumannii CDC 288 01 10 10 A. baumannii CDC 288 10
10
g
nutrient deplete A. baumannii CDC 288 nutrient deplete
10
2 2hr 0 10 10
10
2
10
-2 10 -1
4hr 0 1 - 2 -4 - -2 -1 0 1 -2 -1 0 1
C
2 2 2
C
10 10 10 10 10 10 10 1010 1010 10 10 10 10vehicle
10 10 10
halicin 10 10 10 10 10 vehicle halicin
4 hr
[halicin] g ml [halicin] g ml (0 DMSO) 10 w v)
(0 [halicin] g ml (0 DMSO) (0 w v)
10
E 8hr D F E 4
F
10 A. baumannii1 CDC 288 10 10 10
8 8
2
nutrient vehicle
deplete 10 vehicle 10
08
10 10
1 2 Strong
-2 -1in vivo
ole 0 inhibition
1 of2
10
Strong in vivo inhibition10
against
OD ( 00nm)
metronida metronida ole
10 10 10 10 10 0 10 10 10 10 10 vehicle halicin 10
pan-resistant A. baumannii resistant(0 C. wdifficile
g
[halicin] g ml 10 (0 DMSO) v)
10
C
04
C E
disrupt
in ect
halicin 10
4 vehicle F disrupt
in ect
halicin 10
4 vehicle
coloni ation 02 metronida ole ( 0 mg kg) coloni ation metronida ole ( 0 mg kg)
10
resistance 10 halicin (1 mg kg) 10resistance 10 halicin (1 mg kg)
C. difficile 0
8
8 0 vehicle
10
2
10 10
2
10
2
10 - 10
2 -48 -24 0 hrs 2410 - -4144 -
10 10 10 10
-2
10
-1 0
24 10
148 2
10 10 2 - 2 120
-48 -24 144 0 hrs 24 144 24 48 2 120 144
ampicillin C. difficile treatment [halicin] g ml Time a ter in ection10(hours)
ampicillin C. difficile treatment Time a ter in ection (hours)
10
200 mg kg in ection every 24 hrs metronida ole 200 mg kg in ection every 24 hrs
10
g
g
Infection is gone Bacteria still alive

Murine Models10
of Infection Figure 5. Halicin Displays Efficacy in Murine Models of Infection
with halicin 10
by halicin. Shown is the mean of two biological replicates. with metronidazole
C
C
umannii CDC 288 by halicin. Shown is the

(A) mean
Growth
ofinhibition
two biological
of pan-resistant
replicates. A.
Bars
baumannii
denote absolute
CDC 288error. Bars denote absolute error.
n the presence of 10varying concentrations
disrupt (B)ofKilling
halicin
ofafter
A. baumannii halicin
2 h (blue),CDC
4 h (cyan),
288 in 6PBSh (green),
in the presence 4 The
and 8 h (red). vehicle
of varying initial
concentrations
cell of halicin after 2 h (blue), 4 h (cyan), 6 h (green), and 8 h (red). The initial c
10
coloni
n of two biological 4replicates. Barsation
denote absolute
density is !10inerror.
8 ect
CFU/mL. Shown is the mean of two biological replicates. metronida Bars ole ( 0absolute
denote mg kg) error.
10
nfected with A. baumannii resistance
CDC 288 for 1(C)
h and
In a wound
treated infection
with either
model,
vehicle
mice
(green;
were0.5%
infected
DMSO;withnA.=10 halicin
baumannii
6) or halicin CDC (1
288 mg
(blue; for 1kg)
h and treated with either vehicle (green; 0.5% DMSO; n = 6) or halicin (blu
om wound tissue after treatment was determined
10
0.5% w/v; nby= selective
6) over 24plating.
h. Bacterial
bacterial load for each treatment group.
Blackload
linesfrom
represent
woundgeometric
tissue 2after
10
mean
treatment
of the was determined by selective plating. Black lines represent geometric mean of t
Infection is gone
1
10 1010 10
alicin.
2
Shown is the mean- of2 two-48 -24
vehicle
biological
(D) Growth
replicates. 0 hrs
Barshalicin
inhibition denote 24 630error.
of C. difficile
absolute 144 Shown is the mean24
by halicin. 48
of two biological 2
replicates. 120 absolute
Bars denote 144 error. with halicin
tion and treatment. (0
ampicillin (E) Experimental
DMSO) design for
(0
C. difficile w v)C. difficile
treatmentinfection and treatment. Time a ter in ection (hours)
of infected mice. Metronidazole
200(red;
mg 50
kg(F)
mg/kg;
Bacterial
n =in6)
load
didofnot
ection C.result
difficile
in enhanced
630 in feces
every 24ratesof infected
hrs of clearance
mice.relative
Metronidazole
to vehicle (red; 50 mg/kg; n = 6) did not result in enhanced rates of clearance relative to vehic
F
cin-treated mice (blue; 15 mg/kg; n = 4)controls
displayed (green;
sterilization
10% PEG beginning
300; n at= 7).
72 Halicin-treated
h after treatment,
micewith
(blue;
100% 15 of
mg/kg;
mice n = 4) displayed sterilization beginning at 72 h after treatment, with 100% of mi
10 geometric mean of the
nt. Lines represent being
bacterial
free of load
infection
for each
at 96treatment
h after treatment.
group. Lines represent geometric mean of the bacterial load for each treatment group.
See also Figure S4.
in Murine 10
Models of Infection
8 27
10
5 0.8 10
3.50 0
G0.210 4
Number of mol
Prediction sc
0 0.4
Number of mol
Br O
1
4 Br
91
1
28
19
46
73
1
64
82
37
55
N+
10
0
0.
4 1 0.9 Tanimoto 9 score
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
O-
1 N N+ 103
4 2 4 4 4 10
Large-scale virtual screening

3 N
10
Br Br N OH
Bin
4.5
2
-O N
2N N 8
E F
N+ N+ N
102
10
2
O OH N O
O
10
5
2.5 11 2.5 0.85 17
10 100 10
10
LogP
ZINC000100032716 ZINC000225434673
LogP
HN
3 N N SH 0
>5
3 0 O
32 10 6
CFU/ml
10 10
0.8
0.1 8
H2N NH2
3.5
10
S
3.5 200 300 350 400 450 >500
•
S
5
Applied the same model to screen compounds in the ZINC library
0.2 10 0.4 0
1
1
10
MIC (μg/ml)
1
91
1
28
19
46
73
1
0
0.
64
82
37
55
MIC (μg/ml)
Molecular weight (Daltons) 4
0
0.
0.
N
4 Tanimoto4 score
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
Bin 32 32 32 16
O S N
10
N
time
D
+
-2 N
4.5
10
4.5
E
•
O -
We identified EC8 more

SA KPcompounds
AB PA Ewith inhibition against E. coli (EC), S. 5 O
F 10
3 time
5 1 F N +
100 1 12 BW
aureus (SA), K. pneumoniae (KP), A. baumannii (AB), or P. aeruginosa (PA) 10 ZIN
-3 O -
O 10 ZINC000100032716
>5 ZINC000225434673
>5 F
O
N NH 128 2 128 64 10
-4
128
in vitro200 300 350 400 450 >500
HN
N F 0.1 200 300 350 400 450 >500 0.1
-4
10 Molecular weight (Daltons) 0.1
MIC (μg/ml)
N
10
MIC (μg/ml)
MIC (μg/ml)
Molecular weight (Daltons)
a r I
3* tolC 1* -1 ’’)-I c * C -2 -1*
S
O
- A T - 3 l
D b o
S
-2 51 1 R C X A 2 - I 128 11 t primaryR train C
D
S
10 W 2 B ∆ C O t( ’ ) 25 B ∆ 10C
N O
32 8 32 am HN M NH
a n c (6 W am M library
Broad
EC aSA KP AB BPA∆b
S
B ∆b a
EC SA KP AB PA
N N+ N
O O- O
1 WuXi library
-3
10
ZINC15 pred
G H
-3 O O
O
Br
Br O
10 H N S 2
F
F
O
N+
9 O
O
0.01 0.25 4 0.25
128 10 0.06
9 false predict
Br Br N N
N
F
N N+
OH
O-
4 2 4 128
4 4
10 N
N NN F true-4predictio
-O N N
10
-48 N N N
8
0.1 10
N+ N+
N
N
N
O 10 O - OH
10
O OH N O
3 * S olC O-1* OAT O
S N +
- 1 - I a - c r 3 * lC 1 *113
S 7
10 511 B∆tN R C A (2’ ’ ) - I b 10
7
11 to -
R 25 C
S
X t ’ ) 5 ∆
N HN S O
32 8
Minimum 32
inhibitory W 2
am NM
C S O
O 32 a n Minimum
a
8
c ( 6 32
inhibitory W 2
am B M C
B W ba
32
+
O N N SH B 6 ∆b N
a B 6 ∆b ∆
CFU/ml
+
CFU/ml
N
10 O
10
concentration (μM) concentration (μM)
N O -
O O-
H2N S NH2
S
Br
Br
N+
O
G 10
5 Br
Br O
N+ H 10
5
G
O- 9
N
O- 9
10 4
N N+ 9
10 4 10
-O
O
Br Br+
N
S N N
N N
N N+
OH
32
4 32
2 32
4 16
4 4 10
time
-O
Br
= N0
Br
N
N
N
N
OH
4 2 4 4 10 4 time = 0 8
time = 4hrs10
-O N N 8 N + + N 8
N+ N+
N
N
O 10 3
10 time
O = OH
4hrs
N O
N O
10 3
10
O OH N O
O
7 BW25113 7
2
BW25113 10 7 28
F N+ 2
Compare GNN with other models
• Only GNN ranks Halicin among the top 100 compounds.
• Given our budget, Halicin would not be discovered by other models
Rank of
Model Feature
Halicin
Graph neural network Learned 61
Feed-forward neural network RDKit features (fixed) 273 Learned features are better than
hand-designed features
Feed-forward neural network Morgan fingerprint (fixed) 1217
Random forest Morgan fingerprint (fixed) 2640
Support vector machine Morgan fingerprint (fixed) 771
29
Part 2: infuse biological knowledge in GNNs


30
Motivation for biology-aware models
Representation
Biology- Biology-
Property
aware aware
• Existing property prediction models only

look at the chemical structure
• But properties may depend on additional

biological information
31
Case study: COVID-19 drug combinations
Mortality rate in a recent clinical trial
(Beigel et al., 2020)
Remdesivir 11.4% Still pretty high!
Placebo 15.2%
0 0.04 0.08 0.12 0.16
• Most HIV treatments are drug combinations
HIV • effect( , ) >> effect( ) + effect( )
• Can we find drug combinations for COVID?
32
Case study: COVID-19 drug combinations
• Two drugs are synergistic if effect( , ) >> effect( ) + effect( )
• Goal: Train a model to predict whether a drug combination is synergistic
• Challenge: training data is limited (less than 200 drug combinations), but
deep neural networks are very data hungry
+ + +
Big data Neural network Data Knowledge Neural network
33
Biological knowledge of viral replication
How can a drug block
COVID-19 infection?
1. Block viral entry by

inhibiting ACE2 or
TMPRSS2
2. Inhibit viral proteases:

3CLpro, PLpro, RdRp
3. Inhibit host targets that

interact with viral proteins
(Gordon et al., 2020)
Figure source: Cevik et al., BMJ 2020
34
ComboNet incorporates biology & chemistry
• Synergy comes from inhibition of certain biological targets (e.g., proteins)
• Model biological interaction ⇒ additional data ⇒ better generalization
Representation Biological representation

(to be learned)
…
Antiviral
activity
…
Chemical representation
(to be learned)
35
ComboNet learns drug-target interaction
1. Predict drug-target interaction — whether drug A inhibits target B
Compound Graph Representation

convolution
⏞
3CLpro
ACE2 Instead, predict whether a drug
…… Too sparse to
inhibits a biological target
…
HDAC2 use as features
⏞
Learned representation of
…
theinvolved
Targets molecular structure infection
in COVID-19
0.3 0.1 1 0.3 0.4 0.6 0.1 0 0.7 0
Compounds
Drug-target interaction data
0.9 0.2 0.3 0 0.9 1 0.8 0.4 0.1 0.7
(ChEMBL and NCATS) 0.1 0 0.5 0.1 1 0.1 0.9 0.4 1 0.3
0 0.3 0.2 0.7 0.1 0.2 0 0.8 0.1 0.5
36
ComboNet learns antiviral activity
2. Single-agent antiviral activity prediction
Compound Graph Representation Feed-forward

convolution network
3CLpro
ACE2
……
…
HDAC2 Antiviral
activity pA
…
Single-drug antiviral Drug Reserpine Remdesivir Penicillin Halicin
activity data (NCATS) Antiviral? Yes Yes No No
37
ComboNet learns antiviral synergy
Combination
3. Predict synergy for drug combinations synergy data
Single-drug (NCATS)
3CLpro antiviral activity
ACE2
……
…
HDAC2
pA
…
zAB
zA 3CLpro
Combination
Synergy
Compound A antiviral activity
ACE2 Feature representation of
drug combinationp sAB
……
bliss
HDAC2 (A,
ABB)
zAB = zA + zB − zA ⋅ zB
Compound B zB
…
3CLpro
ACE2
……
…
HDAC2 pB
…
38
ComboNet performance
• Training set (88 drug combinations); Test set (71 drug combinations)
ComboNet AUC is
0.8 on average
Remove chemical
or biological
ROC-AUC
information hurts
Standard models
cannot generalize
39
Discover new drug combinations
• Collaboration with National Center for Advancing Translational Science (NCATS)
• We experimentally tested top drug combinations in NCATS Vero E6 cell assays
• Further studying these combinations in human cell lines
Remdesivir + reserpine Remdesivir + IQ-1S
Drug Virus alive (%) Drug Virus alive (%)

Remdesivir 77.3% Remdesivir 81.7%
Reserpine 42.5% IQ-1S 65%
Combination 3.2% Combination 0%
40
Part 3: de novo drug design


41
Motivation for de novo drug design
• Deep learning can discover new antibiotics and COVID-19 drugs
• Simple approach: train a GNN to rank all the compounds in our library
• Reason: maximize the speed of experimental validation
60
• Problem: number of drug like molecules = 10 . We can’t rank all of them.
Compound
library Candidates
4 8
(10 − 10 )
42
Graph generation for de novo drug design
• Learn a distribution whose mass is concentrated around “good” molecules
• Let’s train a generative model to directly generate “good” molecules
60
• It can efficiently explore the entire chemical space (10 molecules)
Generate
How to generate
molecular graphs?
Generative model A good molecule
43
Previous solution 1: sequence-based methods
• Prior work used recurrent neural networks to generate molecular graphs
(Olivecrona et al., 2018; Gomez-bombarelli et al., 2018; Popova et al., 2018; …)
• Convert a molecule into a SMILES string (a domain specific language)

(Weininger, 1988)
Convert it into a SMILES string

Cc1cn2c(CN(C)C(=O)c3ccc(F)cc3C)c(C)nc2s1
Recurrent neural networks (RNNs)
Weininger, D. SMILES, a chemical language and information system. Journal of chemical information and computer sciences, 28(1):31–36, 1988. 44
Problems of sequence-based approach
• Prior work used sequence-based generative models for molecular graphs
(Olivecrona et al., 2018; Gomez-bombarelli et al., 2018; Popova et al., 2018; …)
• But this string representation is quite brittle…

O O
N N S
N S N
O N O N
F F
N N
Cc1cn2c(CN(C)C(=O)c3ccc(F)cc3C)c(C)nc2s1
N S N S
Cc1cc(F)ccc1C(=O)N(C)Cc1c(C)nc2scc(C)n12 N
Two almost N
F Quite different
F strings
identical graphs Cc1cn2c(CN(C)C(=O)c3ccc(F)cc3C)c(C)nc2s1
Cc1cc(F)ccc1C(=O)N(C)Cc1c(C)nc2scc(C)n12
45
Previous solution: node-by-node generation
• A straightforward approach: generate a graph node-by-node (Liu et al., 2018)
Add nodes
one by one
……
• Molecules are typically sparse: N nodes, O(N) edges
• However, it needs to make O(N) edge predictions in each step
2
• In total: O(N ) edge predictions
46
Failure of node-by-node generation
• Node-by-node generation via a variational auto encoder (VAE) (Liu et al., 2018)
• Diagnostic test: can the decoder reconstruct an input molecule?
Reconstruction accuracy
80
64
encode COVID-19 drug remdesivir
Accuracy
They should 48
be the same 32
decode 16
0
20 40 60 80 100
Molecule size (number of atoms)
47
We need to leverage inductive bias
Inductive
bias?
Sequence
Grid graph
Molecular graphs Dense
(text) (images) (low treewidth) graphs
Complexity
What’s up? All models leverage the inductive

bias of the structure.
48
Junction tree variational autoencoder
Motif Motifs are small due to low treewidth
Junction
tree Motif vocabulary
Tree decomposition
250K graphs ⇒ 638 motifs
99.9% coverage (new graphs)
Molecular
graph
Inspired by the junction tree algorithm in graphical models.

49
Details: hierarchical encoder & decoder
Hierarchical Encoder Hierarchical Decoder
encode decode
Molecular
representation
50
Hierarchical graph encoder
Motif vector
Run graph convolution in the

junction tree
Propagate
node vectors
Run graph convolution in the

molecular graph
Node vector
51
Hierarchical graph decoder
Motif vocabulary
Step 1: predict the next motif
Step 2: predict how to attach

this motif to the current graph
52
Hierarchical graph decoder
Motif-by-motif generation
Attach this motif

to this graph
53
Motif-by-motif versus node-by-node
• Training objective: minimize reconstruction loss
• Motif-by-motif generation is able to reconstruct large molecules!
Reconstruction accuracy
90 Motif-by-motif
generation
72
encode
Accuracy
They should 54
be the same 36
decode 18
Node-by-node
0 generation
20 40 60 80 100
Molecule size (number of atoms)
54
Results: molecular optimization
• Task: learn to modify a non-drug-like molecule into a drug-like molecule
• Drug-likeness is measured by QED scores (Bickerton et al., 2012)
Low QED score High QED score Drug-likeness optimization

80
76.9
73.6
Success rate
70
60
58.5
A local modification significantly
improves drug-likeness 50
Sequence node-by-node motif-by-motif
55
Part 3: de novo drug design


56
Deep learning for molecular sciences
Drug discovery
(e.g., de novo drug design)
Deep learning • Dahl et al., 2015;
• Stokes et al., 2020;

• Jin et al., 2018; ……
Chemistry
(e.g., reaction prediction)
• Duvenaud et al., 2015;
• Coley et al., 2019;

• Jin et al., 2017; ……
Material Science Biology

(e.g., polymer design) (e.g., protein folding)
• Gomez-bombarelli et al., 2018

• Rao et al., 2019;
• Xie et al., 2019 • Senior et al., 2020;

• Jin et al., 2020; …… • Jin et al., 2020; ……
57
Thanks to my collaborators
Regina Barzilay Tommi Jaakkola Klavs Jensen William Green Phillip A. Sharp James Collins Caroline Uhler
Rafael Gomez-Bombarelli Connor Coley Camille Bilodeau Peter Sorger Rachel Wu Jonathan Stokes Kyle Swanson
David Alvarez-Melis Guang-He Lee Allison Tam Nienke Moret Anne Fischer Kevin Yang Tao Lei 58
Thanks to my collaborators
59

Deep Learning For Drug Discovery: Wengong Jin Massachusetts Institute of Technology

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Learning For Drug Discovery: Wengong Jin Massachusetts Institute of Technology

Uploaded by

Copyright:

Available Formats

Deep learning for drug

Massachusetts Institute of Technology

Figure source: Pharmaceutical Research and Manufacturers of America 2

H A good drug (e.g., kills virus)

Figure: Koch et al., PNAS 2005

Figure source: Andrii Buvailo

• Virtual screening: assess whether a compound is a good drug using computation

e growth inhibitory molecules; blue are non-growth inhibitory molecules.

any compounds (faster testing)

• Limitation 1: it loses coverage — at

• Limitation 2: traditional techniques

De novo drug design

oncurrently propose, create, and characterize suggests, inverts this paradigm by

• Advantage: can explore the entire Virtual

• Limitation 1: we need to synthesize

The key to success:

automatic feature learning

• Virtual screening: traditional methods are based on hand-crafted features

He et al., “Deep residual learning for image recognition." CVPR 2016 13

Deep learning: a promising direction

growth prediction score

Stokes, Yang, Swanson, Jin et al, Cell 2020 16

• Part 2: Incorporate biological knowledge into graph neural networks:

• Part 3: Generative models for de novo drug design

• We need novel antibiotic classes due to antibiotic resistance

Why graph neural

• Molecular weight, number of heavy atoms

• More sophisticated features: Morgan

• Exhaustive enumeration of all possible

• Result: high dimensional features (2048),

Problem of traditional features

1 • Traditional methods are0 8based on fixed, hand-engineered molecular features.

• So these hand-engineered0 features can miss some of the unknown patterns

• A molecule is represented as a graph

Each bond is an edge in the graph

Each atom is a node in the graph

This vector It encodes

Graph Feature Feed-forward

Use GNN for virtual screening

growth prediction score

growth prediction score

Halicin is a novel and potent antibiotic

Figure 2. Initial Model Training and the Identification of Halicin

Halicin is potent to resistant bacteria in mice

Infection is gone Bacteria still alive

umannii CDC 288 by halicin. Shown is the

Large-scale virtual screening

We identified EC8 more

• Given our budget, Halicin would not be discovered by other models

Graph neural network Learned 61

Random forest Morgan fingerprint (fixed) 2640

Support vector machine Morgan fingerprint (fixed) 771

• Part 2: Incorporate biological knowledge into graph neural networks:

• Part 3: Generative models for de novo drug design

• Existing property prediction models only

• But properties may depend on additional

Remdesivir 11.4% Still pretty high!

• Most HIV treatments are drug combinations

HIV • effect( , ) >> effect( ) + effect( )

• Can we find drug combinations for COVID?

• Goal: Train a model to predict whether a drug combination is synergistic

1. Block viral entry by