You are on page 1of 59

Deep learning for drug

discovery
Wengong Jin

Massachusetts Institute of Technology


Drug discovery is a time-consuming process
Average time/cost for designing one drug = 10 years + $2.6B

Figure source: Pharmaceutical Research and Manufacturers of America 2


Obviously, we can’t wait for 10 years…

3
OD
01 0

growth
1 Drug
T R discovery is a challenging search problem 0

0 01
20 40 0 80 100 0 20 40 0 80 100
redicted molecules redicted molecules (highest to lowest predictions)

H A good drug (e.g., kills virus)


I
1 0
N 0
Search in the 08
S

Tanimoto similarity
0
chemical space O S

OD ( 00nm)
0 N N
0
S N
O
04 0
H2N
0
02
0
0
y
0 00
Data source: PhRMA.org
1000 1 00 2000 2 00 4
Number of possible
drug-like molecules

60

10
(Kirkpatrick, et al. 2004)

5
• Experimental facilities in industry can only test 10 compounds/day

Figure: Koch et al., PNAS 2005


Kirkpatrick, et al., Nature. 2004 5
Automate drug discovery with computation

Let AI find
good drugs!

Figure source: Andrii Buvailo

6
m is to Computational drug discovery: three schemes
acterize
m nsmitt-
mate im
e is isaim
toto is aim
ultimate to is to
y. This
racterize
acterize
Simulation Virtual screening De novo drug design
eate,
d and characterize
characterize
nent nverse
ransmitt-
nsmitt-
component
transmitt- transmitt-
sly.
y. This This This
This
simultaneously.
aneously.
dthe
p,” inverse
nverse loop,”
and and inverse
inverse
12, 15).
perties
ing the
hodsatomic
roperties
perties reveal properties
eal properties
)specifying
fying
ing ly coor-
the
afterthespecifying
the the
satomic
he t name
atomic
tituentconstituent
atomic atomic
ng D)
)nal with
coor-
coor-
mensional
(3D) coor- (3D) coor-
srse
gn, for
itsname anits name
name
design,
as as its name
ngtying
adigm is the
with
with
startingby starting
with with
ngucture.
for
and
arching foran anfor an for an
searching
tut
he to one
is
isinput
Here thethe
theisinput
the is the
of
ucture. prob-
tructure.
tput is the structure.
the structure.
2) pto uses
to
oneoneto map
ecessarily
ily map one to one
aethods
fofprob-
bution prob-
distribution
of prob-of prob-
ality
2) . 2) of
uses
uses Fig. 2)
2. uses
Schematic of the different approaches toward molecular design. Inverse design starts

Down
n (Fig. 2)(Fig.
design uses
7
methods
ethods
and
arch Figure from
source:
search
methods desired properties
Sanchez-Lengeling
methods and361,
et al., Science ends in chemical
360–365 (2018) space, unlike the direct approach that leads from
m is to Simulation is often too slow
acterize
m nsmitt-
mate im
e is isaim
toto is aim
ultimate to is to
y. This
racterize
acterize
Simulation Virtual screening De novo drug design
eate,
d and characterize
characterize
nent nverse
ransmitt-
nsmitt-
component
transmitt- transmitt-
sly.
y. This This This
This
simultaneously.
aneously.
dthe
p,” inverse
nverse loop,”
and and inverse
inverse
12, 15).
perties
ing the
hodsatomic
roperties
perties reveal properties
eal properties
)specifying
fying
ing ly coor-
the
afterthespecifying
the the
satomic
he t name
atomic
tituentconstituent
atomic atomic
ng D)
)nal with
coor-
coor-
mensional
(3D) coor- (3D) coor-
srse
gn, for
itsname anits name
name
design,
as as its name
ngtying
adigm is the
with
with
startingby starting
with with
ngucture.
for
and
arching foran anfor an for an
searching Takes
tut
he to one
is
isinput
Here thethe the is thedays for one
theisinput
of
ucture. prob-
tructure.
tput is the structure. compound
the structure.
2) pto uses
to
oneoneto map
ecessarily
ily map one to one
aethods
fofprob-
bution prob-
distribution
of prob-of prob-
ality
2) . 2) of
uses
uses Fig. 2)
2. uses
Schematic of the different approaches toward molecular design. Inverse design starts

Down
n (Fig. 2)(Fig.
design uses
8
methods
ethods
and
arch Figure from
source:
search
methods desired properties
Sanchez-Lengeling
methods and361,
et al., Science ends in chemical
360–365 (2018) space, unlike the direct approach that leads from
growth predi
0

OD ( 0
01 04

0 01
Virtual screening
02

0
2 NR

0 20 40 0 80 100 0 10 20 0 40 0 0 0

• Virtual screening: assess whether a compound is a good drug using computation


redicted molecules (highest to lowest predictions) redicted molecules

H
1 models (Walters et al., I1998;
0
McGregor et al., 2007; …)
N 0
08
S
Tanimoto similarity

0
O S

OD ( 00nm)
0 N N
S N
04 Prediction: good!
O
04 0
H2N
02
02 Virtual screening
Compound 01 Experiments
model
B 2 11
0 0
0 00 1000 1 00 2000 2 00 10
-
10
-4
10
- -2
10 10
-1
10
0
10
1
10
2
10
• Virtual screening
Ranked training set molecules is much faster than experimental screening in web labs.
[halicin] g ml

8
It can test 10 compounds within a day, while experimental screening
ation of Halicin

by 2,560 molecules within the FDA-approved drug library supplemented with a natural product collection.

takes years

e growth inhibitory molecules; blue are non-growth inhibitory molecules.


training. Dark blue is the mean of six individual trials (cyan).
ng Hub molecules that were not present in the training dataset.
• It is also much cheaper than experimental screening
ere curated for empirical testing for growth inhibition of E. coli. Fifty-one of 99 molecules were validated as
wn is the mean of two biological replicates. Red are growth inhibitory molecules; blue are non-growth
9
Virtual screening: inherent trade-off
• Virtual screening is restricted to Ease for
synthesis
commercially available compounds
(e.g., ZINC library)

Virtual
• Advantage: no need to synthesize screening

any compounds (faster testing)

• Limitation 1: it loses coverage — at


9
best, we can screen 10 compounds

• Limitation 2: traditional techniques


are based on hand-crafted features Coverage

10
growth predicti

OD ( 00n
OD ( 00n
0 0

De novo drug design


04 01 04

02 1 T R 02
2 NR
0 time between 0 01steps. The ultimate aim is to 0
0 20 40 0 concurrently
80 100 propose,
0 create, 40and characterize
20 0 80 100 0 10 20 0 40 0
• De novo drug design: directly
redicted molecules generate a
redictedcompound
molecules (highest towith
new materials, with each component transmitt- desired
lowest predictions) properties redicted molecules

(Moon
G et al., 1991; Clark et ingal.,
and1995;H Schneider
receiving
1
& Fechner,
data simultaneously. This2005; I …)
0
process is called “closing the loop,” and inverse
N 0
design is a critical
08 facet (12, 15). S

Tanimoto similarity
0
Property criteria
O S

OD ( 00nm)
N
Inverse design
0 N
04
(potency, safety, …) O
S N
Quantum chemical 04 methods reveal properties 0
H2N
of a molecular system only after specifying the 02
Drugessential
design parameters
model 02
of the A good drug
constituent atomic 0 1Experiments
B 2 11
training set nuclei and their 0 three-dimensional (3D) coor- 0
Broad library
ime betweenhalicin
steps. The ultimate aim is to dinate positions (16).
0 Inverse
00 design,
1000 1as00its name
2000 2 00 10
-
10
-4
10
-
10
-2
10
-1
10
0
10
1
1

oncurrently propose, create, and characterize suggests, inverts this paradigm by


Ranked training starting with
set molecules [halicin] g ml

new materials, with each component transmitt-the desired functionality and searching for an
Figure 2. Initial Model Training and ideal molecular
the Identification structure.
of Halicin Here the input is the
ng and receiving data We simultaneously.
need to solve This
(A) Primary screening data for growth inhibition of E. coli by and
functionality 2,560 the
molecules within
output is the
theFDA-approved
structure. drug library supplemented with a natural product c
process is called “closing the loop,” and inverse
Shown an inverse problemreplicates.
is the mean of two biological Red are growth inhibitory molecules; blue are non-growth inhibitory molecules.
Functionality need not necessarily map to one
design is a critical facet (12, 15).
(B) ROC-AUC plot evaluating model performance after training. Dark blue is the mean of six individual trials (cyan).
(C) Rank-ordered prediction scores of Drug unique structure
Repurposing but to a that
Hub molecules distribution of prob-
were not present in the training dataset.
nverse design able instructures.
(D) The top 99 predictions from the data shown (C) were curated Inverse design
for empirical (Fig.
testing 2) uses
for growth inhibition of E. coli. Fifty-one of 99 molecules were va
true positives
Quantum chemical basedreveal
methods on a cut-off optimization,
of OD600
properties <0.2. Shown issampling,
the mean ofand search methods
two biological replicates. Red are growth inhibitory molecules; blue are no
11
inhibitory molecules. to navigate the manifold of functionality of
De novo drug design: inherent trade-off
• Virtual screening is restricted to Ease for
commercially available compounds synthesis
(e.g., ZINC library)

• Advantage: can explore the entire Virtual


screening
chemical space efficiently

• Limitation 1: we need to synthesize


new compounds, which can be hard

De novo
drug design
• Limitation 2: traditional techniques
explores the space based on hand-
designed rules (e.g., genetic algorithms)
Coverage

12
Deep learning: a promising direction
• Deep learning has achieved human-level accuracy in computer vision (He et al., 2016)
Feature learning

The key to success:

automatic feature learning

• Virtual screening: traditional methods are based on hand-crafted features


N
H NH2
N
N N
Use deep learning to learn Prediction: good!
N
features automatically
F
Features designed
by experts Model

He et al., “Deep residual learning for image recognition." CVPR 2016 13


Re

re
Tru
04
02 02
02

Deep learning: a promising direction


A C 08 0 002
0 0 0
0 00 1000 1 00 2000 2 00 000 0 02 04 0 08 1 0 1000 2000
late inde alse positive rate Ranked

D E F
• Deep generative
12 modelspredictions
can generate realistic
10 text and images with 12
pr
desired properties

growth prediction score


1 1

08 1 08

OD ( 00nm)
OD ( 00nm)
0
Deep Generate an image
0

generative
04
of an armchair
0 1 in the 04

shape of avocado
02
models
1 T R 02
2 NR
0 0 01 0
Ramesh et0al., 2020
20 40 0 80 100 0 20 40 0 80 100 0 10 20 0
redicted molecules redicted molecules (highest to lowest predictions) redicted

• G
De novo drug H
design: generate a compound
1 with desired I
properties 0
N 0
08
Use deep S

Tanimoto similarity
0
O
Property criteria
S A good0 4

OD ( 00nm)
N N

(potency, safety, …)
generative 0
O
S N drug
models 04 0
H2N
02
02
01
training set B 2 11
Silver et al., “Mastering the game of Go with deep neural networks and tree search”, Nature (2016).
0 0
Broad library 14-2
Ramesh et al., “DALL-E: creating images from text ”, OpenAI blog 0 00 1000 1 00 2000 2 00 10
-
10
-4
10
-
10
growth predicti

OD ( 00n
0

01
Main technique: graph neural networks 04

02
2 NR
0 01 0
100 0 20 40 Virtual screening / molecular
0 80 0 10
100 20 property
0 40 prediction
0 0 0
redicted molecules
(Duvenaud et (highest to lowest
al. 2015; predictions)
Kearnes et al. 2016; Jin et al., 2017;
redicted molecules
Gilmer et al., 2017; …)
Graph
H I
1 0 encoding
N 0
08
S
Tanimoto similarity

0
O S

OD ( 00nm)
0 N N
S N
04 Property

O
04 0 (numerical attributes)
H2N
02
02
Graphs 01 Graph
0 0
B 2 11 generation
0 00 1000 1 00 2000 2 00 10
-
10
-4
10
- -2
10 10
-1
10
0
10
1
10
2
10
Ranked training set molecules De novo drug design [halicin] g ml
(Olivecrona et al., 2018; Gomez-bombarelli et al., 2018; Jin et al., 2018; Popova et al., 2018; …)
ntification of Halicin
E. coli by 2,560 molecules within the FDA-approved drug library supplemented with a natural product collection.
Red are growth inhibitory molecules; blue are non-growth inhibitory molecules.
15
e after training. Dark blue is the mean of six individual trials (cyan).
Example: discovery of new antibiotics

Stokes, Yang, Swanson, Jin et al, Cell 2020 16


Outline of today’s lecture
• Part 1: graph neural networks for antibiotic discovery
[ICML’17, NeurIPS’17, JCIM’19, Cell’20]

• Part 2: Incorporate biological knowledge into graph neural networks:


application to COVID-19 drug combination discovery
[PNAS (In submission)]

• Part 3: Generative models for de novo drug design


[ICML’18, ICLR’19, ICML’20a,b,c]

17
Part 1: antibiotic discovery
History of antibiotic discovery

• After 1990s, we struggle to discover novel antibiotic classes (Silver et al., 2011;
Brown et al., 2014; Shore & Coukell, 2016)

• We need novel antibiotic classes due to antibiotic resistance

Figure source: ReAct group FDA = U.S. Food and Drug Administration 18
Virtual screening for antibiotic discovery
• Through collaboration with the Broad Institute, we collected 2560
molecules with measured growth inhibition against E. coli (BW25113)

Why graph neural


Drug Antibacterial networks?
Training
Nitrocefin Yes data
Predict
Reserpine No
antibacterial
Penicillin Yes properties
IQ-1S No
Graph neural network
…… ……

19
Traditional approach: hand-crafted features
• Traditional methods are based on fixed, hand-
engineered molecular features.

• Molecular weight, number of heavy atoms

• More sophisticated features: Morgan


fingerprint (Rogers & Hahn 2010)

• Exhaustive enumeration of all possible


substructures, up to radius 3

• Result: high dimensional features (2048),


different substructures merged by hash
20
A C 08 0 002
0 0
0 02 04 0 08 1 0 1000 2000 000 4000 000

Problem of traditional features


alse positive rate Ranked molecules

E F
10 12
predictions
growth prediction score

1 • Traditional methods are0 8based on fixed, hand-engineered molecular features.

OD ( 00nm)
• Molecular weight, number
0 of heavy atoms, etc.

01 04
• Problem: we don’t know 02
all the antibacterial patterns

• So these hand-engineered0 features can miss some of the unknown patterns

2 NR
0 01
0 20 40 0 80 100 0 10 20 0 40 0 0 0


redicted molecules (highest to lowest predictions) redicted molecules
Graph neural networks automatically learn a feature representation from data
I
1 0
N 0
08
S
Tanimoto similarity

0
O S
OD ( 00nm)

0 N N
04
O
S N Features are learned Prediction: good!
04 0
H2N automatically
02
Features
02 designed
Compound 0 1 by experts Model
B 2 11
0 0
0 00 1000 1 00 2000 2 00 10
-
10
-4
10
- -2
10 10
-1
10
0
10
1
10
2
10
Ranked training set molecules [halicin] g ml
21
Graph neural network (GNN)
• Rich history of GNNs (Gori et al., 2005, Scarselli et al., 2009, Duvenaud et
al. 2015, Kearnes et al. 2016, Jin et al., 2017, Gilmer et al., 2017, Zitnik et
al., 2018, etc.)

• A molecule is represented as a graph

Each bond is an edge in the graph

Each atom is a node in the graph

22
Graph neural network (GNN)

Graph
Pooling
convolution

This vector It encodes


Graphafeature
Atom type encodes a larger subgraph
representation
local subgraph

23
Graph neural network (GNN)

Antibacterial
property

Graph Feature Feed-forward


convolution representation network

Le
arn
i xed e d
F Hand-crafted features
Antibacterial
property
Le Deep learned features
ar n e d
ed r n
L ea

24
D E F
12 10 12
predictions

Use GNN for virtual screening

growth prediction score


1 1

08 1 08

OD ( 00nm)
OD ( 00nm)
4
We virtually screened 10 compounds in Broad drug repurposing hub

0 0
• 04 01 04

• We0experimentally
2 1 T R tested the top 99 compounds in the Broad Institute
0 2 2 NR
0 0 01 0
• 51 of them
0 are indeed
20 40 0antibacterial
80 100 — hit
0 rate
redicted molecules
20 = 51.5%
40 0 80 100
redicted molecules (highest to lowest predictions)
0 10 20
r

G H I
1 0
N 0
08
S

Tanimoto similarity
0
O S Compound SU3327

OD ( 00nm)
N
51 drugs Low
Structural 0 N
04
novelty S (renamed as Halicin)
toxicity O N
04 0
H2N
02
02
01
training set B 2 11
0 0
Broad library
0 00 1000 1 00 2000 2 00 10
-
10
-4
10
-
halicin
Ranked training set molecules 25
12 12 10 12
predictions predictions

growth prediction score


1 1 1

Halicin is a novel and potent antibiotic


08

OD ( 00nm)
08 1 08

OD ( 00nm)
OD ( 00nm)
0 0 0

04 04 01 04
• Halicin
02 shows
02
potent
1 T R
growth inhibition against E. coli in vitro
02
2 NR 2 N


0
80 100 It is also
0 structurally different from known
10
0
100 antibiotics
200
0 0 20 40
20 400 00 0 80
0 01
40 0 80 100
0
0 10
west predictions) redicted molecules
redicted molecules redicted molecules (highest to lowest predictions)

I G H I
0 1 0
0 N 0
08
S

Tanimoto similarity
0 0
O
OD ( 00nm)

OD ( 00nm)
N 0 N N
04 04
N S N
O
0 Inhibition 04 0
H2N
02 02
02
01 Low similarity to existing antibiotics 01
B training
2 11 set B 2 11
0 0 0
Broad library
2000 2 00 10
-
10
-4
10
-
10
-2
10
-1
10
0
10
1
10
2
10 0 00 1000 1 00 2000 2 00 10
-
10
-4
10
halicin
ecules [halicin] g ml Ranked training set molecules

Figure 2. Initial Model Training and the Identification of Halicin


(A) Primarywith
approved drug library supplemented screening data
a natural for growth
product inhibition of E. coli by 2,560 molecules within the FDA-approved drug library supplemente
collection. 26
10 0
10
10
4hr 4hr

OD (

C
C

C
4 hr 10 10 4 hr 10
10 02 10
8hr 4
10 8hr 4

Halicin is potent to resistant bacteria in mice


ml
10 A. baumannii CDC 288 01 10 10 A. baumannii CDC 288 10
10

g
nutrient deplete A. baumannii CDC 288 nutrient deplete
10
2 2hr 0 10 10
10
2
10
-2 10 -1
4hr 0 1 - 2 -4 - -2 -1 0 1 -2 -1 0 1

C
2 2 2

C
10 10 10 10 10 10 10 1010 1010 10 10 10 10vehicle
10 10 10
halicin 10 10 10 10 10 vehicle halicin
4 hr
[halicin] g ml [halicin] g ml (0 DMSO) 10 w v)
(0 [halicin] g ml (0 DMSO) (0 w v)
10
E 8hr D F E 4
F
10 A. baumannii1 CDC 288 10 10 10
8 8
2
nutrient vehicle
deplete 10 vehicle 10
08
10 10
1 2 Strong
-2 -1in vivo
ole 0 inhibition
1 of2
10
Strong in vivo inhibition10
against
OD ( 00nm)
metronida metronida ole
10 10 10 10 10 0 10 10 10 10 10 vehicle halicin 10
pan-resistant A. baumannii resistant(0 C. wdifficile

g
[halicin] g ml 10 (0 DMSO) v)
10

C
04
C E
disrupt
in ect
halicin 10
4 vehicle F disrupt
in ect
halicin 10
4 vehicle
coloni ation 02 metronida ole ( 0 mg kg) coloni ation metronida ole ( 0 mg kg)
10
resistance 10 halicin (1 mg kg) 10resistance 10 halicin (1 mg kg)
C. difficile 0
8
8 0 vehicle
10
2
10 10
2

10
2
10 - 10
2 -48 -24 0 hrs 2410 - -4144 -
10 10 10 10
-2
10
-1 0
24 10
148 2
10 10 2 - 2 120
-48 -24 144 0 hrs 24 144 24 48 2 120 144
ampicillin C. difficile treatment [halicin] g ml Time a ter in ection10(hours)
ampicillin C. difficile treatment Time a ter in ection (hours)
10
200 mg kg in ection every 24 hrs metronida ole 200 mg kg in ection every 24 hrs
10

g
g

Infection is gone Bacteria still alive


Murine Models10
of Infection Figure 5. Halicin Displays Efficacy in Murine Models of Infection
with halicin 10
by halicin. Shown is the mean of two biological replicates. with metronidazole

C
C

umannii CDC 288 by halicin. Shown is the


(A) mean
Growth
ofinhibition
two biological
of pan-resistant
replicates. A.
Bars
baumannii
denote absolute
CDC 288error. Bars denote absolute error.
n the presence of 10varying concentrations
disrupt (B)ofKilling
halicin
ofafter
A. baumannii halicin
2 h (blue),CDC
4 h (cyan),
288 in 6PBSh (green),
in the presence 4 The
and 8 h (red). vehicle
of varying initial
concentrations
cell of halicin after 2 h (blue), 4 h (cyan), 6 h (green), and 8 h (red). The initial c
10
coloni
n of two biological 4replicates. Barsation
denote absolute
density is !10inerror.
8 ect
CFU/mL. Shown is the mean of two biological replicates. metronida Bars ole ( 0absolute
denote mg kg) error.
10
nfected with A. baumannii resistance
CDC 288 for 1(C)
h and
In a wound
treated infection
with either
model,
vehicle
mice
(green;
were0.5%
infected
DMSO;withnA.=10 halicin
baumannii
6) or halicin CDC (1
288 mg
(blue; for 1kg)
h and treated with either vehicle (green; 0.5% DMSO; n = 6) or halicin (blu
om wound tissue after treatment was determined
10
0.5% w/v; nby= selective
6) over 24plating.
h. Bacterial
bacterial load for each treatment group.
Blackload
linesfrom
represent
woundgeometric
tissue 2after
10
mean
treatment
of the was determined by selective plating. Black lines represent geometric mean of t
Infection is gone
1
10 1010 10
alicin.
2
Shown is the mean- of2 two-48 -24
vehicle
biological
(D) Growth
replicates. 0 hrs
Barshalicin
inhibition denote 24 630error.
of C. difficile
absolute 144 Shown is the mean24
by halicin. 48
of two biological 2
replicates. 120 absolute
Bars denote 144 error. with halicin
tion and treatment. (0
ampicillin (E) Experimental
DMSO) design for
(0
C. difficile w v)C. difficile
treatmentinfection and treatment. Time a ter in ection (hours)
of infected mice. Metronidazole
200(red;
mg 50
kg(F)
mg/kg;
Bacterial
n =in6)
load
didofnot
ection C.result
difficile
in enhanced
630 in feces
every 24ratesof infected
hrs of clearance
mice.relative
Metronidazole
to vehicle (red; 50 mg/kg; n = 6) did not result in enhanced rates of clearance relative to vehic
F
cin-treated mice (blue; 15 mg/kg; n = 4)controls
displayed (green;
sterilization
10% PEG beginning
300; n at= 7).
72 Halicin-treated
h after treatment,
micewith
(blue;
100% 15 of
mg/kg;
mice n = 4) displayed sterilization beginning at 72 h after treatment, with 100% of mi
10 geometric mean of the
nt. Lines represent being
bacterial
free of load
infection
for each
at 96treatment
h after treatment.
group. Lines represent geometric mean of the bacterial load for each treatment group.
See also Figure S4.
in Murine 10
Models of Infection
8 27
10
5 0.8 10
3.50 0
G0.210 4

Number of mol
Prediction sc
0 0.4

Number of mol
Br O

1
4 Br

91
1

28
19

46

73
1

64

82
37

55
N+
10

0
0.
4 1 0.9 Tanimoto 9 score

0.
0.

0.
0.

0.

0.
0.

0.
0.

0.
O-
1 N N+ 103
4 2 4 4 4 10

Large-scale virtual screening


3 N
10
Br Br N OH
Bin
4.5
2
-O N
2N N 8
E F
N+ N+ N
102
10
2
O OH N O
O
10
5
2.5 11 2.5 0.85 17
10 100 10
10

LogP
ZINC000100032716 ZINC000225434673
LogP
HN
3 N N SH 0
>5
3 0 O
32 10 6

CFU/ml
10 10
0.8
0.1 8
H2N NH2
3.5
10
S
3.5 200 300 350 400 450 >500

S
5
Applied the same model to screen compounds in the ZINC library
0.2 10 0.4 0

1
1
10

MIC (μg/ml)
1
91
1

28
19

46

73
1

0
0.
64

82
37

55
MIC (μg/ml)
Molecular weight (Daltons) 4

0
0.

0.
N
4 Tanimoto4 score

0.
0.

0.
0.

0.

0.
0.

0.
0.

0.
Bin 32 32 32 16
O S N
10
N
time
D
+
-2 N
4.5
10
4.5
E

O -

We identified EC8 more


SA KPcompounds
AB PA Ewith inhibition against E. coli (EC), S. 5 O
F 10
3 time
5 1 F N +
100 1 12 BW
aureus (SA), K. pneumoniae (KP), A. baumannii (AB), or P. aeruginosa (PA) 10 ZIN
-3 O -
O 10 ZINC000100032716
>5 ZINC000225434673
>5 F
O
N NH 128 2 128 64 10
-4
128
in vitro200 300 350 400 450 >500
HN
N F 0.1 200 300 350 400 450 >500 0.1
-4
10 Molecular weight (Daltons) 0.1

MIC (μg/ml)
N
10

MIC (μg/ml)
MIC (μg/ml)
Molecular weight (Daltons)
a r I
3* tolC 1* -1 ’’)-I c * C -2 -1*
S
O
- A T - 3 l
D b o
S
-2 51 1 R C X A 2 - I 128 11 t primaryR train C
D
S
10 W 2 B ∆ C O t( ’ ) 25 B ∆ 10C
N O
32 8 32 am HN M NH
a n c (6 W am M library
Broad
EC aSA KP AB BPA∆b
S
B ∆b a
EC SA KP AB PA
N N+ N
O O- O
1 WuXi library
-3
10
ZINC15 pred
G H
-3 O O
O
Br
Br O
10 H N S 2
F
F
O
N+
9 O
O
0.01 0.25 4 0.25
128 10 0.06
9 false predict
Br Br N N
N
F
N N+
OH
O-

4 2 4 128
4 4
10 N
N NN F true-4predictio
-O N N
10
-48 N N N
8
0.1 10
N+ N+
N
N
N
O 10 O - OH
10
O OH N O
3 * S olC O-1* OAT O
S N +
- 1 - I a - c r 3 * lC 1 *113
S 7
10 511 B∆tN R C A (2’ ’ ) - I b 10
7
11 to -
R 25 C
S
X t ’ ) 5 ∆
N HN S O
32 8
Minimum 32
inhibitory W 2
am NM
C S O
O 32 a n Minimum
a
8
c ( 6 32
inhibitory W 2
am B M C
B W ba
32
+
O N N SH B 6 ∆b N
a B 6 ∆b ∆

CFU/ml
+

CFU/ml
N
10 O
10
concentration (μM) concentration (μM)
N O -
O O-
H2N S NH2
S
Br
Br
N+
O
G 10
5 Br
Br O
N+ H 10
5
G
O- 9
N
O- 9
10 4
N N+ 9
10 4 10
-O
O
Br Br+
N
S N N
N N
N N+
OH
32
4 32
2 32
4 16
4 4 10
time
-O
Br
= N0
Br
N
N
N

N
OH
4 2 4 4 10 4 time = 0 8
time = 4hrs10
-O N N 8 N + + N 8
N+ N+
N
N
O 10 3
10 time
O = OH
4hrs
N O
N O
10 3
10
O OH N O
O
7 BW25113 7
2
BW25113 10 7 28
F N+ 2
Compare GNN with other models
• Only GNN ranks Halicin among the top 100 compounds.

• Given our budget, Halicin would not be discovered by other models

Rank of
Model Feature
Halicin

Graph neural network Learned 61

Feed-forward neural network RDKit features (fixed) 273 Learned features are better than
hand-designed features
Feed-forward neural network Morgan fingerprint (fixed) 1217

Random forest Morgan fingerprint (fixed) 2640

Support vector machine Morgan fingerprint (fixed) 771

29
Part 2: infuse biological knowledge in GNNs
• Part 1: graph neural networks for antibiotic discovery
[ICML’17, NeurIPS’17, JCIM’19, Cell’20]

• Part 2: Incorporate biological knowledge into graph neural networks:


application to COVID-19 drug combination discovery
[PNAS (In submission)]

• Part 3: Generative models for de novo drug design


[ICML’18, ICLR’19, ICML’20a,b,c]

30
Motivation for biology-aware models
Representation

Biology- Biology-
Property
aware aware

• Existing property prediction models only


look at the chemical structure

• But properties may depend on additional


biological information

31
Case study: COVID-19 drug combinations
Mortality rate in a recent clinical trial
(Beigel et al., 2020)

Remdesivir 11.4% Still pretty high!

Placebo 15.2%
0 0.04 0.08 0.12 0.16

• Most HIV treatments are drug combinations

HIV • effect( , ) >> effect( ) + effect( )

• Can we find drug combinations for COVID?

32
Case study: COVID-19 drug combinations
• Two drugs are synergistic if effect( , ) >> effect( ) + effect( )

• Goal: Train a model to predict whether a drug combination is synergistic

• Challenge: training data is limited (less than 200 drug combinations), but
deep neural networks are very data hungry

+ + +
Big data Neural network Data Knowledge Neural network

33
Biological knowledge of viral replication
How can a drug block
COVID-19 infection?

1. Block viral entry by


inhibiting ACE2 or
TMPRSS2

2. Inhibit viral proteases:


3CLpro, PLpro, RdRp

3. Inhibit host targets that


interact with viral proteins
(Gordon et al., 2020)
Figure source: Cevik et al., BMJ 2020

34
ComboNet incorporates biology & chemistry
• Synergy comes from inhibition of certain biological targets (e.g., proteins)

• Model biological interaction ⇒ additional data ⇒ better generalization

Representation Biological representation


(to be learned)


Antiviral
activity


Chemical representation
(to be learned)

35
ComboNet learns drug-target interaction
1. Predict drug-target interaction — whether drug A inhibits target B

Compound Graph Representation


convolution


3CLpro
ACE2 Instead, predict whether a drug
…… Too sparse to
inhibits a biological target


HDAC2 use as features


Learned representation of


theinvolved
Targets molecular structure infection
in COVID-19

0.3 0.1 1 0.3 0.4 0.6 0.1 0 0.7 0

Compounds
Drug-target interaction data
0.9 0.2 0.3 0 0.9 1 0.8 0.4 0.1 0.7
(ChEMBL and NCATS) 0.1 0 0.5 0.1 1 0.1 0.9 0.4 1 0.3
0 0.3 0.2 0.7 0.1 0.2 0 0.8 0.1 0.5

36
ComboNet learns antiviral activity
2. Single-agent antiviral activity prediction

Compound Graph Representation Feed-forward


convolution network
3CLpro
ACE2
……


HDAC2 Antiviral
activity pA


Single-drug antiviral Drug Reserpine Remdesivir Penicillin Halicin
activity data (NCATS) Antiviral? Yes Yes No No

37
ComboNet learns antiviral synergy
Combination
3. Predict synergy for drug combinations synergy data

Single-drug (NCATS)
3CLpro antiviral activity
ACE2
……


HDAC2
pA


zAB
zA 3CLpro
Combination
Synergy
Compound A antiviral activity
ACE2 Feature representation of
drug combinationp sAB
……

bliss
HDAC2 (A,
ABB)

zAB = zA + zB − zA ⋅ zB
Compound B zB


3CLpro
ACE2
……

HDAC2 pB

38
ComboNet performance
• Training set (88 drug combinations); Test set (71 drug combinations)

ComboNet AUC is
0.8 on average

Remove chemical
or biological
ROC-AUC

information hurts

Standard models
cannot generalize

39
Discover new drug combinations
• Collaboration with National Center for Advancing Translational Science (NCATS)

• We experimentally tested top drug combinations in NCATS Vero E6 cell assays

• Further studying these combinations in human cell lines

Remdesivir + reserpine Remdesivir + IQ-1S

Drug Virus alive (%) Drug Virus alive (%)


Remdesivir 77.3% Remdesivir 81.7%
Reserpine 42.5% IQ-1S 65%
Combination 3.2% Combination 0%

40
Part 3: de novo drug design
• Part 1: graph neural networks for antibiotic discovery
[ICML’17, NeurIPS’17, JCIM’19, Cell’20]

• Part 2: Incorporate biological knowledge into graph neural networks:


application to COVID-19 drug combination discovery
[PNAS (In submission)]

• Part 3: Generative models for de novo drug design


[ICML’18, ICLR’19, ICML’20a,b,c]

41
Motivation for de novo drug design
• Deep learning can discover new antibiotics and COVID-19 drugs

• Simple approach: train a GNN to rank all the compounds in our library

• Reason: maximize the speed of experimental validation

60
• Problem: number of drug like molecules = 10 . We can’t rank all of them.

Compound
library Candidates
4 8
(10 − 10 )

42
Graph generation for de novo drug design
• Learn a distribution whose mass is concentrated around “good” molecules

• Let’s train a generative model to directly generate “good” molecules

60
• It can efficiently explore the entire chemical space (10 molecules)

Generate
How to generate
molecular graphs?

Generative model A good molecule

43
Previous solution 1: sequence-based methods
• Prior work used recurrent neural networks to generate molecular graphs

(Olivecrona et al., 2018; Gomez-bombarelli et al., 2018; Popova et al., 2018; …)

• Convert a molecule into a SMILES string (a domain specific language)


(Weininger, 1988)

Convert it into a SMILES string


Cc1cn2c(CN(C)C(=O)c3ccc(F)cc3C)c(C)nc2s1

Recurrent neural networks (RNNs)

Weininger, D. SMILES, a chemical language and information system. Journal of chemical information and computer sciences, 28(1):31–36, 1988. 44
Problems of sequence-based approach
• Prior work used sequence-based generative models for molecular graphs

(Olivecrona et al., 2018; Gomez-bombarelli et al., 2018; Popova et al., 2018; …)

• But this string representation is quite brittle…


O O
N N S
N S N
O N O N
F F
N N
Cc1cn2c(CN(C)C(=O)c3ccc(F)cc3C)c(C)nc2s1
N S N S
Cc1cc(F)ccc1C(=O)N(C)Cc1c(C)nc2scc(C)n12 N
Two almost N
F Quite different
F strings
identical graphs Cc1cn2c(CN(C)C(=O)c3ccc(F)cc3C)c(C)nc2s1
Cc1cc(F)ccc1C(=O)N(C)Cc1c(C)nc2scc(C)n12

45
Previous solution: node-by-node generation
• A straightforward approach: generate a graph node-by-node (Liu et al., 2018)

Add nodes
one by one

……

• Molecules are typically sparse: N nodes, O(N) edges

• However, it needs to make O(N) edge predictions in each step

2
• In total: O(N ) edge predictions

46
Failure of node-by-node generation
• Node-by-node generation via a variational auto encoder (VAE) (Liu et al., 2018)

• Diagnostic test: can the decoder reconstruct an input molecule?

Reconstruction accuracy
80

64
encode COVID-19 drug remdesivir

Accuracy
They should 48

be the same 32
decode 16

0
20 40 60 80 100
Molecule size (number of atoms)

47
We need to leverage inductive bias
Inductive
bias?
Sequence
Grid graph
Molecular graphs Dense
(text) (images) (low treewidth) graphs

Complexity

What’s up? All models leverage the inductive


bias of the structure.

48
Junction tree variational autoencoder
Motif Motifs are small due to low treewidth

Junction
tree Motif vocabulary

Tree decomposition
250K graphs ⇒ 638 motifs
99.9% coverage (new graphs)
Molecular
graph

Inspired by the junction tree algorithm in graphical models.


49
Details: hierarchical encoder & decoder
Hierarchical Encoder Hierarchical Decoder

encode decode

Molecular
representation

50
Hierarchical graph encoder
Motif vector

Run graph convolution in the


junction tree

Propagate
node vectors

Run graph convolution in the


molecular graph

Node vector

51
Hierarchical graph decoder
Motif vocabulary

Step 1: predict the next motif

Step 2: predict how to attach


this motif to the current graph

52
Hierarchical graph decoder
Motif-by-motif generation

Attach this motif


to this graph

53
Motif-by-motif versus node-by-node
• Training objective: minimize reconstruction loss

• Motif-by-motif generation is able to reconstruct large molecules!

Reconstruction accuracy
90 Motif-by-motif
generation
72
encode

Accuracy
They should 54

be the same 36
decode 18
Node-by-node
0 generation
20 40 60 80 100
Molecule size (number of atoms)

54
Results: molecular optimization
• Task: learn to modify a non-drug-like molecule into a drug-like molecule

• Drug-likeness is measured by QED scores (Bickerton et al., 2012)

Low QED score High QED score Drug-likeness optimization


80
76.9
73.6

Success rate
70

60
58.5
A local modification significantly
improves drug-likeness 50
Sequence node-by-node motif-by-motif

55
Part 3: de novo drug design
• Part 1: graph neural networks for antibiotic discovery
[ICML’17, NeurIPS’17, JCIM’19, Cell’20]

• Part 2: Incorporate biological knowledge into graph neural networks:


application to COVID-19 drug combination discovery
[PNAS (In submission)]

• Part 3: Generative models for de novo drug design


[ICML’18, ICLR’19, ICML’20a,b,c]

56
Deep learning for molecular sciences
Drug discovery
(e.g., de novo drug design)
Deep learning • Dahl et al., 2015;

• Stokes et al., 2020;


• Jin et al., 2018; ……

Chemistry
(e.g., reaction prediction)
• Duvenaud et al., 2015;

• Coley et al., 2019;


• Jin et al., 2017; ……

Material Science Biology


(e.g., polymer design) (e.g., protein folding)

• Gomez-bombarelli et al., 2018


• Rao et al., 2019;

• Xie et al., 2019 • Senior et al., 2020;


• Jin et al., 2020; …… • Jin et al., 2020; ……

57
Thanks to my collaborators

Regina Barzilay Tommi Jaakkola Klavs Jensen William Green Phillip A. Sharp James Collins Caroline Uhler

Rafael Gomez-Bombarelli Connor Coley Camille Bilodeau Peter Sorger Rachel Wu Jonathan Stokes Kyle Swanson

David Alvarez-Melis Guang-He Lee Allison Tam Nienke Moret Anne Fischer Kevin Yang Tao Lei 58
Thanks to my collaborators

59

You might also like