You are on page 1of 19

nature computational science

Article https://doi.org/10.1038/s43588-023-00511-5

Efficient and accurate large library ligand


docking with KarmaDock

Received: 20 March 2023 Xujun Zhang1,6, Odin Zhang1,6, Chao Shen1, Wanglin Qu1, Shicheng Chen1,
Hanqun Cao2, Yu Kang 1, Zhe Wang1, Ercheng Wang3, Jintu Zhang1,
Accepted: 8 August 2023
Yafeng Deng4, Furui Liu3, Tianyue Wang1, Hongyan Du1, Langcheng Wang5,
Published online: 21 September 2023 Peichen Pan 1 , Guangyong Chen 3 , Chang-Yu Hsieh 1 &
Tingjun Hou 1
Check for updates

Ligand docking is one of the core technologies in structure-based virtual


screening for drug discovery. However, conventional docking tools and
existing deep learning tools may suffer from limited performance in terms
of speed, pose quality and binding affinity accuracy. Here we propose
KarmaDock, a deep learning approach for ligand docking that integrates the
functions of docking acceleration, binding pose generation and correction,
and binding strength estimation. The three-stage model consists of the
following components: (1) encoders for the protein and ligand to learn the
representations of intramolecular interactions; (2) E(n) equivariant graph
neural networks with self-attention to update the ligand pose based on
both protein–ligand and intramolecular interactions, followed by post-
processing to ensure chemically plausible structures; (3) a mixture density
network for scoring the binding strength. KarmaDock was validated on four
benchmark datasets and tested in a real-world virtual screening project that
successfully identified experiment-validated active inhibitors of leukocyte
tyrosine kinase (LTK).

Ligand docking, one of the core tasks in structure-based virtual screen- of acceleration methods, such as QuickVina 2 (ref. 8) and AutoDock
ing (VS), plays a key role in protein–ligand (PL) binding pose generation, GPU9, as well as deep learning (DL) technologies for predicting binding
PL binding affinity prediction, PL binding pose selection and VS1. In affinity and generating binding poses.
general, traditional docking programs such as AutoDock 42, AutoDock Following the remarkable achievement of AlphaFold2 (ref. 10)
Vina3, LeDock4, Glide5 and GOLD6 utilize heuristic search algorithms in predicting protein structures at the atomic level from protein
to explore a range of possible ligand conformations and scoring func- sequences, an increasing number of researchers have started apply-
tions (SFs) with simplified terms for ligand pose selection and PL bind- ing DL algorithms to predict ligand conformations11–13 and PL binding
ing strength estimation. The simplification enhances their efficiency poses14–19. Unlike traditional ligand docking methods, predicting PL
in large-scale VS, but this comes at the cost of decreased accuracy. conformations with DL algorithms (especially graph neural networks,
With the increasing size of compound libraries (for example, ZINC20 GNNs) can accelerate the docking process and improve docking accu-
(ref. 7), with over 230 million purchasable compounds), the need for racy. Instead of searching and scoring, DL-based pose generation mod-
faster methods for ultra-large VS has led to the development of a range els (PGMs) generate binding poses through one of three methods: (1)

1
Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou,
Zhejiang, China. 2Department of Mathematics, Chinese University of Hong Kong, Hong Kong, China. 3Zhejiang Lab, Hangzhou, Zhejiang, China.
4
Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou, Zhejiang, China. 5Department of Pathology, New York University Medical Center, New York,
NY, USA. 6These authors contributed equally: Xujun Zhang, Odin Zhang. e-mail: panpeichen@zju.edu.cn; gychen@zhejianglab.com;
kimhsieh@zju.edu.cn; tingjunhou@zju.edu.cn

Nature Computational Science | Volume 3 | September 2023 | 789–804 789


Article https://doi.org/10.1038/s43588-023-00511-5

predicting PL distance matrices and generating PL binding poses using poses with a docking success rate of 89.1% and a speed of 0.017 s per
gradient decent (for example, TankBind17), (2) implementing E(n) complex, and reproducing most interaction modes. For pose selec-
equivariant graph neural network20 (EGNN) layers to predict the move- tion, KarmaDock achieves a success rate of 95.6% and is ranked second
ments and directions of ligand atoms in each message-passing iteration among all the methods. In terms of VS, where docking accuracy, speed
(for example, EquiBind15 and E3Bind14) and (3) utilizing denoising dif- and binding strength are all important, KarmaDock shows an aver-
fusion probabilistic models21 to predict the translation, rotation and age enrichment factor of 23.4 and 16.3 on the CASF 2016 and DEKOIS
torsion of the ligands, similar to traditional docking tools (for exam- version 2.0 datasets, respectively, which is higher than that achieved
ple, DiffDock19). Methods (1) and (3) are both capable of generating by the traditional docking tools. Furthermore, KarmaDock is able to
conformations with chemically plausible local structures, albeit with screen 1.77 million compounds in 8.4 h (57.1 days for AutoDock GPU) on
limited efficiency. Conversely, models based on an EGNN demonstrate a single Tesla V100, and successfully discovers an experiment-validated
superior speed, yet face challenges in producing conformations with LTK inhibitor with sub-micro activity. These results demonstrate the
chemically plausible bond lengths and bond angles. Although these superior performance of KarmaDock and show it to be a promising
DL models (except EquiBind) have achieved substantial improvement tool for ultra-large VS.
in docking pose generation, they are most effective in the scenario of
blind docking, where the binding site (or pocket) of the PL complex is Results
unknown. However, in VS practices, where the binding site is usually Docking accuracy and speed as a PGM
known or identified through experiments beforehand22,23, pocket-given In this section, five traditional docking tools (AutoDock GPU9, Glide SP5,
ligand docking is more frequently used, and traditional docking tools Glide XP30, GOLD6 and LeDock4) were chosen for comparison. GOLD
are mainly designed for this purpose. To address this problem, Lig- and LeDock were selected as they are considered to have the best dock-
Pose18 was introduced as a pocket-guided method utilizing EGNN with ing power among commercial and non-commercial docking tools,
self-attention to generate PL binding poses and predict their binding respectively, according to Wang’s31 assessment of ten classic docking
affinities. LigPose has achieved an impressively high docking success tools. Glide SP and XP were chosen for their robust and accurate dock-
rate and low scoring error with a max 265-time speedup compared to ing capabilities. AutoDock GPU was included for a fair comparison of
traditional tools. However, unlike traditional docking methods, the speed on a single graphics processing unit (GPU) between traditional
predicted ligand conformations may not conform to physical rules docking tools and PGMs. In addition, six PGMs (KarmaDock, LigPose,
for bond lengths and angles, as geometric constraints are not imposed TankBind, E3Bind, EquiBind and DiffDock) were included in the per-
during atom movement prediction. formance comparison across various test sets. It should be noted that
Despite the fact that docking algorithms can produce accept- TankBind, E3Bind, EquiBind and DiffDock were primarily designed for
able docking conformations, the accuracy of binding affinity predic- blind docking and were trained and evaluated using the dataset-split
tion (also known as the compound–protein interaction, CPI) is often method proposed by EquiBind. Accordingly, they cannot be directly
unsatisfactory, and this has led to the development of deep learning compared with KarmaDock. To ensure a fair comparison between Kar-
scoring functions (DLSFs) models using various DL algorithms. Among maDock and PGMs on the EquiBind-split dataset, TankBind was selected
DLSFs, RTMScore24 uses mixture density networks (MDNs) to learn as a representative model, and the true binding sites instead of the
the distance distribution of each PL node and determine the bind- P2Rank-predicted binding sites were used. The docking performance
ing strength as the sum of the probabilities of PL nodes, achieving was further evaluated under a pocket-given situation. Except for the
impressive screening power and docking power. Although DLSFs can pocket-guided TankBind and KarmaDock, the performance of the other
outperform traditional SFs in binding affinity prediction, their accu- PGMs was obtained from reports, as some models are not open-source
racy is largely determined by the quality of the ligand-binding poses. or have not released their training scripts.
In this Article we present a DL approach called KarmaDock for As shown in Fig. 2 and Supplementary Table 1, KarmaDock outper-
pocket-guided ligand docking (Fig. 1a) that can generate binding poses formed all traditional docking tools, with a minimum improvement
and predict binding strength with fast speed and high accuracy. The of 14.9%/22.3% in success rate on the PDBBind refined/core sets and a
KarmaDock framework implements GNN architectures consisting of speedup ranging from 163.06 to 8,182.70 times. When compared with
two encoders (graph transformer (GT)24 and geometric vector percep- PGMs, KarmaDock maintained its superiority in terms of speed, with an
tron (GVP)25), an MDN block for scoring and an EGNN block for docking increase ranging from 2.38 to 2,352.94 times. It should be noted that the
(Fig. 1b and Methods). The innovations of this approach are as follows: speed of LigPose could not be directly compared, as its source data are
(1) the protein is characterized based on residues rather than atoms so not publicly available and there is no information on its computational
as to encode geometric features and reduce computational cost; (2) speed. Based on the maximum speedup of 265 times compared with
the probability distribution of the minimum distance between each traditional tools (including Glide XP) reported by LigPose, KarmaDock
protein and ligand node learned by the MDN block can introduce a reached a maximum speedup of 8,182.70 times compared with Glide
distance inductive bias to the shared encoders, thereby helping to XP, suggesting that KarmaDock is faster than LigPose. As for the suc-
guide the learning of pose generation; (3) two encoders are designed cess rate evaluated on the PDBBind refined set, both KarmaDock and
to receive the distance inductive bias and to learn intramolecular LigPose were trained on the PDBBind general set by excluding the
interactions for proteins and ligands, respectively; (4) fully connected refined set. LigPose (74.7%) slightly outperformed KarmaDock (71.7%).
interaction graphs combined with a self-attention-based EGNN are Unlike LigPose and KarmaDock, which were trained on the PDB-
implemented to enable fast docking; and (5) two post-processing Bind general set by excluding the refined set, the KarmaDock Raw
methods are employed to ensure that the generated conformations model was trained on the PDBBind general set with more training
are rational in terms of bond length and angles. samples. Post-processing was then applied in KarmaDock FF and Kar-
KarmaDock was evaluated on four benchmark datasets for ligand maDock Aligned by FF and RDkit-conformation-alignment, respec-
docking (PDBBind version 2020 (ref. 26), APObind27, Comparative tively, to fix any implausible local structures in the poses generated by
Assessment of Scoring Functions 2016 (CASF 2016) (ref. 28) and DEKOIS KarmaDock Raw. The docking accuracy, as measured by success rate,
version 2.0 (ref. 29), for the tasks of pose generation, pose selection and on the PDBBind core set showed the following order: KarmaDock Raw
VS). It has also been applied to real-world VS based on the ChemDiv (89.1%) > KarmaDock FF (88.4%) > KarmaDock Aligned (82.5%) = Lig-
and Specs databases for identifying leukocyte tyrosine kinase (LTK) Pose (82.5%) > KarmaDock (76.8%). Although LigPose cannot be trained
inhibitors. For pose generation, KarmaDock outperforms all the tradi- on the PDBBind general set as the model is not open-sourced, the high
tional docking tools in terms of accuracy and speed, generating ligand success rate of KarmaDock Raw proved its power to generate accurate

Nature Computational Science | Volume 3 | September 2023 | 789–804 790


Article https://doi.org/10.1038/s43588-023-00511-5

Protein Ligand Before docking Docking After docking Complex

b Graph representation KarmaDock architecture Interaction graph construction


Coords

Node feature
Node 1
Recycle
GVP Node 2
x3
Edge feature
block Node ..

Node N

Updated MDN Score: EGNN


node feature block 110.1 block

Node feature Node 1

GT Node 2

Edge feature block Node .. Coords


Node N

Random
Coords
Transform

c Node i_(l+1)
Graph transformer (GT)
edge ij_(l+1)
d Geometric vector perceptrons (GVPs)

Sequence
Embedding
TPF…PAA: Scalar feature
Concat Wm+b σ
Node scalar
Concat
Wh2(SiLU) We2(SiLU) Node vector
ǁ•ǁ2 ǁ•ǁ2 σ
↑ ↓ ← →
↑ ↓
Wh1(norm) We1(norm) ↑ ↓ ← → GVP layer Vector feature
↑ ↓
↑ ↓ ← →
↑ ↓ ← → ↑ ↓ ← → ↑ ↓ ← → ↑ ↓ ← →
↑ ↓
Edge scalar GVP ↑ ↓ ← → Wh ↑ ↓ ← → Wµ ↑ ↓ ← → X ↑ ↓ ← →
module ↑ ↓ ← → ↑ ↓ ← → ↑ ↓ ← → ↑ ↓ ← →
Wh0 We0
↑ → ↓ → GVP layer ↓ →
↓ →
4 heads ↑ → ↓ →
Sum_j ↑ → ↓ →
↓ →

Edge vector GVP layer


Softmax_j ×
e Mixture density network (MDN)

× Coords Protein Ligand Feature Distance


π
Node 1 Node 1
Node 1
Node 1 Node 2
Node 2
Node 1 …
1
Node .. 2
Node 1 Node N σ 3
Node N
WQ WK WV WE Node 2 Node 1 Linear 4
5
Node 2 Node 2 layer 6
Node 1
7
Node 2 … µ 8
Node 2
… …
Norm Norm Node ..
9
Node N Node 1 10
Node N
Node N Node .. –40 –20 0 20 40 60
Aux.
Coords Node N Node N Distance between protein node and ligand node
Node i_l Node j_l Edge ij_l

f E(n) equivariant graph neural networks (EGNN) Node i_l

Node i_l 4 heads


WQ+bQ wij Graph
Concat Wg+bg
norm
Gate Node i_(l+1)
Node j_l WK+bk Kij ǁ•ǁ2 Softmaxj block
Node i_l’ Gate block
WV+bv αij
Coords i_l Leaky
Wij WX1+bX1 WX2+bX2
ReLU

Dij Linear Mij Wh+bh


Concat Coords i_l Coords i_(l+1)
Coords j_l block
Node i_(l+1)

Coords update block


Edge ij_l Coords i_(l+1) Scale ∆X
We+be Coords j_l
Edge ij_(l+1) ǁ•ǁ2
Coords update block

Fig. 1 | Workflow of KarmaDock. a, Visualization of the docking process of Softmax_ j represents the softmax operation on all neighborhood nodes j of node
KarmaDock (from left to right). b, KarmaDock: graph representation, interaction i; Sum_ j denotes the sum operation on all the neighborhood nodes j of node i. In
graph construction and architecture. c, Architecture of the first multi-head b, Norm represents batch normalization, and SiLU is a type of activate function.
attention layer of the graph transformer (GT). d, Architectures of the input layer In d, embedding represents the embedding layer implemented in PyTorch, σ
and the gvp layer of geometric vector perceptrons (GVPs). e, Architecture of the represents a typical activation function or None operation, which varies in the
mixture density network (MDN). f, Architecture of the first E(n) equivariant graph different layers, and arrows on the vector features represent their directions. In e,
neural network (EGNN) layer. Wx and bx denote the learnable parameters from π represents the mixing coefficients, μ and σ denote the means and s.d. values of
linear layers; ⊙ represents the Hadamard product; ⊕ denotes element-wise the distributions, respectively, Aux represents the predicted bond types and
summation; ⊖ denotes element-wise subtraction; × denotes the dot product; atom types of auxiliary tasks. In f, Linear Block calculates as in equation (46), and
‖ ⋅ ‖2 represents the L2 norm; Concat denotes the concatenation operation; Leaky ReLU is a type of activation function.

Nature Computational Science | Volume 3 | September 2023 | 789–804 791


Article https://doi.org/10.1038/s43588-023-00511-5

a 1.0 b 1.0

0.8 0.8

0.6 0.6
Proportion

Proportion
Tool
KarmaDock
0.4 0.4
KarmaDock raw
Tool KarmaDock FF
KarmaDock KarmaDock aligned
AutoDock GPU AutoDock GPU
0.2 Glide SP 0.2 Glide SP
Glide XP Glide XP
LeDock LeDock
GOLD GOLD
0 0
0 2 4 6 8 10 0 2 4 6 8 10
R.m.s.d. R.m.s.d.

c d 139.106
140
3
120

2
100
log10(time cost)

Time cost (s)


1 80

60
0

40

–1 23.798
20
7.938
2.772 3.988
–2 0.017
0
KarmaDock AutoDock GPU GOLD LeDock Glide SP Glide XP KarmaDock AutoDock GPU GOLD LeDock Glide SP Glide XP
Tool Tool

Fig. 2 | The accuracy and speed of KarmaDock. a,b, Plots of empirical (b). KarmaDock Raw/FF/Aligned were trained on the PDBBind general set, and
cumulative distribution functions, which describe the proportion of KarmaDock was trained on the PDBBind general by excluding the refined set. c,
observations falling below each unique r.m.s.d. value of the complexes docked Plot of log10 time cost distributions of the various models. d, Average time costs
by various models on the PDBBind refined set (a) and the PDBBind core set of the various models.

binding poses. As expected, although the post-processing improved Supplementary Fig. 1, we found that KarmaDock can consider metal
the rationality of the poses, it also decreased the success rate, which interactions, π interactions and salt bridges as the most important for
can be attributed to the fact that the force-field optimization ignores forming binding poses. We also wondered whether KarmaDock could
the PL interactions, and the occasionally occurring big error torsion be used to refine the binding poses generated by traditional tools
angles from the raw predicted binding poses may greatly deviate the (Supplementary Section 4). The docking poses of the complexes in
aligned RDkit conformations from the raw predicted binding poses as the PDBBind core set docked by five traditional docking tools were fed
well as from the ground truth. into KarmaDock to replace the randomly initialized conformation for
Acccording to evaluations following the dataset-split approach initialization. As shown in Supplementary Fig. 3, the refined binding
proposed by EquiBind, KarmaDock (56.2%) outperformed TankBind poses generated by KarmaDock have lower root-mean-square devia-
(24.2%) with given pockets, and DiffDock demonstrated state-of-the- tions (r.m.s.d.) and higher success rates compared with those docked
art blind-docking performance among TankBind, E3Bind, EquiBind by traditional docking programs. However, the success rate for poses
and DiffDock. Furthermore, a comparative analysis was conducted to after refinement is lower than that directly docked by KarmaDock, sug-
evaluate the performance of KarmaDock and conventional docking gesting that KarmaDock is best used as a standalone docking model.
tools on the APObind core set. This set mimics real-world scenarios
where only apo-protein structures are readily accessible. The data in Impact of ligand flexibility and dataset-split
Supplementary Table 2 show that KarmaDock surpasses all traditional It is widely recognized that the number of heavy atoms and rotatable
docking tools in terms of both accuracy and speed. Notably, it exhibits bonds in a ligand (that is, the ligand flexibility) greatly affect the speed
a minimum success rate augmentation of 56.7%. and success rate of traditional docking programs31. In this section we
Given that KarmaDock is capable of accurately generating PL explore the impact of these factors on the performance of both Karma-
binding poses, we sought to understand the interaction patterns that Dock and traditional docking tools. As KarmaDock models the docking
the model considers to be important in forming stable binding poses. process without considering bond constraints, the impact of rotatable
We quantified the interaction patterns, composed of eight common bonds can partially represent the impact of heavy-atom numbers. As
non-bond interactions (Supplementary Section 3) of the crystal PL shown in Fig. 3a,b, traditional docking tools took longer as the number
complexes and the binding conformations generated by KarmaDock of heavy atoms and rotatable bonds increased. Among the traditional
in the PDBBind core set, and calculated the overlap rate. As shown in tools, AutoDock GPU was the fastest, with time costs ranging from 0.139

Nature Computational Science | Volume 3 | September 2023 | 789–804 792


Article https://doi.org/10.1038/s43588-023-00511-5

a 4–6
b 1–2
20.0

7–9 3–4
10–12 5–6
17.5
13–15 7–8
16–18 9–10
19–21 11–12 15.0
22–24 13–14

Rotatable bond number range


25–27 15–16
28–30 17–18 12.5
Heavy atom range

31–33 19–20

Time cost (s)


34–36 21–22
37–39 23–24 10.0
40–42 25–26
43–45 27–28
46–48 29–30 7.5

49–51 31–32
52–54 33–34
55–57 35–36 5.0

58–60 37–38
61–63 39–40
2.5
64–66 41–42
67–69 43–44
70–72 47–48
0
KarmaDock AutoDock GPU GOLD LeDock Glide SP Glide XP KarmaDock AutoDock GPU GOLD LeDock Glide SP Glide XP

c 4–6
d 1.0
1–3
7–9
4–6
10–12
13–15 7–9
16–18 0.8
10–12
19–21
22–24 13–15

25–27
Rotatable bond number range 16–18
28–30
Heavy atom range

19–21 0.6
31–33

Success rate
34–36
22–24
37–39
25–27
40–42
43–45 28–30 0.4
46–48
31–33
49–51
52–54 34–36
55–57
37–39
58–60 0.2

61–63 40–42

64–66
43–45
67–69
70–72 46–48
0
KarmaDock Glide XP Glide SP GOLD LeDock AutoDock GPU KarmaDock Glide XP Glide SP GOLD LeDock AutoDock GPU

e 7–9
f 1.0

1–3
10–12

13–15 4–6

16–18 0.8

7–9
19–21
Rotatable bond number range

22–24
10–12
25–27
Heavy atom range

0.6

Success rate
28–30 13–15

31–33

34–36 16–18

0.4
37–39
19–21
40–42

43–45
22–24
46–48 0.2

49–51 34–36

64–66
37–39
67–69
0
KarmaDock Glide XP Glide SP GOLD LeDock AutoDock GPU KarmaDock Glide XP Glide SP GOLD LeDock AutoDock GPU
Tool Tool

Fig. 3 | The impacts of heavy atoms and rotatable bond numbers on docking (c) and rotatable bond number (d) on models’ docking accuracy, tested on the
speed and accuracy. a,b, Impact of heavy atoms (a) and rotatable bond number PDBBind refined set. e,f, Impact of heavy atoms (e) and rotatable bond number
(b) on models’ docking speed (seconds per complex). c,d, Impact of heavy atoms (f) on models’ docking accuracy, tested on the PDBBind core set.

to 157.111 s, while Glide XP was the slowest program, with time costs As illustrated in Fig. 3c–f, the performance of traditional docking
ranging from 9.456 to 956.257 s. As for KarmaDock, the flexibility of software on the PDBBind refined and core sets decreased as the number
ligands had a limited influence on docking speed, which was reasonable of heavy atoms and rotatable bonds in a molecule increased, which is
given the high efficiency of batch processing on GPUs. understandable, as the search space for ligand conformation expands

Nature Computational Science | Volume 3 | September 2023 | 789–804 793


Article https://doi.org/10.1038/s43588-023-00511-5

dramatically with more heavy atoms and rotatable bonds. Surprisingly, rate without post-processing, indicating that the distance encoding
KarmaDock demonstrated consistent docking accuracy, regardless and ligand–ligand (LL) interaction used in KarmaDock help generate
of the numbers of heavy atoms and rotatable bonds, suggesting its chemically plausible conformations. We also provide a visualization
insensitivity to the flexibility of ligands. This is in contrast to LigPose, of the ground-truth conformation, the directly generated conforma-
which has been reported to suffer with highly flexible molecules18. One tion, and the post-processed conformations (Fig. 4e). We can see that
possible explanation for this discrepancy is that the MDN module in the bond lengths of the directly predicted conformation are almost
KarmaDock introduces an inductive bias about the inter-node distance correct, but the bond angles are not, which can be further improved
distribution of PL without regard to ligand flexibility, which enables the by post-processing.
model to effectively dock highly flexible molecules into the pocket.
We also investigated the impacts of the different dataset-split Docking power and screening power as a DLSF
methods listed in Supplementary Table 3 (details are provided in Sup- As a ligand-docking tool, it is essential to not only generate binding
plementary Section 1) on model performance. MLSF_Split and EquiB- poses but also score the binding strength between the protein and
ind_Split split the dataset based on protein family similarity and ligand ligand based on the generated binding poses. KarmaDock utilizes
heavy-atom number, respectively, and LigPose_Split considers both the MDN scoring module, which has been shown to have satisfactory
factors. As shown in Supplementary Table 1, KarmaDock evaluated docking and screening power in RTMScore. Accordingly, we tested
using MLSF_Split (that is, KarmaDock Raw) achieved a 12.3% improve- KarmaDock as an SF on poses provided by the classic CASF 2016 data-
ment in success rate compared with KarmaDock trained on the Lig- set. Compared with RTMScore, which has already demonstrated the
Pose_Split training set and tested on the PDBBind core set, suggesting capabilities of MDN as a scoring function, we focused on evaluating the
that the larger the training set and the more similar the protein family performance in terms of docking power and screening power, which
between the training set and test sets, the more accurate the model are crucial features for VS. The results presented in Supplementary
is. However, when KarmaDock was evaluated using MLSF_Split and Data 1 indicate that KarmaDock exhibits strong performance in both
EquiBind_Split, the success rate dropped substantially from 89.1% to docking power and screening power, ranking first among all the tested
56.2%. Because KarmaDock is insensitive to the number of heavy atoms, ligand-docking tools. KarmaDock also outperformed the other res-
the decrease can be attributed to the lower protein family similarity coring methods except for RTMScore. The superior performance of
between the training and test sets. In summary, the size of the train- KarmaDock suggests that it could be highly effective in pose selection
ing set and the family similarity between the training and test sets and actives enrichment, thus demonstrating its potential as a promis-
can dramatically impact the success rate of KarmaDock, whereas the ing docking tool.
number of heavy atoms and rotatable bonds has a limited influence.
We therefore retained the MLSF_Split method to train KarmaDock in Screening power as a ligand-docking tool
the following studies. In a real-world VS, a ligand-docking tool is required to accurately predict
both the PL binding poses and their binding strengths, which cannot
Impact of post-processing on the rationality of the binding poses be achieved solely by DLSFs or PGMs without a scoring module. Kar-
A drawback of EGNN is that it ignores bond-angle and bond-length maDock, a DL model for ligand docking, has demonstrated exceptional
restrictions during the prediction of atom movements, which may performance as both a PGM and a SF. In this study we evaluated the
result in chemically implausible bond lengths and angles. To address VS performance of KarmaDock along with five traditional docking
this issue, three strategies (Supplementary Section 2) and two types of tools (Glide SP, GOLD, LeDock, Surflex32 and AutoDock Vina), a PGM
post-processing (Post-processing section) are employed in KarmaDock. with a scoring module (TankBind) and a sequence-based model (SBM,
Furthermore, a metric called the rational rate is proposed to measure TransformerCPI version 1) for screening power assessment. An intro-
the ability of models to generate chemically plausible conformations duction to SBMs is provided in Supplementary Section 5. The inclusion
by calculating the ratio of rational conformations among all the gen- of TransformerCPI in the evaluation aimed to evaluate whether Karma-
erated poses. A predicted conformation is considered illegal if the Dock can attain a comparable level of efficiency to that of SBMs, while
differences (errors) of bond length or angle between the predicted also surpassing SBMs in terms of screening power. The performance
and ground-truth conformation exceed a certain threshold, where the of these models was evaluated on DEKOIS 2.0, a classic benchmark
angle is indirectly measured by the distance between the end atoms usually used for testing the VS power of SFs. The proteins and ligands
of two-hop bonds. were first preprocessed using Schrödinger33, and then the binding
Figure 4a presents the distributions of the errors in the bonds poses were generated by Glide SP. The influence of binding poses on VS
and angles between the ground-truth conformations and the poses power was explored, and Fig. 5a,b shows that the binding poses gener-
generated by KarmaDock, KarmaDock FF and KarmaDock Aligned. ated by KarmaDock generally result in a higher Boltzmann-enhanced
The ‘max error’ records the maximum difference in bonds and angles discrimination of the receiver operating characteristic (BED_ROC)34
between a predicted pose and the ground truth, and the ‘mean error’ than those generated by Glide SP and refined by KarmaDock, while
computes the average difference. As expected, the maximum error the poses generated by Glide SP achieved the lowest BED_ROC. Con-
distribution is higher than the mean error distribution. The bond sidering that the SFs used in scoring the three groups of poses were
error distributions are lower compared with the angle distributions, the same, the screening power performance can reflect the binding
which is reasonable, as the angle information is hard to encode for the pose accuracy. Therefore, it can be concluded that using KarmaDock
model to learn. Among the models, the errors decrease in the order to refine the binding conformations generated by traditional tools can
KarmaDock > KarmaDock FF > KarmaDock Aligned, suggesting that improve the quality of poses; however, these are less accurate than
post-processing is effective and that RDkit-conformation alignment the binding poses generated directly by KarmaDock. From the per-
is better than FF optimization. This is understandable, because the spective of models, we can observe from Fig. 5c,d that the screening
RDkit conformation used in the alignment is 100% rational, whereas power rank is as follows: KarmaDock FF > KarmaDock Raw > KarmaDock
the FF-optimized conformation can occasionally be illegal if the opti- Aligned > Glide SP > Surflex > LeDock > GOLD ChemPLP > AutoDock
mization does not converge. Figure 4b describes the rational rate (also Vina > TankBind > TransformerCPI. Four conclusions can be drawn from
referred to as success rate) under various combinations of bond/angle, this rank: (1) the FF optimization can improve the screening power of
mean/max error, and three distance thresholds (1.0, 1.5 and 2.0 Å). The KarmaDock, while the RDkit-conformation alignment decreases the
conclusions in Fig. 4b–d are similar to those previously mentioned. screening power of KarmaDock; (2) KarmaDock outperformed all the
It is surprising to find that KarmaDock reaches a very high rational selected traditional docking tools and TankBind and TransformerCPI,

Nature Computational Science | Volume 3 | September 2023 | 789–804 794


Article https://doi.org/10.1038/s43588-023-00511-5

a b 1.02
2.00 KarmaDock aligned
KarmaDock FF
KarmaDock raw 1.000 1.000 1.000 1.000 1.000
1.75 1.00 0.998
0.992 0.993

0.984

Rational rate (error ≤ 1.0)


1.50 0.980 0.981
0.98
1.25
Error

0.96
1.00

0.75
0.94

0.50
0.92 0.915
KarmaDock aligned
0.25
KarmaDock FF
KarmaDock raw
0 0.90
Bond mean error Bond max error Angle mean error Angle max error Bond mean error Bond max error Angle mean error Angle max error
Error type Error type
c 1.02 d 1.02

1.000 0.999 1.000 1.000 1.000 1.000 0.999 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.999 1.000
1.00 0.995 1.00 0.996 0.996
0.993 0.993
0.989
0.986

Rational rate (error ≤ 2.0)


Rational rate (error ≤ 1.5)

0.98 0.98

0.963
0.96 0.96

0.94 0.94

0.92 0.92
KarmaDock aligned KarmaDock aligned
KarmaDock FF KarmaDock FF
KarmaDock raw KarmaDock raw
0.90 0.90
Bond mean error Bond max error Angle mean error Angle max error Bond mean error Bond max error Angle mean error Angle max error
Error type Error type

e
0.5 2.4
2.2
2.5

O
N
S
C

Raw FF Aligned Crystallized

Fig. 4 | Impact of post-processing on the rationality of the binding poses. a, angle) mean errors (or max errors) are no bigger than 1.0 (b), 1.5 (c) and 2.0 Å (d),
Distributions of bond length (angle) mean (max) errors between crystallized respectively, to the number of total conformations. e, An example (PDB 4JSZ) of
conformations and conformations generated by KarmaDock with various fixing a bond angle error by post-processing, where the bond angle is evaluated
post-processing (KarmaDock Raw, no post-processing; KarmaDock FF, indirectly by the two-hop distance, the yellow dashes denote the distance, and
conformations are post-processed by force-field optimization; KarmaDock the numbers around the yellow dashes are the distance value. ‘Raw’, ‘FF’ and
Aligned, conformations are post-processed by RDkit-conformation-alignment). ‘Aligned’ denote conformations generated by KarmaDock Raw/FF/Aligned,
b–d, Rational rates of conformations generated by KarmaDock Raw/FF/Aligned, respectively, and ‘Crystallized’ represents the ground-truth ligand-binding
defined by the ratio of the number of conformations whose bond length (or bond conformation with rational bond lengths and angles.

demonstrating its superiority and validity in VS; (3) TankBind and Experimental validation in a real-world VS
TransformerCPI ranked last, showing that models that predict binding In 2021, CLIP1-LTK fusion has been identified as an oncogenic driver
strength without considering binding conformations cannot achieve in non-small-cell lung cancer (NSCLC)35 and is regarded as a critical
good screening power; (4) KarmaDock outperformed TransformerCPI target for treating corresponding cancers. To demonstrate the practi-
in terms of screening power with similar speed, suggesting that Kar- cal significance of KarmaDock in VS, we performed a real-world VS on
maDock can replace the role of SBMs in ultra-large VS to rapidly filter the LTK target. As no experimentally determined three-dimensional
probably inactive compounds and retain other compounds for further (3D) structure is available in PDB, SWISS-MODEL36 was employed for
rescoring through accurate SFs or MM/PB(GB)SA. protein homology modeling and protein structure generation, and

Nature Computational Science | Volume 3 | September 2023 | 789–804 795


Article https://doi.org/10.1038/s43588-023-00511-5

a b Score only 0.443 KarmaDock aligned 16.780


1.0
Refined aligned 0.457 Refined raw 16.985

KarmaDock aligned 0.458 KarmaDock raw 16.989


0.8

Mode
Refined raw 0.491 KarmaDock FF 17.434

Refined FF 0.504 Refined FF 17.442


Metric value

0.6
KarmaDock raw 0.512 Refined aligned 17.637

KarmaDock FF 0.519 Score only 19.413


0.4
Mode
0 0.1 0.2 0.3 0.4 0.5 0 5 10 15 20
KarmaDock raw
KarmaDock FF BED_ROC EF_0.5%
0.2 KarmaDock aligned
Refined raw KarmaDock aligned 15.202 Score only 6.430
Refined FF
Refined aligned Refined aligned 15.437 KarmaDock aligned 7.281
0 Score only
KarmaDock raw 15.833 Refined aligned 7.540
BED_ROC ROC_AUC PR_AUC

Mode
Refined raw 15.849 Refined FF 8.589

Score only 15.986 Refined raw 8.613


Mode
KarmaDock raw Refined FF 16.286 KarmaDock raw 8.709
30 KarmaDock FF
KarmaDock aligned KarmaDock FF 16.294 KarmaDock FF 8.710
Refined raw
25 Refined FF 0 2.5 5.0 7.5 10.0 12.5 15.0 0 2 4 6 8
Refined aligned EF_1% EF_5%
Score only
Metric value

20
Score only 0.709 Score only 0.267

KarmaDock aligned 0.744 Refined aligned 0.299


15
Refined aligned 0.755 KarmaDock aligned 0.299

Mode
10 Refined raw 0.780 Refined raw 0.347

Refined FF 0.780 Refined FF 0.349


5
KarmaDock raw 0.782 KarmaDock raw 0.360

0 KarmaDock FF 0.783 KarmaDock FF 0.360

EF_0.5% EF_1% EF_5% 0 0.2 0.4 0.6 0.8 0 0.1 0.2 0.3

Metric ROC_AUC PR_AUC

c Tool
d TranformerCPI 0.065 TranformerCPI 1.736

1.0 KarmaDock FF TankBind 0.107 TankBind 2.831


KarmaDock raw
AutoDock vina 0.140 AutoDock vina 5.464
KarmaDock aligned
Glide@SP GOLD@ChemPLP 0.171 GOLD@ChemPLP 5.750
0.8 Surflex LeDock 0.186 LeDock 6.777
Tool

LeDock
Surflex 0.220 Surflex 8.365
GOLD@ChemPLP
Metric value

AutoDock vina Glide@SP 0.378 Glide@SP 13.628


0.6 TankBind KarmaDock aligned 0.458 KarmaDock aligned 16.780
TranformerCPI
KarmaDock raw 0.512 KarmaDock raw 16.989

KarmaDock FF 0.519 KarmaDock FF 17.434


0.4
0 0.1 0.2 0.3 0.4 0.5 0 5 10 15
BED_ROC EF_0.5%
0.2 TranformerCPI 1.789 TranformerCPI 1.555

TankBind 2.898 TankBind 2.364

AutoDock vina 4.513 AutoDock vina 2.824


0
GOLD@ChemPLP 5.364 GOLD@ChemPLP 3.348

LeDock 5.856 LeDock 3.647


Tool

BED_ROC ROC_AUC PR_AUC


Surflex 7.301 Surflex 4.000

40 Glide@SP 12.095 Glide@SP 6.260


Tool
KarmaDock FF KarmaDock aligned 15.202 KarmaDock aligned 7.281

35 KarmaDock raw KarmaDock raw 15.833 KarmaDock raw 8.709


KarmaDock aligned
Glide@SP KarmaDock FF 16.294 KarmaDock FF 8.710
30 Surflex 0 2.5 5.0 7.5 10.0 12.5 15.0 0 2 4 6 8
LeDock
GOLD@ChemPLP
EF_1% EF_5%
25
Metric value

AutoDock vina TranformerCPI 0.538 TranformerCPI 0.053


TankBind
20 TankBind 0.600 TankBind 0.082
TranformerCPI
AutoDock vina 0.633 AutoDock vina 0.083

15 GOLD@ChemPLP 0.647 GOLD@ChemPLP 0.097

LeDock 0.653 LeDock 0.112


Tool

10 Surflex 0.673 Surflex 0.128

Glide@SP 0.742 Glide@SP 0.237

5 KarmaDock aligned 0.744 KarmaDock aligned 0.299

KarmaDock raw 0.782 KarmaDock raw 0.360


0 KarmaDock FF 0.783 KarmaDock FF 0.360

EF_0.5% EF_1% EF_5% 0 0.2 0.4 0.6 0.8 0 0.1 0.2 0.3

Metric ROC_AUC PR_AUC

Fig. 5 | Screening power on DEKOIS. a,b, Distributions (a) and average values poses generated by Glide@SP are redocked by KarmaDock Raw/FF/Aligned.
(b) of six metrics (BED_ROC, ROC_AUC, PR_AUC, EF_0.5%, EF_1% and EF_5%; KarmaDock Raw/FF/Aligned denote the binding poses generated by KarmaDock
Methods) that KarmaDock achieves based on conformations generated under Raw/FF/Aligned. c,d, Distributions (c) and average values (d) of six metrics
various modes, respectively. ‘Score only’ means the binding conformations achieved by the various models (KarmaDock Raw/FF/Aligned, Glide@SP, Surflex,
are generated by Glide@SP. Refined Raw/FF/Aligned indicate that the binding LeDock, GOLD@ChemPLP, AutoDock Vina, TankBind and TransformerCPI).

the anaplastic lymphoma kinase (ALK) structure (4CLI) with high with those of the crystallized ALK PL binding conformation, and 25
sequence similarities to LTK was utilized as the template structure. ligands forming binding conformations with good interaction modes
The Specs and ChemDiv databases, which are widely used commercial were selected for experimental validation (VS workflow targeting LTK
compound libraries containing over 1.77 million molecules, were used section and Supplementary Table 4). Among the 25 tested compounds,
for screening. KarmaDock efficiently generated the binding poses and compound 23 (ChemDiv ID 8005-7327) showed a half-maximum inhibi-
scored binding strength in just 8.4 h. The ligands with top predicted tory concentration (IC50) value of 765.6 nM, indicating its potential as a
scores were further analyzed by comparing their interaction modes candidate for further drug design. The binding modes of the complexes

Nature Computational Science | Volume 3 | September 2023 | 789–804 796


Article https://doi.org/10.1038/s43588-023-00511-5

a b

ARG-1120

MET-1199
ASP-1270 ASN-648
MET-593
GLY-663
GLY-1269

GLU-1197 GLU-591

c
Hydrogen
bonds

HIS-518
Halogen
bonds

Aromatic
hydrogen
bonds

O C (crystal lorlatinib)

MET-593 N C (docked lorlatinib)

H C (compound 23)

C (protein)

d Inhibition of lorlatinib e Inhibition of drug 23


100 100

80 IC50 = 2.36 nM 80 IC50 = 765.6 nM

60 60
Inhibition (%)
Inhibition (%)

40 40

20 20

0 0
0.01 0.1 1 10 100 1,000 1 10 100 1,000 10,000 100,000
–20 Concentration (nM) –20 Concentration (nM)

Fig. 6 | VS with experimental validation targeting LTK. a, The binding and compound 23 (ChemDiv ID 8005-7327) generated by KarmaDock. d,e,
pose and interaction mode of the target ALK and compound lorlatinib (PDB Inhibition curves of lorlatinib (d) and compound 23 (e) (ChemDiv ID 8005-7327).
4CTB). b, Binding pose and interaction mode of the target LTK and compound The inhibition of Ba/F3-CLIP1-LTK cells is shown as the mean ± s.d. of three
lorlatinib generated by KarmaDock (the protein structure of LTK is generated independent experiments.
by AlphaFold2). c, Binding pose and interaction mode of the target LTK

consisting of the protein and compounds (lorlatinib and ChemDiv with GLU-1197, MET-1199 and ARG-1120, and halogen bonds with GLY-
ID 8005-7327) were visualized, along with similar crystallized ligand 1269 and ASP-1270. A similar interaction mode (Fig. 6b) was found in
compounds (Fig. 6a–c). Analysis of the binding conformation of ALK the binding of LTK to lorlatinib generated by KarmaDock (hydrogen
and lorlatinib (Fig. 6a) revealed that lorlatinib forms hydrogen bonds bonds with GLU-591 and MET-593, and halogen bonds with GLY-663 and

Nature Computational Science | Volume 3 | September 2023 | 789–804 797


Article https://doi.org/10.1038/s43588-023-00511-5

ASN-648), suggesting KarmaDock can generate accurate binding poses pocket center. The node embeddings and edge features are initialized
of PL based on interactions. As shown in Fig. 6c, the selected compound by a graph normalization and an MLP layer (equation (3)). The node
23 (ChemDiv ID 8005-7327) also formed a hydrogen bond with MET-593, embeddings, edge embeddings and ligand atom positions are then
similar to the former two complexes, which contributed to LTK inhibi- updated through the application of the EGNN block, which consists
tion and suggested that KarmaDock considered the hydrogen bonds of eight EGNN layers with self-attention, taking into account both PL
between LTK and compounds to be essential. Furthermore, an aromatic interactions and ligand–ligand interactions (equation (4)). Inspired
hydrogen bond existed between the aromatic H on the compound by AlphaFold2 and LigPose, a recycling strategy is employed to enable
and the O on HIS-518, further improving the binding strength. Further the EGNN block to learn how to consistently refine the binding poses.
design and optimization based on the structure of this compound could At the start of each recycling, the updated embeddings and the raw
be carried out to explore its potential as a drug candidate. embeddings will be well combined through a gate block equation
(equation (5)). After the EGNN block, two types of post-processing
Discussion can be implemented to ensure that the generated conformations are
In the present investigation we have introduced a DL model, coined rational in terms of bond length and angles:
KarmaDock, that is adept at generating binding conformations and
0, 0
predicting binding strength with remarkable accuracy and expeditious Hp, c = GraphNorm (Hc, s , Hp, s ) (3)
processing. Endeavors were made to implement post-processing tech-
niques, such as FF optimization and RDkit-conformation alignment, as r, l r, l r, l l
r, l−1 r, l−1 r, l−1
Hp, c , Ep, c , Xp, c = EGNN_Layer (Hp, c , Ep, c , Xp, c ) (4)
corrective measures for the inaccuracies in bond lengths and angles
detected in the predicted ligand conformations.
Notably, these post-processing approaches did enhance confor- r+1, 0
Fp, c
r, 8
= Gate_Block(Fp, 0, 0
c , Fp, c ) (5)
mation rationality, albeit at the cost of a decrease in docking accuracy,
presenting a challenging task in striking a balance between the two.
A pragmatic resolution appears to be a hierarchical screening of the where H and E represent the embeddings of a node and edge, respec-
compound library, where the screening process initially bypasses con- tively, F refers to embeddings in general, and p, c, r and l represent
formation rationality, followed by an FF optimization of the top-scoring protein, compound, the recycling index and the EGNN layer index,
complexes. Further experimentation with in-depth experimental vali- respectively.
dation can serve to confirm this solution and broaden the application Given the node embeddings and positions, the binding strength
scope of KarmaDock. can be predicted through the MDN block:
However, it is worth acknowledging the inherent limitations of
KarmaDock, which presents as a semi-flexible docking tool with a μp, c , σp, c , πp, c , dp, c = MDN_Block(Hc,1 s , Xc,1 s , Hp,
1 1
s , Xp, s ) (6)
blind eye towards protein structure variation. This attribute perhaps
explains the observable decrease in performance on APObind. Our P C
forward-looking strategy, therefore, is to incorporate protein structure Score = ∑ ∑ logP ((dp, c , | , hp , hc )) (7)
p=1 c=1
variability within the purview of KarmaDock, with an ultimate aim of
transforming it into a fully flexible docking tool.
where H and X represent the node embeddings and positions, respec-
Methods tively, p, c and s refer to protein node, compound node and scalar
Architecture overview feature, respectively, μp, c , σp, c , πp, c , dp, c represent the means, s.d. val-
As depicted in Fig. 1b, the ligand is characterized as a 2D molecular ues, mixing coefficients and node distances, respectively, and score
graph Gc, with atoms as nodes and covalent bonds as edges, which is represents the predicted docking score.
a common approach in GNN-based DLSFs. The protein is represented
as a 3D residue graph Gp, with residues as nodes and edges connect- Graph representation
ing the top 30 nearest-neighbor nodes, which captures long-range In the present study, ligands are characterized as undirected compound
interactions better and has a lower computational cost than a graph graphs [Gc = (Hc, Ec, Xc)], with atoms as nodes and covalent bonds as
with atoms as nodes. edges, using torch_geometric. The node features Hc and edge features
Instead of directly being input to downstream tasks, the ligand Ec are generated by RDkit and TorchDrug, as outlined in Supplementary
graphs and protein graphs are encoded by a GT24 and GVPs25 to learn Table 5, and the node positions Xc represent atom coordinates.
intramolecular interactions and update node embeddings: To predict PL binding poses based on non-bond interactions, it
is necessary to form edges between protein nodes and ligand nodes
H1c, s = GT (Hc,0 s , Ec,0 s ) (1) for the model to learn those interactions. However, fully connecting
protein nodes and ligand nodes can be computationally expensive,
1
Hp, 0 0 0 0 0 especially when considering the large number of protein atoms. To
s = GVP (Sp, s , Hp, s , Hp, v , Ep, s , Ep, v ) (2)
address this, we propose using residues as nodes in the protein graph,
which can substantially reduce the numbers of nodes and edges in
where H and E represent node and edge features, respectively, S repre- the graph. Furthermore, residue graphs can encode the geometric
sents the residue type, and subscripts c, p, s and v denote compound, information of each residue and capture long-range interactions
protein, scalar and vector, respectively. between both residue–residue pairs and PL nodes, which is beneficial
The node embeddings (H1c, s , H1p, s ) from the encoders and the node for binding-pose prediction. We constructed a K-nearest-neighbor
coordinates (X1c, s , X1p, s ) of both the ligand and protein are combined to (KNN) graph [Gp = (Hp, Ep, Xp)] with K equal to 30, where the nodes
form an interaction graph [Gp, c = (Hp, c, Ep, c, Xp, c)], which captures inter- are represented by residues and alpha carbon (CA) atom coordinates
molecular interactions at both the residue–atom and atom–atom are used as node positions Xp. As shown in Supplementary Table 6,
scales. the node features and edge features contain not only scalar features
In the process of binding pose generation, the ligand conforma- (Hp_s, Ep_s), but also vector features (Hp_v, Ep_v), which can be utilized
tion, represented as the coordinates of the ligand nodes, is initially by GVP for intramolecular topology and geometric representation
generated by RDkit and randomly rotated and shifted around the learning.

Nature Computational Science | Volume 3 | September 2023 | 789–804 798


Article https://doi.org/10.1038/s43588-023-00511-5

Graph transformer fv1l = Wv0


l
fvl (18)
As has been demonstrated to be effective for molecular graph repre-
sentation, GT implemented in RTMScore24 is used as the ligand encoder
l l l
for intramolecular interaction learning (Fig. 1c). The node feature fv2 = Wv1 fv1 (19)
hc_i ∈ Rdh ×1 for the ith node and edge feature ec_ij ∈ Rde ×1 for the edge
between node i and node j are first initialized to h0c_i and e0c_ij with d
Slv1 = ‖‖ fv1
l ‖
‖2 (row wise) (20)
dimensions by two linear layers, respectively:

h0c_i = Wh0 hc_i + b0h


Slv2 = ‖ fv2
l
‖ (row wise) (21)
2

e0c_ij = We0 ec_ij + b0e (8) l


fsv = Concat ( fsl , Slv1 ) (22)

where Wh0 ∈ Rd×dh, We0 ∈ Rd×de and b0h , b0e ∈ Rd . The initialized node and
f sl̂ = Wsv
l l
fsv + blsv (23)
edge embedding are then input to the graph transformer layer stacking
six times to output the final embeddings of nodes and edges. The lth
graph transformer layer updates node edge embeddings using message fsl+1 = σs (f sl̂ ) (24)
passing and a modified multi-head self-attention (MHA) mechanism,
as shown in the following equations:
fvl+1 = σv (Slv2 ) ⊙ fv2
l
(row wise multiplication) (25)
qk,
c_i
l
= WQk, l Norm (hlc_i ) (9)
l l l
where Wv0 ∈ Rdv1 ×dv0, Wv1 ∈ Rdv2 ×dv1, Wsv ∈ Rds1 ×(ds0 +dv1 ) are learnable param-
l
eters; fv1 ∈ R dv1 ×3 l
, fv2 ∈ Rdv2 ×3
, Sv1 ∈ R , Slv2 ∈ Rdv2, fsvl ∈ Rds0 +dv1, blsv ∈ Rds1 ,
l dv1
kc_k, jl = WKk, l Norm (hlc_ j ) (10) l
fŝ ∈ Rds1 , fv ∈ Rdv2 ×3 and fs ∈ Rds1 are results of equations; σs and σv
l+1 l+1

represent the activation functions. Before being input to gvp layers,


the sequence information is embedded by the Embedding layer and
vk, l
= WVk, l Norm (hlc_ j ) (11)
c_ j concated with other scalar node features:

hseq = Embedding (Sequence) (26)


ek, l
c_ij
= WEk, l Norm (elc_ij ) (12)

hs = Concat (hs0 , hseq ) (27)


k, l
qc_i ⋅ kc_k, jl
wk, l
c_ij
= Softmaxj∈N(i) (( k, l
) ⋅ ec_ij ) (13) where the dimension of the word table of the Embedding layer is
√dk
(dseq, dseq) and hs0 ∈ Rdhs0. Then, the node features and edge features are
input to the initialization block consisting of a LayerNorm and a gvp
l+1
̂ = hl + W l Dropout layer without activation functions, respectively:
hc_i c_i h0
(14)
k, l k, l (hs1 , hv1 ) = gvp (LayerNorm (hs , hv )) (28)
(Concatk∈1, …, H (Aggregation_Sumj∈N(i) (wc_ij vc_ j )))

(es1 , ev1 ) = gvp (LayerNorm (es , ev )) (29)


l+1
̂ = elij + We0
ec_ij l
Dropout (Concatk∈1, …, H (wk, l
c_ij
)) (15)
where hs ∈ Rdseq +dhs0 , hv ∈ Rdhv0 ×3 , hs1 ∈ Rdhs1 , hv1 ∈ Rdhv1 ×3 , es ∈ Rdes0 ,
ev ∈ Rdev0 ×3, es1 ∈ Rdes1 and ev1 ∈ Rdev1 ×3. After initialization, the node and
l+1 l+1
hl+1 ̂ + W l Dropout (SiLU (W l Norm (h ̂ )))
= hc_i (16) edge features are input to the GVPConv layer stacking twice, involving
c_i h2 h1 c_i
gvp layers in message passing. The equations of the GVPConv layer are
as follows:
l+1 ll+1 l l+1
ec_ij ̂ + We2
= ec_ij Dropout (SiLU (We1 ̂ )))
Norm (ec_ij (17)
mls_ij = Concat (hls_i , es_ij
l
, hls_ j ) (30)

where WQk, l , WVk, l , WEk, l ∈ Rdk ×d , and Wh0


WKk, l , l l
, We0 l
∈ Rd×d , and Wh1 ,
k, l
We1 2d×d
∈R l
, and Wh2, We2 l
∈ Rd×2d are learnable parameters from linear mlv_ij = Concat (hlv_i , ev_ij
l
, hlv_ j ) (31)
layers; k ∈ 1, …, H denotes the number of attention heads; dk is the
dimension of each head, which equals d divided by H; j ∈ N(i) represents
(mls_ij_1 , mlv_ij_1 ) = gvp (mls_ij , mlv_ij ) (32)
the neighboring nodes of node i; Norm denotes batch normalization;
Concat denotes the concatenation operation; Dropout denotes the
dropout operation; SiLU represents a type of activation functions; (mls_ij_2 , mlv_ij_2 ) = gvp (mls_ij_1 , mlv_ij_1 ) (33)
Aggregation_Sumj∈N(i) represents aggregating the messages on the edges
connecting node i and its neighboring nodes j by summation; and
(mls_ij_3 , mlv_ij_3 ) = gvp (mls_ij_2 , mlv_ij_2 ) (34)
Softmaxj∈N(i) denotes the SoftMax operation on neighboring nodes j.

Geometric vector perceptrons ̂ l , h ̂ l ) = Aggregation_Mean


(h s_ l l
i∈N( j ) (ms_ij_3 , mv_ij_3 ) (35)
j v_ j
A GVP is implemented in KarmaDock as the protein encoder to update
node embeddings based on topology connections and geometric
features inside and between resides. The basic block of a GVP is the gvp ̂l ) ,
(f s_l̂ j_0 , f v_l̂ j_0 ) = LayerNorm (hls_ j + Dropout (h s_ j
layer that received both scalar features fs ∈ Rd and vector features (36)
̂ l ))
hlv_ j + Dropout (h v_
fv ∈ Rd×3 (Fig. 1d). The forward process of the lth layer is as follows: j

Nature Computational Science | Volume 3 | September 2023 | 789–804 799


Article https://doi.org/10.1038/s43588-023-00511-5

(f s_l̂ j_1 , f v_l̂ j_1 ) = gvp (f s_l̂ j_0 , f v_l̂ j_0 ) (37) where h0, h1 and e1 ∈ Rdh, e0 ∈ Rde, We_init ∈ Rdh ×de and be_init ∈ Rdh are learn-
able parameters in a linear layer.
The basic block used for updating ligand coordinates consists of
(f s_l̂ j_2 , f v_l̂ j_2 ) = gvp (f s_l̂ j_1 , f v_l̂ j_1 ) (38) eight EGNN layers stacked sequentially (Fig. 1f). The message-passing
process of the lth EGNN layer is as follows:
k, l
(hl+1
s_i
, hl+1
v_ j
) = LayerNorm (hls_ j + Dropout (f s_l̂ j_2 ) , (qi )
k∈1, …, H
= WQl hli_1 + blQ (42)
(39)
hv_l j + Dropout (f v_l̂ j_2 ))
k, l
(kj ) = WKl hlj_1 + blK (43)
k∈1, …, H
where equations (32), (33) and (37) implement activation functions of
ReLU and Sigmoid for scalar features and vector features, respectively,
k, l
and other gvp layers using non activation functions; mls_ij ∈ R2dhs1 +des1 , (vj ) = WVl hj_1
l
+ blV (44)
k∈1, …, H
mlv_ij (2dhv1 +dev1 )×3 ,
∈R mls_ij_1 , mls_ij_2 , mls_ij_3 , ̂ l , f l̂
h s_ ̂
, f s_∗l and hl+1 ∈ Rdhs1 ,
j s_ j_ 0 j_ 2 s_i
̂ l , f l̂
mlv_ij_1 , mlv_ij_2 , mlv_ij_3 , h v_ , f v_l̂ j_2 and hl+1 ∈ Rdhv1 ×3 , f s_l̂ j_1 ∈ R4dhs1 ,
j v_ j_0 v_ j eijl = Concat (eij_1
l
, ‖‖xil − xjl ‖‖ ) (45)
2
f v_l̂ j_1 ∈R (2dhv1 )×3 ; and Aggregation_Meani∈N(j) represents averaging
the messages on the edges connecting node j and its neighboring
mlij = Wm2
l l
(LeakyReLU (Dropout (Wm1 elij + blm1 ))) + blm2 (46)
nodes i.

Interaction graph construction k, l


(kij ) = Concatkϵ1, …, H (kjk, l ) ⊙ mlij (47)
To predict the binding poses, we construct the interaction graph kϵ1, …, H

[Gp, c = (Hp, c , Ep, c , Xp, c )]based on the protein graph Gp and ligand graph
Gc, which can consider protein–protein interactions, LL interactions
wijk, l = (qik, l ⊙ kijk, l ) /√dk (48)
and PL interactions. Unlike common methods used in CPI models that
only connect ligand and protein nodes within a certain distance thresh-
old of the ligand atoms, we fully connect protein and ligand nodes
aijk, l = Softmaxj∈N(i) (‖‖wijk, l ‖‖ ) (49)
considering the global PL interactions for accurate binding pose gen- 2
eration due to the unknown PL binding poses. Hence, we add edges
connecting every PL node pair on the basis of the residue graph and ̂ l = Dropout
h i_1
molecular graph. Furthermore, to enable KarmaDock to learn suitable (50)
bond lengths and angles in the molecules, we connect every ligand k, l k, l
(Whl (Concatk∈1, …, H (Aggregation_Sumj∈N(i) (wij ⊙ vj ))) + blh )
node for LL-interaction learning and encode the actual distance
between atoms in every fragment constructed by splitting the ligand
based on rotatable bonds. The node features h0 ∈ Rdh0 of proteins and hl+1 ̂l )
= Gate_Block (hli_1 , h i_1 (51)
i_1
ligands are inherited from the output of the GT encoder and GVP
encoder, respectively. The edge features e0 ∈ Rde0 consist of one-hot
encoding for edge types (single bond, double bond, triple bond, aro-
l+1
eij_1 = Wel Concatk∈1, …, H (αijk, l ) + ble (52)
matic bond and non-bond interaction) and the distance between nodes
(actual distance for edges inside residue graphs and edges between
ligand nodes inside the same fragment, while other edges are encoded ⎛ ⎞
⎜ ⎟
as −1). Furthermore, the node coordinates x0 ∈ Rn×3 consist of protein xil+1 = Coords_Update_Block ⎜(wijk, l ) k ∈ 1, …, H, , xil , (xjl ) (53)
j∈N(i) ⎟
coordinates inherited from the protein graphs and ligand conforma- ⎜ ⎟
⎝ j ∈ N (i) ⎠
tions randomly rotated and translated around the pocket center.
where αk, l
∈ R1; hli_1, hlj_1, eij_1
l ̂ l , hl+1 and el+1 ∈ Rdh; q k, l , k k, l , k k, l ,
, eijl , mlij , h i_1
ij i_1 ij_1 i j ij
E(n) equivariant graph neural network block
EGNN is an E(n) equivariant model capable of handling dynamical vk,
j
l
and wijk, l ∈ Rdk ; WQl , WKl , WVl , Wm2
l
, Whl and Wel ∈ Rdh ×dh; Wm1
l
∈ Rdh ×(dh +1), blQ ,
systems with high speed. It plays the role of an FF, which meets our
blK , blV , blm1 , blm2 , blh and ble ∈ Rdh are learnable parameters from linear
requirement of both E(n) equivariance and fast processing speed. As
transformer architecture is popular for its outstanding performance layers; k ∈ 1, … , H denotes the number of attention heads; dk is the
on various tasks, merging self-attention and graph message passing dimension of each head, which equals dh divided by H; j ∈ N (i) repre-
(for example, RTMScore, LigPose and MedusaGraph) can substantially sents the neighboring nodes of node i; Concat denotes the concatena-
improve the representational ability of GNNs. Similar to LigPose and tion operation; Dropout denotes the dropout operation; LeakyReLU
MedusaGraph, self-attention is also involved in the message-passing is a type of activation function; Aggregation_Sumj∈N(i) represents sum-
process of an EGNN. To increase the accuracy and force for the EGNN ming the messages on the edges connecting node i and its neighboring
layer to predict atom movement based on current binding poses, the nodes j; and Softmaxj∈N(i)denotes the SoftMax operation on neighboring
recycling scheme inspired by AlphaFold2 is utilized, which increases nodes j.
the number of movements. Coords_Update_Block is responsible for updating the coordinates
The scalar node embeddings h0 of proteins and compounds of ligand nodes:
updated by GVP and GT, respectively, are first initialized by graph ⇀

normalization, and the edge features e0 are initialized by a linear layer: Δ xijl = xil − xjl (54)

h1 = GraphNorm (h0 ) (40)


⇀ ⇀
‖ ⇀‖
Δ xijl = Δ xijl /‖Δ xijl ‖ (55)
e1 = We_init e0 + be_init (41) ‖ ‖2

Nature Computational Science | Volume 3 | September 2023 | 789–804 800


Article https://doi.org/10.1038/s43588-023-00511-5


k, l
Δ xij = Δ xijl ⋅ (Wx2
l l
(LeakeyReLU (Dropout (Wx1 wk, l
+ blx1 ))) + blx2 ) (56)
the MMFF94 FF to optimize the predicted ligand conformations with
ij
a maximum of ten steps. However, because the minimization process
ignores the protein, it may destroy the binding poses predicted by
Δ xijl = WHl Concatk∈1, …, H (Δ xij )
k, l
(57) KarmaDock. Another method aligns the chemically plausible RDkit
conformations, which lack proper orientation and torsional angles,
to the predicted binding poses and replaces the predicted poses with
Δ xil = Aggregation_Sumj∈N(i) (Δ xijl ) (58) the aligned RDkit conformations. The torsional angles of ligands are
first defined and then assigned to the RDkit conformations. Finally, the
transformed RDkit conformations are aligned to the predicted binding
xil+1 = xil + Δ xil (59) conformations by the Kabsch algorithm.

k, l l 1×3
where xil , xjl , Δ xijl , Δ xij , Δ xi and xil+1 ∈ R ; Wx1l ∈ R(dk /2)×dk , Wx2
l
∈ R1×(dk /2), Dataset
WHl ∈ R1×(H), blx1 ∈ R
(dk /2) 1
and blx2 ∈ R . Most DLSFs and PGMs are trained on the PDBBind dataset26, which con-
sists of high-quality PL complexes and their binding affinities extracted
Gate_Block is a basic block for the residual connection used in from the Protein Data Bank (PDB)37. In this study, the newest PDBBind
EGNN layers and the beginning of each recycling. version 2020 with 19443 PL complexes is used for KarmaDock training
and benchmarking. Additionally, the APObind dataset27, a derivative
̂ , h , ĥ
g = Sigmoid (Dropout (Wg Concat (hnew old new − hold ) + bg )) (60) of the 2019 version of PDBBind, was utilized. Unlike its predecessor,
APObind features apo-protein structures in lieu of holo structures. It
̂ is important to clarify that only the core set of APObind, consisting of
hnew = GraphNorm(g ⊙ hnew + hold ) (61)
229 PL complexes, was used in the testing phase. The CASF 2016 bench-
mark, containing 285 diverse PL complexes, is a gold standard for evalu-
̂ , g and h ϵRdh; W ϵRdh ×dh and b ϵRdh are learnable param-
where hold, hnew ating SFs under four various tasks (scoring power test, docking power
new g g
eters from linear layers. test, ranking power test and screening power test). As KarmaDock is
generally a VS tool, we primarily focus on assessing the docking power
Mixture density network block and screening power of its SF (that is, the MDN block), while ignoring
The MDN block, which has been shown to be effective in binding pose the ranking power and scoring power. To further evaluate the screening
selection and actives enrichment, is used as the scoring module in power of KarmaDock, the DEKOIS 2.0 dataset, consisting of 81 targets
KarmaDock (Fig. 1e). The encoded node embeddings (hp , hc ) of proteins from various protein families with 40 active ligands and 1,200 decoys
and ligands by GVP and GT, respectively, are concated sequentially, per target, is used to simulate a VS test.
followed by a linear layer, a batch normalization layer, an activation Furthermore, the screening libraries of ChemDiv 2022 and Specs
function (exponential linear unit, ELU), and a Dropout layer for PL 2021, with 1,561,007 and 208,780 compounds, respectively, were
interaction capturing. Then, three linear layers are used to determine screened using KarmaDock to discover LTK inhibitors, followed by
a set of means μp, c, standard deviations σp, c and mixing coefficients experiments to further validate the results.
πp, c needed to parametrize a mixture density model encoding a mixture
distribution by n distance distributions for each PL node pair: Dataset preprocessing
All the protein files were downloaded from PDB and preprocessed by
hp, c = Dropout
(62) the Protein Preparation Wizard33 module in Schrödinger 2020, which
(ELU (BatchNorm (Wp, c Concat (hp , hc ) + bp, c ))) involves assigning bond orders and re-adding hydrogens. Next, the pro-
tein structures were repaired by creating necessary bonds to proximal
sulfurs, filling missing side trains and missing loops, and optimizing
μp, c = ELU (Wμ hp, c + bμ ) + 1 (63)
the hydrogen-bond network. Hydrogens were then minimized with
the OPLS3 FF38. PROPKA was used for generating protonation states of
σp, c = ELU (Wσ hp, c + bσ ) + 1.1 (64) residues at a pH of 7.0 and Epik was used to generate the ionized state
of het atoms39,40. After preprocessing, we selected protein residues
within 12 Å of the ligand atoms with the help of Prody as the binding
πp, c = Softmax (Wπ hp, c + bπ ) (65)
pocket, which were used as the input for KarmaDock instead of the
whole proteins. This selection allowed KarmaDock to focus on the
where Wp, c ∈ Rdp, c ×2dh, Wμ , Wσ , Wπ ∈ Rn×dp, c, bp, c ∈ Rdp, c, bμ, bσ and bπ ∈ Rn specific region of the protein that was likely to interact with the ligand
are learnable parameters of linear layers; hp and hc ∈ Rdh; Concat, Soft- and reduces the complexity of the input for the network.
max and Dropout denote the concatenation, softmax and dropout
operations, respectively. Training protocol
As we implement a residue-level coarse-grained representation The training processing was carefully designed. It is reasonable to
of proteins, the minimum distance between a specific ligand atom and assume that the distributions of the PL node pair distances fitted by
each residue atom is selected as the indicator of distance between PL the MDN block can help the model to the search space for binding poses
node pairs inspired by RTMScore. To help the model learn molecular and easily find the best binding conformations. To utilize the distribu-
structures more effectively, two auxiliary tasks are incorporated: pre- tions of the PL node pair distances for docking, we first trained Karma-
dicting atom type and bond type inside protein and ligand nodes based Dock as an SF (that is, the MDN block), using the node embeddings
on the previously learned ligand node representation (hp , hc ). updated by encoders and the ground-truth binding conformations as
inputs, with an MDN loss function LMDN and two cross-entropy loss
Post-processing functions (Latom , Lbond ) for two auxiliary tasks, respectively:
To ensure the validity of bond lengths and angles, FF optimization and
L = LMDN + 0.001 × Latom + 0.001 × Lbond (66)
RDkit-conformation alignment are studied in this Article. FF optimi-
zation can be easily implemented by an Application Programming
Interface (API) of RDkit named MMFFOptimizeMolecule, which uses LMDN = −logP ((dp, c , |, hp , hc ))

Nature Computational Science | Volume 3 | September 2023 | 789–804 801


Article https://doi.org/10.1038/s43588-023-00511-5

10 structure generation, where the ALK structure (PDB 4CLI) with high
= −log ∑ πp, c, n N ((dp, c , |, μp, c, n , σp, c, n )) (67) sequence similarity to LTK was utilized as the template structure. The
n=1
protein structure was prepared by the Protein Preparation Wizard33
module in Schrödinger 2020. The compounds from the ChemDiv and
where μp, c, n, σp, c, n and πp, c, n represent the mean, s.d. and mixing coef- Specs databases were collected for screening. Then, KarmaDock was
ficients of the nth distance distribution; hp, hc and dp, c represent the used for screening the compound library. The top 1,000 scored mol-
node embeddings of protein nodes, ligand nodes and the distance ecules were selected and subjected to drug-likeness filtering based
between PL nodes, respectively. The model was optimized using the on Lipinski’s rule of five. This rule, widely accepted in pharmaceutical
Adam optimizer with a batch size of 64, a learning rate of 1 × 10−3 and a research, helps evaluate which compounds have properties compatible
weight decay of 1 × 10−5. The training was stopped if the loss on the with good absorption and permeability in the human body. As a result,
validation set continuously increased for 70 epochs. After that, the GT we were left with a refined set of molecules with greater potential to be
encoder and GVP encoder are believed to have partially learned the active and drug-like inhibitors. Subsequently, the selected molecules
distance distributions. We then trained the docking module as well as underwent clustering based on their extended connectivity finger-
the trained scoring module using the same dataset split method as prints. The outcome of this step was 25 distinct clusters of structurally
used in the scoring module training process. The loss function of the and chemically related molecules. We visually analyzed each group
docking module was added to equation (68) with a weight of 1 based in terms of interaction modes between protein and ligands. From
on the calculated r.m.s.d. between predicted ligand conformations each cluster, we identified and selected the molecule with the most
and the ground-truth conformations. The training hyperparameters favorable interaction pattern, resulting in 25 top-tier candidates for
were the same as before except that the learning rate was set to 1 × 10−4 further evaluation. These chosen molecules were then purchased for
and the weight decay was canceled: biological activity testing using the MTT assay. Upon setting a threshold
of 5 μM for the IC50, an approximate hit rate of 24.0% (6 out of 25) was
L = Ldocking + LMDN + 0.001 × Latom + 0.001 × Lbond (68)
discerned. Subsequently, adopting a criterion of 10 μM for IC50, the hit
rate was found to be 72.0% (18 out of 25). The specific compound IDs,
√ N
√ pred
2 simplified molecular-input line-entry system (SMILES) and their IC50
√ ∑n=1 (xl, n − xlabel
l, n
) values are documented comprehensively in Supplementary Table 4.
pred
Ldocking = r.m.s.d. (xl , xllabel ) = (69)
√ N
MTT procedure
where N denotes the number of ligand nodes and n the index of the This study used the MTT (3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetra-
ligand nodes. zolium bromide) assay to gauge compound activity in the engineered
Ba/F3-CLIP1-LTK cell line that grows only on LTK activity. The experi-
Evaluation protocol ment began by preparing a cell suspension of the Ba/F3-CLIP1-LTK
KarmaDock is a versatile tool that can be used as an SF, a PGM and a cells using RPMI-1640 medium, a nutrient-rich medium commonly
ligand-docking tool. When used as an SF with given binding poses, Kar- used in cell culture that is devoid of interleukin-3. This suspension was
maDock can calculate MDN scores based on the protein and the ligand aliquoted into a 96-well plate, with each well receiving 90 μl, equating to
node embeddings from encoders, and the binding conformations. ~105 cells. Post 4 h of undisturbed incubation in a controlled cell-culture
When used as a PGM, the ligand conformation should be first gener- environment, the test compounds (whose concentrations were varied
ated by RDkit and then initialized by random rotating and translating in a gradient across wells) or the control compound (dimethylsulfoxide,
around the pocket center. The encoded node embeddings along with DMSO) were added in a volume of 10 μl. The cells were then allowed
pocket conformations and ligand conformations are then input to to grow for 72 h, then MTT reagent (at a concentration of 5 mg ml−1)
the EGNN block to generate binding poses. If used as a ligand-docking was subsequently added to each well and incubated for 4 h. The MTT
tool, KarmaDock first predicts the binding poses and then scores the was reduced by metabolically active cells to form insoluble formazan
binding poses using the MDN block. crystals. To solubilize these crystals, 100 μl of SDS-HCl-PBS tri-buffer
solution was added and the plate was left overnight. Absorbance values
Metrics at 570 nm, representing the amount of solubilized formazan and thus
Unlike DLSFs or PGMs, it is crucial for a docking tool to consider not only the cell metabolic activity, were recorded using a microplate reader.
one, but multiple criteria, including docking speed, binding pose qual- By fitting a dose–response curve, the IC50 value for the test compound
ity, and binding strength prediction accuracy. In this study we assessed could be obtained, providing a quantitative measure of the compound’s
KarmaDock by examining its performance as a PGM in terms of its dock- ability to inhibit CLIP1-LTK activity in cells.
ing speed and accuracy. The docking speed was simply measured by the
time it took to generate a single binding pose, while the docking accuracy Other protocol
was measured by the success rate of generating binding conformations KarmaDock was trained using eight NVIDIA A100-SXM4-80GB GPUs
with an r.m.s.d. of less than 2 Å from the ground-truth conformations. and 64 core Intel(R) Xeon(R) platinum 8358P CPUs @ 2.60 GHz. All
Furthermore, the performance of KarmaDock as an SF was also evaluated. the traditional docking tools were evaluated using 48 core Intel(R)
In evaluating the performance of SFs, we prioritized the screening power Xeon(R) Gold 6240R CPUs @ 2.40 GHz, and the DL models were tested
and docking power over scoring power. Here, the docking power was cal- on a single Tesla V100S-PCIE-32GB GPU. The r.m.s.d. values between
culated as the success rate and the screening power was mainly measured generated binding poses and crystallized binding poses were cal-
by BEDROC (α = 80.5) (ref. 34) and enrichment factor calculated from culated by the obrms module of OpenBabel. Recognition of eight
the top x% samples (EF_x%, x = 0.5, 1, 5). Also, two metrics evaluating common non-bond interactions, including acceptor–metal, π–metal,
the ranking ability of DL models, the area under the receiver operating π–cation, π–stacking, salt bridges, halogen bonds, hydrogen bonds and
characteristic curve (ROC_AUC)34 and the area under the precision-recall hydrophobic contacts, was implemented through the APIs provided
curve (PRAUC), are also included in this study as axillary metrics in VS. by oddt41. For traditional ligand docking, the origination of the initial
ligand conformations was executed with RDkit, as opposed to leverag-
VS workflow targeting LTK ing the crystal conformation. For the AutoDock GPU, the grid size was
As no experimentally determined 3D structure is available in the PDB, (60 Å, 60 Å, 60 Å). All other docking parameters were maintained at
SWISS-MODEL36 was used for protein homology modeling and protein their default settings.

Nature Computational Science | Volume 3 | September 2023 | 789–804 802


Article https://doi.org/10.1038/s43588-023-00511-5

Statistics and reproducibility 12. Jing, B., Corso, G., Chang, J., Barzilay, R. & Jaakkola, T. Torsional
We trained and evaluated KarmaDock following the convention of diffusion for molecular conformer generation. in Advances in
molecular docking studies, where three widely used dataset split Neural Information Processing Systems (eds. Koyejo, S. et al.) Vol.
strategies were used and four classic benchmarks were involved in 35, 24240–24253 (Curran Associates, Inc., 2022).
the assessment. The sample size was near to the sample size in the 13. Xu, M. et al. GeoDiff: a geometric diffusion model for molecular
benchmarks, and some samples (compounds) were dropped as RDkit conformation generation. in International Conference on Learning
failed to process them. During the training process, the training set Representations (2022).
and validation set were randomly selected with a random seed of 42. 14. Zhang, Y., Cai, H., Shi, C. & Tang, J. E3Bind: an end-to-end
The README file in the GitHub repository provided comprehensive equivariant network for protein-ligand docking. in International
instructions for reproduction. Conference on Learning Representations (2023).
15. Stärk, H., Ganea, O., Pattanaik, L., Barzilay, D. R. & Jaakkola, T.
Reporting summary EquiBind: geometric deep learning for drug binding structure
Further information on research design is available in the Nature Port- prediction. in Proceedings of the 39th International Conference on
folio Reporting Summary linked to this Article. Machine Learning 20503–20521 (PMLR, 2022).
16. Qiao, Z., Nie, W., Vahdat, A., Miller III, T. F. & Anandkumar, A.
Data availability State-specific protein-ligand complex structure prediction with
The raw datasets26–29 are available at http://pdbbind.org.cn/index.php, a multi-scale deep generative model. Preprint at arXiv https://doi.
https://github.com/devalab/Apobind and http://www.pharmchem. org/10.48550/arXiv.2209.15171 (2023).
uni-tuebingen.de/dekois/data/DEKOIS2.0_library/DEKOIS2.0_library. 17. Lu, W. et al. TANKBind: trigonometry-aware neural networKs for
rar. The prepared datasets42–44 are available at https://zenodo.org/ drug-protein binding structure prediction. in Advances in
record/7788083, https://zenodo.org/record/8211452 and https:// Neural Information Processing Systems Vol. 35, 7236–7249
zenodo.org/record/8131256. PDB IDs 1S38, 1SQA, 4JXS, 1PS3, 3DXG, (2022).
3D4Z, 4CLI, 4JSZ and 4CTB are available in the Protein Data Bank 18. Junfeng, Z., Kelei, H., Tiejun, D. & Wu, J. Accurate protein-ligand
(https://www.rcsb.org/)37. Source data are available with this paper. complex structure prediction using geometric deep learning. Res.
Square https://doi.org/10.21203/rs.3.rs-1454132/v1 (2022).
Code availability 19. Corso, G., Stärk, H., Jing, B., Barzilay, R. & Jaakkola, T. S. DiffDock:
The source code is available at Zenodo (https://zenodo.org/ diffusion steps, twists, and turns for molecular docking.
record/8211513)45 and GitHub (https://github.com/schrojunzhang/ Preprint at arXiv https://doi.org/10.48550/arXiv.2210.01776
KarmaDock). (2023).
20. Satorras, V. G., Hoogeboom, E. & Welling, M. E(n) equivariant
References graph neural networks. in Proceedings of the 38th International
1. Shen, C. et al. From machine learning to deep learning: advances Conference on Machine Learning (eds. Meila, M. & Zhang, T.) Vol.
in scoring functions for protein-ligand docking. WIREs Comput. 139, 9323–9332 (PMLR, 2021).
Mol. Sci. 10, e1429 (2020). 21. Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic
2. Morris, G. M. et al. AutoDock4 and AutoDockTools4: automated models. In Advances in Neural Information Processing Systems
docking with selective receptor flexibility. J. Comput. Chem. 30, (eds Larochelle, H. et al.) Vol. 33, 6840–6851 (Curran Associates,
2785–2791 (2009). 2020).
3. Eberhardt, J., Santos-Martins, D., Tillack, A. F. & Forli, S. 22. Hu, X. et al. Discovery of novel non-steroidal selective
AutoDock Vina 1.2.0: new docking methods, expanded force field glucocorticoid receptor modulators by structure- and IGN-
and Python bindings. J. Chem. Inf. Model. 61, 3891–3898 based virtual screening, structural optimization and biological
(2021). evaluation. Eur. J. Med. Chem. 237, 114382 (2022).
4. Zhao, H. & Caflisch, A. Discovery of ZAP70 inhibitors by high- 23. Hu, X. et al. Discovery of novel GR ligands toward druggable
throughput docking into a conformation of its kinase domain GR antagonist conformations identified by MD simulations and
generated by molecular dynamics. Bioorg. Med. Chem. Lett. 23, Markov state model analysis. Adv. Sci. 9, 2102435 (2022).
5721–5726 (2013). 24. Shen, C. et al. Boosting protein-ligand binding pose prediction
5. Friesner, R. A. et al. Glide: a new approach for rapid, accurate and virtual screening based on residue-atom distance likelihood
docking and scoring. 1. Method and assessment of docking potential and graph transformer. J. Med. Chem. 65, 10691–10706
accuracy. J. Med. Chem. 47, 1739–1749 (2004). (2022).
6. Jones, G., Willett, P., Glen, R. C., Leach, A. R. & Taylor, R. 25. Jing, B., Eismann, S., Suriana, P., Townshend, R. J. L. & Dror,
Development and validation of a genetic algorithm for flexible R. Learning from protein structure with geometric vector
docking. J. Mol. Biol. 267, 727–748 (1997). perceptrons. in International Conference on Learning
7. Irwin, J. J. et al. ZINC20—a free ultralarge-scale chemical Representations (2021).
database for ligand discovery. J. Chem. Inf. Model. 60, 6065–6073 26. Liu, Z. et al. Forging the basis for developing protein-ligand
(2020). interaction scoring functions. Acc. Chem. Res. 50, 302–309
8. Alhossary, A., Handoko, S. D., Mu, Y. & Kwoh, C.-K. Fast, accurate (2017).
and reliable molecular docking with QuickVina 2. Bioinformatics 27. Aggarwal, R., Gupta, A. & Priyakumar, U. D. APObind: a dataset
31, 2214–2216 (2015). of ligand unbound protein conformations for machine learning
9. Santos-Martins, D. et al. Accelerating AutoDock4 with GPUs applications in de novo drug design. Preprint at arXiv https://doi.
and gradient-based local search. J. Chem. Theory Comput. 17, org/10.48550/arXiv.2108.09926 (2021).
1060–1073 (2021). 28. Su, M. et al. Comparative assessment of scoring functions: the
10. Jumper, J. et al. Highly accurate protein structure prediction with CASF-2016 update. J. Chem. Inf. Model. 59, 895–913 (2019).
AlphaFold. Nature 596, 583–589 (2021). 29. Bauer, M. R., Ibrahim, T. M., Vogel, S. M. & Boeckler, F. M.
11. Zhang, H. et al. SDEGen: learning to evolve molecular Evaluation and optimization of virtual screening workflows with
conformations from thermodynamic noise for conformation DEKOIS 2.0—a public library of challenging docking benchmark
generation. Chem. Sci 14, 1557–1568 (2023). sets. J. Chem. Inf. Model. 53, 1447–1462 (2013).

Nature Computational Science | Volume 3 | September 2023 | 789–804 803


Article https://doi.org/10.1038/s43588-023-00511-5

30. Friesner, R. A. et al. Extra precision glide: docking and Acknowledgements


scoring incorporating a model of hydrophobic enclosure This work was financially supported by the National Key Research
for protein-ligand complexes. J. Med. Chem. 49, 6177–6196 and Development Program of China (2022YFF1203000), the National
(2006). Natural Science Foundation of China (22220102001, 82204279 and
31. Wang, Z. et al. Comprehensive evaluation of ten docking 22007082), the Natural Science Foundation of Zhejiang Province
programs on a diverse set of protein-ligand complexes: the (LD22H300001 and LQ21B030013) and Fundamental Research Funds
prediction accuracy of sampling power and scoring power. Phys. for the Central Universities (226-2022-00220). We also thank L. Xu
Chem. Chem. Phys. 18, 12964–12975 (2016). at Jiangsu University of Technology for preparing all the compounds
32. Jain, A. N. Surflex-Dock 2.1: robust performance from ligand used in this study based on the Glide module in Schrödinger software,
energetic modeling, ring flexibility and knowledge-based search. which substantially contributed to our research.
J. Comput. Aided Mol. Des. 21, 281–306 (2007).
33. Madhavi Sastry, G., Adzhigirey, M., Day, T., Annabhimoju, R. Author contributions
& Sherman, W. Protein and ligand preparation: parameters, X.Z., O.Z. and C.S. developed this method, analyzed the data and
protocols and influence on virtual screening enrichments. wrote the manuscript. W.Q. and S.C. bought the compounds and
J. Comput. Aided Mol. Des. 27, 221–234 (2013). measured their IC50 values. H.C., Y.K., Z.W., E.W., J.Z., Y.D., F.L., T.W.,
34. Truchon, J.-F. & Bayly, C. I. Evaluating virtual screening methods: H.D. and L.W. evaluated and interpreted the results and wrote the
good and bad metrics for the ‘early recognition’ problem. manuscript. P.P., G.C., C.-Y.H. and T.H. conceived and supervised the
J. Chem. Inf. Model. 47, 488–508 (2007). project, interpreted the results and wrote the manuscript. All authors
35. Izumi, H. et al. The CLIP1-LTK fusion is an oncogenic driver in non- read and approved the final manuscript.
small-cell lung cancer. Nature 600, 319–323 (2021).
36. Waterhouse, A. et al. SWISS-MODEL: homology modelling of Competing interests
protein structures and complexes. Nucleic Acids Res. 46, W296– The authors declare no competing interests.
W303 (2018).
37. Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, Additional information
235–242 (2000). Supplementary information The online version contains supplementary
38. Kaminski, G. A., Friesner, R. A., Tirado-Rives, J. & Jorgensen, W. material available at https://doi.org/10.1038/s43588-023-00511-5.
L. Evaluation and reparametrization of the OPLS-AA force field
for proteins via comparison with accurate quantum chemical Correspondence and requests for materials should be addressed to
calculations on peptides. J. Phys. Chem. B 105, 6474–6487 Peichen Pan, Guangyong Chen, Chang-Yu Hsieh or Tingjun Hou.
(2001).
39. Olsson, M. H. M., Søndergaard, C. R., Rostkowski, M. & Jensen, Peer review information Nature Computational Science thanks
J. H. PROPKA3: consistent treatment of internal and surface Matthew Holcomb and Shina Kamerlin for their contribution to the
residues in empirical pKa predictions. J. Chem. Theory Comput. 7, peer review of this work. Primary Handling Editor: Kaitlin McCardle, in
525–537 (2011). collaboration with the Nature Computational Science team.
40. Shelley, J. C. et al. Epik: a software program for pK(a) prediction
and protonation state generation for drug-like molecules. J. Reprints and permissions information is available at
Comput. Aided Mol. Des. 21, 681–691 (2007). www.nature.com/reprints.
41. Wójcikowski, M., Zielenkiewicz, P. & Siedlecki, P. Open Drug
Discovery Toolkit (ODDT): a new open-source player in the drug Publisher’s note Springer Nature remains neutral with regard to
discovery field. J. Cheminform. 7, 26 (2015). jurisdictional claims in published maps and institutional affiliations.
42. Zhang, X. J. APObind core set for KarmaDock (229 protein-ligand
complexes) Zenodo https://doi.org/10.5281/zenodo.8211452 Springer Nature or its licensor (e.g. a society or other partner) holds
(2023). exclusive rights to this article under a publishing agreement with
43. Zhang, X. J. DEKOIS2.0 for KarmaDock Zenodo https://doi. the author(s) or other rightsholder(s); author self-archiving of the
org/10.5281/zenodo.8131256 (2023). accepted manuscript version of this article is solely governed by the
44. Zhang, X. J. KarmaDock_PDBBind2020_coreset (1.0) Zenodo terms of such publishing agreement and applicable law.
https://doi.org/10.5281/zenodo.7788083 (2023).
45. Zhang, X. J. schrojunzhang/KarmaDock: v1.0.0 Zenodo https://doi. © The Author(s), under exclusive licence to Springer Nature America,
org/10.5281/zenodo.8211513 (2023). Inc. 2023

Nature Computational Science | Volume 3 | September 2023 | 789–804 804

You might also like