Professional Documents
Culture Documents
1 s2.0 S1476927121001675 Main
1 s2.0 S1476927121001675 Main
A R T I C L E I N F O A B S T R A C T
Keywords: Dipeptidyl peptidase-4 (DPP-4) inhibitors are becoming an essential drug in the treatment of type 2 diabetes
Artificial intelligence mellitus; however, some classes of these drugs exert side effects, including joint pain and pancreatitis. Studies
DPP-4 suggest that these side effects might be related to secondary inhibition of DPP-8 and DPP-9. In this study, we
KNIME
identified DPP-4-inhibitor hit compounds selective against DPP-8 and DPP-9. We built a virtual screening
Machine learning
QSAR
workflow using a quantitative structure–activity relationship (QSAR) strategy based on artificial intelligence to
Virtual screening allow faster screening of millions of molecules for the DPP-4 target relative to other screening methods. Five
regression machine learning algorithms and four classification machine learning algorithms were applied to
build virtual screening workflows, with the QSAR model applied using support vector regression (R2pred 0.78)
and the classification QSAR model using the random forest algorithm with 92.2% accuracy. Virtual screening
results of > 10 million molecules obtained 2 716 hits compounds with a pIC50 value of > 7.5. Additionally,
molecular docking results of several potential hit compounds for DPP-4, DPP-8, and DPP-9 identified CH0002 as
showing high inhibitory potential against DPP-4 and low inhibitory potential for DPP-8 and DPP-9 enzymes.
These results demonstrated the effectiveness of this technique for identifying DPP-4-inhibitor hit compounds
selective for DPP-4 and against DPP-8 and DPP-9 and suggest its potential efficacy for applications to discover hit
compounds of other targets.
1. Introduction pancreatitis, and are related to the secondary inhibition of enzymes with
high-sequence homology to DPP-4 (e.g., DPP-8 and DPP-9) (Huan et al.,
Dipeptidyl peptidase-4 (DPP-4) (EC 3.4.14.5) inhibitors are impor 2015; Patel and Ghate, 2014). Therefore, there is a need to develop
tant oral antidiabetic drugs for treating type 2 diabetes (T2DM). Sita novel DPP-4 inhibitors selective against DPP-8 and DPP-9 enzymes.
gliptin was reported in 2006 as the first DPP-4 inhibitor agent, and since Novel DPP-4 inhibitors can be developed through high-throughput
then, this class of drugs has increasingly shifted the role of sulfonylurea screening, which is generally performed by pharmaceutical com
in T2DM treatment according to national and international guidelines. panies. An alternative is the computer-aided drug design through virtual
The drugs work differently from most other antidiabetic drugs. DPP-4 screening (Hughes et al., 2011; Pei et al., 2020; Shamsara, 2019; Wang
inhibition stimulates the pancreas to produce and release insulin while et al., 2019) of large databases containing millions of compounds, such
reducing or normalizing body weight without causing hypoglycemia as ChEMBL (Gaulton et al., 2012) and PubChem (Kim et al., 2016). A
(Alam et al., 2018; Chylewska et al., 2018; Gallwitz, 2019; virtual screening method developed using a quantitative structur
Popovic-Djordjevic et al., 2018; Sesti et al., 2019). e–activity relationship (QSAR) strategy (regression or classification)
Some commercially available DPP-4 inhibitors are well-tolerated, capable of predicting the selectivity of a molecule for DPP-4 can reveal
whereas others have side effects, ranging from joint pain to the relationship between molecular structures represented by
Abbreviations: DL, Deep Learning; XGBoost, XGBoost Tree Ensemble; RF, Random Forest; MLR, Multiple Linear Regression; SVR, Support Vector Regression; SVM,
Support Vector Machine.
* Correspondence to: Faculty of Pharmacy, Universitas Indonesia, Depok 16424, Indonesia.
E-mail address: arry.yanuar@ui.ac.id (A. Yanuar).
https://doi.org/10.1016/j.compbiolchem.2021.107597
Received 17 April 2021; Received in revised form 25 October 2021; Accepted 26 October 2021
Available online 30 October 2021
1476-9271/© 2021 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
O. Hermansyah et al. Computational Biology and Chemistry 95 (2021) 107597
2. Methods
2.1. Dataset
The dataset was downloaded from the ChEMBL website, and it in
cludes the human DPP-4 target and an IC50 activity filter (https://www.
ebi.ac.uk/chembl/target_report_card/CHEMBL284/). The dataset con
tained 4 661 compounds after removing empty activity values, salt ions,
and small fragments, with activity units presented as molar values. The
molecular structures were normalized and compound duplication was
determined (Cherkasov et al., 2014; Kausar and Falcao, 2018). The
remaining 3 933 compounds were used as the regression modeling
dataset.
For the QSAR classification models, we used 4 355 compounds from
the ChEMBL database obtained using a scientific literature filter. The
missing values and duplicates were corrected, salt ions and small frag
ments were removed, and the molecular structure was normalized
(Cherkasov et al., 2014; Kausar and Falcao, 2018), leaving 3 740 com
pounds. The data were classified as active and inactive compounds, with
pIC50 activities > 7.5 designating active compounds and pIC50 < 6
designating inactive compounds (those with pIC50 between 6.0 and 7.5
were removed) (Cai et al., 2017). The remaining 2 307 compounds were
Fig. 1. QSAR workflow for modeling DPP-4 inhibitors.
used for the model development.
2
O. Hermansyah et al. Computational Biology and Chemistry 95 (2021) 107597
2.4. Feature selection A previous study proposed rm2(test) for external validation of models
(Roy et al., 2015a), with this value calculated using the square of the
Optimal features were selected using several feature-selection correlation coefficient between actual and predicted activities from the
methods. Dimension reduction was performed with principal compo test dataset. For acceptable predictions, r2m (test) should be < 0.2 when
nent analysis, calculation of height correlation, and application of the ∆rm2 (test) is > 0.5.
random forest algorithm (Silipo et al., 2014). The identified features
were tested using several machine learning models, and those with the 2.7. Evaluation of the classification QSAR model
highest accuracy were used as features for modeling.
Internal validation and external validation datasets were used to test
2.5. Quantitative structure–activity relationship modeling classification model performance. All models were evaluated as follows:
3
O. Hermansyah et al. Computational Biology and Chemistry 95 (2021) 107597
Fig. 2. Analysis of the chemical space. Training set versus test set (external validation) defined by molecular weight (MW) and ALogP. (A) For the regression model,
green symbols (x) are results from the training set, and red symbols (∆) are results from the test set. (B) For the classification model, blue symbols (□) are results from
the training set, and red symbols (∆) are results from the test set.
Fig. 3. Feature selection. Seven features developed to obtain the best method Fig. 4. Internal validation results using the regression model. Support vector
regression produced the best performance among other models according to the
using four learning models.
lowest MSE.
4
O. Hermansyah et al. Computational Biology and Chemistry 95 (2021) 107597
Table 1
Internal validation results for the classification model.
Models TP FP TN FN Sensitivity Specificity F-Measure Precision Accuracy
Table 2
Statistical parameters for external validation of various models.
Metric DL XGBoost MLR RF SVR Standard
(R −2
R′ 20 )
/ R2 0.0023 0.0000 0.0086 0.0107 0.0006 < 0.1a
⃒ 2 ⃒
⃒R − R′ 2 ⃒
0 0
0.3347 0.0704 0.1417 0.1349 0.0481 < 0.3a
k 0.9979 1.0024 0.9976 1.0005 0.9975 0.85 ≤ k ≤ 1.15a
k′ 0.9797 0.9845 0.9805 0.9867 0.9902 0.85 ≤ k′ ≤ 1.15a
R2m 0.2760 0.5181 0.3665 0.4272 0.5668 –
Table 3
External validation of the classification model.
Models TP FP TN FN Sensitivity Specificity F-measure Precision Accuracy
= 10.19) to 9 100 µM (pIC50 = 2.04) and a test set of 787 molecules (for 3.2. Feature selection
external validation) with molecular activities ranging from 0.012 nM
(pIC50 = 10.92) to 1 000 µM (pIC50 = 3). Feature selection was performed for seven feature types (Fig. 3) to
For the classification model (Fig. 2B), we used a training set of 1 845 identify the most optimal feature for use in compound identification,
molecules (879 active and 976 inactive), whereas the test set (external with this undertaken using the random forest algorithm. This process
validation) contained 462 molecules (237 active and 135 inactive). reduced 17 569 total features by 98.8%, resulting in 208 features for the
Analysis of the chemical space was performed to examine the diversity QSAR regression model and 200 features for the QSAR classification
of the datasets. Fig. 2 shows plots of DPP-4-inhibitor compound diversity model.
in each dataset according to molecular weight (MW) and partition co
efficient (XLogP). The results identified that compounds with a MW
ranging from 128.094 Da to 1 173.69 Da and a partition coefficient 3.3. Optimization of machine learning algorithms
ranging from − 4.107–18.493, suggesting that the model demonstrated
significant heterogeneity in the chemical space, potentially resulting in The model with the lowest error and highest accuracy was obtained
broader predictive ability for new compounds (Kong and Yan, 2017). from algorithm parameter optimization by searching for the lowest
random error within a specific parameter range across 100 parameter
5
O. Hermansyah et al. Computational Biology and Chemistry 95 (2021) 107597
Table 4
Performance of QSAR workflow on various targets.
Target Models Q2 MSE R2 (ext) Dataset Curation Training Validation Test
Beta-1 adrenergic receptor (CHEMBL213) Deep Learning 0.6601 0.4265 0.9134 1508 620 446 496 50
MLR 0.1570 1.0578 -0.2724
Random Forest 0.7349 0.3326 0.6462
SVR 0.7312 0.3373 0.6515
XGBoost 0.7099 0.3641 0.6676
Sigma Opioid receptor (CHEMBL233) Deep Learning 0.6730 0.6113 0.0736 2280 1157 832 925 232
MLR 0.4672 0.9959 0.5693
Random Forest 0.7725 0.4253 0.7318
SVR 0.7543 0.4593 0.7311
XGBoost 0.7453 0.4762 0.7422
values and 100 repetitions. Parameter-optimization results were as fol small datasets, by mapping and transforming nonlinear data kernels into
lows: deep learning using two dense layers with a learning rate of 0.01 high-dimensional features. XGBoost has not been extensively employed
and 100 neurons (RMSE = 0.8335; batches, 65; epochs, 505); XGBoost in QSAR modeling because it is new compared to other machine learning
(RMSE = 0.7684; rounds, 1000; maximum depth, 13; and eta, 0.2994); methods, although it shows a faster analytical capability and satisfactory
multiple linear regression (RMSE = 0.9605; offset parameter, 0.23707); predictive results (Babajide Mustapha and Saeed, 2016).
random forest (RMSE = 0.7765; number of models, 109; tree depth, 15); The optimization results using the algorithms showed accuracy of up
support vector regression (RMSE = 0.7573; cost parameters, 79; degree, to 94%, although support vector regression showed an accuracy of only
5). 69%, suggesting that the hyperplane was unable to separate active and
For optimization of classification methods, the deep learning optimal inactive compounds in the classification model. However, ensemble-
parameters on the dense layer were obtained using a rectified linear unit based models, such as random forest and XGBoost, were able to pre
(ReLU) weight-initiation strategy and a leaky ReLU activation function dict active and inactive compounds well. These results contrasted with
(batch size, 15; epochs, 843). The highest accuracies for each algorithm those of the regression QSAR model, which showed support vector
were 0.9431 (deep learning), 0.9404 (XGBoost; nRounds, 500; max regression as the best algorithm. In the regression method, the trans
Depth, 12), 0.9404 (random forest; nMethod, 216; treeDepth, 21), and formation of the variable to higher dimensions resulted in better pre
0.6938 (support vector regression; sigma, 0.9662; penalty, 25). diction of pIC50 relative to other methods.
Support vector regression showed the best results, which agree with In the regression model, the goodness-of-fit from multiple linear
previous studies that applied the QSAR model to DPP-4-inhibitor anal regression and deep learning were low relative to results from support
ysis (Gu et al., 2013; Yang et al., 2013). Additionally, this method is vector regression and ensemble methods. This is likely because multiple
widely used to manage high-dimensional variables, especially with linear regressions are intended for linear data and show worse
6
O. Hermansyah et al. Computational Biology and Chemistry 95 (2021) 107597
performance with nonlinear data. For deep learning, the poor results deep learning, XGBoost, and random forest showed excellent perfor
were likely due to the lack of training data, which hindered model mance (accuracy >90%), with random forest showing the best
performance. performance.
Internal validation of the regression QSAR model showed < 25% Although the QSAR regression models met the requirements for pa
difference for the results derived from all of the models, suggesting no rameters k and k′ , the R20 values for deep learning, multiple linear
overfitting (Veerasamy et al., 2011). Support vector regression showed regression, and random forest were > 0.1, indicating that they were
lowest error (Fig. 4), followed by XGBoost and random forest. Internal unsuitable as predictive QSAR models.
validation of the QSAR classification models (Table 1) revealed that For verification of similarities between observed and predicted data,
7
O. Hermansyah et al. Computational Biology and Chemistry 95 (2021) 107597
Fig. 7. The interaction of compounds CH0001 CH0002 and CH0003 with DPP4. Ojeda-Montes et al. (2018) report that compounds selected for DPP8 and DPP9 have
hydrophobic interactions with Phe357, Arg358, or Tyr547 residues of DPP4. (a) Hydrophobic interaction of the Pi-Pi T-Shaped type between the aromatic group of
the molecule CH0001 and DPP4 (3KWF) at residues Phe357 and Tyr547. (b) Hydrophobic interaction of the Pi-Pi Stacked type between the aromatic group molecule
CH0002 and DPP4 (4PNZ) at residue Phe357. (c) There was no hydrophobic interaction between CH0003 and DPP4 (2ONC) with Phe357, Arg358, or
Tyr547 residues.
Table 2 shows that deep learning and multiple linear regression did not 3.8. Molecular docking results
meet the R2m requirements (values <0.5), whereas the other methods
were eligible, whereas on support vector regression showed an ∆R2m Molecules showing high potential inhibitory activity after docking
with DPP-4 included CH0002, CH0003 and CH0001 (ChEMBL identi
parameter making it eligible for the QSAR regression model.
Support vector regression fulfilled all requirements necessary for the fiers). Specifically, CH0002 showed slightly higher inhibitory activity to
QSAR regression model, suggesting that it can be used to predict hit trelagliptin (PDB: 5KBY), CH0001 showed higher inhibitory activity
compounds during virtual screening. Additionally, support vector than omarigliptin (PDB: 4PNZ) and CH0003 showed higher inhibitory
regression showed a significant difference between the curves formed by activity than almost all ligands.
predictive values and the zero intercept, suggesting increased model In accordance with the proposal of Ojeda-Montes et al. (2018) to
accuracy. produce inhibitors that have potential activity against DPP4 and are
Table 3 shows external validation results of the classification selective against DPP8 and DPP9, a compound must have an aromatic
method, and Fig. 5 shows ROC analysis of the accuracies of the deep ring so that it can interaction π –π with Phe357, then have a negatively
learning, XGBoost, and random forest methods [all higher than previous charged group so that it can interact electrostatically with Arg358 and
studies (accuracy > 80%)] (Cai et al., 2017). Deep learning and XGBoost have an aromatic ring another to form an additional –π interaction with
algorithms showed better performance on external validation; however, Tyr547. Virtual screening hit compounds, CH0001, CH0002 and
we used the random forest algorithm to build the virtual screening CH0003, have various of these groups (Fig. 7). DPP4 inhibitors such as
workflow based its higher accuracy relative to the other algorithms. trelagliptin, alogliptin, omarigliptin, and carmegliptin also interact with
these residues. In compound CH0003, the interaction with DPP4 (2ONC)
did not form a molecular bond with the residue, which resulted in low
3.6. Testing QSAR regression workflow on other targets selectivity with DPP8 and DPP9 (Ojeda-montes et al., 2018).
Generally compounds that have inhibitory potential (Ki < 1 µM),
QSAR prediction of several targets (Table 4) produced results with a moderate (1 µM < Ki < 10 µM), and weak (Ki > 10 µM) (Havale and Pal,
coefficient of determination reaching 0.7 for the support vector regres 2009; Kang et al., 2014; Taur et al., 2012). Molecular docking results
sion, XGBoost, and random forest models, suggesting the ability of the using DPP-8 indicated that CH0003 (Ki = 52.98 nM) and CH0001
workflow to identify targets from raw datasets downloaded from the (Ki = 310.16 nM) showed high potential inhibitory activity and that
ChEMBL database. CH0002 (Ki = 1190 nM) showed moderate inhibitory activity, which
was similar to trelagliptin (Ki = 1410 nM). For DPP-9, CH0003 (Ki =
22.27 nM) and CH0001 (Ki = 332.85 nM) showed high inhibitory ac
3.7. Virtual screening results tivity, whereas CH0002 (Ki = 1220 nM) showed moderate inhibitory
activity, which was similar to carmegliptin (Ki = 1730 nM). As the result
Virtual screening results of the ChEMBL, PubChem, and Molport of molecular docking of hit compounds against DPP4, DPP8 and DPP9
databases identified several hit compounds (Fig. 6), which were (Table 5), the best compound from the virtual screening study was
compared the DPP-4-inhibitors from the ChEMBL database (similarity CH0002 (Fig. 8), which showed potential inhibitory activity against
values < 0.85) and with DPP-4 DUD-E decoys. We ultimately identified DPP4, but moderate inhibition against both DPP8 and DPP9. The
2 716 potential hit compounds.
8
O. Hermansyah et al. Computational Biology and Chemistry 95 (2021) 107597
Table 5
Molecular docking results of hit compounds with DPP-4, DPP-8, and DPP-9.
Macromolecule Ligand Binding Inhib_Constant Molecule
(PDB ID) Energy (Ki)
Kcal/mol nM
9
O. Hermansyah et al. Computational Biology and Chemistry 95 (2021) 107597
Data Availability 2007. Discovery of alogliptin: a potent, selective, bioavailable, and efficacious
inhibitor of dipeptidyl peptidase IV †. J. Med. Chem. 50, 2297–2300. https://doi.
org/10.1021/jm070104l.
Data are available at Hermansyah, Oky; Bustamam, Alhadi; Yanuar, Gallwitz, B., 2019. Clinical Use of DPP-4 Inhibitors. Front. Endocrinol. 10, 389. https://
Arry (2019), “Dataset for QSAR Modeling of DPP-4 Inhibitors,” Men doi.org/10.3389/fendo.2019.00389.
deley Data, v2. https://doi.org/10.17632/4sw5hr2yz7.2. Gaulton, A., Bellis, L.J., Bento, A.P., Chambers, J., Davies, M., Hersey, A., Light, Y.,
McGlinchey, S., Michalovich, D., Al-Lazikani, B., Overington, J.P., 2012. ChEMBL: a
large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40,
Acknowledgments D1100–D1107. https://doi.org/10.1093/nar/gkr777.
Gaulton, A., Hersey, A., Nowotka, M., Bento, A.P., Chambers, J., Mendez, D., Mutowo, P.,
Atkinson, F., Bellis, L.J., Cibrián-Uhalte, E., Davies, M., Dedman, N., Karlsson, A.,
We thank to Prof. Heru Suhartanto (Faculty of Computer Sciences, Magariños, M.P., Overington, J.P., Papadatos, G., Smit, I., Leach, A.R., 2017. The
Universitas Indonesia) for providing advice on and discussion of ma ChEMBL database in 2017. Nucleic Acids Res. 45, D945–D954. https://doi.org/
chine learning models. 10.1093/nar/gkw1074.
Golbraikh, A., Tropsha, A., 2002. Beware of q2! J. Mol. Graph. Model. 20, 269–276.
https://doi.org/10.1016/S1093-3263(01)00123-1.
Consent for Publication Gramatica, P., 2013. In: Reisfeld, B., Mayeno, A.N. (Eds.), On the Development and
Validation of QSAR Models BT - Computational Toxicology, Volume II. Humana
Press, Totowa, NJ, pp. 499–526. https://doi.org/10.1007/978-1-62703-059-5_21.
Not applicable. Grimshaw, C.E., Jennings, A., Kamran, R., Ueno, H., Nishigaki, N., Kosaka, T., Tani, A.,
Sano, H., Kinugawa, Y., Koumura, E., Shi, L., Takeuchi, K., 2016. Trelagliptin (SYR-
References 472, Zafatek), novel once-weekly treatment for type 2 diabetes, inhibits dipeptidyl
peptidase-4 (DPP-4) via a non-covalent mechanism. PLOS ONE 11, e0157509.
https://doi.org/10.1371/journal.pone.0157509.
Al-Fakih, A.M., Algamal, Z.Y., Lee, M.H., Aziz, M., Ali, H.T.M., 2019. A QSAR model for
Gu, T., Yang, X., Li, M., Wu, M., Su, Q., Lu, W., Zhang, Y., 2013. Predicting the DPP-IV
predicting antidiabetic activity of dipeptidyl peptidase-IV inhibitors by enhanced
inhibitory activity pIC₅₀ based on their physicochemical properties. BioMed. Res. Int.
binary gravitational search algorithm. SAR QSAR Environ. Res. 30, 403–416.
2013, 798743 https://doi.org/10.1155/2013/798743.
https://doi.org/10.1080/1062936X.2019.1607899.
Havale, S.H., Pal, M., 2009. Medicinal chemistry approaches to the inhibition of
Alam, F., Islam, M.A., Kamal, M.A., Gan, S.H., 2018. Updates on managing type 2
dipeptidyl peptidase-4 for the treatment of type 2 diabetes. Bioorg. Med. Chem. 17,
diabetes mellitus with natural products: towards antidiabetic drug development.
1783–1802. https://doi.org/10.1016/j.bmc.2009.01.061.
Curr. Med. Chem. 25 (39), 5395–5431. https://doi.org/10.2174/
Huan, Y., Jiang, Q., Liu, J., Shen, Z., 2015. Establishment of a dipeptidyl peptidases
0929867323666160813222436. PMID: 27528060.
(DPP) 8/9 expressing cell model for evaluating the selectivity of DPP4 inhibitors.
Babajide Mustapha, I., Saeed, F., 2016. Bioactive Molecule Prediction Using Extreme
J. Pharmacol. Toxicol. Methods 71, 8–12. https://doi.org/10.1016/j.
Gradient Boosting. In: Molecules, 21, p. 983. https://doi.org/10.3390/
vascn.2014.11.002.
molecules21080983.
Hughes, J.P., Rees, S., Kalindjian, S.B., Philpott, K.L., 2011. Principles of early drug
Baldi, P., Brunak, S., 2001. Bioinformatics: The Machine Learning Approach. MIT Press.
discovery. Br. J. Pharmacol. 162, 1239–1249. https://doi.org/10.1111/j.1476-
Beisken, S., Meinl, T., Wiswedel, B., de Figueiredo, L.F., Berthold, M., Steinbeck, C.,
5381.2010.01127.x.
2013. KNIME-CDK: Workflow-driven cheminformatics. BMC Bioinforma. 14, 257.
Kang, N.S., Ahn, J.H., Kim, S.S., Chae, C.H., Yoo, S.-E., 2007. Docking-based 3D-QSAR
https://doi.org/10.1186/1471-2105-14-257. PMID: 24103053; PMCID:
study for selectivity of DPP4, DPP8, and DPP9 inhibitors. Bioorg. Med. Chem. Lett.
PMC3765822.
17, 3716–3721. https://doi.org/10.1016/j.bmcl.2007.04.031.
Beisken, S., Meinl, T., Wiswedel, B., de Figueiredo, L.F., Berthold, M., Steinbeck, C.,
Kang, S., Tang, W., Li, H., Chreifi, G., Martásek, P., Roman, L.J., Poulos, T.L.,
2013. KNIME-CDK: Workflow-driven cheminformatics. BMC Bioinformatics 14 (1),
Silverman, R.B., 2014. Nitric oxide synthase inhibitors that interact with both heme
257. https://doi.org/10.1186/1471-2105-14-257.
propionate and tetrahydrobiopterin show high isoform selectivity. J. Med. Chem. 57,
Berman, H.M., 2000. The protein data bank. Nucleic Acids Res. 28, 235–242. https://doi.
4382–4396. https://doi.org/10.1021/jm5004182.
org/10.1093/nar/28.1.235.
Kausar, S., Falcao, A.O., 2018. An automated framework for QSAR model building.
Biftu, T., Sinha-Roy, R., Chen, P., Qian, X., Feng, D., Kuethe, J.T., Scapin, G., Gao, Y.D.,
J. Chemin.-. 10, 1. https://doi.org/10.1186/s13321-017-0256-5.
Yan, Y., Krueger, D., Bak, A., Eiermann, G., He, J., Cox, J., Hicks, J., Lyons, K.,
Kim, S., Chen, J., Cheng, T., Gindulyte, A., He, J., He, S., Li, Q., Shoemaker, B.A.,
He, H., Salituro, G., Tong, S., Patel, S., Doss, G., Petrov, A., Wu, J., Xu, S.S.,
Thiessen, P.A., Yu, B., Zaslavsky, L., Zhang, J., Bolton, E.E., 2018. PubChem 2019
Sewall, C., Zhang, X., Zhang, B., Thornberry, N.A., Weber, A.E., 2014. Omarigliptin
update: improved access to chemical data. Nucleic Acids Res. 47, D1102–D1109.
(MK-3102): a novel long-acting DPP-4 inhibitor for once-weekly treatment of type 2
https://doi.org/10.1093/nar/gky1033.
diabetes. J. Med. Chem. 57, 3205–3212. https://doi.org/10.1021/jm401992e.
Kim, S., Thiessen, P.A., Bolton, E.E., Chen, J., Fu, G., Gindulyte, A., Han, L., He, J., He, S.,
Bitencourt-Ferreira, G., de Azevedo, W.F., 2019a. In: de Azevedo Jr., W.F. (Ed.), Machine
Shoemaker, B.A., Wang, J., Yu, B., Zhang, J., Bryant, S.H., 2016. PubChem substance
Learning to Predict Binding Affinity BT - Docking Screens for Drug Discovery.
and compound databases. Nucleic Acids Res. 44, D1202–D1213. https://doi.org/
Springer New York, New York, NY, pp. 251–273. https://doi.org/10.1007/978-1-
10.1093/nar/gkv951.
4939-9752-7_16.
Kong, Y., Yan, A., 2017. QSAR models for predicting the bioactivity of Polo-like Kinase 1
Burness, CBurness, C.B. (2015). Omarigliptin: First Global Approval. Drugs, 75(16),
inhibitors. Chemom. Intell. Lab. Syst. 167, 214–225. https://doi.org/10.1016/j.
1947–1952. https://doi.org/10.1007/s40265–015-0493–8eleste B, 2015.
chemolab.2017.06.011.
Omarigliptin: First Global Approval. Drugs 75, 1947–1952. 〈https://doi.org/
Kumar, R., Sharma, A., Siddiqui, M.H., Tiwari, R.K., 2018. Prediction of drug-plasma
10.1007/s40265–015-0493–8〉.
protein binding using artificial intelligence based algorithms. Comb. Chem. High.
Bitencourt-Ferreira, G., Duarte da Silva, A., Filgueira de Azevedo, W. Jr., 2021.
Throughput Screen. 21 (1), 57–64. https://doi.org/10.2174/
Application of Machine Learning Techniques to Predict Binding Affinity for Drug
1386207321666171218121557. PMID: 29256344.
Targets: A Study of Cyclin-Dependent Kinase 2. Curr. Med. Chem. 28 (2), 253–265.
Lipinski, C.A., Lombardo, F., Dominy, B.W., Feeney, P.J., 2001. Experimental and
https://doi.org/10.2174/2213275912666191102162959.
computational approaches to estimate solubility and permeability in drug discovery
Cai, J., Li, C., Liu, Z., Du, J., Ye, J., Gu, Q., Xu, J., 2017. Predicting DPP-IV inhibitors with
and development settings1PII of original article: S0169–409X(96)00423–1. The
machine learning approaches. J. Comput. -Aided Mol. Des. 31, 393–402. https://doi.
article was originally published in Advanced Drug Delivery Reviews 23 (1997) 3.
org/10.1007/s10822-017-0009-6.
Advanced Drug Delivery Reviews 46, 3–26. https://doi.org/https://doi.org/
Cherkasov, A., Muratov, E.N., Fourches, D., Varnek, A., Baskin, I.I., Cronin, M.,
10.1016/S0169–409X(00)00129–0.
Dearden, J., Gramatica, P., Martin, Y.C., Todeschini, R., Consonni, V., Kuz’min, V.E.,
Liu, W., Lu, H., Cao, C., Jiao, Y., Chen, G., 2018. An improved quantitative structure
Cramer, R., Benigni, R., Yang, C., Rathman, J., Terfloth, L., Gasteiger, J., Richard, A.,
property relationship model for predicting thermal conductivity of liquid aliphatic
Tropsha, A., 2014. QSAR modeling: where have you been? Where are you going to?
alcohols. J. Chem. Eng. Data 63, 4735–4740. https://doi.org/10.1021/acs.
J. Med. Chem. 57, 4977–5010. https://doi.org/10.1021/jm4004285.
jced.8b00764.
Chylewska, A., Biedulska, M., Sumczynski, P., Makowski, M., 2018.
Makrilakis, K., 2019. The role of DPP-4 inhibitors in the treatment algorithm of type 2
Metallopharmaceuticals in therapy - a new horizon for scientific research. Curr. Med.
diabetes mellitus: when to select, what to expect. Int. J. Environ. Res. Public Health
Chem. 25 (15), 1729–1791. https://doi.org/10.2174/
16, 2720. https://doi.org/10.3390/ijerph16152720.
0929867325666171206102501. PMID: 29210637.
Martin, Y.C., Kofron, J.L., Traphagen, L.M., 2002. Do structurally similar molecules have
da Silva, A.D., Bitencourt-Ferreira, G., de Azevedo Jr., W.F., 2020. Taba: a tool to analyze
similar biological activity? J. Med. Chem. 45, 4350–4358. https://doi.org/10.1021/
the binding affinity. J. Comput. Chem. 41, 69–73. https://doi.org/10.1002/
jm020155c.
jcc.26048.
Mattei, P., Boehringer, M., Di Giorgio, P., Fischer, H., Hennig, M., Huwyler, J., Koçer, B.,
Danishuddin, Khan, A.U., 2016. Descriptors and their selection methods in QSAR
Kuhn, B., Loeffler, B.M., MacDonald, A., Narquizian, R., Rauber, E., Sebokova, E.,
analysis: paradigm for drug design. Drug Discov. Today 21, 1291–1302. https://doi.
Sprecher, U., 2010. Discovery of carmegliptin: a potent and long-acting dipeptidyl
org/10.1016/j.drudis.2016.06.013.
peptidase IV inhibitor for the treatment of type 2 diabetes. Bioorg. Med. Chem. Lett.
Davies, M., Nowotka, M., Papadatos, G., Dedman, N., Gaulton, A., Atkinson, F., Bellis, L.,
20, 1109–1113. https://doi.org/10.1016/j.bmcl.2009.12.024.
Overington, J.P., 2015. ChEMBL web services: streamlining access to drug discovery
McKeage, K., 2015. Trelagliptin: first global approval. Drugs 75, 1161–1164. https://doi.
data and utilities. Nucleic Acids Res. 43, W612–W620. https://doi.org/10.1093/
org/10.1007/s40265-015-0431-9.
nar/gkv352.
Mozafari, Z., Arab Chamjangali, M., Arashi, M., 2020. Combination of least absolute
Feng, J., Zhang, Z., Wallace, M.B., Stafford, J.A., Kaldor, S.W., Kassel, D.B., Navre, M.,
shrinkage and selection operator with Bayesian Regularization artificial neural
Shi, L., Skene, R.J., Asakawa, T., Takeuchi, K., Xu, R., Webb, D.R., Gwaltney, S.L.,
10
O. Hermansyah et al. Computational Biology and Chemistry 95 (2021) 107597
network (LASSO-BR-ANN) for QSAR studies using functional group and molecular Roy, K., Kar, S., Das, R.N., 2015b. In: Roy, K., Kar, S., Das, R.N. (Eds.), QSAR/QSPR
docking mixed descriptors. Chemom. Intell. Lab. Syst. 200, 103998 https://doi.org/ Methods BT - A Primer on QSAR/QSPR Modeling: Fundamental Concepts. Springer
10.1016/j.chemolab.2020.103998. International Publishing, Cham, pp. 61–103. https://doi.org/10.1007/978-3-319-
Myint, K.-Z., Wang, L., Tong, Q., Xie, X.-Q., 2012. Molecular fingerprint-based artificial 17281-1_3.
neural networks QSAR for ligand biological activity predictions. Mol. Pharm. 9, Santos, L.H.S., Ferreira, R.S., Caffarena, E.R., 2019. In: de Azevedo Jr., W.F. (Ed.),
2912–2923. https://doi.org/10.1021/mp300237z. Integrating Molecular Docking and Molecular Dynamics Simulations BT - Docking
Mysinger, M.M., Carchia, M., Irwin, J.J., Shoichet, B.K., 2012. Directory of useful decoys, Screens for Drug Discovery. Springer New York, New York, NY, pp. 13–34. https://
enhanced (DUD-E): better ligands and decoys for better benchmarking. J. Med. doi.org/10.1007/978-1-4939-9752-7_2.
Chem. 55, 6582–6594. https://doi.org/10.1021/jm300687e. Selvaraj, C., Tripathi, S., Reddy, K., Singh, S.K., 2011. Tool development for Prediction of
Neves, B.J., Braga, R.C., Melo-Filho, C.C., Moreira-Filho, J.T., Muratov, E.N., Andrade, C. pIC50 values from the IC50 values-A pIC50 value calculator, Current Trends in
H., 2018. QSAR-based virtual screening: advances and applications in drug Biotechnology and Pharmacy.
discovery. Front. Pharmacol. 9, 1275. https://doi.org/10.3389/fphar.2018.01275. Sesti, G., Avogaro, A., Belcastro, S., Bonora, B.M., Croci, M., Daniele, G., Dauriz, M.,
Ojeda-Montes, M.J., Gimeno, A., Tomas-Hernández, S., Cereto-Massagué, A., Beltrán- Dotta, F., Formichi, C., Frontoni, S., Invitti, C., Orsi, E., Picconi, F., Resi, V.,
Debón, R., Valls, C., Mulero, M., Pujadas, G., Garcia-Vallvé, S., 2018. Activity and Bonora, E., Purrello, F., 2019. Ten years of experience with DPP-4 inhibitors for the
selectivity cliffs for DPP-IV inhibitors: Lessons we can learn from SAR studies and treatment of type 2 diabetes mellitus. Acta Diabetol. 56, 605–617. https://doi.org/
their application to virtual screening. Med. Res. Rev. 38 (6), 1874–1915. https://doi. 10.1007/s00592-018-1271-3.
org/10.1002/med.21499. Epub 2018 Apr 16. PMID: 29660786. Shamsara, J., 2019. A random forest model to predict the activity of a large set of soluble
Mazanetz, P., J. Marmon, M., Reisser B.T., R., Morao, I, C., 2012. Drug discovery epoxide hydrolase inhibitors solely based on a set of simple fragmental descriptors.
applications for KNIME: an open source data mining platform. Curr. Top. Med. Comb. Chem. High. Throughput Screen. 22 (8), 555–569. https://doi.org/10.2174/
Chem. 12, 1965–1979. https://doi.org/10.2174/156802612804910331. 1386207322666191016110232. PMID: 31622216.
Patel, B.D., Ghate, M.D., 2014. Recent approaches to medicinal chemistry and Shi, J., Zhao, G., Wei, Y., 2018. Computational QSAR model combined molecular
therapeutic potential of dipeptidyl peptidase-4 (DPP-4) inhibitors. Eur. J. Med. descriptors and fingerprints to predict HDAC1 inhibitors. Med Sci. (Paris) 34, 52–58.
Chem. 74, 574–605. https://doi.org/10.1016/j.ejmech.2013.12.038. Silipo, R., Adae, I., Hart, A., Berthold, M., 2014. Seven techniques for dimensionality
Pei, L., Shen, X., Yan, Y., Tan, C., Qu, K., Zou, J., Wang, Y., Ping, F., 2020. Virtual reduction: missing values, low variance filter, high correlation filter, pca, random
screening of the multi-pathway and multi-gene regulatory molecular mechanism of forests, backward feature elimination, and forward feature construction. Knime
dachengqi decoction in the treatment of stroke based on network pharmacology. 1–21.
Comb. Chem. High. Throughput Screen. 23 (8), 775–787. https://doi.org/10.2174/ Sokolović, D., Ranković, J., Stanković, V., Stefanović, R., Karaleić, S., Mekić, B.,
1386207323666200311113747. PMID: 32160845. Milenković, V., Kocić, J., Veselinović, A.M., 2017. QSAR study of dipeptidyl
Popovic-Djordjevic, J.B., Jevtic, I.I., Stanojkovic, T.P., 2018. Antidiabetics: structural peptidase-4 inhibitors based on the Monte Carlo method. Med. Chem. Res. 26,
diversity of molecules with a common aim. Curr Med Chem. 25 (18), 2140–2165. 796–804. https://doi.org/10.1007/s00044-017-1792-2.
https://doi.org/10.2174/0929867325666171205145309. PMID: 29210642. Taur, J.-S., Schuck, E.L., Wong, N.Y., 2012. A transcellular assay to assess the P-gp
Ramesh, Muthusamy, Muthuraman, Arunachalam, 2020R. Quantitative structure- inhibition in early stage of drug development. Drug Metab. Lett. 6, 285–291. https://
activity relationship (QSAR) studies for the inhibition of MAOs. Comb. Chem. High. doi.org/10.2174/1872312811206040008.
Throughput Screen. 23 (9) https://doi.org/10.2174/ Veerasamy, R., Rajak, H., Jain, A., Sivadasan, S., Christapher, P.V., Agrawal, R.K., 2011.
1386207323666200324173231. Validation of QSAR models - strategies and importance. Int. J. Drug Des. Disco.
Ripley, B.D., 1996. Pattern Recognition and Neural Networks. Cambridge University Wang, Z.-F., Hu, Y.-Q., Zhang, Q.-G.W., R, 2019. Virtual screening of potential anti-
Press, Cambridge https://doi.org/DOI: 10.1017/CBO9780511812651. fatigue mechanism of polygonati rhizoma based on network pharmacology. Comb.
Ross, B., Krapp, S., Augustin, M., Kierfersauer, R., Arciniega, M., Geiss-Friedlander, R., Chem. High. Throughput Screen. https://doi.org/10.2174/
Huber, R., 2018. Structures and mechanism of dipeptidyl peptidases 8 and 9, 1386207322666191106110615.
important players in cellular homeostasis and cancer. Proc. Natl. Acad. Sci. 115, Wójcikowski, M., Siedlecki, P., Ballester, P.J., 2019. In: de Azevedo Jr., W.F. (Ed.),
E1437–E1445. https://doi.org/10.1073/pnas.1717565115. Building Machine-Learning Scoring Functions for Structure-Based Prediction of
Ross, B.H., 2019. Improvement of Protein Crystal Diffraction Using Post-Crystallization Intermolecular Binding Affinity BT - Docking Screens for Drug Discovery. Springer
Methods: Infrared Laser Radiation Controls Crystal Order. Thesis. 〈https://doi.org/ New York, New York, NY, pp. 1–12. https://doi.org/10.1007/978-1-4939-9752-7_1.
10.2210/PDB6HP8/PDB〉. Xiong, Z., Cui, Y., Liu, Z., Zhao, Y., Hu, M., Hu, J., 2020. Evaluating explorative
Roy, K., Kar, S., Das, R., 2015a. A primer on QSAR/QSPR modeling: fundamental prediction power of machine learning algorithms for materials discovery using k-fold
concepts. 〈https://doi.org/10.1007/978–3-319–17281-1〉. forward cross-validation. Comput. Mater. Sci. 171, 109203 https://doi.org/
Roy, K., Kar, S., Das, R.N., 2015a. In: Roy, K., Kar, S., Das, R.N. (Eds.), Statistical Methods 10.1016/j.commatsci.2019.109203.
in QSAR/QSPR BT - A Primer on QSAR/QSPR Modeling: Fundamental Concepts. Yang, X., Li, M., Su, Q., Wu, M., Gu, T., Lu, W., 2013. QSAR studies on pyrrolidine amides
Springer International Publishing, Cham, pp. 37–59. https://doi.org/10.1007/978- derivatives as DPP-IV inhibitors for type 2 diabetes. Med. Chem. Res. 22,
3-319-17281-1_2. 5274–5283. https://doi.org/10.1007/s00044-013-0527-2.
11