Professional Documents
Culture Documents
com
ScienceDirect
Journal of Taibah University for Science 11 (2017) 422–433
Received 15 April 2016; received in revised form 28 May 2016; accepted 23 June 2016
Available online 21 July 2016
Abstract
To establish a quantitative structure-activity relationship model of the binding affinity constants (−log Ki ) of 41 flavonoid
derivatives towards the GABA (A) receptor, the DFT-B3LYP method with basis set 6-31G (d) was performed to gain insights into
the chemical structure and property information for the studied compounds. The best topological and electronic descriptors were
selected. This work was conducted with principal component analysis (PCA), multiple linear regression (MLR), multiple non-linear
regression (MNLR) and artificial neural network (ANN). According to these analyses, we propose quantitative models and interpret
the activity of the compounds based on multivariate statistical analysis. The statistical results of the MLR, MNLR and ANN indicate
that the determination coefficients R2 were 0.896, 0.925 and 0.916, respectively. The results show that the three modelling methods
can predict the studied activity well and may be useful for predicting the biological activity of new compounds. The statistical
results indicate that the models are statistically significant and stable with data variation in the external validation.
© 2016 The Authors. Production and hosting by Elsevier B.V. on behalf of Taibah University. This is an open access article under
the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Keywords: QSAR model; DFT study; Flavonoid derivatives; GABA (A) receptor; Artificial neural network (ANN)
http://dx.doi.org/10.1016/j.jtusci.2016.06.005
1658-3655 © 2016 The Authors. Production and hosting by Elsevier B.V. on behalf of Taibah University. This is an open access article under the
CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
M. Ghamali et al. / Journal of Taibah University for Science 11 (2017) 422–433 423
Table 1
List of 41 flavonoids and their binding affinities towards GABA (A) receptor (−log Ki , Ki in nM).
N R1 R2 R3 R4 R1 R2 R3 −log Ki
1 H H H H H H H 6.000
2 H F H H H OH H 5.600
3 H Cl H H H OH H 6.070
4 H Br H H H OH H 6.220
5* H F H H H NO2 H 6.740
6 H Cl H H H NO2 H 8.100
7 H Cl H H H H OCH3 5.900
8* H Br H H H H OCH3 5.680
9* H Br H H NO2 H H 6.680
10 H NO2 H H H H Br 7.600
11* H Cl H H F H H 6.380
12 H Br H H F H H 6.420
13 H H H H H F H 5.450
14 H F H H H F H 6.040
15 H Cl H H H F H 6.930
16* H Br H H H F H 7.380
17 H H H H H H F 5.440
18* H F H H H H F 5.600
19 H Cl H H H H F 6.740
20* H Br H H H H F 6.940
21 H H H H H Cl H 6.210
22 H F H H H Cl H 6.700
23* H Cl H H H Cl H 7.640
24 H Br H H H Cl H 7.770
25 H H H H H Br H 6.380
26 H F H H H Br H 6.630
27 H Cl H H H Br H 7.640
28 H Br H H H Br H 7.720
29* H Br H H H H H 7.150
30* H Br H H H H NO2 6.700
31 H NO2 H H H NO2 H 7.920
32* H Br H H H NO2 H 9.000
33 OH Br OH Br H H H 6.150
34 OH H OH H H H H 5.520
35* OH H OH H H H OH 5.520
36* OH H OH H Cl H H 5.100
37 OH H OH H F H H 5.100
38 OH OCH3 OH H H H OH 6.000
39 OH OH OH H H H H 5.250
40 OH OH OH H H H OH 4.920
41 OH H OH OCH3 H H H 5.690
* Test set.
dipole moment (DM) (Debye), total energy ET (eV), volume MV (cm3 ); molecular weight MW (g/mol);
absolute hardness η (eV), absolute electronegativity χ molar refractivity MR (cm3 ); parachor Pc (cm3 ); den-
(eV), and reactivity index ω (eV). η, χ and ω were deter- sity D (g/cm3 ); refractive index n; surface tension γ
mined using the following equations [25]: (dyne/cm3 ). Thus, the descriptor data matrix of dimen-
(ELUMO − EHOMO ) (ELUMO − EHOMO ) sion (41*14) was generated as shown in Table 2.
η= , χ= ,
2 2
2.3. Statistical analysis
μ2
ω=
2η The quantitative structure-activity relationship
(QSAR) study is a statistical approach to establish
The ACD/ChemSketch program [26] was used to empirical models that relate the biological activity of
calculate the following topological descriptors: molar compounds to their chemical structures. In this QSAR
Table 2
Values of the obtained parameters of the studied flavonoid derivatives.
N −log Ki ET EHOMO ELUMO DM ω η χ MW MR MV Pc n γ D
1 6.000 −19,825.38 −6.35 −1.80 4.20 3.642 2.278 −4.073 222.239 64.200 179.200 475.700 1.635 49.500 1.239
2 5.600 −24,575.59 −6.31 −1.91 5.95 3.841 2.199 −4.110 256.229 66.080 181.900 498.300 1.646 56.300 1.408
3 6.070 −34,388.26 −6.37 −1.98 6.43 3.978 2.192 −4.176 272.683 70.980 189.600 528.100 1.671 60.100 1.437
4 6.220 −91,884.65 −6.35 −1.98 6.29 3.973 2.186 −4.168 317.134 73.770 193.800 542.000 1.686 61.000 1.635
5* 6.740 −28,095.96 −6.73 −2.78 1.76 5.734 1.975 −4.759 285.227 70.740 195.300 540.100 1.644 58.400 1.460
6 8.100 −37,908.63 −6.80 −2.82 2.26 5.813 1.988 −4.808 301.681 75.640 203.000 569.900 1.667 62.000 1.485
7 5.900 −35,458.63 −6.13 −1.84 7.34 3.700 2.147 −3.986 286.710 75.780 215.200 571.500 1.621 49.700 1.332
425
* Test set
426 M. Ghamali et al. / Journal of Taibah University for Science 11 (2017) 422–433
study, quantitative descriptors are used to explain the and the details of this algorithm are provided elsewhere
chemical structure and describe the relationship between [33].
the chemical structure and the biological activity. To Testing the stability, predictive capacity and general-
explain the structure-activity relationship, these 14 ization ability of the models are notably important steps
descriptors were calculated for the 41 molecules using in a QSAR study. To validate the predictive capacity of
the Gaussian03W and ChemSketch programs. a QSAR model, there are two basic principles: inter-
The quantitative descriptors of the flavonoid deriva- nal validation and external validation. Cross-validation
tives were studied using statistical methods based on is a popular method for internal validation. In this study,
the principal component analysis (PCA) [27] with the the internal predictive capability of the model was eval-
software XLSTAT version 2013 [28]. The PCA is a use- uated using the leave-one-out cross-validation (R2 cv ).
ful statistical technique to obtain the maximal amount A good R2 cv often indicates good robustness and high
of encoded information in the compound structures and internal predictive capacity of the QSAR model. How-
understand the distribution of the compounds [29]. This ever, recent studies [34] indicate that there is no evident
method is an essentially descriptive statistical method correlation between the value of R2 cv and the actual pre-
to graphically present the maximum information in the dictive capacity of a QSAR model, which suggests that
data, as shown in Tables 1 and 2. R2 cv remains inadequate as a reliable estimate of the
The multiple linear regression (MLR) analysis with model’s predictive capacity for all new chemicals. To
backward elimination of variables is used to model determine both the generalizability of QSAR models for
the structure-activity relationship. This mathematical new chemicals and the true predictive capacity of the
technique minimizes the difference between actual and models, a statistical external validation is applied in the
predicted values. It is also used to select the descriptors model development step by properly using a prediction
as input variables in the multiple nonlinear regression set for validation.
(MNLR) and artificial neural network (ANN). To further refine the predictive ability of the devel-
MLR and MNLR were generated using the software oped QSAR models, another group of metrics, the rm 2
XLSTAT version 2013. To predict −log Ki , the equa- metrics, which determines the proximity between the
tions were justified by the determination coefficient (R2 ), observed and predicted activities, was introduced by Roy
mean squared error (MSE), Fisher’s criterion (F) and and Roy [35]. The rm 2 metrics are calculated based on
significance level (P). the correlation of the observed and predicted response
The ANN analysis was performed using Matlab soft- data. Presently two different indicators are calculated
ware version 2009a Neural Fitting tool (nftool) toolbox for both training (internal validation) and test (external
on a data set of the compounds [30]. A number of validation) sets: rm 2 and rm 2 . For an acceptable QSAR
individual models of the ANN were designed, built model, rm 2 should be >0.5, and rm 2 should be <0.2.
and trained. Three components constitute a neural net-
work: the processing elements or nodes, the topology
3. Results and discussion
of the connections among the nodes, and the learn-
ing rule by which new information is encoded in the
3.1. Data set for analysis
network. Although there many different ANN models,
the most frequently used type of ANN in QSAR is the
A QSAR study was performed to study the interac-
three-layered feed-forward network [31]. In this type of
tion of 41 flavonoids with the GABA (A) receptor, as
network, the neurons are arranged in layers as an input
previously reported [23], to determine the quantitative
layer, a hidden layer and an output layer. Each neuron in
relationship between the structure as well as the biolog-
any layer is fully connected with the neurons of a suc-
ical activity. The values of the calculated descriptors are
ceeding layer, and there are no connections among the
shown in Table 2.
neurons in the same layer.
According to the supervised learning that we used
here, the networks were taught with examples of input 3.2. Principal component analysis
patterns and the corresponding target outputs. Through
an iterative process, the connection weights were mod- In total, 14 descriptors that encode the 41 molecules
ified until the network provided the desired results for were submitted to principal component analysis
the training set of data. A back-propagation algorithm (PCA)[36]. The first three principal axes sufficiently
was used to minimize the error function. This algorithm describe the information provided by the data matrix.
was described with a simple application example [32], Indeed, the percentages of variance are 49.82%; 24.74%
M. Ghamali et al. / Journal of Taibah University for Science 11 (2017) 422–433 427
1
total information is estimated to be 86.15%.
Principal component analysis (PCA) [37] was con-
0.456
ducted to identify the link among different variables.
1
Table 3 presents the correlation matrix that represents
the correlations among the fourteen descriptors.
0.955
0.616
The obtained matrix provides information on the high
1
or low relationship among the variables. In general,
co-linearity (r > 0.5) was observed among most of the
0.433
0.350
0.701
variables and between the variables and −log Ki . A
Pc
1
high interrelationship was observed between ELUMO and
ω(r = −0.996), and a low interrelationship was observed
0.784
−0.191
−0.302
0.411
between −log Ki and DM (r = 0.024). Additionally, to
MV
1
decrease the redundancy in our data matrix, the highly
correlated descriptors (R ≥ 0.9) were excluded.
0.782
0.983
0.460
0.328
0.768
MR
1
3.3. Multiple linear regression MLR
0.901
0.751
0.847
0.341
0.166
0.901
Many attempts have been made to develop a relation-
MW
ship with the indicator variable of the biological activity
1
−log Ki , but the best relationship that was obtained using
this method only corresponds to the linear combination
−0.395
−0.325
−0.533
−0.346
0.268
0.306
−0.201
of several selected descriptors: the energy EHOMO , dipole
1
χ
0.608
−0.568
−0.632
−0.532
−0.671
−0.224
−0.199
−0.481
The resulting equation is:
1
η
− 1.734 × 10−2 × γ
ω
(1)
0.026
−0.051
−0.063
0.290
0.293
0.398
0.259
−0.103
−0.198
0.155
DM
R2 = 0.896; R2 cv = 0.825; rm 2
1
Ntraining = 28;
(LOO) = 0.732; rm (LOO) = 0.140; MSE = 0.100;
2
−0.065
−0.996
0.754
0.980
−0.470
−0.428
−0.574
−0.455
0.165
0.203
−0.287
0.883
−0.056
−0.856
0.358
0.959
−0.261
−0.157
−0.436
−0.167
0.395
0.431
−0.065
EHOMO
ELUMO
MV
MR
Pc
D
ω
χ
η
γ
n
428 M. Ghamali et al. / Journal of Taibah University for Science 11 (2017) 422–433
Fig. 2. Graphical representation of calculated and observed toxicity and the residues values calculated by MLR.
M. Ghamali et al. / Journal of Taibah University for Science 11 (2017) 422–433 429
Fig. 3. Graphical representation of calculated and observed toxicity and the residues values calculated by MNLR.
common tool to study multidimensional data. We used shown in Fig. 3. As observed, these data are uniformly
the data matrix of the proposed descriptors by the MLR, distributed around the regression line.
which correspond to 28 compounds (training set).
The obtained equation is: 3.5. Artificial neural networks ANN
− log Ki = −46.438 − 7.014 × EHOMO + 0.498
ANN has become an important and widely used non-
× DM + 0.534 × MR + 2.733 × 10−2 linear modelling technique for QSAR studies. It can
× γ − 0.408 × EHOMO
2
− 6.768 × 10−2 be used to generate a predictive model of the quan-
titative structure-activity relationship (QSAR) between
× DM2 − 2.876 × 10−3 × MR2 the molecular descriptors, which are obtained from the
MLR, and the observed activity. The ANN calculated
− 3.161 × 10−4 × γ 2 (2)
activity model was developed using the characteristics
of the studied compounds. The correlation of the pre-
The obtained parameters that describe the electronic dicted and observed activities and the residual graph of
and topological aspects of the studied molecules are: absolute numbers are shown in Fig. 4.
Ntest = 13; R2 pred = 0.874; rm 2 (test) = 0.695; rm 2 Ntest = 13; R2 pred = 0.712; rm 2 (test) = 0.666; rm 2
(test) = 0.131; MSEtest = 0.116 (test) = 0.123; MSEtest = 0.181
The QSAR model expressed by Eq. (2) was cross- The obtained determination coefficient (R2 ) value is
validated by its appreciable R2 cv values (R2 cv = 0.784), 0.916 for this data set of flavonoid derivatives. The cross-
which was obtained using the leave-one-out (LOO) validated squared correlation coefficient (R2 cv = 0.745)
method. A value of R2 cv greater than 0.5 is the essen- and mean squared error (MSE = 0.085) suggest the good
tial condition to qualify a QSAR model as valid [34]. internal consistency and predictive ability of the bio-
In addition, the metric values (rm 2 and rm 2 ) indicate logical activity with a low MSE. The ANN model also
that the QSAR model is acceptable. The robustness and shows satisfactory performance in the external validation
predictive ability of the model were further supported (R2 pred = 0.712), which confirms that the stability predic-
by the significant R2 pred value (0.874) of the test set tion ability for new chemicals.
data. According to internal, external and metric values, We assessed the best QSAR models that were
the MNLR model is better than the MLR model. developed in this study. Based on previous results, a
The correlations of the predicted and observed activ- comparison of the quality of the MLR, MNLR and
ities and the residual graph of the absolute numbers are ANN models shows that the MNLR model truly reflects
430 M. Ghamali et al. / Journal of Taibah University for Science 11 (2017) 422–433
Fig. 4. Graphical representation of calculated and observed toxicity and the residues values calculated by ANN.
better predictive capability than the MLR and ANN mod- applicability must be defined. The predicted compounds
els because the MNLR approach provides better results in this domain may be considered reliable. The applica-
than MLR and ANN. MNLR establishes a satisfactory bility domain was discussed based on the Williams graph
relationship between the molecular descriptors and the in Fig. 5, where the standardized residuals and leverage
activity of the studied compounds. The principal perfor- values (hi ) are plotted. Based on the calculation of the
mance metrics of the three models are shown in Table 5. leverage hi for each compound, the QSAR model is used
to predict the compound activity:
3.6. Domain of applicability −1 T
hi = xi (XT X) xi (i = 1, . . ., n)
To evaluate the reliability of any QSAR model and where xi is the row vector of the descriptors of compound
its power to predict new compounds, the domain of i, and X is the variable matrix, which is deduced from the
Table 5
Performance comparison between models obtained by MLR and ANN.
Model Training set Test set
R2 cv rm (LOO)
2
rm 2 (LOO) MSE R2 pred rm 2 (test) rm 2 (test) MSE
Finally, the accuracy and predictability of the pro- synthesized by solution phase combinatorial chemistry, Biochem.
posed models were illustrated by comparing key Biophys. Res. Commun. 249 (1998) 481–485.
statistical indicators such, as R or R2 , of different models [16] K. Dekermendjian, P. Kahnberg, M.R. Witt, et al., The use of
a pharmacophore model for identification of novel ligands for
using different statistical tools and descriptors, as shown the benzodiazepine binding site of the GABAA receptor, J. Med.
in Table 6. Chem. 42 (1999) 4343–4350.
[17] X. Hong, A.J. Hopffinger, 3D-pharmacophores of flavonoid
binding at the benzodiazepine GABAA receptor site using
Acknowledgment
4D-QSAR analysis, J. Chem. Inform. Model. 43 (2003)
324–326.
We are grateful to the “Association Marocaine des [18] E.D. Cox, H.D. Arauzo, Q. Huang, M.S. Reddy, C. Ma, B. Harris,
Chimistes Théoriciens” (AMCT) for its pertinent help R. McKernan, P. Skolnick, J.M. Cook, J. Med. Chem. 41 (1998)
with the programs. 2537–2552.
[19] H.D. Arauzo, G.E. Evoniuk, P. Skolnick, J.M. Cook, J. Med.
Chem. 34 (1991) 1754–1756.
References [20] W. Zhang, K.F. Koehler, P. Zhang, J.M. Cook, Development of
a comprehensive pharmacophore model for the benzodiazepine
[1] H. Kubinyi, QSAR: hansch analysis and related approaches, in: R. receptor, Drug Des. Dev. 12 (1995) 193–248.
Mannhold, K Prokgsgaard-Larsen, H. Timmerman (Eds.), Meth- [21] M. Marder, G. Estiú, L.B. Blanch, H. Viola, C. Wasowski, J.H.
ods and Principles in Medicinal Chemistry, VCH, Weinheim, Medina, A.C. Paladini, Molecular modelling and QSAR analysis
1993. of the interaction of flavone derivatives with the benzodiazepine
[2] C.D. Carlo, N. Mascolo, A.A. Izzo, F. Capasso, Flavonoid: old binding site of the GABAA receptor complex, Bioorg. Med.
and new aspects of a class of natural therapeutic drugs, Life Sci. Chem. 9 (2001) 323–335.
65 (1999) 337–353. [22] D. Hadjipavlou-Litina, R. Garg, C. Hansch, Comparative
[3] P.C.H. Haollman, I.C.W. Arts, Flavonols, flavones and flavanols- quantitative structure-activity relationship studies (QSAR) on
nature, occurrence and dietary burden, J. Sci. Food Agric. 80 non-benzodiazepine compounds binding to benzodiazepine
(2000) 1081–1093. receptor (BzR), Chem. Rev. 104 (2004) 3751–3793.
[4] S.A. Aherne, N.M. O’Brien, Dietary flavonols: chemistry, food [23] X. Huang, T. Liu, J. Gu, X. Luo, R. Ji, Y. Cao, H. Xue, J.T. Wong,
content, and metabolism, Nutrition 18 (2002) 75–81. B.L. Wong, G. Pei, H. Jiang, K. Chen, J. Med. Chem. 44 (2001)
[5] T.P.T. Cushnie, A.J. Lamb, Antimicrobial activity of flavonoids, 1883–1891.
Int. J. Antimicrob. Agents 26 (2005) 343–356. [24] M.J. Frisch, et al., Gaussian 03, Revision B.01, Gaussian, Inc.,
[6] D. Clarke, H. Wiseman, Phytoestrogens, in: J. Gilbert, H.Z. Pittsburgh, PA, 2003.
Senyuva (Eds.), Bioactive Compounds in Food, John Wiley & [25] U. Sakar, R. Parthasarathi, V. Subramanian, P.K. Chattaraji, Tox-
Sons, Oxford, 2008, pp. 173–198. icity analysis of polychlorinated dibenzofurans through global, J.
[7] A. Andres, S.M. Donovan, M.S. Kuhlenschmidt, Soy isoflavones Mol. Des. IECMD (2004) 1–24.
and virus infections, J. Nutr. Biochem. 20 (2009) 563–569. [26] Advanced Chemistry Development Inc., Toronto, Canada, 2009
[8] P. Rathee, H. Chaudhary, S. Rathee, D. Rathee, V. Kumar, K. www.acdlabs.com/resources/freeware/chemsketch/.
Kohli, Mechanism of action of flavonoids as anti-inflammatory [27] M. Larif, A. Adad, R. Hmammouchi, A.I. Taghki, A. Soulaymani,
agents, Inflamm. Allergy Drug Targets 8 (2009) 229–235. A. Elmidaoui, M. Bouachrine, T. Lakhlifi, Biological activities of
[9] D. Amić, D.D. Amić, D. Bešlo, N. Trinajstić, Structure-radical triazine derivatives combining DFT and QSAR results, Arab. J.
scavenging activity relationships of flavonoids, Croat. Chem. Chem. (2015), http://dx.doi.org/10.1016/j.arabjc.2012.12.033 (in
Acta 76 (1) (2003) 55–61. press).
[10] P.A. Stefanic, A. Krbavcic, T. Solmajer, QSAR of flavonoids: [28] XLSTAT Software, XLSTAT Company, 2013 http://www.xlstat.
4. Differential inhibition of aldose reductase and p56lck protein com.
tyrosine kinase, Croat. Chem. Acta 75 (2002) 517–529. [29] S. Chtita, M. Larif, M. Ghamali, M. Bouachrine, T. Lakhlifi,
[11] P.M. Sivakumar, S.K. Geetha Babu, D. Mukesh, QSAR studies on DFT-based QSAR Studies of MK801 derivatives for non com-
chalcones and flavonoids as anti-tuberculosis agents using genetic petitive antagonists of NMDA using electronic and topological
function approximation (GFA) method, Chem. Pharm. Bull. 55 descriptors, J. Taibah Univ. Sci. 9 (2) (2014) 143–154.
(1) (2007) 44–49. [30] H. Demuth, M. Hugan, M. Beal, Neural Network Toolbox: For
[12] B.F. Rasuleva, N.D. Abdullaev, V.N. Syrov, J. Leszczynski, A Use with MATHLAB, User Guid’s, Version 9, 2011.
quantitative structure–activity relationship (QSAR) study of the [31] R. Hmamouchi, M. Larif, A. Adad, M. Bouachrine, T. Lakhlifi,
antioxidant activity of flavonoids, QSAR Comb. Sci. 24 (2005) J. Comp. Meth. Mol. Des. 4 (3) (2014) 61–71.
1056–1065. [32] D. Cherqaoui, D. Villemin, J. Chem. Soc. Faraday Trans. 90
[13] S.V. Argyropoulos, D. Nutt, The use of benzodiazepines in anx- (1994) 97–102.
iety and other disorders, Eur. Neuropsychopharmacol. Suppl. 6 [33] J.A. Freeman, D.M. Skapura, Neural Networks. Algorithms,
(1999) s407–s412. Applications, and Programming Techniques, Addison-Wesley,
[14] P.R. Duchowicz, M.G. Vitale, E.A. Castro, J.C. Autino, G.P. Reading, MA, 1991.
Romanelli, D.O. Bennardi, QSAR modeling of the interaction [34] A. Golbraikh, A. Tropsha, Beware of q2 , J. Mol. Graph. Model.
of flavonoids with GABA(A) receptor, Eur. J. Med. Chem. 43 20 (2002) 269–276.
(2008) 1593–1602. [35] P.P. Roy, K. Roy, QSAR Comb. Sci. 27 (2008) 302.
[15] M. Marder, H. Viola, J.A. Bacigaluppo, et al., Detection of benzo- [36] STATITCF Software, Technical Institute of Cereals and Fodder,
diazepine receptor ligands in small libraries of flavone derivatives Paris, France, 1987.
M. Ghamali et al. / Journal of Taibah University for Science 11 (2017) 422–433 433
[37] M. Ghamali, S. Chtita, R. Hmammouchi, A. Adad, M. [38] P. Gramatica, Principales of QSAR models validation: internal
Bouachrine, T. Lakhlifi, The inhibitory activity of aldose reduc- and external, QSAR Comb. Sci. 26 (2007) 694–701.
tase of flavonoids compounds. Combining DFT and QSAR cal- [39] T. March, Some Physico-Chemical Factors, Dissertation Paper,
culations, J. Taibah Univ. Sci. (2016), http://dx.doi.org/10.1016/ Harold Lyche & Co., Drammen, Norway, 1954.
j.jtusci.2015.09.006 (in press).