Professional Documents
Culture Documents
To cite this article: P.M. Khan & K. Roy (2021) QSPR modelling for investigation of different
properties of aminoglycoside-derived polymers using 2D descriptors, SAR and QSAR in
Environmental Research, 32:7, 595-614, DOI: 10.1080/1062936X.2021.1939150
a
Department of Pharmacoinformatics, National Institute of Pharmaceutical Educational and Research
(NIPER), Kolkata, India; bDrug Theoretics and Cheminformatics Laboratory, Division of Medicinal and
Pharmaceutical Chemistry, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
Introduction
Gene therapy deals with delivering exogenous DNA into mammalian cells to treat
numerous genetic disorders [1] and many other diseases, such as neurodegenerative [2]
and infectious disorders [3], AIDS [4] and different types of cancers [5–7]. The delivery of
membrane-impermeable DNA into the cell has been attained by employing either viral
vectors or synthetically designed non-viral vectors [8,9]. The viral vectors are more widely
used carrier than the non-viral vectors for delivering exogenous DNA into the target cells.
However, the clinical risk (i.e. immunogenicity, insertion mutagenesis, limited carrying
capacity, and viral degradation) and potentially high production costs associated with
viral vectors limit their therapeutic applications [10,11]. To overcome the viral vector
problems, scientists worldwide mainly concentrate on developing efficient and
biocompatible non-viral vectors (i.e. cationic lipid and polymers) [12–15]. Non-viral vec
tors exhibit certain merits over viral vectors, such as lower production cost, flexibility in
chemical design and development, nearly unlimited capacity to carry DNA and safety [16].
However, non-viral vectors have also suffered from some demerits, such as low transgene
efficacy and high cytotoxicity [17]. At present, more attention has diverted towards the
design and development of novel non-viral vehicles with higher transgene efficacy and
biocompatibility using appropriate alterations in the chemical structure [18,19]. Polymers
have rigorously been explored as the delivery vehicle for small molecules (drugs) as well
as macromolecules (i.e. DNA, RNA and proteins) [20,21]. In the present study, we have
employed a quantitative structure–property relationship (QSPR) approach to predict the
different properties of the aminoglycoside-derived polymers. A combinatorial library of
aminoglycoside-derived polymers was synthesized and characterized by another group
[19,22]; the idea of selecting aminoglycoside core for polymer synthesis originates mainly
due to the presence of several functional groups in the core structure such as hydrophilic
sugar, hydroxyl, and amine groups, which can facilitate the design of different derivatives
as well as in polymerization process [23]. Furthermore, polymers derived from aminogly
cosides core are more likely to be biodegradable due to the glycosidic linkages in these
molecules [24].
Only a few QSPR models were proposed to predict the percentage of DNA binding and
transgene expression efficacy of aminoglycoside-derived polymers. For example, Rege et al.
[25] have proposed a nonlinear support vector machine (SVM) model to predict DNA
binding efficacy of an aminoglycoside-polyamine library. The final SVM model with a cross-
validated r2 value of 0.97 comprises several features with encoded information about the
molecular size, basicity, methylene group spacing between amine centres, hydrogen-bond
donor groups and positive groups [25]. Similarly, Potta et al. have also proposed a nonlinear
SVM model to predict the aminoglycoside derived polymers mediated transgene expression
using a dataset of 30 polymers in the training set and validated the model using only three
test set compounds. The final SVM model with a significant statistical value of r2 = 0.78
comprises five structural features, which are PEOE_VSA_PPOS, log P, RECON_SIEPMax,
BCUT_PEOE_3, and RECON_PIPMax, and finally, the model was validated using the limited
number of test compound (n = 3) [19]. Miryala et al. [26]. reported the parallel synthesis and
QSPR modelling of a small library of 27 lipo-polymers obtained by conjugation of three
alkanoyl chlorides (i.e. hexanoyl chloride, myristoyl chloride and stearoyl chloride) with
three aminoglycosides derived polymers (neomycin, paromomycin and apramycin cross-
linked with glycerol diglycidylether (GDE)) as novel non-viral vectors for transgene delivery
and expression in the target cells. The final SVM model comprises six unique features, which
are ALKYL_rsynth, ALKYL_vdw_vol, AMINO_Q_VSA_FFPOS, AMINO_a_nN, vsa_other, and
AMINO_PEOE_RPC+, which stand for the synthetic feasibility of alkyl group, van der Waals
volume of an alkyl group, fractional positive polar van der Waals surface area of aminoglyco
sides, number of nitrogen atoms, van der Waals surface area of other atoms and relative
positive partial charge of aminoglycosides, respectively. The model with a significant sta
tistical value of an internal parameter (r2 = 0.8365) and predictive variance of an external set
(r2pred = 0.6543) was selected as the best model using the online Learning Equipment (SOLE)
platform, a web-based machine learning system [26]. Zhen et al. [22] also proposed two-
step chemometric models to predict the aminoglycoside derived polymers mediated
transgene expression. The first step involved in developing QSPR models of different
SAR AND QSAR IN ENVIRONMENTAL RESEARCH 597
development to enhance the predictive performance for the external set compounds with
lowest error. The present findings provide new insight for exploring the design of an
aminoglycoside-derived polymer library based on different identified physicochemical
properties and predicting different crucial properties of polymeric vehicles before their
synthesis.
number of model descriptors. PLS is a robust technique in the sense that it can handle
numerous, intercorrelated and noisy variables. The final QSPR models for predicting
percentage DNA binding and aminoglycoside-derived polymers mediated transgene
expression were obtained at two and three latent variables respectively (which are the
actual regressing variables instead of the original descriptors) and subsequently validated
using different internal and external validation parameters to judge the acceptability of
QSAR models. The internal parameters are the determination coefficient (r2), training set
leave-one-out cross-validation (Q2LOO), r2m(LOO) [36,37] and mean absolute error of train
ing set (MAEtrain100%). In contrast, external parameters deal with the predictive ability of
generated models based on test set compounds using external predictive variance r2pred,
r2m(test), and mean absolute error of test set (MAEtest95%) etc. [38]. Figure 1 provides
a detailed schematic overview of the protocol employed in the present study for QSPR
model development.
Figure 1. Detailed schematic overview of the protocol employed in the present study for QSPR model
development.
SAR AND QSAR IN ENVIRONMENTAL RESEARCH 601
Figure 2. Scatter plots of observed vs predicted DNA binding of QSPR models obtained using
aminoglycoside-derived polymers.
To determine the importance of each descriptor in the final equation (1), we have
performed a VIP analysis. The descriptors with VIP scores greater than one result in higher
statistical significance towards polymer DNA binding and are considered the most crucial
variables in QSPR modelling. In our case, two variables, F08[C-N] and gmax are regarded
as the essential descriptors, while PW3 and B10[N-N] with VIP scores less than one were
considered less important variables (Figure S4 in supporting information). We have also
performed a loading plot analysis to identify the most influential descriptors in the final
model. It was found that F08[C-N] and gmax are situated far from the origin of the plot
and considered as most influential descriptors in the final model, while PW3 and B10[N-N]
with slightly more close to the origin of the plot than other variables and considered less
influential than other variables (Figure 3).
The final QSPR equation was based on four unique descriptors calculated using the
AlvaDesc software tool. All the appearing descriptors show a positive contribution
towards predicting polymer-DNA binding efficacy, suggesting that higher values of the
descriptors result in higher DNA binding and vice versa. The first descriptor gmax belongs
SAR AND QSAR IN ENVIRONMENTAL RESEARCH 603
Figure 3. Loading plot of the final QSPR model obtained using aminoglycoside-derived polymer for
predication of percentage DNA binding.
to atom-type electrotopological state, and it stands for maximum E-state value in the
molecules [40]. A high E-state value is generally associated with most electronegative
atoms in the molecules, and there is a strong probability that its selection relates to
structural alerts that it contains such moieties adjacent to the electrophilic centres [41,42].
The careful analysis of the data revealed that compounds #13, 14 and 16 in the training set
and compound #15 and 29 in the test set result in higher gmax values due to the presence
of amide functional group (electronegative atom, i.e., nitrogen attached adjacent to
electrophilic carbon of amide bond) in the molecules leading to higher DNA binding
efficacy of these polymers. Conversely, compound #18 shows a lower value of gmax
descriptor and results in a low DNA binding efficacy.
The second important descriptor in the equation is F08[C-N], which belongs to a class
of 2D atom pair descriptors [43]; it stands for the frequency of C – N at the topological
distance 8 in the molecule. Its positive correlation with predicting DNA binding efficacy
indicates that an increase in C-N number at the topological distance of eight results in
higher DNA binding. For example, compound #1 with a higher C-N frequency at the
topological distance of eight results in higher DNA binding efficacy. It is also evident from
the previous report that a shorter distance between nitrogen atoms within the aminogly
coside core results in higher polymer DNA binding [19].
The next descriptor in the final model is PW3, which belongs to the shape topological
descriptor class demonstrating path/walk 3 – randic shape index [44]. The positive
regression coefficient towards modelling DNA binding efficacy indicates that an increase
in PW3 value increases polymer DNA binding and vice versa. For example, compound #16
with a higher value of PW3 result in higher polymer-DNA binding.
The final variable that appeared in the equation is B10[N-N], which belongs to the 2D
atom pairs [43]; it denotes the presence or absence of N – N at the topological distance 10
in the molecules, increasing polymer-DNA binding and vice versa. For example, com
pound #13 shows more excellent polymer-DNA binding due to a nitrogen–nitrogen atom
pair separated by a topological distance of 10. It suggests that the distribution of nitrogen
atoms in the macromolecule was essential for estimating polymer-DNA binding efficacy. It
is also clear from the previous report that nitrogen cation in the molecules may bind with
the phosphate group in the DNA molecule [25].
604 P.M. KHAN AND K. ROY
Besides this, we have performed a Y-randomization study to check whether the final
model was obtained by chance (random) or not (non-random). The analysis was per
formed by generating 100 unique models by shuffling the response variable values while
the descriptors’ values were kept intact. If the r2Y intercept and Q2Y intercept values of the
generated models exceed 0.3 and 0.05, respectively, the final models can be considered
obtained by chance (random). The plot analysis revealed that r2Y intercept and Q2Y
intercept values of the generated models were below the specified criteria, i.e.,
r2Y = 0.0266 and Q2Y = −0.336 and suggest that the proposed model was not obtained
by chance (Figure S5 in supporting information).
Finally, to define the proposed model’s applicability in the chemical space, we have
performed the AD study using the DModX approach. The AD analysis revealed that all the
compounds are within the domain in both sets, i.e., training and test sets (Figure S6 in
supporting information).
Consensus Predictor tool [39] to improve the external validation sets prediction quality.
A scatter plot of the observed vs predicted training/test set compounds is given in
Figure 4, which indicates the goodness of fit and predictions. The selected four individual
models and the values of their statistical parameters are shown in Table 1.
To identify the relative importance of each variable in the final models, we have
performed the variable importance plot analysis using the SIMCA-P software tool [45].
The descriptors were presented in descending order of their relative importance in the
final model (Figure S7 in supporting information). For further verification of the VIP
analysis results, we have performed loading plot analysis intending to identify the most
influential descriptors and their relative significance in the final model (as shown in
Figure 5).
The F09[C-O] descriptor appearing in the final three individual models (IM 1–3) was
considered the second most crucial descriptor in QSPR models 1 and 2. However, it is the
most critical descriptor in the third QSPR model with more than one VIP score. F09[C-O]
belongs to the class of 2D atom pair descriptors, which stands for the frequency of
carbon-oxygen atoms at the topological distance 9 [43]. It positively correlates with the
response values in all the selected models, which indicates that relative luciferase expres
sion increases if the frequency of carbon-oxygen atoms at the topological distance nine
increases and vice versa. The previous studies reported that a higher number of an oxygen
atom in molecular building block results in higher efficacies of transgene expression in the
cell [19]. The close analysis of the present data revealed that polymer with cross-linker
GDE with one extra oxygen atom results in higher values of these descriptors than RDE
and other linkers except PPEGDE and PEGDE. Similarly, higher oxygen atoms in the
Figure 4. Scatter plots of observed vs predicted luciferase expression efficacy of four individual QSPR
models obtained using aminoglycoside-derived polymers.
606
P.M. KHAN AND K. ROY
Table 1. Final individual and consensus QSPR models for predicting aminoglycoside-derived polymers mediated transgene expression and the detailed statistical
values of internal and external parameters.
2
Model No. Model equations LVs r2 Q2LOO rm2LOO train rm2LOO train rpred rm2LOO test rm2LOO test MAE95% test RMSEc RMSEp SEE
1 Log10 ðRLU=mgÞ ¼ 7:283 þ 0:0407 F09½C O� 0:0942 3 0.779 0.717 0.618 0.134 0.903 0.542 0.142 0.133 0.582 0.196 0.635
F09½O O� 0:0158 F10½C O� 4:6334 ETA EtaP L
2 Log10 ðRLU=mgÞ ¼ 6:301 þ 0:0436 F09½C O� 0:106 n 3 0.786 0.713 0.609 0.156 0.861 0.522 0.205 0.174 0.573 0.235 0.626
ROR þ 2:041 minssCH2 0:01116 ETA Beta
3 Log10 ðRLU=mgÞ ¼ 6:005 þ 0:0323F09½C O� 0:086 3 0.784 0.710 0.605 0.145 0.852 0.282 0.308 0.157 0.575 0.243 0.628
F09½O O� 0:0056 F10½C O� 0:0180 NssCH2
4 Log10 ðRLU=mgÞ ¼ 6:747 þ 0:460 MaxssssC þ 0:019 SsOH 3 0.782 0.702 0.601 0.113 0.843 0.461 0.171 0.162 0.578 0.250 0.632
þ3:400 minssCH2 0:0619 C 006
CM 0 Average of predictions from all input Individual models 0.939 0.699 0.142 0.11259 - 0.1557 -
CM 1 Average of predictions from ‘qualified’ Individual models 0.939 0.699 0.142 0.11259 - 0.1557 -
CM 2 Weighted average predictions from ‘qualified’ Individual models 0.939 0.691 0.157 0.11257 - 0.1555 -
CM 3 Best selection of predictions (compound-wise) from ‘qualified’ Individual models 0.941 0.743 0.025 0.11839 - 0.1531 -
2
LVs = Latent variables, r2 = Determination coefficient, Q2LOO = Leave one out cross-validation, rpred = External set predictivity and MAE95% test = Mean absolute error value of test set after removal
of 5% of high residual compounds, RMSEc = Root mean square error of training set, RMSEp = Root mean square error of test set, SEE = Standard error of estimate of training set.
SAR AND QSAR IN ENVIRONMENTAL RESEARCH 607
Figure 5. Loading plot of the four individual PLS models developed for prediction of polymer-
mediated transgene expression.
aminoglycoside core also positively affect the F09[C-O] descriptor values; for example,
paromomycin with a higher oxygen atom count in the aminoglycoside core results in
a higher value than other aminoglycosides core. The present model proposes that
a higher frequency of C-O pair of atoms separated by the topological distance 9 (instead
of focusing on only oxygen atoms in the molecule) contributes to higher efficacies of
transgene expression due to increased hydrogen bonding potential and polarization of
the polymers. For example, compound 32 (Streptomycin-PEGDE) shows the least relative
luciferase expression in the cell due to a lower value of this particular descriptor than
molecule #25 (Paromomycine-GDE) with a higher descriptor value.
Another descriptor F10[C-O] appearing in two final individual models (IMs 1 and 3) also
belongs to the class of 2D atom pair descriptors [43], which stands for the frequency of
608 P.M. KHAN AND K. ROY
carbon-oxygen atoms at topological distance 10. But it contrasts to the B09[C-O] descrip
tor in the final models, the relative luciferase expression increases if the frequency of
carbon-oxygen atom at the topological distance of 10 decreases and vice versa. For
example, compound #5 (Neomycin-PEGDE) results in lower relative luciferase expression
due to the higher frequency of pair carbon-oxygen atoms at the topological distance of
10. From this observation, it is clear that pair of carbon and oxygen atoms should be
specific to be at topological distance nine, and a further minor increment of one edge
between these two atoms results in a decline of relative luciferase expression in the cells.
The next descriptor is F09[O-O], which stands for the frequency of oxygen–oxygen
atoms at the topological distance nine [43]. It is the essential descriptor in the first
individual QSPR model with more than one VIP score. It negatively correlates with the
response in all the selected models, which indicates that relative luciferase expression
increases if the frequency of the oxygen–oxygen atoms at the topological distance nine
decreases and vice versa. For example, compound #26 (Paromomycine-PPGDE) shows the
least relative luciferase expression in the cell due to the higher F09[O-O] descriptor value.
On the other hand, molecule #43 (Sisomicin-BGDE) results in higher luciferase expression
due to the low frequency of oxygen–oxygen atoms at topological distance nine.
The ETA_EtaP_L descriptor belongs to the extended topochemical atom descriptor
class [46,47], and it is the least contributing descriptor in the first individual model. This
descriptor signifies local connectedness relative to the molecular size. It provides informa
tion related to branching, presence of heteroatoms, and unsaturation [48]. It negatively
correlates with the relative luciferase expression, suggesting that an increase in branch
ing/unsaturation relative to the molecular size results in a lower relative luciferase
expression in the cells. A close observation of data revealed that polymers with RDE,
EGDE and GDE cross-linker result in lower values of ETA_EtaP_L due to the presence of
phenyl group (imparting unsaturation in the molecules as well as serving to decrease
polymer mass density but subsequently resulting in enhancement of hydrophobicity of
polymer), short-chain length (size) and branching in the cross-linker chain respectively.
For example, compound #17 (Apramycin-CDDE) with a higher value of ETA_EtaP_L
descriptor results in lower luciferase expression than compound #12 (Streptomycin-
EGDE) with a lower value of ETA_EtaP_L descriptors.
The nROR descriptor appeared in the second individual models (IM 2) and was
considered as the most crucial descriptor based on the higher VIP scores. The nROR
descriptor belongs to the class of functional group count descriptors [43], representing
the number of aliphatic ether functionality in the molecular building block. The negative
coefficient indicates that the cross-linker length with an aliphatic ether functional group
results in lower efficacies of transgene expression and vice versa. For example, cross-
linkers with aliphatic ether functional groups or long chains, including PEGDE and PPGDE,
are not appropriate for developing polymeric vehicles for gene delivery [22], such as
compound #5 (Neomycin-PEGDE), resulting in lower expression due to a higher number
of aliphatic ether functional groups in the molecule. In contrast, molecule #42
(Kanamycin-RDE) results in higher expression due to the lower number of aliphatic
ether functional groups in the molecule.
The next descriptor that appears in individual model 2 is ETA_Beta, which belongs
to the class of extended topochemical atom (ETA) indices [46,47]. The descriptor
provides information about the measure of the molecules’ electronic environment. It
SAR AND QSAR IN ENVIRONMENTAL RESEARCH 609
Conclusion
In the present study, we have successfully generated QSPR models to predict the polymer
DNA binding, and polymer-mediated transgene expression efficacy of aminoglycoside-
derived polymers. The final QSPR models were obtained using the partial least squares
SAR AND QSAR IN ENVIRONMENTAL RESEARCH 611
(PLS) regression technique employing two different sizes of the dataset (33 and 44
aminoglycoside-derived polymers for polymer DNA binding and polymers mediated trans
gene expression, respectively). It has been found out that several structural attributes
contributed to predicting polymer DNA binding as well as polymer mediated transgene
expression of aminoglycoside-derived polymers. In case of the polymer DNA binding
prediction, the maximum E-state of a molecule, path/walk three Randic index, presence
of a pair of nitrogen atoms separated with topological distance ten, and C-N pair presence
at the topological distance eight result in higher polymer DNA binding. On the other hand,
in case of the polymer mediated transgene expression prediction, the higher values of
different variables such as frequency of carbon-oxygen atoms at the topological distance 9,
minimum atom-type E-State: -CH2-, sum of E-state indices for -OH groups in the molecule
and maximum atom-type E-State: >C< result in higher transgene expression in the cells.
Again, higher values of different descriptors such as frequency of carbon-oxygen atoms at
the topological distance 10, frequency of oxygen–oxygen atoms at the topological distance
nine, a number of aliphatic ethers functionality in the molecular building block and
presence of CH2RX fragment in the molecule (X represent the electronegative atoms as
oxygen, nitrogen Sulphur, etc.) show lower transgene expression efficacy. The present
findings provide new insight for exploring the design of an aminoglycoside-derived poly
mer library based on different identified physicochemical properties and predicting the
polymeric vehicles’ different crucial properties before their synthesis.
Acknowledgements
PMK thanks to National Institute of Pharmaceutical Education and Research Kolkata, the Ministry of
Chemicals & Fertilizers, Department of Pharmaceuticals, Government of India for providing financial
assistance in the form of a fellowship.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Funding
This work was supported by the Ministry of Chemicals and Fertilizers, Govt. of India.
ORCID
K. Roy http://orcid.org/0000-0003-4486-8074
References
[1] W.F. Anderson, Gene therapy for genetic diseases, Hum. Gene Ther. 5 (1994), pp. 281–282.
doi:10.1089/hum.1994.5.3-281.
[2] M.G. Kaplitt, A. Feigin, C. Tang, H.L. Fitzsimons, P. Mattis, P.A. Lawlor, R.J. Bland, D. Young,
K. Strybing, D. Eidelberg, and M.J. During, Safety and tolerability of gene therapy with an
adeno-associated virus (AAV) borne GAD gene for Parkinson’s disease: An open label, phase
I trial, Lancet 369 (2007), pp. 2097–2105. doi:10.1016/S0140-6736(07)60982-9.
612 P.M. KHAN AND K. ROY
[3] B.A. Bunnell and R.A. Morgan, Gene therapy for infectious diseases, Clin. Microbiol. Rev. 11
(1998), pp. 42–56. doi:10.1128/CMR.11.1.42.
[4] R. Wolkowicz and G. Nolan, Gene therapy progress and prospects: Novel gene therapy
approaches for AIDS, Gene Ther. 12 (2005), pp. 467–476. doi:10.1038/sj.gt.3302488.
[5] N.A. Horn, J.A. Meek, G. Budahazi, and M. Marquet, Cancer gene therapy using plasmid DNA:
Purification of DNA for human clinical trials, Hum. Gene Ther. 6 (1995), pp. 565–573.
doi:10.1089/hum.1995.6.5-565.
[6] Z.R. Yang, H.F. Wang, J. Zhao, Y.Y. Peng, J. Wang, B.A. Guinn, and L.Q. Huang, Recent
developments in the use of adenoviruses and immunotoxins in cancer gene therapy, Cancer
Gene Ther. 14 (2007), pp. 599–615. doi:10.1038/sj.cgt.7701054.
[7] G. Ermak, Emerging Medical Technologies, World scientific publishing Co. Pvt.Ltd, Singapore,
2015.
[8] H. Yin, R.L. Kanasty, A.A. Eltoukhy, A.J. Vegas, J.R. Dorkin, and D.G. Anderson, Non-viral vectors
for gene-based therapy, Nat. Rev. Genet. 15 (2014), pp. 541–555. doi:10.1038/nrg3763.
[9] C.E. Thomas, A. Ehrhardt, and M.A. Kay, Progress and problems with the use of viral vectors for
gene therapy, Nat. Rev. Genet. 4 (2003), pp. 346–358. doi:10.1038/nrg1066.
[10] N. Bessis, F.J. GarciaCozar, and M.C. Boissier, Immune responses to gene therapy vectors:
Influence on vector function and effector mechanisms, Gene Ther. 11 (2004), pp. S10–S17.
doi:10.1038/sj.gt.3302364.
[11] C. Baum, O. Kustikova, U. Modlich, Z. Li, and B. Fehse, Mutagenesis and oncogenesis by
chromosomal insertion of gene transfer vectors, Hum. Gene Ther. 17 (2006), pp. 253–263. doi:
10.1089/hum.2006.17.253..
[12] D. Niculescu-Duvaz, J. Heyes, and C.J. Springer, Structure-activity relationship in cationic lipid
mediated gene transfection, Curr. Med. Chem. 10 (2005), pp. 1233–1261. doi:10.2174/
0929867033457476.
[13] S.K. Samal, M. Dash, S. Van Vlierberghe, D.L. Kaplan, E. Chiellini, C. Van Blitterswijk, L. Moroni,
and P. Dubruel, Cationic polymers and their therapeutic potential, Chem. Soc. Rev. 41 (2012),
pp. 7147–7194.
[14] D. Pezzoli, F. Olimpieri, C. Malloggi, S. Bertini, A. Volonterio, and G. Candiani, Chitosan-graft-
branched polyethylenimine copolymers: Influence of degree of grafting on transfection behavior,
PLoS One 7 (2012), pp. e34711. doi:10.1371/journal.pone.0034711.
[15] R. Labas, F. Beilvert, B. Barteau, S. David, R. Chèvre, and B. Pitard, Nature as a source of
inspiration for cationic lipid synthesis, Genetica 138 (2010), pp. 153–168. doi:10.1007/s10709-
009-9405-8.
[16] A.D. Miller, The problem with cationic liposome/micelle-based non-viral vector systems for gene
therapy, Curr. Med. Chem. 10 (2005), pp. 1195–1211. doi:10.2174/0929867033457485.
[17] H. Gonzalez, S.J. Hwang, and M.E. Davis, New class of polymers for the delivery of macromo
lecular therapeutics, Bioconjug. Chem. 10 (1999), pp. 1068–1074. doi:10.1021/bc990072j.
[18] J.H. Jeong, S.W. Kim, and T.G. Park, Molecular design of functional polymers for gene therapy,
Prog. Polym. Sci. 32 (2007), pp. 1239–1274.
[19] T. Potta, Z. Zhen, T.S.P. Grandhi, and M.D. Christensen, Discovery of antibiotics-derived poly
mers for gene delivery using combinatorial synthesis and cheminformatics modeling,
Biomaterials 35 (2014), pp. 1977–1988. doi:10.1016/j.biomaterials.2013.10.069.
[20] B. Shi, M. Zheng, W. Tao, R. Chung, D. Jin, D. Ghaffari, and O.C. Farokhzad, Challenges in DNA
delivery and recent advances in multifunctional polymeric DNA delivery systems,
Biomacromolecules 18 (2017), pp. 2231–2246. doi:10.1021/acs.biomac.7b00803.
[21] W. Wagner, S. Sakiyama-Elbert, and G. Zhang, Biomaterials Science: An Introduction to
Materials in Medicine, Academic Press, London, 2020.
[22] Z. Zhen, T. Potta, M.D. Christensen, E. Narayanan, K. Kanagal, C.M. Breneman, and K. Rege,
Accelerated materials discovery using chemical informatics investigation of polymer physico
chemical properties and transgene expression efficacy, ACS Biomater. Sci. Eng. 5 (2019), pp.
654–669. doi:10.1021/acsbiomaterials.8b00963.
SAR AND QSAR IN ENVIRONMENTAL RESEARCH 613
[23] M. Chen, M. Hu, D. Wang, G. Wang, X. Zhu, D. Yan, and J. Sun, Multifunctional hyperbranched
glycoconjugated polymers based on natural aminoglycosides, Bioconjug. Chem. 23 (2012), pp.
1189–1199. doi:10.1021/bc300016b.
[24] N.D. Stebbins, M.A. Ouimet, and K.E. Uhrich, Antibiotic-containing polymers for localized,
sustained drug delivery, Adv. Drug Deliv. Rev. 78 (2014), pp. 77–87. doi:10.1016/j.
addr.2014.04.006.
[25] K. Rege, A. Ladiwala, S. Hu, M. Breneman, J.S. Dordick, and S.M. Cramer, Investigation of
DNA-binding properties of an aminoglycoside-polyamine library using Quantitative
Structure-Activity Relationship (QSAR) models, J. Chem. Inf. Model. 45 (2005), pp. 1854–1863.
doi:10.1021/ci050082g.
[26] B. Miryala, Z. Zhen, T. Potta, C.M. Breneman, and K. Rege, Parallel synthesis and quantitative
structure-activity relationship (QSAR) modeling of aminoglycoside-derived lipopolymers for
transgene expression, ACS Biomater. Sci. Eng. 1 (2015), pp. 656–668. doi:10.1021/
acsbiomaterials.5b00045.
[27] K. Roy, S. Kar, and R. Das, Understanding the Basics of QSAR for Applications in Pharmaceutical
Sciences and Risk Assessment, Academic press, New York, 2015.
[28] S. Paz-Abuin, M.P. Pellin, M. Paz-Pazos, and A. Lopez-Quintela, Influence of the reactivity of
amine hydrogens and the evaporation of monomers on the cure kinetics of epoxy-amine: Kinetic
questions, Polymer (Guildf) 38 (1997), pp. 3795–3804. doi:10.1016/S0032-3861(96)00957-3.
[29] C.W. Yap, PaDEL-descriptor: An open source software to calculate molecular descriptors and
fingerprints, J. Comput. Chem. 32 (2011), pp. 1466–1474. doi:10.1002/jcc.21707.
[30] AlvaDesc (software for molecular descriptors calculation) version 2.0.2, 2020, https://www.
alvascience.com, 2020.
[31] R.W. Kennard and L.A. Stone, Computer aided design of experiments, Technometrics 11 (1969),
pp. 137–148. doi:10.1080/00401706.1969.10490666.
[32] H. Golmohammadi, Z. Dashtbozorgi, and W.E. Acree Jr, Quantitative structure–activity relation
ship prediction of blood-to-brain partitioning behavior using support vector machine, Eur.
J. Pharm. Sci. 47 (2011), pp. 421–429. doi:10.1016/j.ejps.2012.06.021.
[33] K. Roy, S. Kar, and R.N. Das, A Primer on QSAR/QSPR Modeling: Fundamental Concepts, Springer,
New York, 2015.
[34] P.M. Khan and K. Roy, Current approaches for choosing feature selection and learning algo
rithms in quantitative structure–activity relationships (QSAR), Expert Opin. Drug Discov. 13
(2018), pp. 1075–1089. doi:10.1080/17460441.2018.1542428.
[35] S. Wold, M. Sjöström, and L. Eriksson, PLS-regression: A basic tool of chemometrics, Chemom.
Intell. Lab. Syst. 58 (2001), pp. 109–130. doi:10.1016/S0169-7439(01)00155-1.
[36] K. Roy and I. Mitra, On various metrics used for validation of predictive QSAR models with
applications in virtual screening and focused library design, Comb. Chem. High Throughput
Screen. 14 (2011), pp. 450–474. doi:10.2174/138620711795767893.
[37] K. Roy, I. Mitra, P. Ojha, S. Kar, R.N. Das, and H. Kabir, Introduction of rm2 (rank) metric
incorporating rank-order predictions as an additional tool for validation of QSAR/QSPR models,
Chemom. Intell. Lab. Syst. 118 (2012), pp. 200–210. doi:10.1016/j.chemolab.2012.06.004.
[38] K. Roy, R.N. Das, P. Ambure, and R.B. Aher, Be aware of error measures. Further studies on
validation of predictive QSAR models, Chemom. Intell. Lab. Syst. 152 (2016), pp. 18–33.
doi:10.1016/j.chemolab.2016.01.008.
[39] K. Roy, P. Ambure, S. Kar, and P. Kumar Ojha, Is it possible to improve the quality of predictions
from an “intelligent” use of multiple QSAR/QSPR/QSTR models? J. Chemom. 32 (2018), pp.
e2992. doi:10.1002/cem.2992.
[40] L.H. Hall and L.B. Kier, Electrotopological state indices for atom types: A novel combination of
electronic, topological, and valence state information, J. Chem. Inf. Comput. Sci. 35 (1995), pp.
1039–1045. doi:10.1021/ci00028a014.
[41] J. Votano, M. Parham, L. Hall, L. Kier, S. Oloff, A. Tropsha, Q. Xie, and W. Tong, Three new
consensus QSAR models for the prediction of Ames genotoxicity, Mutagenesis 19 (2004), pp.
365–377. doi:10.1093/mutage/geh043.
614 P.M. KHAN AND K. ROY
[42] S. Gupta, N. Basant, D. Mohan, and K.P. Singh, Inter-moieties reactivity correlations: An
approach to estimate the reactivity endpoints of major atmospheric reactants towards organic
chemicals, RSC Adv. 6 (2016), pp. 50297–50305. doi:10.1039/C6RA06805G.
[43] R. Todeschini and V. Consonni, Handbook of Molecular Descriptors, Vol. 11, John Wiley & Sons,
New Jersey, 2008.
[44] Y.S. Prabhakar, R.K. Rawal, M.K. Gupta, V.R. Solomon, and S.B. Katti, Topological descriptors in
modeling the HIV inhibitory activity of 2-aryl-3-pyridyl-thiazolidin-4-ones, Comb. Chem. High
Throughput Screen. 8 (2005), pp. 431–437. doi:10.2174/1386207054546531.
[45] Z. Wu, D. Li, J. Meng, and H. Wang, Introduction to SIMCA-P and its application, in Handbook of
Partial Least Squares, V.V. Esposito, W.W. Chin, J. Henseler, and H. Wang (Eds.), Berlin-
Heidelberg, Springer, 2010, pp. 757–774.
[46] K. Roy, Quantitative Structure-Activity Relationships in Drug Design, Predictive Toxicology, and
Risk Assessment, IGI Global, Hershey, Pennsylvania, 2015.
[47] K. Roy and G. Ghosh, Introduction of extended topochemical atom (eta)indices in the valence
electron mobile (vem) environment as tools for QSAR/QSPR studies, Internet Electron. J. Mol.
Des. 2 (2003), pp. 599–620.
[48] A. Karmakar, P. Ambure, T. Mallick, S. Das, K. Roy, and N.A. Begum, Exploration of synthetic
antioxidant flavonoid analogs as acetylcholinesterase inhibitors: An approach towards finding
their quantitative structure–activity relationship, Med. Chem. Res. 28 (2019), pp. 723–741.
doi:10.1007/s00044-019-02330-8.