Professional Documents
Culture Documents
1 s2.0 S2214157X23003118 Main
1 s2.0 S2214157X23003118 Main
A R T I C L E I N F O A B S T R A C T
Keywords: Determination of solubility via theoretical approaches was carried out in this study. Because of its
Pharmaceutics importance to the expansion of the pharmaceutical industry, this study models Lenalidomide
Modeling solubility in supercritical carbon dioxide using multiple tree-based techniques which are of ma
Machine learning chine learning nature. These parameters are molded based on temperature and pressure input
Gradient boosting features due to the significant variability of drug solubility with the temperature and pressure.
The experimental data have been collected and inputted the models to train them and used the
data for testing the machine learning models. The results are useful for production of nano
medicine with enhanced solubility in solvents. Decision Tree (DT), Extra Trees (ET), and Gradient
Boosting (GB) models are used and optimized using SCA algorithm to obtain more robust models
for prediction of the drug solubility in the solvent. So, the developed models are called SCA-DT,
SCA-ET, and SCA-GB in this study and have R2-scores of 0.932, 0.951, and 0.997, respectively.
The SCA-DT model has an RMSE error rate of 0.0948, this rate is 0.0822 for SCA-ET, and 0.0203
for SCA-GB. So, the SCA-GB is introduced as the best model of this research for prediction of
Lenalidomide solubility in the solvent.
1. Introduction
It has been reported that the major percentage of the newly discovered medications are not soluble enough in aqueous solutions,
therefore they need to be taken at high dosage to reach the therapeutic effect. Consequently, more side effects of medications would be
observed when high dosage of drugs is taken by patients [1–3]. These poor water-soluble drugs can be classified into different cat
egories among which the class of BCS II have poor solubility according to Biopharmaceutical Classification System [4,5]. Despite the
poor solubility of some drug substances, some techniques have been developed so far to improve the solubility of these medications,
thereby the drug bioavailability would be enhanced.
Nanomedicine technology is among the employed techniques that has been devised for the purpose of improving drugs solubility
for poorly water-soluble medicines [6–8]. The technology of nanomedicine production relies on the fact that the particles solubility in
solvents would be significantly enhanced when the size of particle tends to nanosized scale. The solubility enhancement for nano
particles is attributed to the enhanced surface area and consequently the increasing free surface energy of the nanoparticles [9].
* Corresponding author. Department of Pharmaceutical Chemistry, College of Pharmacy, University of Ha’il, Ha’il, 81442, Saudi Arabia.
E-mail address: b.huwaimel@uoh.edu.sa (B. Huwaimel).
https://doi.org/10.1016/j.csite.2023.103005
Received 17 February 2023; Received in revised form 17 March 2023; Accepted 11 April 2023
Available online 11 April 2023
2214-157X/© 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/).
B. Huwaimel and T.N. Alharby Case Studies in Thermal Engineering 45 (2023) 103005
Table 1
Dataset of Lenalidomide drug Solubility in sc-CO2 [16].
However, the nanoparticles are not stable and prone to agglomeration to form micron size particles which would reduce the drug
solubility again [10]. Therefore, sophisticated techniques are required to be developed for efficient formulation of nanomedicines in
pharmaceutical industry [7,11,12].
In process engineering discipline, green technology offers superior characteristics in terms of process economics and also sus
tainability. One of the green technologies which can be employed for pharmaceutical industry is nanonization of drug particles using
supercritical solvents (SC solvents). In this technique, compressed gas such as CO2 at its supercritical state is utilized in the process.
Since organic solvents are not used in this process, the method can be said to be sustainable and green route for manufacturing of drug
substances with improved solubility. This method has been recently used for evaluation of some active pharmaceutical ingredients
(APIs) for nanonization, and the results indicated that the process is promising to be employed in production of poorly soluble APIs in
nanosized scale [13–15]. Sajadian et al. [16] used this technique for measuring the solubility of Lenalidomide in supercritical CO2 and
developed thermodynamic models for correlation of the dataset.
As the initial step towards developing the supercritical method for nanonization of APIs, the solubility is basically determined using
some techniques such as gravimetric method. The amount of solubility of API in supercritical solvent will reveal whether the process
can be adopted for the drug. Furthermore, computational methods such as machine learning can be developed to holistically study the
supercritical processing of APIs where the methods are used for estimation of API solubility in a diverse range of conditions without the
need for additional experiments [17,18].
The application of Machine Learning (ML) models is essential for the analysis of any experimental data and for providing the robust
predictions required in a variety of scientific fields. Decision Trees are a popular type of machine learning model (DTs) which can be
employed in estimating APIs solubility in different solvents. When dealing with problems involving classification and regression, the
Decision Tree algorithm is a useful mathematical technique that can be utilized. By chopping up the data into smaller and smaller
pieces, the DT algorithm produces a structure that resembles a tree [19–21]. During the course of this research, we utilized three
different decision tree models as well as two DT-based ensemble methods including Extra Trees (ET) and Gradient Boosting (GB) for
prediction of API solubility. All of these models are optimized using SCA optimization algorithm in this study.
The gradient boosting procedure gradually trains several weak learners to obtain a more accurate estimation of the targeted
response. A weak learner (DT here) in machine learning is one who outperforms chance only marginally. Shallow decision trees are
used as weak learners in gradient boosting trees. As more trees are added to the model, the loss function accompanied by the ensemble
model is minimized. A user can modify the loss function to best suit the task at hand.
The Extra Trees method [22] extends the random forest to reduce the probability of over-fitting. In the extra tree (ET) algorithm, a
random subset of features is used to train each base decision tree. In order to split the node, it selects the best characteristic at random,
along with the value that corresponds to it. When ET model trains a regression tree, the entire training dataset is used [23].
Here, for the first time, we introduced computational approach based on machine learning for correlation of Lenalidomide drug
solubility in supercritical carbon dioxide. The methods employed in this study for modeling the solubility values included: Decision
Tree (DT), Extra Trees (ET), and Gradient Boosting (GB) models optimized using SCA algorithm.
2
B. Huwaimel and T.N. Alharby Case Studies in Thermal Engineering 45 (2023) 103005
Fig. 1. Correlation between input variables and the output for drug solubility data.
3
B. Huwaimel and T.N. Alharby Case Studies in Thermal Engineering 45 (2023) 103005
order for the Decision Tree to develop, the data must be continuously partitioned into binary components. In order to evaluate the
partitions for all characteristics, a measure of randomness, like entropy, is employed [28,29].
4
B. Huwaimel and T.N. Alharby Case Studies in Thermal Engineering 45 (2023) 103005
Table 2
Final statistical results for comparing the developed models.
and the optimal count of iterations for a given shrinkage hyperparameter could be anywhere from a few hundred to several million. It is
important to evaluate all of the base models (DTs) in the ensemble, but this can take some time. Inherent sequentially and inherent
difficulties with parallelization characterize the learning process [33,34].
5
B. Huwaimel and T.N. Alharby Case Studies in Thermal Engineering 45 (2023) 103005
6
B. Huwaimel and T.N. Alharby Case Studies in Thermal Engineering 45 (2023) 103005
( t)
r1 (t) = a × 1 −
T
7
B. Huwaimel and T.N. Alharby Case Studies in Thermal Engineering 45 (2023) 103005
{ ⃒ ⃒
xti + r1 × sin(r2 ) × ⃒r3 pti − xti ⃒, r4 ≥ 0.5
xt+1
i = ⃒ ⃒
xit + r1 × cos(r2 ) × ⃒r3 pti − xit ⃒, r4 < 0.5
Where, pti stands for the position of the optimal solution found thus far in dimension i and xti denotes the current solution’s position in
the i-th dimension at the t-th iteration. The parameter r1 is described by the first equation above and is used to determine the next
position’s searching region, where a is fixed to 2 in the original research, t and T are the current iteration and max iteration counts,
respectively. The parameter r2 is indeed a randomly generated number between 0 and 2π that describes the solution’s movement
towards or away from the destination. The parameter r3 is a random number in the range [0, 2] that is used to add a random weight for
the destination, either increasing the impact of the destination on the distance (r3>1) or decreasing it (r3<1), respectively. Parameter
r4 toggles between the cosine and sine components in a balanced fashion.
8
B. Huwaimel and T.N. Alharby Case Studies in Thermal Engineering 45 (2023) 103005
5. Conclusion
This investigation models the solubility of lenalidomide utilizing a variety of tree-based approaches because of the significance of
this topic to the growth of the pharmaceutical industry. In order to obtain more robust models, the Decision Tree (DT), Extra Trees
(ET), and Gradient Boosting (GB) models are used. The SCA algorithm is then used to optimize these models. In this particular research
endeavor, the models are referred to as SCA-DT, SCA-ET, and SCA-GB, and they have R2-scores that are, respectively, 0.932, 0.951, and
0.997. The RMSE error rate for the SCA-DT model is 0.0948, while the RMSE error rate for the SCA-ET model is 0.0822, and the RMSE
error rate for the SCA-GB model is 0.0203. As a result, the SCA-GB is presented as the most suitable model for this investigation. The
9
B. Huwaimel and T.N. Alharby Case Studies in Thermal Engineering 45 (2023) 103005
evaluation results indicated that the pressure also had significant impact on the variations of lenalidomide solubility in the solvent
which is due to the compressibility of the solvent which is in its supercritical state.
Author statement
Bader Huwaimel: Conceptualization, Writing – Original draft, Formal analysis, Validation. Tareq Nafea Alharby: Conceptuali
zation, Writing – Review & Editing, Resources, Software, Validation, Investigation.
Data availability
Acknowledgements
Researchers would like to thank the Medical and Diagnostic Research Center, University of Ha’il, Hail, Saudi Arabia for support.
References
[1] W. Liu, et al., The effect of mucin on supersaturation of poorly water-soluble drugs with different crystallization behavior and in vitro-in vivo correlation,
J. Drug Deliv. Sci. Technol. 78 (2022), 103973.
[2] X. Liu, et al., Improving solubility of poorly water-soluble drugs by protein-based strategy: a review, Int. J. Pharm. 634 (2023), 122704.
[3] S. Salunke, et al., Oral drug delivery strategies for development of poorly water soluble drugs in paediatric patient population, Adv. Drug Deliv. Rev. 190 (2022),
114507.
[4] A. Charalabidis, et al., The biopharmaceutics classification System (BCS) and the biopharmaceutics drug disposition classification System (BDDCS): beyond
guidelines, Int. J. Pharm. 566 (2019) 264–281.
[5] A.M. Saeed, et al., Comparative bioavailability of two formulations of biopharmaceutical classification System (BCS) class IV drugs: a case study of lopinavir/
ritonavir, J. Pharmaceut. Sci. 110 (12) (2021) 3963–3968.
[6] M.C. Operti, et al., PLGA-based nanomedicines manufacturing: technologies overview and challenges in industrial scale-up, Int. J. Pharm. 605 (2021), 120807.
[7] M. Taleuzzaman, et al., Chapter 19 - good laboratory practice and current good manufacturing practice requirements in the development of cancer
nanomedicines, in: S. Beg, et al. (Eds.), Nanoformulation Strategies for Cancer Treatment, Elsevier, 2021, pp. 341–352.
[8] C. Webb, et al., Using microfluidics for scalable manufacturing of nanomedicines from bench to GMP: a case study using protein-loaded liposomes, Int. J. Pharm.
582 (2020), 119266.
[9] M. Faizan, et al., Entropy analysis of sutterby nanofluid flow over a riga sheet with gyrotactic microorganisms and cattaneo–christov double diffusion,
Mathematics 10 (17) (2022) 3157.
[10] J. Ouyang, et al., 2D materials-based nanomedicine: from discovery to applications, Adv. Drug Deliv. Rev. 185 (2022), 114268.
[11] Z. Pei, et al., Current perspectives and trend of nanomedicine in cancer: a review and bibliometric analysis, J. Contr. Release 352 (2022) 211–241.
[12] S.L. van den Broek, V. Shalgunov, M.M. Herth, Transport of nanomedicines across the blood-brain barrier: challenges and opportunities for imaging and
therapy, Biomater. Adv. 141 (2022), 213125.
[13] M.A.S. Abourehab, et al., Theoretical investigations on the manufacture of drug nanoparticles using green supercritical processing: estimation and prediction of
drug solubility in the solvent using advanced methods, J. Mol. Liq. (2022), 120559.
[14] S.M. Abuzar, et al., Enhancing the solubility and bioavailability of poorly water-soluble drugs using supercritical antisolvent (SAS) process, Int. J. Pharm. 538
(1) (2018) 1–13.
[15] S.M. Alshahrani, et al., Green processing based on supercritical carbon dioxide for preparation of nanomedicine: model development using machine learning
and experimental validation, Case Stud. Therm. Eng. 41 (2023), 102620.
[16] S.A. Sajadian, et al., Experimental analysis and thermodynamic modelling of lenalidomide solubility in supercritical carbon dioxide, Arab. J. Chem. 15 (6)
(2022), 103821.
[17] F. An, et al., Machine learning model for prediction of drug solubility in supercritical solvent: modeling and experimental validation, J. Mol. Liq. 363 (2022),
119901.
[18] Y. Li, et al., Theoretical modeling study on preparation of nanosized drugs using supercritical-based processing: determination of solubility of Chlorothiazide in
Supercritical Carbon dioxide, J. Mol. Liq. (2022), 120984.
[19] R. Polikar, Ensemble learning, in: Ensemble Machine Learning, Springer, 2012, pp. 1–34.
[20] K.P. Murphy, Machine Learning: a Probabilistic Perspective, MIT press, 2012.
[21] T.M. Mitchell, The Discipline of Machine Learning, vol. 9, Carnegie Mellon University, School of Computer Science, Machine Learning, 2006.
[22] P. Geurts, D. Ernst, L. Wehenkel, Extremely randomized trees, Mach. Learn. 63 (1) (2006) 3–42.
[23] V. John, et al., Real-time lane estimation using deep features and extra trees regression, in: Image and Video Technology, Springer, 2015.
[24] M. Xu, et al., Decision tree regression for soft classification of remote sensing data, Remote Sens. Environ. 97 (3) (2005) 322–336.
[25] L. Breiman, et al., Classification and Regression Trees, 1984.
[26] M.W. Ahmad, M. Mourshed, Y. Rezgui, Trees vs Neurons: comparison between random forest and ANN for high-resolution prediction of building energy
consumption, Energy Build. 147 (2017) 77–89.
[27] V. Rodriguez-Galiano, et al., Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees
and support vector machines, Ore Geol. Rev. 71 (2015) 804–818.
[28] M. Mathuria, Decision tree analysis on j48 algorithm for data mining, Int. J. Adv. Res. Comput. Sci. Software Eng. 3 (6) (2013).
[29] A. Sakar, R.J. Mammone, Growing and pruning neural tree networks, IEEE Trans. Comput. 42 (3) (1993) 291–299.
[30] J.H. Friedman, Greedy function approximation: a gradient boosting machine, Ann. Stat. (2001) 1189–1232.
[31] A. Natekin, A. Knoll, Gradient boosting machines, a tutorial, Front. Neurorob. 7 (2013) 21.
[32] C. Kamath, E. Cantu-Paz, Creating Ensembles of Decision Trees through Sampling, Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States),
2001.
[33] R. Seyghaly, et al., Interference recognition for fog enabled IoT architecture using a novel tree-based method, in: 2022 IEEE International Conference on Omni-
Layer Intelligent Systems (COINS), IEEE Computer Society, 2022.
[34] T. Duan, et al., Ngboost: natural gradient boosting for probabilistic prediction, in: International Conference on Machine Learning, PMLR, 2020.
10
B. Huwaimel and T.N. Alharby Case Studies in Thermal Engineering 45 (2023) 103005
[35] L. Schmid, et al., Tree-based ensembles for multi-output regression: comparing multivariate approaches with separate univariate ones, Comput. Stat. Data Anal.
179 (2023), 107628.
[36] S. Mirjalili, SCA: a sine cosine algorithm for solving optimization problems, Knowl. Base Syst. 96 (2016) 120–133.
[37] C. Li, et al., An exploitation-boosted sine cosine algorithm for global optimization, Eng. Appl. Artif. Intell. 117 (2023), 105620.
11