Professional Documents
Culture Documents
org/jced Article
process of the ternary mixture of the two solutes and IL. Also, present work aims at modeling IDAC of seventeen solutes in
the estimation of property γ∞ is more economical and easier 44 imidazolium-based ILs, based exclusively on temperature
than establishing LLE and VLE phase diagrams,3 which makes and COSMO-RS sigma profiles of both the solutes and
studying this parameter more advantageous in the design and solvents, and using a support vector machine (SVM) technique
optimization of separation processes. Because of the rise of associated with the dragon-fly swarm algorithm for better
research studies aiming for the estimation of γ∞, a large optimization of hyper-parameters.
number of papers have reported values of γ∞ for different
solutes in more than 200 ILs since 2001.3 2. METHODS AND DATASET
Clearly, experimental-based characterization is the most
reliable means for estimation of properties of a matter, but with 2.1. COSMO-RS. Possibly, COSMO-RS which is based on
ample availability of ILs, which are estimated at 250,000 based quantum calculations combined with a continuum treatment of
on currently available binary anion/cation systems,8 exper- the solvent is the most popular approach for modeling
imental measurements are costly, often difficult, and time- solubility.19 The method was developed and applied for the
consuming, and the reason why modeling methods of IDAC first time by Klamt and co-workers.25 The researchers adopted
have arisen. the conductor-like screening model in a real solvent comport-
Thermodynamic models based on the activity coefficient are ment which mainly consists of carrying out quantum
the most extensive approach to predict IDAC. These models mechanical calculations on a given individual solute in a
are classified into predictive models such as group contribution conducting continuum manner, hence modeling explicitly
models UNIFAC,9 Scatchard Hildebrand, Flory−Huggins, and solvent molecules is unnecessary. The consequential surface
COSMO, and semi-predictive models which necessitate fitting charge densities and geometries are then exploited to compute
the Gibbs free energy model parameters such as Van Laar, the interactions between the solute and solvent by way of pair-
Margules, NRTL, and UNIQUAC.10 In that context, many wise interrelating surface segments. Subsequently, a statistical
correlative models predicting IDAC by simple statistical thermodynamic perspective is used for estimation of activity
regression11−18 based on least squares have been proposed. coefficients of solutes.19 In depth description of the COSMO-
In addition, other models based on advanced machine learning RS can be found in the literature.28
algorithms have been developed and seemed more robust. Numerous papers have reported the prescreening of solutes
However, despite their fast computation ability, these in ILs based on COSMO-RS,3,18,29−31 and the method gave
techniques require a large amount of data and usually do not good qualitative and, in many cases, quantitative predictions of
broaden well outside the ranges and classes of the systems IDAC of solutes in ILs. Also, the ability of the method to
perform virtual screening of different solutes in different ILs in
from which they were established and the accuracy is
a fast pace has played a major role in its attractiveness. In the
conditional to the availability and validity of experimental
present contribution, COSMO-RS sigma profiles of seventeen
data because they are correlative methods.19 A more advanced
solutes and different ions forming the ILs are considered as
computational technique based on quantum mechanical
molecular descriptors of the binary mixture solute/IL and are
calculations and statistical thermodynamics such as con-
subsequently used for modeling the IDAC.
ductor-like screening model for real solvents (COSMO-RS)
2.2. SVM for Regression. SVM is a correlative computa-
appear capable of circumventing challenges and limitations
tional technique-like artificial neural network (ANN), however
encountered in correlative modeling mainly because they are
it is built on structural risk minimization32 instead of empirical
more predictive because they do not rely on the experimental
risk minimization-based ANN, the reason being that this
data for calculations. Although they require sophisticated
technique is robust and accurate as stated by Yoon et al.33
materials and a user’s expertise and are the most computa-
Therefore, it is a reliable algorithm both for classification and
tionally exhaustive, they are promising in virtually screening
regression.34 For regression analysis, SVM regression (SVMr)
and evaluating different combinations of anions/cations
hypothetically reduces the predictable error in a learning
forming the ILs.19,20 The σ-profile of ions may be alternatively
process and decreases the problem of overfitting often
obtained using nonproprietary software such as COSMO-
encountered in machine learning.35 Leading in many cases to
SAC.21 An overview of the COSMO-RS technique along with
a better performance of the SVMr compared to other learning
related research studies and papers are given in the following
methods.33,36−39 As explained by Parveen et al.,40 SVM
section.
regression analysis of a training data set: TD = {(a1,b1),
So far, computational modeling of IDAC has been proposed
(a2,b2), ..., (aN,bN)}, where ai is a vector of real independent
in numerous papers, mainly based on correlative methods such
variables and bi is the corresponding real dependent variables.
as neural networks22,23 or quantum mechanical/statistical Accordingly, in feature space, the regression equation can be
thermodynamic methods.3,18,24−27 As correlative models estimated by:
depend on the availability of accurate experimental data, and
the latter models are complicated in use and require expertise, z(a , w) = (w·⌀(a) + c) (1)
it is interesting to explore a hybrid approach where results and
data obtained from quantum mechanical/statistical thermody- where w represents the weight vector, ⌀(a) corresponds to the
namic calculations are correlated using a robust machine feature function, w·⌀(a) is the dot product, and c is a constant.
learning algorithm to develop a model for estimation of IDAC Theoretically, SVM reduces the predictable error through
of different solutes in ILs. To the best of our knowledge, there minimization of the following equation
has been only one paper18 that combined the two computa-
tional methods for modeling IDAC of water in ILs based on 1 1
temperature and sigma profiles of anions and cations using Q (f ) = C Lε(b , z(a , w) + W 2
N 2 (2)
multilinear regression and there has not been any similar
research that involved different solutes in ILs. Therefore, the and
B https://dx.doi.org/10.1021/acs.jced.0c00168
J. Chem. Eng. Data XXXX, XXX, XXX−XXX
Journal of Chemical & Engineering Data pubs.acs.org/jced Article
l
o
o0
Lε(b , z(a , w) = m
o
o
if |b , z(a , w)| ≤ ε Movement of dragonflies in nature toward food can be
• Alignment principal: represents the identical velocity of wherein Nj(σ) represents the quantity of segments with a
dragonflies members existing in the same group, and it is discretized surface charge density σ, Aj the total cavity surface
∑
n
V area, and Aj(σ) the total surface area of all of the segments with
defined as, Ai = i=n1 i , where Vi is the velocity of a particular charge density σ. aeff is known as the effective
neighborhood individual i. surface area and represents the theoretical contact surface
C https://dx.doi.org/10.1021/acs.jced.0c00168
J. Chem. Eng. Data XXXX, XXX, XXX−XXX
Journal of Chemical & Engineering Data pubs.acs.org/jced Article
Table 2. Most Important Information About the Data Adjusted in This Work
variable category factors unit domain SD variance kurtosis
input M (IL) g/mol [169.247−643.79] 131.327 17,251.967 −0.3689
T K [293.15−413] 21.1643 448.0465 −0.51456
S1 e/nm2 [1.2583−3.86975] 0.4504 0.2030 2.4220
S2 e/nm2 [11.50845−48.7078] 6.2359 38.8849 5.7886
S3 e/nm2 [0−25.8853] 6.523 42.5475 −0.83620
S4 e/nm2 [0−5.281] 1.7947 3.2214 −0.2266
M (Sol) g/mol [18.015−198.394] 30.6218 935.636 1.445
S5 e/nm2 [4.327−31.699] 4.842 23.401476 0.98834
output IDAC [0.013−4222.32] 331.527 109,945.803 51.072
between molecules. In some works,50 the parameter aeff is (hydrogen bond donor character, the nonpolar character, and
adjusted to a value of 7.1 Å2. the hydrogen bond acceptor behavior). The fusion of the three
Usually, the σ-profile of a solute or a solvent is divided into σ-profile regions of the solute is justified by the idea that at
four sections, and each section is defined by an interval of σ infinite dilution, the molecules of solutes are theoretically
(e/nm2). The sections were previously considered as entirely surrounded by solvent molecules and in that specific
molecular descriptors by Gonfa.18 condition, the electric surface density distribution by region is
In this work, the sigma profile of solutes, anions, and cations not as important as the total value of σ.
of the IL are partitioned into two sections. Table 1 shows the 3.1.2. Statistical Data Analysis. Table 2 summarizes the
σ-profile descriptors considered in the present study and their dependent and independent variables, their ranges, unit of
ranges. measure, domain studied, their standard deviation, variance
S1 represents the hydrogen bond donor character of the and Kurtosis.
cation, S2 correspond to the nonpolar character of the cation 3.1.2.1. Kernel Density Distribution. The probability
and its hydrogen bond acceptor behavior. Similarly, S3 density function (pdf) of a random variable can be represented
indicates the hydrogen bond donor character in addition to nonparametrically as a kernel distribution, when a parametric
nonpolarity of the anion and S4 designates the strong distribution cannot describe the data and to avoid making
hydrogen bond acceptor behavior of the anion. Whereas, S5 assumptions about the data distribution. A smoothing function
is the summation of the three characters of the solutes and a bandwidth value control the smoothness of the resultant
D https://dx.doi.org/10.1021/acs.jced.0c00168
J. Chem. Eng. Data XXXX, XXX, XXX−XXX
Journal of Chemical & Engineering Data pubs.acs.org/jced Article
density curve and characterize the kernel distribution. For a number of dependent variables, as an excess of them may lead
given predictive factor x, the estimated pdf of the variable to more noisy samples, and subsequently a trained model
represent its kernel density estimator and is defined as memorizing the noise instead of learning the trend of the data.
i x − xi yz
∑ kjjj zz
n To overcome this problem, regularization techniques such as
nh j = 1 k h {
1
fĥ (x) = Lasso and Ridge that lead to simple models that usually do not
(11) overfit. In brief, in a regularized machine learning model, the
loss function includes an additional term, also known as the
where x1, x2, ..., xn are random sample values from an regularization term, that has to be minimized as well and is
unidentified distribution, n is the data sample size, k represents defined as
the kernel smoothing function, and h is the bandwidth.51
Accordingly, the probability density distribution of pre- L= ∑ (γj∞pred − γj∞real)2 + λ∑ |β| (13)
dictive variables considered in this work is represented in
Figure 1. The regularization term in the Lasso loss function not only
3.1.2.2. Least Absolute Shrinkage and Selection Operator punishes the function for high values of coefficients β, but it
(Lasso). To evaluate the accuracy of a trained model using a also sets them to zero if they are not relevant. As a result, the
machine learning method, an optimization of a loss function is model will include the most pertinent inputs and the least
required. In regression, the value of the predicted output is significant ones will be discarded. The results of Lasso
continuous and is reliant on the choice and collection of inputs regularization applied to the problem discussed in this work
that best describe the problem in hand. Also, any interferences and using different types of interaction matrixes can be
that may exist between inputs should be closely identified, and accessed in the Supporting Information.
for that purpose the binary interactions between inputs are 3.1.2.3. Ridge Regularization. Ridge regularization, also
illustrated in Figure 2. known as Tikhonov Regularization, is the most popular
Training a model consists at minimizing the quadratic loss technique used for shrinking the number of inputs and
which is the sum of deviations between the predicted and improving the predictive accuracy of correlative models prior
actual output value. The multivariable linear regression to Lasso. The difference between these two regularization
problem is represented by eq 12 techniques is in the penalty term in the loss function L as
γj∞ = β1X1j + β2X 2j + β3X3j + ... + βi Xij + β0 (12)
follows
where X1j, X2jX3j, Xij are the i inputs variables corresponding to L= ∑ (γj∞pred − γj∞real)2 + λ∑ β 2 (14)
the output γ∞j in the line j in the dataset and β1, β2, β3, βi are
their respective coefficients while β0 is the intercept coefficient. Although Ridge regularization punishes high values of
Minimization of the loss function and finding optimal β coefficients β by the sum of squared in the loss function, it
coefficients through an iterative process allow the development does not eliminate the least significant covariates like Lasso
of a trained model for regression. However, a problem of does, and as a result, it reduces overfitting but does not set the
overfitting the data is often encountered in machine learning irrelevant inputs’ coefficients to zero. Outcomes of Ridge
algorithms, therefore attention should be given to the optimal regularization of the present data and a statistical analysis of
E https://dx.doi.org/10.1021/acs.jced.0c00168
J. Chem. Eng. Data XXXX, XXX, XXX−XXX
Journal of Chemical & Engineering Data pubs.acs.org/jced Article
input variables’ distribution are available in the Supporting performed randomly. In the next step, the kernel function that
Information. fits best our data trend and hyper-parameters are optimized
3.2. Modeling IDAC. 3.2.1. SVMr Model. In this paper, within the following ranges:
SVMr learning algorithm is used for nonlinear modeling of the C ∈ [10−3,103], ε ∈ [0,10−2], γ ∈ [10−3, 10], and n ∈ [2, 6];
IDAC of 17 solutes in imidazolium-based ILs using the largest while the different kernel functions tested are linear,
available/accessible literature data. The optimal SVMr model polynomial with different degrees, and Gaussian and radial
is obtained through the process illustrated in Figure 3. basis function (RBF). The optimal SVMr model is achieved
through a repetition of the steps listed above and based on the
best predictive ability of the model, which is measured in this
work by the root mean squared error (RMSE).
Based on the results, the Gaussian function is the kernel
function that fits best the data in this work, and the
hyperparameters C, ε, and γ have the values of 91, 0.0586,
and 2, respectively. The dataset used, Matlab code and model
parameters that allow the reproduction of the results are
available in the Supporting Information.
Different model’s statistical evaluation criteria are adopted in
this work in order to assess the predictive ability of the
developed models, the RMSE, the mean absolute error (MAE),
the mean absolute percentage error (MAPE), the coefficient of
correlation (R), the determination coefficient (R2), and the
intercept coordinate τ. Mathematical equations for these
parameters are
N
1
RMSE = ∑ (γ ∞exp − γi∞cal)2
Figure 3. Flowchart of the SVMr optimization process. N i=1 i (15)
ILs solutes
IM-2,1 SCN IM-6,1 BF4 N-PENTANE
IM-2,1 AC IM-2,1 OTF N-HEXANE
IM-4,1 CL IM-4,1 PO2-O1,O1 N-HEPTANE
IM-2,1 DCA IM-6,1 TCB N-OCTANE
IM-2OH,1 DCA IM-2,1 SO3-PH1 N-NONANE
IM-4,1 SCN IM-4,1 OTF N-DECANE
IM-2,1 BF4 IM-10,1 BF4 N-UNDECANE
IM-4,1 AC IM-4,1 SO3-PH1 N-DODECANE
IM-2,1 CCN3 IM-6,1 OTF N-TRIDECANE
IM-6,1 CL IM-10,1 TCB N-TETRADECANE
IM-4,1 DCA IM-1,1 NTF2 CYCLOPENTANE
IM-2,1 SO3-1 IM-2,1 NTF2 METHYLCYCLOPENTANE
IM-2OH,1 BF4 IM-2OH,1 NTF2 CYCLOHEXANE
IM-1,1 PO2-O1,O1 IM-4,1 NTF2 METHYLCYCLOHEXANE
IM-2,1 TFA IM-6,1 NTF2 CYCLOHEPTANE
IM-6,1 SCN IM-6,1,1 NTF2 CYCLOOCTANE
IM-4,1 BF4 IM-1O6,1 NTF2 WATER
IM-2,1 TCB IM-2,1 FAP
IM-4,1 CCN3 IM-2OH,1 FAP
IM-4,1 SO3-1 IM-1O6,1O6 NTF2
IM-2,1 PO2-O1,O1 IM-6,1 FAP
IM-4,1 TFA IM-10,10,1 NTF2
F https://dx.doi.org/10.1021/acs.jced.0c00168
J. Chem. Eng. Data XXXX, XXX, XXX−XXX
ÅÄÅ N ÑÉÑ
Journal of Chemical & Engineering Data pubs.acs.org/jced Article
ÅÅ ∞cal Ñ
Ñ
Å
R = ÅÅ∑ (γi − γ )ÑÑÑ
ÅÅ ÑÑ
learning algorithm, associated to DA algorithm for optimiza-
ÅÇ i = 1 ÑÖ
∞Exp exp ∞cal
− γ ̅ ) × (γi tion of its hyper-parameters. The division proportions of the
ÅÄÅ N ÑÉÑ
ÅÅ ÑÑ
data for training and test, along with the ranges of the hyper-
ÅÅ ÑÑ
ÅÅÇ i = 1 ÑÑÖ
∞ Exp exp 2
− γ ̅ ) × (γi ∞ cal ∞ cal 2 previous section. The steps leading to the development of the
(18)
optimal DA-SVMr hybrid algorithm are illustrated in the
flowchart presented in Figure 5.
n
∑i = 1 (γiexp − γical)2
R2 = 1 − n
∑i = 1 (γiexp − γ ̅ exp)2 (19)
In order to evaluate the predictive ability of the optimized
SVMr model, the validation agreement vector, and the
validation agreement plot of the predicted versus experimental
response variable from the validation dataset only are analyzed.
Using the Matlab function “postreg”, the linear regression of
the calculated versus experimental output is visualized.
The total data agreement scheme is shown in Figure 4,
which illustrates the SVMr model’s calculated outputs versus
Figure 7. William’s Diagram for detection of outliers for the DA-SVM model.
H https://dx.doi.org/10.1021/acs.jced.0c00168
J. Chem. Eng. Data XXXX, XXX, XXX−XXX
Journal of Chemical & Engineering Data pubs.acs.org/jced Article
zz
2
N i=1 j
1 dataset and from the subset of data which is used in the
k {
γi∞Exp assessment of the DA-SVMr optimization performance. For
(25) this purpose, the function Matlab “holdout” for cross-
validation is used and to ensure that the DA-SVMr optimal
N
γi∞Exp − γi∞cal model is not contingent on the validation data. Consequently,
RAE = ∑ the improved predictive accuracy of the model is the result of
i=1
γi∞Exp (26) the swarm optimization of the learning machine hyper-
γi∞cal
parameters and certainly not the result of overfitting.59,60
N
(∑i + 1 |log |/N)
A f = 10 γi∞exp (27) 4. CONCLUSIONS
N γ ∞cal In this study, two SVMr models were developed for the
(∑i + 1 log i∞exp /N) estimation of IDAC of seventeen solutes in 44 imidazolium-
Bf = 10 γi (28)
based ILs based on temperature, molecular weight of solute
The values of the ten statistical evaluation indices of the and solvent, and five COSMO-RS σ-profile descriptors for
three models discussed in this work for prediction (validation both ILs and solutes. Experimental data is selected from the
data) are summarized in Table 5. literature and is closely analyzed statistically. Two advanced
Based on the models’ statistical parameter RMSE, it can be deep learning regularization techniques, known as Lasso and
seen that the two SVMr-based models surpassed the COSMO- Ridge are applied in this study to illustrate the decisive
RS model.3 The difference between the two models (SVMr selection of input parameters. The two developed models show
and DA-SVMr) is insignificant compared to that of the two a good correlative and predictive accuracy. Statistical the
same models with the COSMO-RS one. Moreover, the values comparison of different models reveals a weak performance of
of R2 (for validation) and ΔR2 in addition to the RMSE imply a COSMO-RS model3 in predicting IDAC in contrast to the
that the DA-SVMr model has more predictive power than the two SVMr models. The results confirm that the COSMO-RS is
SVMr. This means that the two models proposed in this work a good source of structural molecular information about the
I https://dx.doi.org/10.1021/acs.jced.0c00168
J. Chem. Eng. Data XXXX, XXX, XXX−XXX
Journal of Chemical & Engineering Data pubs.acs.org/jced Article
molecules of solutes and solvents (ILs). Furthermore, the DA- Maamar Laidi − Biomaterials and Transport Phenomena
SVMr model outperformed the SVM model with a validation Laboratory (LBMPT), University Yahia Fares of Medea,
correlation coefficient of 0.9954 and a RMSE equal to 0.1785. Medea 26000, Algeria; Department of Process Engineering,
These results demonstrate the improved performance of SVM Institute of Technology, University Dr. Yahia Fares of Medea,
for regression when the hyper-parameters are optimized using Medea 26000, Algeria
DA. Despite the better results and improved accuracy obtained Complete contact information is available at:
with the DA-SVMr model, associating SVMr with other SI https://pubs.acs.org/10.1021/acs.jced.0c00168
algorithms for optimization of hyper-parameters is highly
recommended as this may lead to even more accurate models.
■
Notes
ASSOCIATED CONTENT The authors declare no competing financial interest.
*
sı Supporting Information
The Supporting Information is available free of charge at
https://pubs.acs.org/doi/10.1021/acs.jced.0c00168.
■ REFERENCES
(1) Welton, T. Room-Temperature Ionic Liquids. Solvents for
Synthesis and Catalysis. Chem. Rev. 1999, 99, 2071−2084.
ZIP file contains ‘data_ss.xlsx’ and ‘data_ss_Info.docx’ (2) Freemantle, M. An Introduction to Ionic Liquids; Royal Society of
providing the data points used for modeling. It also Chemistry, 2010.
encloses two folders ‘DA-SVM’ and ‘SVM’ which include (3) Paduszyński, K. An Overview of the Performance of the
the two models developed in this work using Matlab, COSMO-RS Approach in Predicting the Activity Coefficients of
along with a ‘README_File.docx’ that describes each of Molecular Solutes in Ionic Liquids and Derived Properties at Infinite
the other files and how they are used together (ZIP) Dilution. Phys. Chem. Chem. Phys. 2017, 19, 11835−11850.
■
(4) Lei, Z.; Dai, C.; Zhu, J.; Chen, B. Extractive Distillation with
Ionic Liquids: A Review. AIChE J. 2014, 60, 3312−3329.
AUTHOR INFORMATION (5) Kato, R.; Krummen, M.; Gmehling, J. Measurement and
Corresponding Author Correlation of Vapor−liquid Equilibria and Excess Enthalpies of
Hania Benimam − Biomaterials and Transport Phenomena Binary Systems Containing Ionic Liquids and Hydrocarbons. Fluid
Laboratory (LBMPT), University Yahia Fares of Medea, Phase Equilib. 2004, 224, 47−54.
(6) Brennecke, J. F.; Maginn, E. J. Ionic Liquids: Innovative Fluids
Medea 26000, Algeria; orcid.org/0000-0002-9786-7166; for Chemical Processing. AIChE J. 2001, 47, 2384−2389.
Phone: +213 669542184; Email: hbenimam@gmail.com (7) Kazakov, A.; Magee, J. W.; Chirico, R. D.; Diky, V.; Muzny, C.
D.; Kroenlein, K.; Frenkel, M. NIST Standard Reference Database 147:
Authors
NIST Ionic Liquids Database(ILThermo). Version 2.0; National
Cherif Si Moussa − Biomaterials and Transport Phenomena Institute of Standards and Technology: Gaithersburg MD. There is no
Laboratory (LBMPT), University Yahia Fares of Medea, Corresp. Rec. this Ref, 2013.
Medea 26000, Algeria; Department of Process Engineering, (8) Paduszyński, K. In Silico Calculation of Infinite Dilution Activity
Institute of Technology, University Dr. Yahia Fares of Medea, Coefficients of Molecular Solutes in Ionic Liquids: Critical Review of
Medea 26000, Algeria Current Methods and New Models Based on Three Machine
Mohamed Hentabli − Biomaterials and Transport Phenomena Learning Algorithms. J. Chem. Inf. Model. 2016, 56, 1420−1437.
Laboratory (LBMPT), University Yahia Fares of Medea, (9) Weidlich, U.; Gmehling, J. A Modified UNIFAC Model. 1.
Medea 26000, Algeria; Department of Process Engineering, Prediction of VLE, HE, and .Gamma..Infin. Ind. Eng. Chem. Res. 1987,
Institute of Technology, University Dr. Yahia Fares of Medea, 26, 1372−1381.
(10) Simoni, L. D.; Lin, Y.; Brennecke, J. F.; Stadtherr, M. A.
Medea 26000, Algeria; Laboratory of Quality Control, Physico-
Modeling Liquid− Liquid Equilibrium of Ionic Liquid Systems with
Chemical Department, SAIDAL of Medea, Medea 26000, NRTL, Electrolyte-NRTL, and UNIQUAC. Ind. Eng. Chem. Res.
Algeria; orcid.org/0000-0002-6693-0708 2008, 47, 256−272.
Salah Hanini − Biomaterials and Transport Phenomena (11) Eike, D. M.; Brennecke, J. F.; Maginn, E. J. Predicting Infinite-
Laboratory (LBMPT), University Yahia Fares of Medea, Dilution Activity Coefficients of Organic Solutes in Ionic Liquids. Ind.
Medea 26000, Algeria Eng. Chem. Res. 2004, 43, 1039−1048.
J https://dx.doi.org/10.1021/acs.jced.0c00168
J. Chem. Eng. Data XXXX, XXX, XXX−XXX
Journal of Chemical & Engineering Data pubs.acs.org/jced Article
(12) Tämm, K.; Burk, P. QSPR Analysis for Infinite Dilution of the Water Activity Coefficient at Infinite Dilution in Ionic Liquids.
Activity Coefficients of Organic Compounds. J. Mol. Model. 2006, 12, Ind. Eng. Chem. Res. 2014, 53, 12466−12475.
417−421. (31) Matheswaran, P.; Wilfred, C. D.; Kurnia, K. A.; Ramli, A.
(13) Xi, L.; Sun, H.; Li, J.; Liu, H.; Yao, X.; Gramatica, P. Prediction Overview of Activity Coefficient of Thiophene at Infinite Dilution in
of Infinite-Dilution Activity Coefficients of Organic Solutes in Ionic Ionic Liquids and Their Modeling Using COSMO-RS. Ind. Eng.
Liquids Using Temperature-Dependent Quantitative Structure − Chem. Res. 2016, 55, 788−797.
Property Relationship Method. Chem. Eng. J. 2010, 163, 195−201. (32) Vapnik, A. V. The Nature of Statistical Learning Theory;
(14) Grubbs, L. M.; Ye, S.; Saifullah, M.; McMillan-Wiggins, M. C.; Springer-Verlag: New York, 1995.
Acree, W. E.; Abraham, M. H.; Twu, P.; Anderson, J. L. Correlations (33) Yoon, H.; Jun, S.-C.; Hyun, Y.; Bae, G.-O.; Lee, K.-K. A
for Describing Gas-to-Ionic Liquid Partitioning at 323 K Based on Comparative Study of Artificial Neural Networks and Support Vector
Ion-Specific Equation Coefficient and Group Contribution Versions Machines for Predicting Groundwater Levels in a Coastal Aquifer. J.
of the Abraham Model. Fluid Phase Equilib. 2011, 301, 257−266. Hydrol. 2011, 396, 128−138.
(15) Mutelet, F.; Ortega-Villa, V.; Moïse, J.-C.; Jaubert, J.-N.; Acree, (34) Drucker, H.; Burges, C. J. C.; Kaufman, L.; Smola, A. J.; Vapnik,
W. E. Prediction of Partition Coefficients of Organic Compounds in V. Support Vector Regression Machines. Advances in Neural
Ionic Liquids Using a Temperature-Dependent Linear Solvation Information Processing Systems; MIT Press, 1997; pp 155−161.
Energy Relationship with Parameters Calculated through a Group (35) Yu, P.-S.; Chen, S.-T.; Chang, I.-F. Support Vector Regression
Contribution Method. J. Chem. Eng. Data 2011, 56, 3598−3606. for Real-Time Flood Stage Forecasting. J. Hydrol. 2006, 328, 704−
(16) Stephens, T. W.; Chou, V.; Quay, A. N.; Shen, C.; Dabadge, N.; 716.
Tian, A.; Loera, M.; Willis, B.; Wilson, A.; Acree, W. E.; Twu, P.; (36) Wu, C.-H.; Ho, J.-M.; Lee, D. T. Travel-Time Prediction with
Anderson, J. L.; Abraham, M. H. Thermochemical Investigations of Support Vector Regression. IEEE Trans. Intell. Transp. Syst. 2004, 5,
Solute Transfer into Ionic Liquid Solvents: Updated Abraham Model 276−281.
Equation Coefficients for Solute Activity Coefficient and Partition (37) Wen, Y. F.; Cai, C. Z.; Liu, X. H.; Pei, J. F.; Zhu, X. J.; Xiao, T.
Coefficient Predictions. Phys. Chem. Liq. 2014, 52, 488−518. T. Corrosion Rate Prediction of 3C Steel under Different Seawater
(17) Gonfa, G.; Bustam, M. A.; Sharif, A. M.; Mohamad, N.; Ullah, Environment by Using Support Vector Regression. Corros. Sci. 2009,
S. Tuning Ionic Liquids for Natural Gas Dehydration Using COSMO- 51, 349−355.
RS Methodology. J. Nat. Gas Sci. Eng. 2015, 27, 1141−1148. (38) Chevalier, R. F.; Hoogenboom, G.; McClendon, R. W.; Paz, J.
(18) Gonfa, G.; Bustam, M. A.; Shariff, A. M.; Muhammad, N.; A. Support Vector Regression with Reduced Training Sets for Air
Ullah, S. Quantitative Structure−activity Relationships (QSARs) for Temperature Prediction: A Comparison with Artificial Neural
Estimation of Activity Coefficient at Infinite Dilution of Water in Networks. Neural Comput. Appl. 2011, 20, 151−159.
Ionic Liquids for Natural Gas Dehydration. J. Taiwan Inst. Chem. Eng. (39) He, Z.; Wen, X.; Liu, H.; Du, J. A Comparative Study of
2016, 66, 222−229. Artificial Neural Network, Adaptive Neuro Fuzzy Inference System
(19) Shiflett, M. B.; Maginn, E. J. The Solubility of Gases in Ionic
and Support Vector Machine for Forecasting River Flow in the
Liquids. AIChE J. 2017, 63, 4722−4737.
Semiarid Mountain Region. J. Hydrol. 2014, 509, 379−386.
(20) Matheswaran, P.; Wilfred, C. D.; Kurnia, K. A.; Ramli, A.
(40) Parveen, N.; Zaidi, S.; Danish, M. Support Vector Regression
Prediction of Activity Coefficient of Sulfones at Infinite Dilution in
Model for Predicting the Sorption Capacity of Lead (II). Perspect. Sci.
Ionic Liquids and Their Modeling Using COSMO-RS. AIP Conference
2016, 8, 629−631.
Proceedings; AIP Publishing, 2016; Vol. 1787, p 020007.
(41) Vapnik, V.; Golowich, S. E.; Smola, A. J. Support Vector
(21) Wang, S.; Sandler, S. I.; Chen, C.-C. Refinement of COSMO−
Method for Function Approximation, Regression Estimation and
SAC and the Applications. Ind. Eng. Chem. Res. 2007, 46, 7275−7288.
(22) Ajmani, S.; Rogers, S. C.; Barley, M. H.; Burgess, A. N.; Signal Processing. In Advances in Neural Information Processing
Livingstone, D. J. Characterization of Mixtures Part 1: Prediction of Systems; MIT Press, 1997; pp 281−287.
Infinite-Dilution Activity Coefficients Using Neural Network-Based (42) Beni, G.; Wang, J. Swarm Intelligence in Cellular Robotic
QSPR Models. QSAR Comb. Sci. 2008, 27, 1346−1361. Systems. Robots and Biological Systems: Towards a New Bionics?;
(23) Nami, F.; Deyhimi, F. Prediction of Activity Coefficients at Springer Berlin Heidelberg: Berlin, Heidelberg, 1993; pp 703−712.
Infinite Dilution for Organic Solutes in Ionic Liquids by Artificial (43) Bonabeau, E.; Dorigo, M.; Theraulaz, G. Swarm Intelligence:
Neural Network. J. Chem. Thermodyn. 2011, 43, 22−27. From Natural to Artificial Systems; Oxford University Press: New York,
(24) Banerjee, T.; Khanna, A. Infinite Dilution Activity Coefficients 1999.
for Trihexyltetradecyl Phosphonium Ionic Liquids: Measurements (44) Tharwat, A.; Moemen, Y. S.; Hassanien, A. E. Classification of
and COSMO-RS Prediction. J. Chem. Eng. Data 2006, 51, 2170− Toxicity Effects of Biotransformed Hepatic Drugs Using Whale
2177. Optimized Support Vector Machines. J. Biomed. Inf. 2017, 68, 132−
(25) Diedenhofen, M.; Eckert, F.; Klamt, A. Prediction of Infinite 149.
Dilution Activity Coefficients of Organic Compounds in Ionic Liquids (45) Mirjalili, S. Dragonfly Algorithm: A New Meta-Heuristic
Using COSMO-RS. J. Chem. Eng. Data 2003, 48, 475−479. Optimization Technique for Solving Single-Objective, Discrete, and
(26) Freire, M. G.; Ventura, S. P. M.; Santos, L. M. N. B. F.; Multi-Objective Problems. Neural Comput. Appl. 2016, 27, 1053−
Marrucho, I. M.; Coutinho, J. A. P. Evaluation of COSMO-RS for the 1073.
Prediction of LLE and VLE of Water and Ionic Liquids Binary (46) Mullins, E.; Oldland, R.; Liu, Y. A.; Wang, S.; Sandler, S. I.;
Systems. Fluid Phase Equilib. 2008, 268, 74−84. Chen, C.-C.; Zwolak, M.; Seavey, K. C. Sigma-Profile Database for
(27) Kurnia, K. A.; Pinho, S. P.; Coutinho, J. A. P. Evaluation of the Using COSMO-Based Thermodynamic Methods. Ind. Eng. Chem. Res.
Conductor-like Screening Model for Real Solvents for the Prediction 2006, 45, 4389−4415.
of the Water Activity Coefficient at Infinite Dilution in Ionic Liquids. (47) Klamt, A. The COSMO and COSMO-RS Solvation Models.
Ind. Eng. Chem. Res. 2014, 53, 12466−12475. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2011, 1, 699−709.
(28) Diedenhofen, M.; Klamt, A. COSMO-RS as a Tool for Property (48) Klamt, A. Conductor-like Screening Model for Real Solvents: A
Prediction of IL Mixtures-A Review. Fluid Phase Equilib. 2010, 294, New Approach to the Quantitative Calculation of Solvation
31−38. Phenomena. J. Phys. Chem. 1995, 99, 2224−2235.
(29) Kumar, A. A. P.; Banerjee, T. Thiophene Separation with Ionic (49) Eckert, F.; Klamt, A. Fast Solvent Screening via Quantum
Liquids for Desulphurization: A Quantum Chemical Approach. Fluid Chemistry: COSMO-RS Approach. AIChE J. 2002, 48, 369−385.
Phase Equilib. 2009, 278, 1−8. (50) Klamt, A.; Jonas, V.; Bürger, T.; Lohrenz, J. C. W. Refinement
(30) Kurnia, K. A.; Pinho, S. P.; Coutinho, J. A. P. Evaluation of the and Parametrization of COSMO-RS. J. Phys. Chem. A 1998, 102,
Conductor-like Screening Model for Real Solvents for the Prediction 5074−5085.
K https://dx.doi.org/10.1021/acs.jced.0c00168
J. Chem. Eng. Data XXXX, XXX, XXX−XXX
Journal of Chemical & Engineering Data pubs.acs.org/jced Article
L https://dx.doi.org/10.1021/acs.jced.0c00168
J. Chem. Eng. Data XXXX, XXX, XXX−XXX