You are on page 1of 12

pubs.acs.

org/jced Article

Dragonfly-Support Vector Machine for Regression Modeling of the


Activity Coefficient at Infinite Dilution of Solutes in Imidazolium
Ionic Liquids Using σ‑Profile Descriptors
Hania Benimam,* Cherif Si Moussa, Mohamed Hentabli, Salah Hanini, and Maamar Laidi
Cite This: https://dx.doi.org/10.1021/acs.jced.0c00168 Read Online

ACCESS Metrics & More Article Recommendations *


sı Supporting Information
See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.
Downloaded via AUCKLAND UNIV OF TECHNOLOGY on June 2, 2020 at 19:13:24 (UTC).

ABSTRACT: Ionic liquids (ILs) have shown remarkable potential


for applications in separation, such as extractive distillation and
liquid−liquid extraction. Crucial to these applications is the
estimation of a significant property of the ILs which is the infinite
dilution activity coefficient (IDAC) of different solutes in ILs. In
this context, the present paper aims to model IDAC of 17 solutes
in 44 imidazolium ILs using 2666 experimental data points
gathered from the literature and based on support vector machine
for the regression (SVMr) learning algorithm. Two models are
developed, one based on SVMr and the other one based on
dragonfly algorithm (DA) associated with SVMr. Both models
consider the same set of predictive variables which are the
temperature, the molecular weight of solute and solvent, and five conductor-like screening models for real solvents (COSMO-RS) σ-
profile descriptors related to the solute and IL. The DA is applied for optimization of SVMr hyper-parameters. The results show the
superiority of the DA-SVMr model demonstrated by its correlation coefficient (R) and root mean square error values of 0.996 and
0.170, respectively.

1. INTRODUCTION such applications of ILs in factual processes dictate their


As the sustainable development of processes continues to characterization by establishing accurate quantitative structure
receive a great amount of attention, the introduction of cleaner property relationships which give insight into the properties
and behavior of the IL based on its molecular structure.6 In a
technologies has become a high priority, and opting for eco-
separation process that involves the use of an IL as an
friendly solvents to replace harmful solvents has preoccupied
entrainer, the liquid−liquid equilibrium (LLE) and vapor−
researchers both in academia and industry.1 Ionic liquids (ILs)
liquid equilibrium (VLE) data of the ternary system including
are organic salts that consist entirely of ions and have melting
the IL is preliminary in the design and optimization of the
points much lower than those for conventional inorganic ionic
extraction phase; the reason why considerable efforts have
compounds.2 As explained by Paduszyński,3 in ILs, the
been made to report the experimental results of LLE and VLE
Coulomb forces between ions are strong resulting in their
of different systems relating ILs.7 Infinite dilution activity
extremely low volatility, and the reason why they are
coefficient (IDAC) (γ∞) of a solute in an IL is another
considered eco-friendly. The high-solvating power of ILs for
significant thermodynamic property of ILs giving information
a large variety of materials including organic, inorganic, and
about their affinity and basically justifies opting for a particular
organo-metallic compounds enhances their performance as the IL over others in a given separation process. At infinite dilution
solvent, and particularly, their capacity for selective separation of a solute in an IL, the interactions between molecules of IL
of similar compounds.3 Moreover, the properties of ILs can be and those of a solute are predominant compared to the
adjusted and tuned to selectively separate similar compounds interactions between IL and IL, and the difference in values of
by a careful choice of anion and cation combination.3 γ∞ of two solutes reveals the IL’s selectivity in the separation
The numerous pros of ILs, compared to the cons of volatile
organic compounds (VOCs), resulted in many of their
applications in extractive distillation and liquid−liquid Received: February 16, 2020
extraction,4,5 leading thus to more environmentally friendly Accepted: May 14, 2020
processes due to the nonvolatility of ILs compared to that of
VOCs and enhanced separation processes by tuning some key
parameters such as capacity and selectivity through a judicious
choice of anions and cations in designing the ILs. Unarguably,

© XXXX American Chemical Society https://dx.doi.org/10.1021/acs.jced.0c00168


A J. Chem. Eng. Data XXXX, XXX, XXX−XXX
Journal of Chemical & Engineering Data pubs.acs.org/jced Article

process of the ternary mixture of the two solutes and IL. Also, present work aims at modeling IDAC of seventeen solutes in
the estimation of property γ∞ is more economical and easier 44 imidazolium-based ILs, based exclusively on temperature
than establishing LLE and VLE phase diagrams,3 which makes and COSMO-RS sigma profiles of both the solutes and
studying this parameter more advantageous in the design and solvents, and using a support vector machine (SVM) technique
optimization of separation processes. Because of the rise of associated with the dragon-fly swarm algorithm for better
research studies aiming for the estimation of γ∞, a large optimization of hyper-parameters.
number of papers have reported values of γ∞ for different
solutes in more than 200 ILs since 2001.3 2. METHODS AND DATASET
Clearly, experimental-based characterization is the most
reliable means for estimation of properties of a matter, but with 2.1. COSMO-RS. Possibly, COSMO-RS which is based on
ample availability of ILs, which are estimated at 250,000 based quantum calculations combined with a continuum treatment of
on currently available binary anion/cation systems,8 exper- the solvent is the most popular approach for modeling
imental measurements are costly, often difficult, and time- solubility.19 The method was developed and applied for the
consuming, and the reason why modeling methods of IDAC first time by Klamt and co-workers.25 The researchers adopted
have arisen. the conductor-like screening model in a real solvent comport-
Thermodynamic models based on the activity coefficient are ment which mainly consists of carrying out quantum
the most extensive approach to predict IDAC. These models mechanical calculations on a given individual solute in a
are classified into predictive models such as group contribution conducting continuum manner, hence modeling explicitly
models UNIFAC,9 Scatchard Hildebrand, Flory−Huggins, and solvent molecules is unnecessary. The consequential surface
COSMO, and semi-predictive models which necessitate fitting charge densities and geometries are then exploited to compute
the Gibbs free energy model parameters such as Van Laar, the interactions between the solute and solvent by way of pair-
Margules, NRTL, and UNIQUAC.10 In that context, many wise interrelating surface segments. Subsequently, a statistical
correlative models predicting IDAC by simple statistical thermodynamic perspective is used for estimation of activity
regression11−18 based on least squares have been proposed. coefficients of solutes.19 In depth description of the COSMO-
In addition, other models based on advanced machine learning RS can be found in the literature.28
algorithms have been developed and seemed more robust. Numerous papers have reported the prescreening of solutes
However, despite their fast computation ability, these in ILs based on COSMO-RS,3,18,29−31 and the method gave
techniques require a large amount of data and usually do not good qualitative and, in many cases, quantitative predictions of
broaden well outside the ranges and classes of the systems IDAC of solutes in ILs. Also, the ability of the method to
perform virtual screening of different solutes in different ILs in
from which they were established and the accuracy is
a fast pace has played a major role in its attractiveness. In the
conditional to the availability and validity of experimental
present contribution, COSMO-RS sigma profiles of seventeen
data because they are correlative methods.19 A more advanced
solutes and different ions forming the ILs are considered as
computational technique based on quantum mechanical
molecular descriptors of the binary mixture solute/IL and are
calculations and statistical thermodynamics such as con-
subsequently used for modeling the IDAC.
ductor-like screening model for real solvents (COSMO-RS)
2.2. SVM for Regression. SVM is a correlative computa-
appear capable of circumventing challenges and limitations
tional technique-like artificial neural network (ANN), however
encountered in correlative modeling mainly because they are
it is built on structural risk minimization32 instead of empirical
more predictive because they do not rely on the experimental
risk minimization-based ANN, the reason being that this
data for calculations. Although they require sophisticated
technique is robust and accurate as stated by Yoon et al.33
materials and a user’s expertise and are the most computa-
Therefore, it is a reliable algorithm both for classification and
tionally exhaustive, they are promising in virtually screening
regression.34 For regression analysis, SVM regression (SVMr)
and evaluating different combinations of anions/cations
hypothetically reduces the predictable error in a learning
forming the ILs.19,20 The σ-profile of ions may be alternatively
process and decreases the problem of overfitting often
obtained using nonproprietary software such as COSMO-
encountered in machine learning.35 Leading in many cases to
SAC.21 An overview of the COSMO-RS technique along with
a better performance of the SVMr compared to other learning
related research studies and papers are given in the following
methods.33,36−39 As explained by Parveen et al.,40 SVM
section.
regression analysis of a training data set: TD = {(a1,b1),
So far, computational modeling of IDAC has been proposed
(a2,b2), ..., (aN,bN)}, where ai is a vector of real independent
in numerous papers, mainly based on correlative methods such
variables and bi is the corresponding real dependent variables.
as neural networks22,23 or quantum mechanical/statistical Accordingly, in feature space, the regression equation can be
thermodynamic methods.3,18,24−27 As correlative models estimated by:
depend on the availability of accurate experimental data, and
the latter models are complicated in use and require expertise, z(a , w) = (w·⌀(a) + c) (1)
it is interesting to explore a hybrid approach where results and
data obtained from quantum mechanical/statistical thermody- where w represents the weight vector, ⌀(a) corresponds to the
namic calculations are correlated using a robust machine feature function, w·⌀(a) is the dot product, and c is a constant.
learning algorithm to develop a model for estimation of IDAC Theoretically, SVM reduces the predictable error through
of different solutes in ILs. To the best of our knowledge, there minimization of the following equation
has been only one paper18 that combined the two computa-
tional methods for modeling IDAC of water in ILs based on 1 1
temperature and sigma profiles of anions and cations using Q (f ) = C Lε(b , z(a , w) + W 2
N 2 (2)
multilinear regression and there has not been any similar
research that involved different solutes in ILs. Therefore, the and
B https://dx.doi.org/10.1021/acs.jced.0c00168
J. Chem. Eng. Data XXXX, XXX, XXX−XXX
Journal of Chemical & Engineering Data pubs.acs.org/jced Article

l
o
o0
Lε(b , z(a , w) = m
o
o
if |b , z(a , w)| ≤ ε Movement of dragonflies in nature toward food can be

n|b , z(a , w)| − ε otherwise


expressed as Fi = x+ + x, where x+ represents the food’s
position, and their movement away from the enemy is Ei = x−
(3) + x, given x− is the enemy’s position.45
Q(f) represents the empirical error and the term C indicates The localization/coordinate of a dragonfly is the sum of its
the measure of optimization between the empirical error and position x and its step Δx and can be expressed as, xt+1 = xt +
the model complexity shown in the right term of eq 2; whereas Δxt+1, where t is the current iteration and Δxt+1 is determined
the loss function is expressed in eq 3 and is labeled insensitive using the following equation
loss function ε.41 By integrating the Langrangian multiplier β Δxt + 1 = sSi + aAi + cCi + fFi + eEi + ωΔxi (6)
and β*, the optimization problem is transformed into a dual
problem. The coefficients with a value different than zero and where Si is the separation for xi and s is its separation weight; Ai
their corresponding input vectors ai are named the support is the alignment of xi and a is its alignment weight; Ci is the
vectors. The resulting regression equation is the following cohesion of xi and c is its weight; Fi is the food attraction of xi
and f is the food parameter; and Ei is the distraction from
Nsv
enemy and e is its factor; and ω is the inertia weight. The
z(a , βi , βi*) = ∑ (βi − βi*)(⌀(ai)·⌀(aj)) + c factors s, a, c, f, and e are determined during optimization and
i=1 (4) have a double action that of exploration and of exploitation in
By introducing the kernel function k(xi,xj), the above the DA algorithm.45
function takes the following form
3. RESULTS AND DISCUSSION
Nsv
z(a , βi , βi*) = 3.1. Data Collection and Analysis. 3.1.1. Sigma Profile.
∑ (βi − βi*)k(a , ai) + c A brief description of the COSMO-RS sigma profile similar to
i=1 (5)
Mullins’s et al.46 is presented here. In theory, COSMO-based
Karush−Kuhn−Tucker provisions are employed to calculate models picture the solute molecule inside a cavity surrounded
the value of the entity c. In SVMr modeling, the so-called by a uniform medium representing the solvent, and based on a
hyper-parameters are the cost function C and the radius of the set of explicit rules and atoms’ structural aspects, the model
insensitive tube ε in addition to the kernel function’s parameter build the cavity hypothetically within an ideal conductor and a
γ, and these parameters are directly associated to the SVMr’s perfect solvation process. The dipole and higher moments of
performance ability for the problem at hand. In the present the solute molecule induce charge from the adjacent medium
work, SVMr modeling is elaborated using MATLAB 2018. The to the surface of the cavity by eliminating the electric field
importance of the values that hyper-parameters take in SVMr inside the conductor and lateral to the surface. The resulting
requires the optimization of these parameters. Basically, there surface charge is defined by the following equation
are infinite possible values that hyper-parameters may take, so
testing all values and combinations of values is computationally ϕtot = ϕsol + Dq* = 0 (7)
and practically impossible. Therefore, for the present study, where φtot is the total potential on the surface of the cavity, φsol
dragonfly algorithm (DA), a swarm intelligence (SI) algorithm, is the potential on the solute molecule because of the charge
has been used with SVM for the optimization of hyper- distribution, D represents the Coulomb interaction matrix
parameters. defining the interactions among surface charges and is a
2.3. Dragonfly Algorithm. SI is a field of study first function of its geometry.47 The surface charge dispersal in a
proposed by Beni and Wang in 1993.42 The study aims at dielectric solvent is estimated by the fitting of the surface
understanding the different interactions between individual charge distribution in a conductor (σ*).
living creatures that may result in social intelligence. The COSMO calculation response (σ*) is then approxi-
Accordingly, it simulates and implements the collective and mated over a circular surface section to attain a new surface
social intelligence behavior of a whole population based on charge distribution labeled σ, which characterizes the
discovered rules between some of the population.43 Tharwat probability distribution of a molecular surface section that
and co-workers44 summarized the principles leading any swarm has a specific electric charge density, and the probability
behavior as: distribution is referred to as the sigma profile p(σ) of a
• Separation principal: represents the constant collision molecule,46 and it is defined by Klamt48,49 for a molecule j as
avoidance between individuals in a population, and this Nj(σ ) A j (σ )
follows the following equation: Si = −∑ni=1(x − xi); pj (σ ) = =
where x represents the current solution’s and xi is the Nj Aj (8)
position in the neighborhood individual i, while n is the
total number of members in a neighborhood. Aj
• Cohesion principal: this principal is based on the
Nj = ∑ Nj(σ ) =
σ
aeff (9)
tendency of individuals for arrangement toward the
mass midpoint of the neighborhood, and it is described Aj = ∑ A j (σ )
n
∑i = 1 xi (10)
as Ci = n
− x. σ

• Alignment principal: represents the identical velocity of wherein Nj(σ) represents the quantity of segments with a
dragonflies members existing in the same group, and it is discretized surface charge density σ, Aj the total cavity surface

n
V area, and Aj(σ) the total surface area of all of the segments with
defined as, Ai = i=n1 i , where Vi is the velocity of a particular charge density σ. aeff is known as the effective
neighborhood individual i. surface area and represents the theoretical contact surface
C https://dx.doi.org/10.1021/acs.jced.0c00168
J. Chem. Eng. Data XXXX, XXX, XXX−XXX
Journal of Chemical & Engineering Data pubs.acs.org/jced Article

Table 1. σ-Profile Descriptors and the Margins Limiting Each Section


cation’s molecular descriptors anion’s molecular descriptors solute’s σ-profile descriptor
S1 S2 S3 S4 S5
σ (e/nm2) [−2.5, −1.0] [−1.0, 2.0] [−1.1, 1.5] [1.5, 3.0] [−2.5, 2.5]

Table 2. Most Important Information About the Data Adjusted in This Work
variable category factors unit domain SD variance kurtosis
input M (IL) g/mol [169.247−643.79] 131.327 17,251.967 −0.3689
T K [293.15−413] 21.1643 448.0465 −0.51456
S1 e/nm2 [1.2583−3.86975] 0.4504 0.2030 2.4220
S2 e/nm2 [11.50845−48.7078] 6.2359 38.8849 5.7886
S3 e/nm2 [0−25.8853] 6.523 42.5475 −0.83620
S4 e/nm2 [0−5.281] 1.7947 3.2214 −0.2266
M (Sol) g/mol [18.015−198.394] 30.6218 935.636 1.445
S5 e/nm2 [4.327−31.699] 4.842 23.401476 0.98834
output IDAC [0.013−4222.32] 331.527 109,945.803 51.072

Figure 1. Probability density distribution of variables.

between molecules. In some works,50 the parameter aeff is (hydrogen bond donor character, the nonpolar character, and
adjusted to a value of 7.1 Å2. the hydrogen bond acceptor behavior). The fusion of the three
Usually, the σ-profile of a solute or a solvent is divided into σ-profile regions of the solute is justified by the idea that at
four sections, and each section is defined by an interval of σ infinite dilution, the molecules of solutes are theoretically
(e/nm2). The sections were previously considered as entirely surrounded by solvent molecules and in that specific
molecular descriptors by Gonfa.18 condition, the electric surface density distribution by region is
In this work, the sigma profile of solutes, anions, and cations not as important as the total value of σ.
of the IL are partitioned into two sections. Table 1 shows the 3.1.2. Statistical Data Analysis. Table 2 summarizes the
σ-profile descriptors considered in the present study and their dependent and independent variables, their ranges, unit of
ranges. measure, domain studied, their standard deviation, variance
S1 represents the hydrogen bond donor character of the and Kurtosis.
cation, S2 correspond to the nonpolar character of the cation 3.1.2.1. Kernel Density Distribution. The probability
and its hydrogen bond acceptor behavior. Similarly, S3 density function (pdf) of a random variable can be represented
indicates the hydrogen bond donor character in addition to nonparametrically as a kernel distribution, when a parametric
nonpolarity of the anion and S4 designates the strong distribution cannot describe the data and to avoid making
hydrogen bond acceptor behavior of the anion. Whereas, S5 assumptions about the data distribution. A smoothing function
is the summation of the three characters of the solutes and a bandwidth value control the smoothness of the resultant
D https://dx.doi.org/10.1021/acs.jced.0c00168
J. Chem. Eng. Data XXXX, XXX, XXX−XXX
Journal of Chemical & Engineering Data pubs.acs.org/jced Article

Figure 2. Input variables binary interactions.

density curve and characterize the kernel distribution. For a number of dependent variables, as an excess of them may lead
given predictive factor x, the estimated pdf of the variable to more noisy samples, and subsequently a trained model
represent its kernel density estimator and is defined as memorizing the noise instead of learning the trend of the data.
i x − xi yz
∑ kjjj zz
n To overcome this problem, regularization techniques such as

nh j = 1 k h {
1
fĥ (x) = Lasso and Ridge that lead to simple models that usually do not
(11) overfit. In brief, in a regularized machine learning model, the
loss function includes an additional term, also known as the
where x1, x2, ..., xn are random sample values from an regularization term, that has to be minimized as well and is
unidentified distribution, n is the data sample size, k represents defined as
the kernel smoothing function, and h is the bandwidth.51
Accordingly, the probability density distribution of pre- L= ∑ (γj∞pred − γj∞real)2 + λ∑ |β| (13)
dictive variables considered in this work is represented in
Figure 1. The regularization term in the Lasso loss function not only
3.1.2.2. Least Absolute Shrinkage and Selection Operator punishes the function for high values of coefficients β, but it
(Lasso). To evaluate the accuracy of a trained model using a also sets them to zero if they are not relevant. As a result, the
machine learning method, an optimization of a loss function is model will include the most pertinent inputs and the least
required. In regression, the value of the predicted output is significant ones will be discarded. The results of Lasso
continuous and is reliant on the choice and collection of inputs regularization applied to the problem discussed in this work
that best describe the problem in hand. Also, any interferences and using different types of interaction matrixes can be
that may exist between inputs should be closely identified, and accessed in the Supporting Information.
for that purpose the binary interactions between inputs are 3.1.2.3. Ridge Regularization. Ridge regularization, also
illustrated in Figure 2. known as Tikhonov Regularization, is the most popular
Training a model consists at minimizing the quadratic loss technique used for shrinking the number of inputs and
which is the sum of deviations between the predicted and improving the predictive accuracy of correlative models prior
actual output value. The multivariable linear regression to Lasso. The difference between these two regularization
problem is represented by eq 12 techniques is in the penalty term in the loss function L as
γj∞ = β1X1j + β2X 2j + β3X3j + ... + βi Xij + β0 (12)
follows

where X1j, X2jX3j, Xij are the i inputs variables corresponding to L= ∑ (γj∞pred − γj∞real)2 + λ∑ β 2 (14)
the output γ∞j in the line j in the dataset and β1, β2, β3, βi are
their respective coefficients while β0 is the intercept coefficient. Although Ridge regularization punishes high values of
Minimization of the loss function and finding optimal β coefficients β by the sum of squared in the loss function, it
coefficients through an iterative process allow the development does not eliminate the least significant covariates like Lasso
of a trained model for regression. However, a problem of does, and as a result, it reduces overfitting but does not set the
overfitting the data is often encountered in machine learning irrelevant inputs’ coefficients to zero. Outcomes of Ridge
algorithms, therefore attention should be given to the optimal regularization of the present data and a statistical analysis of
E https://dx.doi.org/10.1021/acs.jced.0c00168
J. Chem. Eng. Data XXXX, XXX, XXX−XXX
Journal of Chemical & Engineering Data pubs.acs.org/jced Article

input variables’ distribution are available in the Supporting performed randomly. In the next step, the kernel function that
Information. fits best our data trend and hyper-parameters are optimized
3.2. Modeling IDAC. 3.2.1. SVMr Model. In this paper, within the following ranges:
SVMr learning algorithm is used for nonlinear modeling of the C ∈ [10−3,103], ε ∈ [0,10−2], γ ∈ [10−3, 10], and n ∈ [2, 6];
IDAC of 17 solutes in imidazolium-based ILs using the largest while the different kernel functions tested are linear,
available/accessible literature data. The optimal SVMr model polynomial with different degrees, and Gaussian and radial
is obtained through the process illustrated in Figure 3. basis function (RBF). The optimal SVMr model is achieved
through a repetition of the steps listed above and based on the
best predictive ability of the model, which is measured in this
work by the root mean squared error (RMSE).
Based on the results, the Gaussian function is the kernel
function that fits best the data in this work, and the
hyperparameters C, ε, and γ have the values of 91, 0.0586,
and 2, respectively. The dataset used, Matlab code and model
parameters that allow the reproduction of the results are
available in the Supporting Information.
Different model’s statistical evaluation criteria are adopted in
this work in order to assess the predictive ability of the
developed models, the RMSE, the mean absolute error (MAE),
the mean absolute percentage error (MAPE), the coefficient of
correlation (R), the determination coefficient (R2), and the
intercept coordinate τ. Mathematical equations for these
parameters are
N
1
RMSE = ∑ (γ ∞exp − γi∞cal)2
Figure 3. Flowchart of the SVMr optimization process. N i=1 i (15)

First, the data reporting IDAC of 17 solutes in 44 n


imidazolium-based ILs is collected and organized in an excel 1
MAE = ∑ |(γ ∞Exp − γi∞cal)|
folder to be processed by the machine learning algorithm. The n i=1 i (16)
ILs and solutes involved in this study are recapitulated in Table
3. Second, the total dataset is divided into training and test sets
using the holdout function included in the statistical and 100
N
γ ∞Exp − γ ∞cal
machine learning toolbox of Matlab. A quarter of the whole MAPE (%) = ∑ i ∞Expi
N i=1 γi (17)
data is set for the test and the choice of selected points is

Table 3. ILs and Solutes Implicated in the Present Work

ILs solutes
IM-2,1 SCN IM-6,1 BF4 N-PENTANE
IM-2,1 AC IM-2,1 OTF N-HEXANE
IM-4,1 CL IM-4,1 PO2-O1,O1 N-HEPTANE
IM-2,1 DCA IM-6,1 TCB N-OCTANE
IM-2OH,1 DCA IM-2,1 SO3-PH1 N-NONANE
IM-4,1 SCN IM-4,1 OTF N-DECANE
IM-2,1 BF4 IM-10,1 BF4 N-UNDECANE
IM-4,1 AC IM-4,1 SO3-PH1 N-DODECANE
IM-2,1 CCN3 IM-6,1 OTF N-TRIDECANE
IM-6,1 CL IM-10,1 TCB N-TETRADECANE
IM-4,1 DCA IM-1,1 NTF2 CYCLOPENTANE
IM-2,1 SO3-1 IM-2,1 NTF2 METHYLCYCLOPENTANE
IM-2OH,1 BF4 IM-2OH,1 NTF2 CYCLOHEXANE
IM-1,1 PO2-O1,O1 IM-4,1 NTF2 METHYLCYCLOHEXANE
IM-2,1 TFA IM-6,1 NTF2 CYCLOHEPTANE
IM-6,1 SCN IM-6,1,1 NTF2 CYCLOOCTANE
IM-4,1 BF4 IM-1O6,1 NTF2 WATER
IM-2,1 TCB IM-2,1 FAP
IM-4,1 CCN3 IM-2OH,1 FAP
IM-4,1 SO3-1 IM-1O6,1O6 NTF2
IM-2,1 PO2-O1,O1 IM-6,1 FAP
IM-4,1 TFA IM-10,10,1 NTF2

F https://dx.doi.org/10.1021/acs.jced.0c00168
J. Chem. Eng. Data XXXX, XXX, XXX−XXX
ÅÄÅ N ÑÉÑ
Journal of Chemical & Engineering Data pubs.acs.org/jced Article

ÅÅ ∞cal Ñ
Ñ
Å
R = ÅÅ∑ (γi − γ )ÑÑÑ
ÅÅ ÑÑ
learning algorithm, associated to DA algorithm for optimiza-

ÅÇ i = 1 ÑÖ
∞Exp exp ∞cal
− γ ̅ ) × (γi tion of its hyper-parameters. The division proportions of the

ÅÄÅ N ÑÉÑ
ÅÅ ÑÑ
data for training and test, along with the ranges of the hyper-

/ÅÅÅ ∑ (γiÅ − γ ) ÑÑÑÑ


parameters are the same as that for SVMr discussed in the

ÅÅ ÑÑ
ÅÅÇ i = 1 ÑÑÖ
∞ Exp exp 2
− γ ̅ ) × (γi ∞ cal ∞ cal 2 previous section. The steps leading to the development of the
(18)
optimal DA-SVMr hybrid algorithm are illustrated in the
flowchart presented in Figure 5.
n
∑i = 1 (γiexp − γical)2
R2 = 1 − n
∑i = 1 (γiexp − γ ̅ exp)2 (19)
In order to evaluate the predictive ability of the optimized
SVMr model, the validation agreement vector, and the
validation agreement plot of the predicted versus experimental
response variable from the validation dataset only are analyzed.
Using the Matlab function “postreg”, the linear regression of
the calculated versus experimental output is visualized.
The total data agreement scheme is shown in Figure 4,
which illustrates the SVMr model’s calculated outputs versus

Figure 5. Flowchart of the DA-SVM optimization process.

The machine learning algorithm principle remains the same,


however, the DA algorithm feeds the SVM initially with a
random combination of hyper-parameters within their ranges.
The steps beginning with division of data and up until the
development of SVMr model are repeated and the minimum
value of RMSE obtained is saved as best through five iterations
of a simulation. Then, the DA generates a new population of
hyper-parameters for the SVMr algorithm and the same set of
Figure 4. Scatter plot of the global data adjusted by the optimal SVMr steps are repeated in order to achieve a new best RMSE and
model. that is for 100 trials, amongst which the minimum RMSE
corresponds to the resultant optimal DA-SVMr model.
the experimental data with an agreement vector [α (slope), τ The same process explained above is tested with different
(y intercept), R (correlation coefficient)] = [0.9871, 0.0466, kernel functions, namely, linear, polynomial, RBF and
0.9951]. The results reveal an overall accuracy of the Gaussian. Results of DA-SVMr modeling of the same data
developed model in correlating the total data. Moreover, the using different kernel functions are summarized in a Table 4.
preponderance of the test data which follows exactly the trend Based on the results obtained, the linear kernel function gave
of the SVMr model linear regression except for few points fair modeling results by DA-SVMr and yielded the highest
(<3% of the test data) that yielded higher deviations, elucidates errors and lowest correlation coefficient among all kernel
the prediction precision of the optimized model appraised by functions. The nonlinearity of the system studied and its
its RMSE, R and R2 of a value of 0.1827, 0.9951, and 0.9899, complexity thermodynamically and physicochemically are the
respectively. main reasons for the poor linear kernel performance. On the
As mentioned above, in the previous SVMr modeling, the other hand, the polynomial kernel degree is finely tuned to an
optimization of the three hyper-parameters, which are the optimal fractional order which usually gives better results than
regularization constant C, the ε-insensitive loss function the real values of orders. Although, the polynomial of
parameter, and the kernel function’s parameter γ, is crucial optimized degree 3.33 and the Gaussian kernels gave better
to the performance of the model and is performed during the results than the linear one, the RBF outperformed all tested
training phase through a grid search within the parameters functions and the DA-SVMr RBF-based model capitulated the
intervals listed earlier and based on holdout cross-validation. most accurate predictions.
However, this optimization technique is quite traditional and The hyper-parameters optimized by DA for hybrid modeling
the chance of missing the optimal target is high because of the of IDAC based on SVM for regression (SVMr) using the RBF
unavoidable spans inside each interval. Therefore, it will be kernel function are C, ε, and γ estimated at 176.023, 0.032, and
interesting to adopt other types of optimization of the hyper- 2.190, respectively.
parameters, among which a SI based method known as DA. All the necessary files for reproduction of the results are
3.2.2. DA-SVMr Model. Previous papers have coupled SI available in the Supporting Information.
optimization algorithms with learning machines and the results As shown in Figure 6, the optimal DA-SVMr model using
were promising.52−54 The proposed model is based on SVMr the RBF exhibited a remarkable efficiency in correlating the
G https://dx.doi.org/10.1021/acs.jced.0c00168
J. Chem. Eng. Data XXXX, XXX, XXX−XXX
Journal of Chemical & Engineering Data pubs.acs.org/jced Article

Table 4. Results of DA-SVMr Modeling Using Different Kernel Functions


DA-optimized hyper-parameters DA-SVMr model statistical parameters
C ε γ n quantity of support vectors R R2 RMSE MAPE (%) MAE
linear 21.368 0.7967 1748 0.8758 0.6891 0.8926 30.608 0.4996
RBF 176.023 0.0319 2.1901 1457 0.9954 0.9906 0.1785 3.7801 0.0979
polynomial 1 0.1321 3.3273 1485 0.9799 0.9588 0.3635 12.6783 0.2716
Gaussian 244.2572 0.0458 1.8723 1421 0.99295 0.9855 0.2141 4.336 0.1203

affect the accuracy of the data points, which influence the


validity of the model.55 The data points distant from the
general trend of the majority of data points are called outliers.
Furthermore, discovering such data points is crucial for
improving the model accuracy by eliminating the suspected
inexact experimental data which weakens the model perform-
ance.56 In the present study, the Leverage mathematical
technique is used to detect outliers, where the residual values
and a Hat matrix are employed. Mohammadi et al.57 gave a
detailed path for creating the Hat values, and the main steps
are the following

H = X(XTX )−1XT (20)

where X designates the input experimental observation matrix


Figure 6. Scatter plot of the global data adjusted by the optimal DA- (m × n), m is the number of observations, and n is the number
SVMr model.
of model input parameters. The Hat values are deducted
directly from the diagonal of the matrix H created by eq 20.
global data, confirmed by a correlation coefficient’s value of
0.99575 and a significant low RMSE of 0.1696. Also, the Hat = diagonale(H ) (21)
predictive ability of the DA-SVM model is demonstrated by
testing 666 data points which follow closely the trend of the In our study, the DA-SVM model presents higher accuracy
experimental IDAC, except for few points for which the than the SVM model, the reason why it is considered for the
deviations between experimental and model predicted values
outlier search. Consequently, a William’s diagram representing
are noteworthy. The predictive correlation coefficient R,
RMSE, and MAPE are 0.9954, 0.1785, and 3.7801%, graphically the normalized residuals according to the Leverage
respectively. Also, the marginal loss between the R2 for (Hat) values is established and allows the classification of the
training and testing of an optimal DA-SVMr model is total data points between validated data and outliers
insignificant suggesting its high generalization performance in (suspected data). The critical leverage H* in the plot is
terms of R2. defined by the following equation
3.2.3. Outlier Detection for the DA-SVM Model. As shown
in Figure 6, there are data points relatively distant from the 3(n + 1)
H* =
bisector line. The fact that in this study, a large experimental m (22)
dataset is used to develop the regression model, errors, and
uncertainties resulting from laboratorial measurements may The normalized residuals are calculated using eq 23

Figure 7. William’s Diagram for detection of outliers for the DA-SVM model.

H https://dx.doi.org/10.1021/acs.jced.0c00168
J. Chem. Eng. Data XXXX, XXX, XXX−XXX
Journal of Chemical & Engineering Data pubs.acs.org/jced Article

(γi∞exp − γi∞cal) Table 5. Statistical Evaluation of the Models’ Performance


(R _Norm)i = i = 1, ...m
Var(γi∞exp − γi∞cal) COSMO-RS3 SVM DA-SVM
RMSE 1.1180 0.2421 0.1785
(23)
MAE 0.7822 0.1077 0.0979
Normalized residuals equal ±3 are considered margin limits MAPE 22.7504 3.6446 3.7801
for validated data, whereas residuals outside these ranges MSE 1.2499 0.0586 0.0319
define suspected data. In addition, all data points with Hat MRSE 0.1905 0.0101 0.0116
values > H* are also considered as suspected data. The results RAE 507.9285 6.7443 7.7576
of the outlier analysis is presented in Figure 7. Af 1.2724 1.0382 1.0503
Based on the abovementioned figure, the H* value is 0.0101, Bf 0.9703 0.9975 0.9907
and only the data represented by green circles is the valid data R 0.8412 0.9912 0.9954
and for which the model’s accuracy is confirmed. This set has R2 0.7076 0.9825 0.9908
2573 data points out of the 2666 overall data that was
originally used to develop the DA-SVM model and represents a
ratio of 96.51% of the overall experimental data. The doubtful allow the representation of the nonlinear relationship between
data are designated by blue triangles and encloses 93 outliers the IDAC of solutes in 44 imidazolium-based ILs and the
corresponding to 3.49% of the overall experimental data. The molecular structural charge density of the solute and solvent,
validation of the model is moreover confirmed within the range and that to a large extent.
of the validated data and suspicious data and is clearly Despite the fact that COSMO-RS is an advanced technique
perceived. that is considered predictive insofar as it does not rely on
3.3. Comparison of Models. The SVMr and DA-SVMr experimental data for predicting IDAC, this hybrid quantum
models developed in this work and the COSMO-RS-based mechanical thermodynamic method provides rough estima-
model3 for the estimation of IDAC are compared in terms of tions of the parameter, which can be clearly seen in Figure 8.
certain statistical indices for the determination of models’ Whereas, the σ-profile of molecules delivered by COSMO-RS3
performance. The evaluation criteria listed above which are the seem to be closely relevant and quantitatively representative of
RMSE, the MAE, the MAPE, the coefficient of correlation (R), the molecular level interactions between the solute and the
and the determination coefficient (R2) calculated by eqs solvent without explicit knowledge of these interactions. The
15−19, respectively, are calculated for the evaluation of judicious choice of the COSMO-RS σ-profile descriptors as
performance of each model, hence accentuation of the model inputs for the development of correlative models based on the
that gave the most accurate predictions. In addition, other SVM algorithm resulted in significantly more accurate models
evaluation terms are considered, namely, the mean square error for IDAC estimation, and Table 5 confirms these results in
(MSE), the mean relative squared error (MRSE), the relative terms of the statistical evaluation parameters of the three
absolute error (RAE), the accuracy factor (Af), and bias factor models.
(Bf) are given by eqs 24−28,respectively.58 By associating DA with SVM regression and using COSMO-
RS σ-profile descriptors as input variables, the DA-SVMr
N
1 model promotes approximately zero uncertainties in inputs and
MSE = ∑ (γ ∞Exp − γi∞cal)2 finely selected hyper-parameters of SVMr. In trade off, the
N i=1 i (24) good results obtained by the DA-SVMr model is the absolute
N i ∞Exp
jγ − γi∞cal yzz
MRSE = ∑ jjjj i zz
independency of the validation dataset from the training

zz
2

N i=1 j
1 dataset and from the subset of data which is used in the

k {
γi∞Exp assessment of the DA-SVMr optimization performance. For
(25) this purpose, the function Matlab “holdout” for cross-
validation is used and to ensure that the DA-SVMr optimal
N
γi∞Exp − γi∞cal model is not contingent on the validation data. Consequently,
RAE = ∑ the improved predictive accuracy of the model is the result of
i=1
γi∞Exp (26) the swarm optimization of the learning machine hyper-
γi∞cal
parameters and certainly not the result of overfitting.59,60
N
(∑i + 1 |log |/N)
A f = 10 γi∞exp (27) 4. CONCLUSIONS
N γ ∞cal In this study, two SVMr models were developed for the
(∑i + 1 log i∞exp /N) estimation of IDAC of seventeen solutes in 44 imidazolium-
Bf = 10 γi (28)
based ILs based on temperature, molecular weight of solute
The values of the ten statistical evaluation indices of the and solvent, and five COSMO-RS σ-profile descriptors for
three models discussed in this work for prediction (validation both ILs and solutes. Experimental data is selected from the
data) are summarized in Table 5. literature and is closely analyzed statistically. Two advanced
Based on the models’ statistical parameter RMSE, it can be deep learning regularization techniques, known as Lasso and
seen that the two SVMr-based models surpassed the COSMO- Ridge are applied in this study to illustrate the decisive
RS model.3 The difference between the two models (SVMr selection of input parameters. The two developed models show
and DA-SVMr) is insignificant compared to that of the two a good correlative and predictive accuracy. Statistical the
same models with the COSMO-RS one. Moreover, the values comparison of different models reveals a weak performance of
of R2 (for validation) and ΔR2 in addition to the RMSE imply a COSMO-RS model3 in predicting IDAC in contrast to the
that the DA-SVMr model has more predictive power than the two SVMr models. The results confirm that the COSMO-RS is
SVMr. This means that the two models proposed in this work a good source of structural molecular information about the
I https://dx.doi.org/10.1021/acs.jced.0c00168
J. Chem. Eng. Data XXXX, XXX, XXX−XXX
Journal of Chemical & Engineering Data pubs.acs.org/jced Article

Figure 8. Parity plots of the experimental versus calculated γ∞ 3


i by the three models. (a) COSMO-RS , (b) SVMr model, and (c) DA-SVMr model.

molecules of solutes and solvents (ILs). Furthermore, the DA- Maamar Laidi − Biomaterials and Transport Phenomena
SVMr model outperformed the SVM model with a validation Laboratory (LBMPT), University Yahia Fares of Medea,
correlation coefficient of 0.9954 and a RMSE equal to 0.1785. Medea 26000, Algeria; Department of Process Engineering,
These results demonstrate the improved performance of SVM Institute of Technology, University Dr. Yahia Fares of Medea,
for regression when the hyper-parameters are optimized using Medea 26000, Algeria
DA. Despite the better results and improved accuracy obtained Complete contact information is available at:
with the DA-SVMr model, associating SVMr with other SI https://pubs.acs.org/10.1021/acs.jced.0c00168
algorithms for optimization of hyper-parameters is highly
recommended as this may lead to even more accurate models.


Notes
ASSOCIATED CONTENT The authors declare no competing financial interest.
*
sı Supporting Information
The Supporting Information is available free of charge at
https://pubs.acs.org/doi/10.1021/acs.jced.0c00168.
■ REFERENCES
(1) Welton, T. Room-Temperature Ionic Liquids. Solvents for
Synthesis and Catalysis. Chem. Rev. 1999, 99, 2071−2084.
ZIP file contains ‘data_ss.xlsx’ and ‘data_ss_Info.docx’ (2) Freemantle, M. An Introduction to Ionic Liquids; Royal Society of
providing the data points used for modeling. It also Chemistry, 2010.
encloses two folders ‘DA-SVM’ and ‘SVM’ which include (3) Paduszyński, K. An Overview of the Performance of the
the two models developed in this work using Matlab, COSMO-RS Approach in Predicting the Activity Coefficients of
along with a ‘README_File.docx’ that describes each of Molecular Solutes in Ionic Liquids and Derived Properties at Infinite
the other files and how they are used together (ZIP) Dilution. Phys. Chem. Chem. Phys. 2017, 19, 11835−11850.


(4) Lei, Z.; Dai, C.; Zhu, J.; Chen, B. Extractive Distillation with
Ionic Liquids: A Review. AIChE J. 2014, 60, 3312−3329.
AUTHOR INFORMATION (5) Kato, R.; Krummen, M.; Gmehling, J. Measurement and
Corresponding Author Correlation of Vapor−liquid Equilibria and Excess Enthalpies of
Hania Benimam − Biomaterials and Transport Phenomena Binary Systems Containing Ionic Liquids and Hydrocarbons. Fluid
Laboratory (LBMPT), University Yahia Fares of Medea, Phase Equilib. 2004, 224, 47−54.
(6) Brennecke, J. F.; Maginn, E. J. Ionic Liquids: Innovative Fluids
Medea 26000, Algeria; orcid.org/0000-0002-9786-7166; for Chemical Processing. AIChE J. 2001, 47, 2384−2389.
Phone: +213 669542184; Email: hbenimam@gmail.com (7) Kazakov, A.; Magee, J. W.; Chirico, R. D.; Diky, V.; Muzny, C.
D.; Kroenlein, K.; Frenkel, M. NIST Standard Reference Database 147:
Authors
NIST Ionic Liquids Database(ILThermo). Version 2.0; National
Cherif Si Moussa − Biomaterials and Transport Phenomena Institute of Standards and Technology: Gaithersburg MD. There is no
Laboratory (LBMPT), University Yahia Fares of Medea, Corresp. Rec. this Ref, 2013.
Medea 26000, Algeria; Department of Process Engineering, (8) Paduszyński, K. In Silico Calculation of Infinite Dilution Activity
Institute of Technology, University Dr. Yahia Fares of Medea, Coefficients of Molecular Solutes in Ionic Liquids: Critical Review of
Medea 26000, Algeria Current Methods and New Models Based on Three Machine
Mohamed Hentabli − Biomaterials and Transport Phenomena Learning Algorithms. J. Chem. Inf. Model. 2016, 56, 1420−1437.
Laboratory (LBMPT), University Yahia Fares of Medea, (9) Weidlich, U.; Gmehling, J. A Modified UNIFAC Model. 1.
Medea 26000, Algeria; Department of Process Engineering, Prediction of VLE, HE, and .Gamma..Infin. Ind. Eng. Chem. Res. 1987,
Institute of Technology, University Dr. Yahia Fares of Medea, 26, 1372−1381.
(10) Simoni, L. D.; Lin, Y.; Brennecke, J. F.; Stadtherr, M. A.
Medea 26000, Algeria; Laboratory of Quality Control, Physico-
Modeling Liquid− Liquid Equilibrium of Ionic Liquid Systems with
Chemical Department, SAIDAL of Medea, Medea 26000, NRTL, Electrolyte-NRTL, and UNIQUAC. Ind. Eng. Chem. Res.
Algeria; orcid.org/0000-0002-6693-0708 2008, 47, 256−272.
Salah Hanini − Biomaterials and Transport Phenomena (11) Eike, D. M.; Brennecke, J. F.; Maginn, E. J. Predicting Infinite-
Laboratory (LBMPT), University Yahia Fares of Medea, Dilution Activity Coefficients of Organic Solutes in Ionic Liquids. Ind.
Medea 26000, Algeria Eng. Chem. Res. 2004, 43, 1039−1048.

J https://dx.doi.org/10.1021/acs.jced.0c00168
J. Chem. Eng. Data XXXX, XXX, XXX−XXX
Journal of Chemical & Engineering Data pubs.acs.org/jced Article

(12) Tämm, K.; Burk, P. QSPR Analysis for Infinite Dilution of the Water Activity Coefficient at Infinite Dilution in Ionic Liquids.
Activity Coefficients of Organic Compounds. J. Mol. Model. 2006, 12, Ind. Eng. Chem. Res. 2014, 53, 12466−12475.
417−421. (31) Matheswaran, P.; Wilfred, C. D.; Kurnia, K. A.; Ramli, A.
(13) Xi, L.; Sun, H.; Li, J.; Liu, H.; Yao, X.; Gramatica, P. Prediction Overview of Activity Coefficient of Thiophene at Infinite Dilution in
of Infinite-Dilution Activity Coefficients of Organic Solutes in Ionic Ionic Liquids and Their Modeling Using COSMO-RS. Ind. Eng.
Liquids Using Temperature-Dependent Quantitative Structure − Chem. Res. 2016, 55, 788−797.
Property Relationship Method. Chem. Eng. J. 2010, 163, 195−201. (32) Vapnik, A. V. The Nature of Statistical Learning Theory;
(14) Grubbs, L. M.; Ye, S.; Saifullah, M.; McMillan-Wiggins, M. C.; Springer-Verlag: New York, 1995.
Acree, W. E.; Abraham, M. H.; Twu, P.; Anderson, J. L. Correlations (33) Yoon, H.; Jun, S.-C.; Hyun, Y.; Bae, G.-O.; Lee, K.-K. A
for Describing Gas-to-Ionic Liquid Partitioning at 323 K Based on Comparative Study of Artificial Neural Networks and Support Vector
Ion-Specific Equation Coefficient and Group Contribution Versions Machines for Predicting Groundwater Levels in a Coastal Aquifer. J.
of the Abraham Model. Fluid Phase Equilib. 2011, 301, 257−266. Hydrol. 2011, 396, 128−138.
(15) Mutelet, F.; Ortega-Villa, V.; Moïse, J.-C.; Jaubert, J.-N.; Acree, (34) Drucker, H.; Burges, C. J. C.; Kaufman, L.; Smola, A. J.; Vapnik,
W. E. Prediction of Partition Coefficients of Organic Compounds in V. Support Vector Regression Machines. Advances in Neural
Ionic Liquids Using a Temperature-Dependent Linear Solvation Information Processing Systems; MIT Press, 1997; pp 155−161.
Energy Relationship with Parameters Calculated through a Group (35) Yu, P.-S.; Chen, S.-T.; Chang, I.-F. Support Vector Regression
Contribution Method. J. Chem. Eng. Data 2011, 56, 3598−3606. for Real-Time Flood Stage Forecasting. J. Hydrol. 2006, 328, 704−
(16) Stephens, T. W.; Chou, V.; Quay, A. N.; Shen, C.; Dabadge, N.; 716.
Tian, A.; Loera, M.; Willis, B.; Wilson, A.; Acree, W. E.; Twu, P.; (36) Wu, C.-H.; Ho, J.-M.; Lee, D. T. Travel-Time Prediction with
Anderson, J. L.; Abraham, M. H. Thermochemical Investigations of Support Vector Regression. IEEE Trans. Intell. Transp. Syst. 2004, 5,
Solute Transfer into Ionic Liquid Solvents: Updated Abraham Model 276−281.
Equation Coefficients for Solute Activity Coefficient and Partition (37) Wen, Y. F.; Cai, C. Z.; Liu, X. H.; Pei, J. F.; Zhu, X. J.; Xiao, T.
Coefficient Predictions. Phys. Chem. Liq. 2014, 52, 488−518. T. Corrosion Rate Prediction of 3C Steel under Different Seawater
(17) Gonfa, G.; Bustam, M. A.; Sharif, A. M.; Mohamad, N.; Ullah, Environment by Using Support Vector Regression. Corros. Sci. 2009,
S. Tuning Ionic Liquids for Natural Gas Dehydration Using COSMO- 51, 349−355.
RS Methodology. J. Nat. Gas Sci. Eng. 2015, 27, 1141−1148. (38) Chevalier, R. F.; Hoogenboom, G.; McClendon, R. W.; Paz, J.
(18) Gonfa, G.; Bustam, M. A.; Shariff, A. M.; Muhammad, N.; A. Support Vector Regression with Reduced Training Sets for Air
Ullah, S. Quantitative Structure−activity Relationships (QSARs) for Temperature Prediction: A Comparison with Artificial Neural
Estimation of Activity Coefficient at Infinite Dilution of Water in Networks. Neural Comput. Appl. 2011, 20, 151−159.
Ionic Liquids for Natural Gas Dehydration. J. Taiwan Inst. Chem. Eng. (39) He, Z.; Wen, X.; Liu, H.; Du, J. A Comparative Study of
2016, 66, 222−229. Artificial Neural Network, Adaptive Neuro Fuzzy Inference System
(19) Shiflett, M. B.; Maginn, E. J. The Solubility of Gases in Ionic
and Support Vector Machine for Forecasting River Flow in the
Liquids. AIChE J. 2017, 63, 4722−4737.
Semiarid Mountain Region. J. Hydrol. 2014, 509, 379−386.
(20) Matheswaran, P.; Wilfred, C. D.; Kurnia, K. A.; Ramli, A.
(40) Parveen, N.; Zaidi, S.; Danish, M. Support Vector Regression
Prediction of Activity Coefficient of Sulfones at Infinite Dilution in
Model for Predicting the Sorption Capacity of Lead (II). Perspect. Sci.
Ionic Liquids and Their Modeling Using COSMO-RS. AIP Conference
2016, 8, 629−631.
Proceedings; AIP Publishing, 2016; Vol. 1787, p 020007.
(41) Vapnik, V.; Golowich, S. E.; Smola, A. J. Support Vector
(21) Wang, S.; Sandler, S. I.; Chen, C.-C. Refinement of COSMO−
Method for Function Approximation, Regression Estimation and
SAC and the Applications. Ind. Eng. Chem. Res. 2007, 46, 7275−7288.
(22) Ajmani, S.; Rogers, S. C.; Barley, M. H.; Burgess, A. N.; Signal Processing. In Advances in Neural Information Processing
Livingstone, D. J. Characterization of Mixtures Part 1: Prediction of Systems; MIT Press, 1997; pp 281−287.
Infinite-Dilution Activity Coefficients Using Neural Network-Based (42) Beni, G.; Wang, J. Swarm Intelligence in Cellular Robotic
QSPR Models. QSAR Comb. Sci. 2008, 27, 1346−1361. Systems. Robots and Biological Systems: Towards a New Bionics?;
(23) Nami, F.; Deyhimi, F. Prediction of Activity Coefficients at Springer Berlin Heidelberg: Berlin, Heidelberg, 1993; pp 703−712.
Infinite Dilution for Organic Solutes in Ionic Liquids by Artificial (43) Bonabeau, E.; Dorigo, M.; Theraulaz, G. Swarm Intelligence:
Neural Network. J. Chem. Thermodyn. 2011, 43, 22−27. From Natural to Artificial Systems; Oxford University Press: New York,
(24) Banerjee, T.; Khanna, A. Infinite Dilution Activity Coefficients 1999.
for Trihexyltetradecyl Phosphonium Ionic Liquids: Measurements (44) Tharwat, A.; Moemen, Y. S.; Hassanien, A. E. Classification of
and COSMO-RS Prediction. J. Chem. Eng. Data 2006, 51, 2170− Toxicity Effects of Biotransformed Hepatic Drugs Using Whale
2177. Optimized Support Vector Machines. J. Biomed. Inf. 2017, 68, 132−
(25) Diedenhofen, M.; Eckert, F.; Klamt, A. Prediction of Infinite 149.
Dilution Activity Coefficients of Organic Compounds in Ionic Liquids (45) Mirjalili, S. Dragonfly Algorithm: A New Meta-Heuristic
Using COSMO-RS. J. Chem. Eng. Data 2003, 48, 475−479. Optimization Technique for Solving Single-Objective, Discrete, and
(26) Freire, M. G.; Ventura, S. P. M.; Santos, L. M. N. B. F.; Multi-Objective Problems. Neural Comput. Appl. 2016, 27, 1053−
Marrucho, I. M.; Coutinho, J. A. P. Evaluation of COSMO-RS for the 1073.
Prediction of LLE and VLE of Water and Ionic Liquids Binary (46) Mullins, E.; Oldland, R.; Liu, Y. A.; Wang, S.; Sandler, S. I.;
Systems. Fluid Phase Equilib. 2008, 268, 74−84. Chen, C.-C.; Zwolak, M.; Seavey, K. C. Sigma-Profile Database for
(27) Kurnia, K. A.; Pinho, S. P.; Coutinho, J. A. P. Evaluation of the Using COSMO-Based Thermodynamic Methods. Ind. Eng. Chem. Res.
Conductor-like Screening Model for Real Solvents for the Prediction 2006, 45, 4389−4415.
of the Water Activity Coefficient at Infinite Dilution in Ionic Liquids. (47) Klamt, A. The COSMO and COSMO-RS Solvation Models.
Ind. Eng. Chem. Res. 2014, 53, 12466−12475. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2011, 1, 699−709.
(28) Diedenhofen, M.; Klamt, A. COSMO-RS as a Tool for Property (48) Klamt, A. Conductor-like Screening Model for Real Solvents: A
Prediction of IL Mixtures-A Review. Fluid Phase Equilib. 2010, 294, New Approach to the Quantitative Calculation of Solvation
31−38. Phenomena. J. Phys. Chem. 1995, 99, 2224−2235.
(29) Kumar, A. A. P.; Banerjee, T. Thiophene Separation with Ionic (49) Eckert, F.; Klamt, A. Fast Solvent Screening via Quantum
Liquids for Desulphurization: A Quantum Chemical Approach. Fluid Chemistry: COSMO-RS Approach. AIChE J. 2002, 48, 369−385.
Phase Equilib. 2009, 278, 1−8. (50) Klamt, A.; Jonas, V.; Bürger, T.; Lohrenz, J. C. W. Refinement
(30) Kurnia, K. A.; Pinho, S. P.; Coutinho, J. A. P. Evaluation of the and Parametrization of COSMO-RS. J. Phys. Chem. A 1998, 102,
Conductor-like Screening Model for Real Solvents for the Prediction 5074−5085.

K https://dx.doi.org/10.1021/acs.jced.0c00168
J. Chem. Eng. Data XXXX, XXX, XXX−XXX
Journal of Chemical & Engineering Data pubs.acs.org/jced Article

(51) Peter D, H. Kernel Estimation of a Distribution Function.


Commun. Stat. Theor. Methods 1985, 14, 605−620.
(52) Khaouane, L.; Si-Moussa, C.; Hanini, S.; Benkortbi, O.
Optimization of Culture Conditions for the Production of Pleuro-
mutilin from Pleurotus Mutilus Using a Hybrid Method Based on
Central Composite Design, Neural Network, and Particle Swarm
Optimization. Biotechnol. Bioprocess Eng. 2012, 17, 1048−1054.
(53) Amroune, M.; Bouktir, T.; Musirin. Power System Voltage
Stability Assessment Using a Hybrid Approach Combining Dragonfly
Optimization Algorithm and Support Vector Regression. Arabian J.
Sci. Eng. 2018, 43, 3023−3036.
(54) Keskes, S.; Hanini, S.; Hentabli, M.; Laidi, M. Artificial
Intelligence and Mathematical Modelling of the Drying Kinetics of
Pharmaceutical Powders. Kem. Ind. 2020, 69, 137−152.
(55) Rousseeuw, P. J.; Leroy, A. M. Robust Regression and Outlier
Detection; Wiley: New York, 1987; Vol. 254.
(56) Hosseinzadeh, M.; Hemmati-Sarapardeh, A. Toward a
Predictive Model for Estimating Viscosity of Ternary Mixtures
Containing Ionic Liquids. J. Mol. Liq. 2014, 200, 340−348.
(57) Mohammadi, A. H.; Gharagheizi, F.; Eslamimanesh, A.; Richon,
D. Evaluation of Experimental Data for Wax and Diamondoids
Solubility in Gaseous Systems. Chem. Eng. Sci. 2012, 81, 1−7.
(58) Soleimani, R.; Saeedi Dehaghani, A. H.; Shoushtari, N. A.;
Yaghoubi, P.; Bahadori, A. Toward an Intelligent Approach for
Predicting Surface Tension of Binary Mixtures Containing Ionic
Liquids. Korean J. Chem. Eng. 2018, 35, 1556−1569.
(59) Huang, C.-L.; Wang, C.-J. A GA-Based Feature Selection and
Parameters Optimizationfor Support Vector Machines. Expert Syst.
Appl. 2006, 31, 231−240.
(60) Lin, S.-W.; Ying, K.-C.; Chen, S.-C.; Lee, Z.-J. Particle Swarm
Optimization for Parameter Determination and Feature Selection of
Support Vector Machines. Expert Syst. Appl. 2008, 35, 1817−1824.

L https://dx.doi.org/10.1021/acs.jced.0c00168
J. Chem. Eng. Data XXXX, XXX, XXX−XXX

You might also like