You are on page 1of 11

Journal of CO2 Utilization 58 (2022) 101926

Contents lists available at ScienceDirect

Journal of CO2 Utilization


journal homepage: www.elsevier.com/locate/jcou

Turning deep-eutectic solvents into value-added products for CO2 capture:


A desirability-based virtual screening study
Amit Kumar Halder a, Pravin Ambure a, Yunierkis Perez-Castillo b, M. Natália D.S. Cordeiro a, *
a
LAQV@REQUIMTE/Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
b
Bio-Cheminformatics Research Group and Escuela de Ciencias Físicas y Matemáticas, Universidad de Las Américas, Quito 170504, Ecuador

A R T I C L E I N F O A B S T R A C T

Keywords: Carbon dioxide capture and storage is one of the primary mitigation approaches endorsed by current global
Deep eutectic solvents climate change policies, since it is the largest contributing gas to the greenhouse effect. Recently, deep eutectic
CO2 absorption solvents (DESs) have been found as promising green solvents for CO2 capture, particularly in industrial appli­
Viscosity
cations. Compared to conventional ionic liquids, for example, DESs display similar solvent properties but they are
QSPR
Desirability-based screening and ranking
far cheaper, easier to prepare, and greener. Yet their major drawback is linked to their high viscosity that im­
pedes management not fulfilling industrial demands. Further efforts are thus required to develop new DES
solvents that hold together high capacity for CO2 uptake and low viscosity. In this work, we adopted a
desirability-based decision approach to jointly handle these two conflicting properties. We began by setting up
Quantitative Structure-Property Relationships (QSPR) models for uncovering the two properties based on a
dataset of known binary DESs. Then, desirability functions derived from the individual QSPR models were found
and combined into an overall desirability-based criterion that was applied to screen and rank two libraries, one
comprising experimentally reported DESs and another one with newly designed DESs. The latter enabled us to
propose novel efficient DESs for CO2 uptake, i.e., with the most suitable trade-offs between CO2 absorption
capacity and viscosity. Finally, but most importantly, this work demonstrates the usefulness of the desirability-
based approach for the rational discovery of deep eutectic solvents or other materials to suit particular sus­
tainable applications.

1. Introduction become essential to achieve a cleaner energy production, transportation,


and/or transformation into value-added products. As such, non-aqueous
Despite the present and future threat posed by climate change, solvents such as pure or functionalised ionic liquids (ILs) have been
emissions of greenhouse gases have continued to growth, especially of thought to be, at least for a certain time, an alternative green solution for
carbon dioxide (CO2). Undeniably, CO2 is one of the greenhouse gases CO2 capture [7–9]. Apart from having high CO2 absorption capacity, ILs
that mostly contribute to climate change and global warming [1]. The were considered green solvents mainly due to their low volatility and
largest human-source of carbon dioxide comes from the burning of fossil large stability. But for most of the ILs, their synthesis is difficult and
fuels like coal, natural gas and oil. It has been observed that CO2 cost-prohibitive, and often they show significant toxicity and poor
generated from fossil fuels and energy related operations contributes up biodegradability.
to 60% to global warming, and the control of the emission of this Recently, deep eutectic solvents (DESs), a new class of solvents
greenhouse gas is thus required to sustain environmental balance [2]. analogous to ILs, have gained increased attention as promising green
Currently, several CO2 capture technologies are available. Among these, solvents for CO2 capture, especially for industrial applications [10,11].
the amine scrubbing technique that resorts to the use of amine-based DESs are generally prepared by combining Lewis bases, i.e. hydrogen
aqueous solutions is the most frequently applied [3–6]. Nevertheless, bond acceptors (HBAs), with acids or hydrogen bond donors (HBDs) in a
this technique has various flaws, such as being itself environmentally specific molar ratio so that these components may associate with each
hazardous and energetically expensive [3]. Innovative post-combustion other through hydrogen bond interactions to produce a eutectic mixture,
technological solutions for removing acid gases like CO2 have thus the melting point of which is lower than either of its components [12,

* Corresponding Author.
E-mail address: ncordeir@fc.up.pt (M.N.D.S. Cordeiro).

https://doi.org/10.1016/j.jcou.2022.101926
Received 12 August 2021; Received in revised form 22 January 2022; Accepted 2 February 2022
Available online 14 February 2022
2212-9820/© 2022 Elsevier Ltd. All rights reserved.
A.K. Halder et al. Journal of CO2 Utilization 58 (2022) 101926

13]. In 2003, Abbott et al. reported the first DES, which was prepared by curated in such a way that the influence of the latter on the response
mixing choline chloride (ChCl) and urea in a molar ratio of 1:2. variables was minimised as much as possible. To do so, for the CO2
Although both these components have high melting points (i.e. 302 ◦ C capture capacity, only dataset samples measured with a pressure range
and 133 ◦ C for ChCl and urea, respectively), the resultant DES demon­ of 0.1–0.5 MPa were considered, and similarly, for the viscosity, only
strated to own a very low melting point of 12 ◦ C [14]. Subsequently, dataset samples measured at a pressure of 0.1 MPa. Whereas concerning
Abbott et al. exemplified how DESs could be produced between a range temperature, the dataset samples considered for both properties had
of carboxylic acids, alcohols and choline chloride [15]. been measured in the 298.15 K − 308.15 K range.
So far, a wide variety of DESs have been reported in the literature The collected chemical structures and properties were appropriately
[12,16]. As solvents, DESs dissolve a broad spectrum of materials such as curated following the data curation workflow recommended by Four­
sugars, salts, drugs, amino acids, proteins, etc. Apart from being easy to ches et al. [41]. Further, all the response property values were
prepare, cheap, biodegradable and renewable, most of the DESs are less log-transformed (log mCO2 and log η) for being of practical use in the
toxic in nature, and therefore more environment friendly compared to following QSPR modelling. The distribution of the resultant response
conventional organic solvents and even to their IL counterparts [12,17]. variables is shown in Fig. 1, whereas details of the corresponding dataset
Yet, a number of physicochemical properties should be investigated samples are provided in the Supplementary Material (Table S1).
during the development of DESs or of any other solvent to judge their
usage in various chemical and industrial processes [10,12,18,19]. In 2.2. Molecular descriptors
particular, viscosity is one of such DESs’ crucial properties that is
essential to be investigated in accessing their overall potential and The codification of the molecular structure of the DES’ components
feasibility for industrial applications [18–23]. Generally, viscosity of the relied upon the different sets of 0–2D descriptors available in the Dragon
targeted solvents is required for the calculation of the mass transfer and software [42], which in turn have a long history in structure-activity and
reaction rates [23,24]. That is, without the knowledge of viscosity, in­ structure-property relationships. 3D or higher dimensional descriptors
dustrial process equipment, pipelines and pumping are impossible to were avoided because their calculation besides being more computa­
determine [22,25]. Indeed, many DESs display high viscosity that re­ tionally demanding, requires pre-finding the most important structural
stricts their industrial application due to both the high pumping costs as conformations of DES’ components at a given temperature that, in their
well as the poor heat and mass transfer involved in their processing [8, turn, clearly depend on the quantum mechanical method employed
26]. Therefore, a problem facing DESs’ development efforts for potential [43].
application in CO2 capture is to find the best compromise between Since DESs are basically mixtures, the molecular descriptors calcu­
various, often competing objectives. Clearly, the ideal deep eutectic lated for each component (MDi )were used to obtain weighted mixture
solvent should have the highest affinity to CO2 but also the lowest descriptors as suggested earlier by Oprisiu et al. [44]. Two different
possible viscosity [12], which underlines the multi-objective nature of methods of determining such type of descriptors were implemented in
the problem of such development efforts [18,27,28]. In this context, the this work as detailed below.
so-called multicriteria decision (MCD) techniques have been specifically Method 1: Sum of weighted descriptors. In this case, the final de­
conceived for dealing with such type of problems. The aim of MCD scriptors (MDpmix ) are determined simply as follows:
techniques is the simultaneous examination of more than one dependent
system’ property, based on compromises and trade-offs among the MDpmix = X1 MD1 + X2 MD2 (1)
various objectives. In so doing, local optimal solutions corresponding to
one particular objective can be avoided by satisfying concomitantly in which, Xi stands for the molar fraction of the component i (i = 1, 2)
whole the targeted objectives. MCD approaches, either based on desir­ present in the DES.
ability functions or on the Pareto-front concept, have been widely used Method 2: Sum and absolute difference of weighted descriptors. In
in scoring the important properties of compounds with applications to this case, the final descriptors (MDpmix and MDnmix ) are determined by
diverse research areas [29,30]. the following two alternative expressions:
In the past years, MCD techniques have been applied to in silico MDpmix = X1 MD1 + X2 MD2
modelling approaches, such as Quantitative Structure-Activity/Property
Relationships (QSAR/QSPR), but mostly focusing on improving the MDnmix = |X1 MD1 − X2 MD2 | (2)
pharmaceutical profile of drugs regarding their therapeutic efficacy,
selectivity, bioavailability, and toxicity [30–34]. In the present work, we In both cases, the descriptors of the individual components (MD1 and
developed QSPR models that can be exploited for jointly handling MD2) are weighted as per their molar fractions (X1 and X2, respectively),
multiple properties of interest to the DESs’ development. To do so, we and component 1 is the cationic part or HBA/salt while component 2 the
resort to a desirability-based MCD approach that condenses all of the HBD. Additionally, two indicator parameters namely ‘presence of chlo­
properties’ objectives into a single-valued objective function [35]. This ride’ and ‘presence of bromide’ were also included for the models’
approach is applied here to a set of binary DESs by simultaneously generation. These two binary parameters simply denoted whether
considering their best CO2 absorption capacity and viscosity properties, ‘chloride’ or ‘bromide’ ions are present in the salt/HBA of DESs or not.
as well as suggesting new enhanced DESs for CO2 uptake. The latter is Calculation of such mixture descriptors was carried out using a specific
further probed by applying a desirability-based virtual screening to rank in-house Python-based open source tool developed (QSAR-Mx, available
and identify novel DESs with acceptable trade-offs between both prop­ at https://github.com/ncordeirfcup/QSAR-Mx). After calculation of
erties from two new built libraries of DESs. these descriptors, constant descriptors were removed using the same
tool and setting a variance cut-off of 0.001.
2. Materials and methods
2.3. Dataset division
2.1. Datasets collection and curation
As usual, the selected dataset samples should be split up into training
Seventy-three binary DESs with measured viscosity (η) and CO2 ab­ and test subsets for the following QSPR modelling. In this work, the
sorption capacity (mCO2) were retrieved from the literature [27,36–40]. QSPR models were generated with data-distributions resulting from the
Here, the challenge was to collect data for which both these DESs’ ‘compounds-out’ division scheme. This division strategy, previously
response properties have been reported. Besides since both largely proposed by Oprisiu et al. [44], is considered as the strongest one for
depend on temperature and pressure, the data had to be compiled and mixtures, especially when the models are used for predicting novel

2
A.K. Halder et al. Journal of CO2 Utilization 58 (2022) 101926

Fig. 1. Distribution of the two response variables in the modelling dataset.

compounds that are absent in the modelling dataset [45,46]. In this where for R2m the observed values are considered in the Y-axis and the
‘compounds-out’ scheme, at least one compound of the dataset is never predicted values in the X-axis whereas for R′ m2 the values of these axes
placed in the training set. In one hand, such scheme ensures a powerful are just exchanged.
validation for the derived QSPR models by placing dissimilar chemicals Additionally, considering the concept of compound-out cross-vali­
in the training and test sets. However, on the other hand, the overall dation mentioned earlier, the calculation of another statistical param­
predictivity of the generated models may be significantly reduced when eter Q2LCO − i.e., leave chemical out cross-validation R2, is here
dealing with small datasets [45,46]. Also, depending on the size of the proposed. To do so, all mixtures formed by a new chemical (with
dataset, many possible ‘compounds-out’ data-distributions can occur. observed property Yi) that belongs to component-1 (HBAs in this case) of
Therefore, it may well be a challenge to find the most suitable the training dataset are removed one by one and, after each removal,
data-distribution that serves the purpose of the compounds-out valida­ their predicted values (Ŷ L(HBA)O) obtained with the model derived using
tion (i.e., placing dissimilar samples in the training and test sets) and the remaining training set samples. A similar procedure is applied to
that, at the same time, it does not compromise the statistical significance
each chemical belonging to component-2 (i.e., HBDs) to obtain ŶL(HBD)O.
of the derived QSPR model. In this work, to overcome such challenge, an
The parameter Q2LCO is then calculated as:
automatic generation of multiple data-distributions based on the ‘com­
⎡⎛ ∑( )⎞ ⎛ ∑( ) ⎞⎤/
pounds-out’ division scheme was allowed. Subsequently, multiple QSPR ̂ L(HBA)O 2
Yi − Y Yi − Ŷ L(HBD)O 2
models were developed using the resultant data-distributions. Details of ⎢ ⎜
Q2 LCO = ⎣⎝1 − i

⎟ ⎜
⎠ + ⎝1 −
i

⎟⎥
⎠⎦ 2
this automatic generation procedure are described in the recent work by (Yi − Ym )2 (Yi − Ym )2
Halder et al. [47].
i i

(5)

2.4. QSPR modelling and validation In Eq. (5), Ym is the average observed property for the training set
samples. In fact, though Q2LCO is based on the well-known leave-many-
As regards to the modelling technique, we opted for a sequential out cross-validation approach [50], it is particularly useful for the in­
forward stepwise-multiple linear regression (SFS-MLR) based approach. ternal validation of models developed with mixtures.
The following three different conditions were employed for scoring the Similarly, another statistical parameter MAELCO − leave-com­
SFS feature selection: pounds-out based mean absolute error − was also calculated as follows:
[( )/ (
∑ ⃒ ∑
(a) Determination coefficient (R2), no cross-validation; MAELCO = ̂ L(HBA)O ⃒
|Yi − Y N + |Yi
(b) Negative mean absolute error (NMAE), no cross-validation; i i
)/ ]/
(c) Determination coefficient (R2), five-fold cross-validation. ⃒
̂ L(HBD)O ⃒
− Y N 2 (6)
In addition, several diagnostic statistical tools were applied for
evaluating the derived MLR models and according to those, the best In Eq. (6), N stands for the total number of datapoints of the training
QSPR models selected. Goodness-of-fit of the models was assessed by set. A large difference between the values of Q2LOO and Q2LCO (or
examining standard statistics such as the determination coefficient (R2) MAELOO and MAELCO) for the derived model indicates that the model
and its adjusted value (R2adj), the Fisher’s statistic (F) and its related fitting with mixtures of at least one component is not satisfactory. Such
probability factor, as well as the standard error of estimate (SEE). To models should then be avoided as they do not satisfy the internal pre­
check the internal predictivity, the leave-one-out cross-validation (LOO- dictivity criteria based on the compounds-out cross-validation. In
CV) technique was adopted and several statistics determined from it, addition to these, the final models were subjected to 5-fold cross-
namely: the Q2LOO (LOO cross-validation R2) [48], MAELOO (LOO mean validation checks performed with the Weka 3.8 software (https://
absolute error) along with the average statistics R2m(LOO) (Eq. 3) and its www.cs.waikato.ac.nz/ml/weka/). On these, values for the statistical
associated standard deviation ΔR2m(LOO) (Eq. 4) [49,50]. parameters Q25-fold and MAE5-fold were calculated to further confirm the
( √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
( )
) internal predictivity of the models.
R2 m(LOO) = R2 1 − R2 − R20 (3) To assess the external predictive ability of the models, similar sta­
tistical parameters such as R2Pred [51], MAEtest [49], average R2m(test),
and ΔR2m(test) [52] were determined. A Y-randomization procedure was
where R2 and R20 are the determination coefficients between the
also applied to further check whether the derived models pertain or not
observed and the LOO− predicted activities of the compounds on the
to chance correlations. The latter consisted in carrying out 1000 random
whole dataset without and with the intercept set to zero, respectively.
⃒ ⃒ modifications of the Y-response vector and analysing the resulting cR2P
ΔR2 m(LOO) = ⃒R2m − R′ m2 ⃒ (4) (Eq. 7) statistical parameter values [53].

3
A.K. Halder et al. Journal of CO2 Utilization 58 (2022) 101926

√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
absorption capacity of þ 0.926 (in log units, i.e.: log mCO2) coinciding
cR2P = R × R2 − R2r (7)
with the DES with the highest CO2 absorption capacity in the data set.
Likewise, its lower (undesirable) value Li was set to ¡ 2.046, corre­
in which R2r is the scrambled randomised determination coefficient. sponding to the one with the lowest CO2 absorption capacity (refer to
The primary goal of any QSPR modelling approach is to set up Table S2). The function applied to log mCO2 was thus defined as:
validated models that will guarantee accurate and precise predictions

for new substances. This encompasses, apart from inspecting the model’ ⎪
⎪ 0 ̂ i ≤ Li
if Y

predictivity, defining its applicability domain (AD), i.e., the response ⎪[
⎨ ̂ i − Li
]s
Y
and chemical structure space in which the model makes reliable pre­ di = if Li < Ŷ i < Ti (10)
⎪ Ti − Li

dictions. Towards that end, the AD of the derived QSPR models was ⎪


determined by applying both the leverage [54] and the standardisation 1 ̂ i ≥ Ti = Ui
if Y
approaches [55]. The leverage approach provides a measure of the
By contrast, the viscosity property should be minimised for reaching
distance between the descriptor values for a sample and the mean of
a desirable DES. Thus, the target (Ti) or lower value Li was set to a vis­
descriptor values for all samples. Therefore, the leverage values vs.
cosity of þ 0.763 (in log units, i.e.: log η) and the higher (undesirable)
standard residuals were plotted for the training and test sample sets.
value Ui was set to þ 3.015, matching the DESs with lowest and highest
From such plot, known as the Williams plot, any sample with a leverage
viscosity values in the data set, respectively (refer to Table S2). The
value (h) higher than h* − warning leverage (h* is usually fixed at 3
desirability function applied to log η was then:
(p + 1)/N, in which p is the number of descriptors of the model and N the

number of set samples) was identified as an influential outlier, whereas ⎪ 1 ̂ i ≤ Ti = Li
if Y

⎪[
any sample with standardised residual greater than 3 standard deviation ⎪
⎨ ̂ ]s
Y i − Ui
units was identified as a response outlier. The standardisation technique di = if Ui > Y ̂ i > Ti (11)
⎪ Ti − Ui
is another AD approach that has been introduced more recently for ⎪



determining only structural outliers [55]. In the present work, a 0 if Ŷ i ≥ Ui
data-sample was considered as a structural outlier based on a consensus
Once the desirability functions for both properties are defined, the
prediction, i.e., only when both the leverage and standardisation
overall desirability ‘D’ of each DES can be computed as follows:
AD-based approaches predict it as outlier. All the later statistical pa­
( )1/2
rameters and AD results were computed using our in-house QSAR-Mx D = dCO2 × dviscosity (12)
tool and the plots made by resorting to Matplotlib version 3.1.3 [56].
This single D score provides an overall assessment of the desirability
of the combined response property levels. As it is clear, the values of D
2.5. Desirability-based MCD ranking fall in the interval [0,1] and will grow as the balance of properties
progressively turns out to be more favourable. Notice also that D will be
In this work, a desirability-based multicriteria decision procedure zero if any di = 0, i.e., if any of the response properties is unacceptable.
[30–34] was employed to simultaneously consider and then rank the Then, the overall desirability D should be maximised over the inde­
DESs based on their desirable properties’ profile, i.e., in the present case, pendent variables’ domain − i.e., the domain of descriptors included in the
with high CO2 absorption capacity and low viscosity. Towards such aim, individual predictive QSPR models, essentially to collectively identify their
we firstly employed the developed QSPR models to predict the CO2 levels that will lead to the most desirable combination of response
absorption capacity and viscosity property values. Then, the predicted properties. In our case, the most desirable combination will be (in log
property values ( Y
̂ i ) are scaled to their respective desirability (di) values units) when the CO2 absorption capacity is equal to or higher than 0.926
by means of the Derringer’s desirability functions [35]. Each response and the viscosity is equal to or lower than 0.763. The optimum overall
property is independently transformed into a desirability value by an desirability score D was determined here by using the generalised
arbitrary function, that is, its original predicted value is range scaled reduced gradient algorithm (GRG) for solving non-linear optimisation
between 0 and 1 by: problems [57].
[ ] Additionally, to evaluate how reliable the predicted D values for the
̂ i − Li s
DESs are, the desirability’s determination coefficient (RD2) and its
Y
di = (8)
Ui − L i adjusted value (RD2adj) for each different training set, as previously
suggested [31], were computed in an analogous manner to that
where Li and Ui are the selected lower and upper values of the property
employed for R2 and R2adj. This is done for example for RD2 using the
to scale. The value of the exponent s is determined by setting a prop­
following equation:
erty’s value to a desirability value equal to 0.5 [34]. On the assumption

that the desirable value of x is defined as 0.5, s can be computed (DYi − D̂ )2
following the equation below. RD 2 = 1 − ∑ Yi
(13)
(DYi − DYi )2
log (0.5)
s= ( ) (9)
where DYi and D̂ refer to the overall desirability values calculated from
log Ux−i − LLi i Yi
the observed and predicted response Yi values for each DES, and DYi to
In this work, the median of each property (observed values) was set the corresponding mean for whole the Yi responses included. In much
to a desirability value of 0.5. In so doing, log mCO2 end up being − 0.555 the same way [31], the overall desirability’s LOO–CV coefficient QD2LOO
and log η þ 1.806, resulting in s values of 1.0049 and 1.1140 for was calculated just as Q2LOO. Similarly, for each test set, RD2Pred values
log mCO2 and log η, respectively. Details of these calculations can be were calculated to evaluate the external predictivity. In this way, one
found in Table S2 of the Supplementary Material. can then infer how reliable the simultaneous optimisation of both re­
Further, the desirability function for each targeted response property sponses over the independent variables’ domain will be.
must be defined depending on whether its ought to be maximised, Finally, for a virtual screening exercise, two new libraries were built
minimised, or assigned a target value. Regarding the CO2 absorption by either collecting other DESs samples from the literature or by
property, this should be maximised and so, the DES with the highest CO2 combining in different ways the chemicals present in the modelling
absorption value is considered to be the most desirable one (di = 1). For dataset. Specifically, these two different libraries were built as follows:
computing di, the target (Ti) or upper value Ui was set to a CO2

4
A.K. Halder et al. Journal of CO2 Utilization 58 (2022) 101926

(i) The first library comprised the two external validation datasets 3. Results and discussions
used for model validation plus a library of 207 reported DESs that
were collected from the literature – from now on, referred as the 3.1. QSPR models
‘experimental screening dataset’. This dataset is documented in
detail in the Supplementary Material (Tables S1 and S3). Following the strategy outlined before, we started by seeking the best
(ii) The second library included different combinations of the QSPR (SFS-MLR) models for targeting the DESs’ physicochemical
chemicals present on the modelling dataset – from now on, properties understudy, i.e., CO2 absorption capacity and viscosity. The
referred as the ‘hypothetical screening dataset’. For this purpose, workflow of the adopted procedure for selection of the best QSPR
we start by generating a hypothetical library of DESs, resulting models is outlined in Fig. 2.
from all possible combinations of HBDs and HBAs of our DES’ For both properties, 102 QSPR models were first set up based on the
modelling dataset, excluding those combinations already present two methods for calculating the mixture descriptors, the several dataset
as well as those pertaining to components belonging to the test set distributions obtained by following the compounds-out validation
in both properties due to their inherent uncertainties (i.e.: HBAs scheme, along with the SFS scoring conditions (Fig. 2). These QSPR
thioacetamide chloride and urea chloride). The HBA:HBD ratios models and the attained statistical parameters are documented in detail
were varied from 1:1–1:7, leading to 1924 new hypothetical in the Supplementary Material (Table S4). Among these, the best QSPR
DESs. Details of this library (i.e., its composition, the DES calcu­ models were selected based upon their performance regarding internal
lated descriptors, and predicted properties as well as outlier in­ and external predictivity − i.e., models CM09 and VM48 for the CO2
formation) are also given in Table S3 of the Supplementary absorption capacity and viscosity, respectively (see Table S4). The
Material. compound-out division applied in the case of model CM09 returned a
test set comprising all DESs with components such as benzyl­
The molecular descriptors included in the final MLR models devel­ trimenthylammonium chloride, guanidium hydrochloride, N-benzyl-2-
oped for each property were computed for each sample in these hydroxy-N,N-dimethyl ethanaminium chloride, tetrabutylammonium
screening libraries and then, both log mCO2 and log η values predicted. bromide, tetramethylammonium chloride, thioacetamide chloride,
The latter were afterwards converted to desirability values according to triethanolamine chloride, aminomethoxy propanol, ethylenediamine
the transformations previously described (Eqs. 10 and 11) and the and glycerol, while that set, in model VM48, comprised all DESs with
overall desirability (D) for each DES calculated from them (Eq. 12). As components such as benzyltriethylammonium chloride, methyl­
such, it was possible to inspect and rank the DESs in these new libraries triphenylphosphonium bromide, tetrabutylammonium chloride, tetrae­
according to the predicted D score based only on structural information. thylammonium chloride, tetrapropylammonium chloride,
In so doing, we could recognise the most promising DESs for CO2 uptake thioacetamide chloride, triethylmethylammonium chloride, 1,2-pro­
having low viscosity − i.e., those having the highest D values that will panediol, glycerol and octanoic acid.
logically be top-ranked, and to discard the remaining ones ranked last. The best predictive model CM09 found for the CO2 absorption ca­
pacity, expressed as log mCO2 (a five-variable equation), is detailed
below along with the statistical regression parameters, while the
meaning of its descriptors is shown in Table 1.

Fig. 2. Outline of the procedure adopted for selection of the best QSPR models.

5
A.K. Halder et al. Journal of CO2 Utilization 58 (2022) 101926

Table 1 were also included, i.e.: methyltrioctylammonium chloride, guaiacol,


The five 2D-molecular descriptors used in the CO2 absorption model (Eq. 14). diethylene glycol, furfuryl chloride, decanoic acid and methyl­
Symbol Definition [57–59] Class diethanolamine. Model CM09 (Eq. 14) was found to have a very satis­
factory predictivity on this external validation set as well, giving the
totalcharge Total charge of the species Constitutional
indices values obtained for R2Pred and MAEtest (0.873 and 0.225, respectively),
MATS2i Moran autocorrelation at lag 2/weighted 2D and even slightly better than that of the test set. Moreover, the values
by ionisation potentials Autocorrelations obtained for R2m(test), ΔR2m(test) and RMSEP (= 0.732, 0.085, and 0.277,
CATS2D_02_DN CATS H-bond donor (D) negatively 2D CATSa respectively) further indicate the high external predictivity of this CO2
charged (N) at lag 2
CATS2D_03_AP CATS H-bond acceptor (A) positively 2D CATSa
absorption model. The predictions of the model for the full DESs data are
charged (P) at lag 3 illustrated in Fig. 3.
B01[C− N] Presence/absence of C− N at topological 2D Atom pairs Given that the real utility of the model relies on its ability to accu­
distance 1 rately predict the modelled property for new solvents, careful assess­
a
Chemically Advanced Template Search (CATS) descriptors specifically ment of the model’s true predictive power is of utmost importance. This
designed to identify scaffold hops [59]. includes the model validation but also the definition of its applicability
domain in the space of descriptors used for its derivation. For that
log mCO2 = − 1.569(±0.131) purpose, we established the MLR models’ applicability domain by both
− 1.007(±0.325)totalchargepmix the leverage and standardisation approaches. As per the former, we built
− 1.415(±0.232)MATS2ipmix a Williams’ plot using the leverage values calculated for each DES. As
(14) seen in Fig. 3, most of the DESs are within the applicability domain
− 0.984(±0.197)CATS2D 02 DNpmix
− 1.020(±0.120)CATS2D 03 APpmix covered by ± 3 times the standardised residuals. Considering the
+ 2.063(±0.079)B01[C − N]pmix leverage threshold h* (= 0.45), only two training set samples (namely:
SN15 and SN16, cf. Table S1) were found as structural outliers. The
N = 40; R2 = 0.969; R2Adj = 0.964; F(5, 34) = 211.05; Q2LOO = 0.954; standardisation approach revealed the same two training set samples as
MAELOO = 0.139; R2m(LOO) = 0.936; ΔR2m(LOO) = 0.023; Q2LCO = 0.936; structural outliers (Table S1). Nevertheless, these two lactic acid-based
MAELCO = 0.163; Ntest = 33; R 2Pred = 0.865; MAEtest = 0.299, RMSEP DESs were retained as they were very well-fitted by the model and
= 0.373; R2m(test) = 0.801; ΔR2m(test) = 0.089; cR2P (1000 runs) = 0.907. have very low absolute residuals (< 0.05) [61]. The final results of the
As seen, this model is very good in terms of both statistical signifi­ absorption model (model CM9) for the full data used are tabulated in the
cance and goodness of prediction. It accounts for ca. 95% of the variance Supplementary Material (Table S5).
in the data while at the same time shows very good internal predictivity, Let us now attempt to interpret the five selected descriptors
as manifested by the high LOO cross-validated statistics. Another responsible for triggering the DESs’ capacity for CO2 capture (model
important characteristic of the model is its high LCO cross-validation CM09). According to Eq. 14, only descriptor B01[C− N]pmix contributes
statistics. The low deviation between Q2LOO and Q2LCO (= 0.018) as positively to the CO2 absorption capacity while all the remaining do it
well as between MAELOO and MAELCO (= 0.024) indicates that all the negatively. Furthermore, B01[C− N]pmix is found to be the most signif­
DESs training mixtures are well-fitted by this model. The model is also icant descriptor in the model, suggesting the importance of the presence
extremely robust and shows very little dependence on the composition of C− N bonds on the DESs for having higher absorption capacity. In fact,
of the training set (ΔR2m(LOO) is only 0.089). The 5-fold cross-validation HBD components containing the C− N scaffold (e.g.: ethanolamine, urea,
test gave rise to Q25-fold and MAE5-fold values of 0.952 and 0.137, 3-amino-1-propanol, etc) display higher CO2 absorption capacity
respectively. In addition, the model displays a high predictive ability for compared to those that do not have (e.g.: phenol, 1,4-butanediol, acetic
the test set, judging from the obtained R2Pred, MAEtest, R2m(test) and ΔR2m acid, etc). The second most influential descriptor − MATS2ipmix, and the
(test) values (0.950, 0.299, 0.875 and 0.043, respectively). fourth one − totalchargepmix, both indicate that DESs containing high
Particular attention is to be paid equally to the degree of collinearity ionisation potential and charges are less likely to absorb CO2. Regarding
of the variables included in the model but that can easily be checked by the two CATS2D descriptors [60], also found to be important in the CO2
analysing the cross-correlation matrix. As can be seen in Table 2, pair absorption model, these simply reveal that H-bond acceptor and donor
correlations among the variables are always small (< 0.80), meaning properties at specific topological distances are key to the CO2 absorption
that the descriptors can be considered independent and severe over­ ability. Curiously, positive CATS2D_02_DNpmix correlations were found
estimation of chance correlation effects can be excluded. Lack of chance only in the modelling set DESs containing lactic acid as HBDs, thus
correlations in the model’ development is also confirmed by the high cR2P indicating that lactic acid is not a desirable component for DESs if high
value (= 0.907) obtained in the Y-randomisation test. CO2 absorption is expected.
For a final validation of this QSPR model, an additional dataset Moving on to the DESs’ viscosity, the resulting best-predictive model
containing 33 DESs for which the CO2 absorption capacity has been (a five-variable equation, model VM48) found for this property (log η) is
reported was employed. Details of this external validation dataset are given below together with the statistical parameters of the regression,
provided in the Supplementary Material (Table S1). Notice that to in­ while the selected descriptors are shown in Table 3.
crease the number of data samples, the pressure range of mCO2 mea­
surements considered for such external dataset is slightly higher
(0.09–0.60 MPa) than that of the modelling dataset (0.1–0.5 MPa). Be­
sides DESs with components that were missing in the modelling dataset

Table 2
Correlation matrix among the five variables of the CO2 absorption model (Eq. 14).
Descriptors totalchargepmix MATS2ipmix CATS2D_02_DNpmix CATS2D_03_APpmix B01[C− N]pmix

totalchargepmix 1.000 0.517 0.238 − 0.337 − 0.396


MATS2ipmix 1.000 − 0.105 − 0.121 0.047
CATS2D_02_DNpmix 1.000 − 0.172 − 0.137
CATS2D_03_APpmix 1.000 0.346
B01[C− N]pmix 1.000

6
A.K. Halder et al. Journal of CO2 Utilization 58 (2022) 101926

Fig. 3. Predicted vs. observed (left) and corresponding Williams plot (right) for the CO2 absorption capacity of the whole set of DESs.

goodness of fit. Another important characteristic of this model is the low


Table 3
deviation between each LOO parameter and the respective LCO
The five 2D-molecular descriptors used in the viscosity model (Eq. 15).
parameter. For example, we obtained a difference of only 0.011 between
Symbol Definition [55,56,59] Class Q2LOO and the newly proposed Q2LCO metric, which accounts for the
HyWi_B(s) Hyper-Wiener-like index (log function) from 2D matrix-based internal compounds-out validation on the training set. In addition, the
the Burden matrix weighted by atomic 5-fold cross-validation test afforded Q25-fold and MAE5-fold values of
intrinsic state (I-state)a 0.952 and 0.137, respectively. Besides the model also shows good
B01[C− C] Presence/absence of C− C at topological 2D Atom pairs
distance 1
external predictive ability, judging from the attained R2Pred value (=
SpMax2_Bh(s) Highest 2nd eigenvalue of Burden matrix Burden 0.818) but especially from the value of the average R2m(test) (= 0.741;
weighted by I-state eigenvalues ΔR2m(test) = 0.0006), because the latter comes from applying a more
SM15_AEA Spectral moment of order 15 from the Edge adjacency stringent external predictivity criterion and it is higher than 0.5 [50].
(bo) augmented edge adjacency matrix weighted indices
Additionally, one can see that the model’s descriptors are not highly
by bond order
P_VSA_ppp_ter P_VSA-like on a potential pharmacophore P_VSA-like correlated by looking at the maximum pair correlation value obtained
point in terminal atomsb descriptors (0.766; see Table 4). Moreover, the high cR2P value (= 0.831) achieved in
a
I-states are based on the Kier-Hall atomic electronegativity modified by the the Y-randomisation test supports the uniqueness of the current model
number of σ bonds, number of hydrogen atoms, number of electrons in π or­ developed for the DESs’ viscosity.
bitals, and number of lone pair electrons [58,59]. Similar to the CO2 absorption model, an external validation set (72
b
P_VSA-like descriptors stand for the van der Waals surface area (VSA) with a DESs) was considered to further justify the predictivity of this viscosity
particular property (P), in this case a potential pharmacophore point (ppp) − i. model. In this case, it included DESs with reported viscosity but not
e., a generalized atom type defined considering some physicochemical aspects suitable for modelling the CO2 absorption capacity. Details of this
[58,59,62]. external dataset are given in the Supplementary Material (Table S1).
Values of R2Pred (= 0.799) and MAEtest (= 0.232) were then obtained by
log η = − 4.984(±0.422) fitting such dataset with our best viscosity model (Eq. 15), as well as the
+ 1.962(±0.119)HyWi B(s)pmix following for the other external validation parameters: R2m(test) = 0.653;
+ 0.175(±0.061)SM15 AEA(bo)pmix
(15) ΔR2m(test) = 0.108, RMSEP = 0.327. It can therefore be inferred that,
− 0.591(±0.066)SpMax2 Bh(s)nmix even though the sample size of this external validation dataset is as large
+ 0.014(±0.002)P VSA ppp ternmix as the modelling one, the model affords a fairly reliable external
+ 0.763(±0.237)B01[C − C]nmix
predictivity.
The predicted viscosity values calculated by the model against the
N = 52; R2 = 0.877; R2Adj = 0.864; F(5, 46) = 65.611; Q2LOO = 0.836;
experimental ones are plotted for all DESs in Fig. 4. This figure also
MAELOO = 0.151; R2m(LOO) = 0.771; ΔR2m(LOO) = 0.082; Q2LCO = 0.825;
depicts the Williams plot for checking the applicability domain of the
MAELCO = 0.159; Ntest = 21; R 2Pred = 0.818; MAEtest = 0.161, RMSEP
MLR model (h* = 0.346). As can be seen, only two external dataset
= 0.256; R2m(test) = 0.741; ΔR2m(test) = 0.0006; cR2P (1000 runs)= 0.831.
samples (namely: EV1 and EV6, cf. Table S1) have a leverage greater
Viscosity of mixtures such as DESs or ILs is a quite difficult physi­
than h* but show standard deviation values within the AD limit, which
cochemical property to predict [63]. Even so, over the past decades,
implies that they are to be considered as structural outliers. Yet signif­
several QSPR models have been set up for modelling and predicting the
icantly, only EV6 is found as a possible structural outlier by the
viscosity of DESs, based either on classical thermodynamic relations or
consensus prediction from both the leverage and standardisation ap­
on the Conductor like Screen Model for Real Solutions (COSMO-RS)
proaches (see Tables S1 and S5 of the Supplementary Material).
approach [20–23,38]. Yet, though with a high predictive performance,
The viscosity MLR model disclosed the importance of five mixed 2D
such models focussed solely on particular classes of DESs (e.g.: on hy­
descriptors (Eq. 15). Interestingly, four out of five descriptors have
drophilic [20–22,38] or hydrophobic DESs [23]) but, in particular, those
positive contributions to the viscosity (namely: descriptors HyWi_B(s),
based on the COSMO-RS approach suffer from the same drawbacks of
B01[C− C], SM15_AEA(bo), P_VSA_ppp_ter) and three of them are based
QSPR models based on 3D-descriptors [43] and overall missing an
on the absolute difference between the weighted descriptor values of the
in-depth validation that is crucial for efficiently coping with mixtures
DES’ components − i.e.: descriptors B01[C− C], SpMax2_Bh(s), and
(see Sections 2.2 and 2.3). Thus, taking into account the limited data
P_VSA_ppp_ter (see Table 3). In addition, all these are topological graph-
that had to be employed here, our model shows a satisfactory internal
based descriptors whose interpretation is not as simple as that pertaining
predictivity as it accounts for ca. 86% and 83% of the variance in the
to the descriptors appearing in the CO2 absorption model (Table 2),
training and cross-validation data, respectively. The large average R2m
2 since they condense a large amount of physicochemical information into
(LOO) value and low ΔR m(LOO) are also indicative of the model’s
a single number [58,59]. One should nevertheless remark that the major

7
A.K. Halder et al. Journal of CO2 Utilization 58 (2022) 101926

Table 4
Correlation matrix of the 5 variables of the viscosity model (Eq. 15).
Descriptors HyWi_B (s)pmix SM15_AEA (bo)pmix SpMax2_Bh (s)nmix P_VSA_ppp_ternmix B01[C− C]nmix

HyWi_B(s)pmix 1.000 − 0.378 0.511 0.345 0.156


SM15_AEA(bo)pmix 1.000 − 0.187 0.143 0.048
SpMax2_Bh(s)nmix 1.000 0.753 0.766
P_VSA_ppp_ternmix 1.000 0.510
B01[C− C]nmix 1.000

Fig. 4. Predicted vs. observed viscosity (left) and Williams plot for the viscosity model (right) for the whole dataset of DESs.

contribution to the DES’ viscosity comes from variable HyWi_B(s)pmix Table 5 shows the ten top-ranked DESs according to the computed
followed by the remaining ones (Eq. 15). Generally, HyWi_B measures overall desirability values − i.e., with the highest CO2 absorption and
how compact a molecule is given its weight, meaning the lesser its value lowest viscosity. As can be observed, mostly ethylenediamine (EDA)
is the more compact the molecule is. One should point also that ethye­ based DESs have the larger desirability values and therefore these low
lenediamine (EDA) based DESs show low values of HyWi_B(s)pmix, viscosity DESs hold high-level CO2 absorption capacity.
indicating that DESs containing this component are to be preferred if a The overall desirability function displays good statistical quality,
low viscosity is desired. judging from the values obtained for RD2 (= 0.916) and RD2adj (= 0.915).
Meanwhile the value attained for QD2LOO (= 0.916) affords further
reliability to the method employed in predicting D. This therefore en­
3.2. Desirability-based MCD results
sures the performance of the overall desirability function predictions
based on the two derived MLR models and support its use on future
Thus far, we have demonstrated that the QSPR models developed are
ranking efforts.
good in terms of both their statistical significance and predictive ability.
The optimisation of the overall desirability was then carried out to
We may now proceed with an adequate level of confidence to the
obtain the levels of the descriptors included in such models that simul­
simultaneous consideration of the targeted properties for the set of DESs.
taneously deliver the most desirable combination of both properties. The
We began by computing the individual di values for each DES, according
results of the desirability-based MCD procedure are shown in Table 6.
to the desirability functions previously outlined, and then those were
That is, there are shown the theoretical values for the predictive de­
combined into the single overall desirability D by means of Eq. 12. For
scriptors required to reach a highly desirable DES candidate (overall
details regarding the observed and predicted desirability values ob­
desirability score, ‘D’ = 1.000) with the best possible compromise be­
tained for each property, along with the single overall desirability values
tween the CO2 absorption capacity (1.042, in log units) and viscosity
for the whole dataset of DESs, please refer to Table S2 in the Supple­
(0.456, in log units). These optimum values of the descriptors can serve
mentary Material.
as reference for a similarity-based ranking of novel DESs with an
improved property profile, or even be employed as a “pattern generator”
Table 5 to support their future rational design.
Observed (D) and predicted (Pred. D) values for the ten top-ranked DES samples
from the modelling dataset.
Sample HBAa HBDb HBA:HBD ratio D Pred. D

SN55 C1minC EDA 1:4 0.986 0.916


SN48 MEAC EDA 1:4 0.980 0.947
SN54 C1minC EDA 1:3 0.945 0.863 Table 6
SN47 MEAC EDA 1:3 0.939 0.900 Results of the desirability-based MCD procedure.
SN53 C1minC EDA 1:2 0.874 0.779
Optimum theoretical values for the predictive descriptors
SN46 MEAC EDA 1:2 0.859 0.824
SN56 C1minC DETA 1:4 0.844 0.777 CO2 absorption capacity Viscosity
SN49 MEAC DETA 1:4 0.837 0.802
totalchargepmix = 0.000 HyWi_B(s)pmix = 3.175
SN8 UC EDA 1:3 0.766 0.825
MATS2ipmix = − 0.388 SM15_AEA(bo)pmix = 0.000
SN66 C1minC 3A1Pr 1:3 0.737 0.848
CATS2D_02_DNpmix = 0.000 SpMax2_Bh(s)nmix = 2.629
a
C1minC: 1-Methylimidazolium chloride; MEAC: Monoethanolammonium CATS2D_03_APpmix = 0.000 P_VSA_ppp_ternmix = 27.540
chloride; UC: Urea chloride. B01[C− N]pmix = 1.000 B01[C− C]nmix = 0.499
b log mCO2 = 1.042; dCO2 = 1.000 log η = 0.456; dviscosity = 1.000
EDA: Ethylenediamine; DETA: Diethylenetriamine; 3A1Pr: 3-amino-1-
Overall desirability score ‘D’ ¼ 1.000
propanol.

8
A.K. Halder et al. Journal of CO2 Utilization 58 (2022) 101926

3.3. Desirability-based virtual screening results This clearly highlights the importance of considering both properties
simultaneously for finding DESs with high CO2 absorption ability and
To explore the utility of the desirability-based MCD methodology low viscosity.
herein described in proposing DESs having high CO2 uptake capacity As can be seen in Table 8, in which the top ten DES candidates
and low viscosity, we performed two virtual screening experiments as identified from virtual screening of the library denoted as hypothetical
described in the Material and methods section. screening dataset, the hypothetical DESs show a much better joint profile
The ten first ranked DESs from the library denoted as experimental concerning the CO2 absorption capacity and viscosity than that of the
screening dataset according to the overall desirability predicted scores are DESs from the experimental screening set. Details about the full virtual
listed in Table 7, whereas further details regarding all DESs of this li­ screening results of such hypothetical DESs library are supplied as
brary are reported as Supplementary Material (Table S6). Supplementary Material (Table S6).
As can be seen, none of these top ranked DESs are found as structural It can be noticed also that none of these top ranked hypothetical DESs
outliers. Furthermore, one can notice that the DES with the best are found as structural outliers and all of them contain ethylenediamine
compromise between CO2 absorption and viscosity in this library con­ (EDA) as the HDB component. This is not a surprising result since, as
tains ethylammonium chloride (EAC) as HBA and acetamide (ACA) as previously discussed, the best desirability values among the DESs pre­
HBD at a molar ratio of 1:1.5, giving an overall desirability of 0.770. sent on the modelling set were obtained when EDA is present as HBD. In
Regarding the HBA component, choline chloride (ChCl) and tetrabuty­ fact, EDA is present in most of the top ranked hypothetical DESs (Pred. D
lammonium bromide (TBAB) are present in nine DESs across this top > 0.50; see Table S6) and in 47 out of those with a predicted desirability
ranked subset. It can be seen also that monoethanolamine (MEA) and higher than 0.9. In contrast, there is more diversity across the HBAs
methyldiethanol amine (MDEA) are the most represented HBD compo­ present in the best hypothetical DESs, with at least ten different com­
nents of these DESs. This ought to be expected since the ethylenediamine ponents included. From those, monoethanolammonium chloride
(EDA)-based DESs from the modelling dataset were found to have the (MEAC), tetraethylammonium chloride (TEAC), triethylmethylammo­
best overall desirability values (Cf. Table 3 and Table S2) and the nium chloride (TEAC), and tetrapropylammonium chloride (TPAC)
structure of EDA is closely related to MEA, which is a well-known amine stand out on the top ten ranked hypothetical DESs (Table 8). Noticeably,
for industrial scrubbing [6]. Regardless of the fact that the CO2 ab­ all the latter have a much higher HBA:HBD ratio than the actual one
sorption capacities of these amines (i.e., MEA, EDA and MDEA) are well present in the modelling set (see Table 3 and S1). This is consistent with
established [6], it is evident that the desirability-based ranking could the previously discussed association between higher desirability values
effectively distinguish among various amine based DESs depending on and higher HBA:HBD ratios, indicating in addition a way of improving
which component is paired to them and with which ratio. As an the CO2 absorption/viscosity profile of such DESs.
example, the top-ranked EAC-based DES (i.e., XVS25) was originally To test how the proposed desirability-based virtual screening
reported by Abbott et al. [64] with a comparatively low viscosity of approach compares to a distance-based one, the distances to the optimal
64 cP at 313 K. The two other EAC-based DESs (XVS26 with tri­ solution found (Table 6) were computed for each sample in the hypo­
fluoroacetamide and XVS27 with urea, Cf. Table S6) reported in the thetical virtual screening set. Then these distances were used as ranking
same work [64] were found to have higher viscosities than the former criteria following thus a similarity-based structural approach. De­
DES, being thus worst-ranked. Similarly, the fourth-ranked DES XVS63 scriptors for the VS data and the optimal solution were scaled to the
(retrieved from [65]) was also reported to have low viscosity (of ~30 cP [0,1] interval to avoid the dominance of distances caused by descriptors
at 300 K) in the work of Mjalli et al. [66]. In addition, the measured covering larger intervals or having large values. The scaling was made
viscosities for the seventh-ranked XVS64 and tenth-ranked XVS65 considering the minimum and maximum values of each descriptor on
samples were ~38 cP (at 300 K) and 48.6 cp (at 298.5 K), respectively the training data set. Details about the distance-based screening results
[66]. In the same work of Mjalli et al. [66], two other top ranked DESs, obtained for the hypothetical DESs library are provided in Supplemen­
namely XVS75 and XVS76, were also reported to have low viscosities tary Material (Table S7).
(~36 cp at 300 K). Generally, the overall desirability decreases if the As can be seen, the Euclidean distance performed better in retrieving
HBA:HBD ratio lowers and the experimental viscosities of the latter more desirable DESs. However, overall, the similarity distance-based
MEA-based DESs comply with that. Finally, one can notice also that the approach included in the top positions DESs with lower desirability
predicted desirability values obtained for this library are lower than than the purely desirability-based methodology. This is a consequence of
those obtained for the modelling data set (Cf. Table 7 vs. Table 3). The having unbalanced desirability values for each property and it is not a
fact that the DESs profile of the experimental screening dataset is worse surprising result. Distances to the optimal solution can be biased, for
than that of the DESs from the modelling dataset is a consequence of example, by descriptors that are very close to their optimal values for
them being evaluated for, at most, one of the properties here considered. one property. The latter is the case for e.g. HV1725 that ranks first in the
distance approach despite having desirability individual values of 1 and
Table 7 0.76 for the CO2 absorption and viscosity, respectively. In this case the
The ten top-ranked DESs in the experimental screening dataset according to the calculation of the distance to the optimal solution is biased by values of
predicted overall desirability (Pred. D). the descriptors coding for CO2 absorption matching almost perfectly
ID HBAa HBDb HBA:HBD ratio Pred. D Outlier information
with the optimal descriptors’ values.
To sum up, these results suggest that new DESs for CO2 uptake, not
XVS25 EAC AC 1:1.5 0.770 Not outlier
reported in the literature, could be obtained by properly combining
EC33 ChCl MDEA 1:7 0.690 Not outlier
EC30 ChCl MDEA 1:6 0.689 Not outlier already well-characterised components. In particular, the ranking of
XVS63 ChCl MEA 1:8 0.674 Not outlier new improved DESs for this particular application can be guided by the
XVS35 TBAB IMZ 3:7 0.656 Not outlier overall desirability-based procedure.
EC28 TBAB MDEA 1:4 0.652 Not outlier
XVS64 ChCl MEA 1:6 0.646 Not outlier
XVS75 TBAB MEA 1:5 0.644 Not outlier 4. Conclusions
XVS76 TBAB MEA 1:4 0.619 Not outlier
XVS65 ChCl MEA 1:5 0.618 Not outlier Since DESs have attracted a great deal of interest from the scientific
a
EAC: Ethylammonium chloride; ChCl: Choline chloride; TBAB: Tetrabuty­ community as useful environment friendly solvents, quite a few in silico
lammonium bromide. modelling studies have been reported so far to predict their physico­
b
AC: Acetamide; MDEA: Methyldiethanol amine; MEA: Monoethanolamine; chemical [20–23,38,40,65] or toxicological properties [66–68]. None of
IMZ: Imidazole. those studies have however attempted to try to find the most promising

9
A.K. Halder et al. Journal of CO2 Utilization 58 (2022) 101926

Table 8
The ten top-ranked DESs in the hypothetical screening dataset according to the predicted overall desirability (Pred. D).
ID HBAa HBDb HBA:HBD ratio Pred. dCO2 Pred. dviscosity Pred. D Outlier information

HV0854 MEAC EDA 1:6 1.000 0.997 0.999 Not outlier


HV1388 TEAC EDA 1:7 0.993 1.000 0.996 Not outlier
HV0873 MEAC EDA 1:7 1.000 0.986 0.993 Not outlier
HV1910 TEMAC EDA 1:7 0.983 1.000 0.992 Not outlier
HV1369 TEAC EDA 1:6 0.979 1.000 0.989 Not outlier
HV1651 TPAC EDA 1:7 0.996 0.98 0.988 Not outlier
HV1633 TPAC EDA 1:6 0.982 0.991 0.987 Not outlier
HV1891 TEMAC EDA 1:6 0.968 1.000 0.984 Not outlier
HV1350 TEAC EDA 1:5 0.962 1.000 0.981 Not outlier
HV0108 1-MIC EDA 1:7 0.962 1.000 0.981 Not outlier
a
MEAC: Monoethanolammonium chloride; TEAC: Tetraethylammonium chloride; TEMAC: Triethylmethylammonium chloride; TPAC: Tetrapropylammonium
chloride; 1-MIC: 1-Methylimidazolium chloride;
b
EDA: Ethylenediamine.

DESs to suit a particular application. The results achieved in this work Formal analysis, Investigation, Data curation. Yunierkis Perez-Cas­
clearly highlight the benefits of resorting to a combined strategy of tillo: Methodology, Validation, Formal analysis, Investigation, Writing –
multi-objective targets and related desirability-based ranking as useful original draft. M. Natália D. S. Cordeiro: Conceptualization, Method­
tools in the development of DESs for CO2 uptake. The data obtained ology, Validation, Investigation, Resources, Visualization, Writing –
allowed us to reach the theoretical levels of 0–2D descriptors leading to a review & editing, Supervision, Project administration, Funding
desirable DES candidate on the basis of high CO2 absorption potential acquisition.
and low viscosity. This combined strategy can thus be efficiently
employed as a virtual screening tool for identifying and prioritising new
DESs with acceptable trade-offs between CO2 absorption capacity and Declaration of Competing Interest
viscosity. More and more, an increasing number of DESs are being
synthesised and their physicochemical properties reported. In the near The authors declare that they have no known competing financial
future, the desirability-based multicriteria decision approach adopted interests or personal relationships that could have appeared to influence
here can thus be exploited using a larger number of DES samples and the work reported in this paper.
other important properties (such as evaporation, surface tension, den­
sity, heat absorption capacity, etc.) for reaching improved industrial Acknowledgements
applications along with targeting other end-applications (e.g.: identi­
fying promising CO2 conversion products). For example, a high evapo­ This work was supported by UIDB/50006/2020 with funding from
ration rate of some DESs renders these lesser useful as an industrial FCT/MCTES (Portugal) through national funds.
solvent [69]. Therefore, gas evaporation and other properties can be
added to our model as soon as enough data become available for Appendix A. Supporting information
modelling. Furthermore, our group made efforts to launch a publicly
available software (QSAR-Mx) for more rapidly tackling virtual pre­ Supplementary data associated with this article can be found in the
dictions using such an in silico based approach and offsetting the diffi­ online version at doi:10.1016/j.jcou.2022.101926.
culties of handling more complex mixture data.
One should comment nevertheless here that a major limitation of the References
desirability-based approach applied is that it naturally depends on the
bounds chosen for the desirability functions need to be set up. In this [1] M. Bui, C.S. Adjiman, A. Bardow, E.J. Anthony, A. Boston, S. Brown, P.S. Fennell,
study, none of the known nor the hypothetical DESs achieved a global D S. Fuss, A. Galindo, L.A. Hackett, J.P. Hallett, H.J. Herzog, G. Jackson, J. Kemper,
S. Krevor, G.C. Maitland, M. Matuszewski, I.S. Metcalfe, C. Petit, G. Puxty,
= 1 score. However, it is possible that upcoming advances in this J. Reimer, D.M. Reiner, E.S. Rubin, S.A. Scott, N. Shah, B. Smit, J.P.M. Trusler,
research area could lead to DES with global D = 1 score following the P. Webley, J. Wilcox, N. Mac Dowell, Carbon capture and storage (CCS): the way
present modelling framework. It could be even the case that many DESs forward, Energ. Environ. Sci. 11 (2018) 1062–1176.
[2] Y.H. Cai, Y.L. Chen, Q.J. Li, L. Li, H.X. Huang, S.Y. Wang, W.X. Wang, CO2 hydrate
could be predicted with global D = 1 in the future due to the available
capture and storage, Energy Technol. 5 (2017) 1195–1199.
data employed to train the current QSPR models, making them indis­ [3] H.W. Ren, S.H. Lian, X. Wang, Y. Zhang, E.H. Duan, Exploiting the hydrophilic role
tinguishable. To avoid this scenario, newly discovered DESs with of natural deep eutectic solvents for greening CO2 capture, J. Clean. Prod. 193
(2018) 802–810.
improved CO2 absorption and viscosity profiles must be included in the
[4] F. Wang, J. Zhao, H. Miao, J.P. Zhao, H.C. Zhang, J.L. Yuan, J.Y. Yan, Current
modelling framework for QSPR models re-training and validation as status and challenges of the ammonia escape inhibition technologies in ammonia-
well as for re-defining the bounds corresponding to the targeted based CO2 capture process, Appl. Energ. 230 (2018) 734–749.
properties. [5] R.B. Leron, M.H. Li, Solubility of carbon dioxide in a choline chloride-ethylene
glycol based deep eutectic solvent, Thermochim. Acta 551 (2013) 14–19.
To conclude, this study serves as proof of concept for further research [6] M.A. Kuenemann, D. Fourches, Cheminformatics modeling of amine solutions for
in materials design, from the desirability-based multicriteria decision assessing their CO2 absorption properties, Mol. Inform. 36 (2017), 1600143.
point of view, paving the way to increase the likelihood of new DES [7] S.J. Zeng, X. Zhang, L.P. Bai, X.C. Zhang, H. Wang, J.J. Wang, D. Bao, M.D. Li, X.
Y. Liu, S.J. Zhang, Ionic-liquid-based CO2 capture systems: structure, interaction
candidates to evolve into successful solvents to match particular sus­ and process, Chem. Rev. 117 (2017) 9625–9673.
tainable applications. [8] S.C. Tiwari, K.K. Pant, S. Upadhyayula, Efficient CO2 absorption in aqueous dual
functionalized cyclic ionic liquids, J. CO2 Util. 45 (2021), 101416.
[9] D. Kowalska, J. Maculewicz, P. Stepnowski, J. Dolzonek, Ionic liquids as
CRediT authorship contribution statement environmental hazards - crucial data in view of future PBT and PMT assessment,
J. Hazard. Mater. 403 (2021), 123896.
Amit Kumar Halder: Conceptualization, Methodology, Software, [10] F.P. Pelaquim, A.M.B. Neto, I.A.L. Dalmolin, M.C. da Costa, Gas solubility using
deep eutectic solvents: review and analysis, Ind. Eng. Chem. Res. 60 (2021)
Validation, Formal analysis, Investigation, Data curation, Writing –
8607–8620.
original draft, Visualization. Pravin Ambure: Methodology, Validation, [11] S. Sarmad, J.-P. Mikkola, X. Ji, Carbon dioxide capture with ionic liquids and deep
eutectic solvents: a new generation of sorbents, ChemSusChem 10 (2017) 324–352.

10
A.K. Halder et al. Journal of CO2 Utilization 58 (2022) 101926

[12] G. Garcia, S. Aparicio, R. Ullah, M. Atilhan, Deep eutectic solvents: [39] W.J. Guo, Y.C. Hou, S.H. Ren, S.D. Tian, W.Z. Wu, Formation of deep eutectic
physicochemical properties and gas separation applications, Energy Fuels 29 solvents by phenols and choline chloride and their physical properties, J. Chem.
(2015) 2616–2644. Eng. Data 58 (2013) 866–872.
[13] S.K. Shukla, J.P. Mikkola, Intermolecular interactions upon carbon dioxide capture [40] A. Bakhtyari, R. Haghbakhsh, A.R.C. Duarte, S. Raeissi, A simple model for the
in deep-eutectic solvents, Phys. Chem. Chem. Phys. 20 (2018) 24591–24601. viscosities of deep eutectic solvents, Fluid Phase Equilibr. 521 (2020) 11262.
[14] A.P. Abbott, G. Capper, D.L. Davies, R.K. Rasheed, V. Tambyrajah, Novel solvent [41] D. Fourches, E. Muratov, A. Tropsha, Trust, but verify II: a practical guide to
properties of choline chloride/urea mixtures, Chem. Commun. (2003) 70–71. chemogenomics data curation, J. Chem. Inf. Model. 56 (2016) 1243–1252.
[15] A.P. Abbott, P.M. Cullis, M.J. Gibson, R.C. Harris, E. Raven, Extraction of glycerol [42] A. Mauri, V. Consonni, M. Pavan, R. Todeschini, Dragon software: an easy
from biodiesel into a eutectic based ionic liquid, Green Chem. 9 (2007) 868–872. approach to molecular descriptor calculations, Match-Commun. Math. Co. 56
[16] J.D. Mota-Morales, R.J. Sanchez-Leija, A. Carranza, J.A. Pojman, F. del Monte, (2006) 237–248.
G. Luna-Barcenas, Free-radical polymerizations of and in deep eutectic solvents: [43] M. Hechinger, K. Leonhard, W. Marquardt, What is wrong with quantitative
green synthesis of functional materials, Prog. Polym. Sci. 78 (2018) 139–153. structure-property relations models based on three-dimensional descriptors?
[17] E.L. Smith, A.P. Abbott, K.S. Ryder, Deep eutectic solvents (DESs) and their J. Chem. Inf. Model. 52 (2012) 1984–1993.
applications, Chem. Rev. 114 (2014) 11060–11082. [44] I. Oprisiu, S. Novotarskyi, I.V. Tetko, Modeling of non-additive mixture properties
[18] B.B. Hansen, S. Spittle, B. Chen, D. Poe, Y. Zhang, J.M. Klein, A. Horton, using the online chemical database and modeling environment (OCHEM),
L. Adhikari, T. Zelovich, B.W. Doherty, B. Gurkan, E.J. Maginn, A. Ragauskas, J. Cheminform. 5 (2013) 4.
M. Dadmun, T.A. Zawodzinski, G.A. Baker, M.E. Tuckerman, R.F. Savinell, J. [45] E.N. Muratov, E.V. Varlamova, A.G. Artemenko, P.G. Polishchuk, V.E. Kuz’min,
R. Sangoro, Deep eutectic solvents: a review of fundamentals and applications, Existing and developing approaches for QSAR Analysis of Mixtures, Mol. Inform.
Chem. Rev. 121 (2021) 1232–1285. 31 (2012) 202–221.
[19] H. Vanda, Y.T. Dai, E.G. Wilson, R. Verpoorte, Y.H. Choi, Green solvents from ionic [46] E.N. Muratov, E.V. Varlamova, A.G. Artemenko, P.G. Polishchuk, L. Nikolaeva-
liquids and deep eutectic solvents to natural deep eutectic solvents, C. R. Chim. 21 Glomb, A.S. Galabov, V.E. Kuz’min, QSAR analysis of poliovirus inhibition by dual
(2018) 628–638. combinations of antivirals, Struct. Chem. 24 (2013) 1665–1679.
[20] R. Haghbakhsh, K. Parvaneh, S. Raeissi, A. Shariati, A general viscosity model for [47] A.K. Halder, Reza Haghbakhsh, I.V. Voroshylova, A.R.C. Duarte, M.N.D.
deep eutectic solvents: the free volume theory coupled with association equations S. Cordeiro, Density of deep eutectic solvents: the path forward cheminformatics-
of state, Fluid Phase Equilibr. 470 (2018) 193–202. driven reliable predictions for mixtures, Molecules 26 (2021) 5779.
[21] Y. Benguerba, I.M. Alnashef, A. Erto, M. Balsamo, B. Ernst, A quantitative [48] I.V. Tetko, V.Y. Tanchuk, A.E.P. Villa, Prediction of n-octanol/water partition
prediction of the viscosity of amine based DESs using Ss-profile molecular coefficients from PHYSPROP database using artificial neural networks and E-state
descriptors, J. Mol. Struct. 1184 (2019) 357–363. indices, J. Chem. Inf. Comput. Sci. 41 (2001) 1407–1421.
[22] T. Lemaoui, N.E.H. Hammoudi, I.M. Alnashef, M. Balsamo, A. Erto, B. Ernst, [49] P. Gramatica, On the development and validation of QSAR models, Methods Mol.
Y. Benguerba, Quantitative structure properties relationship for deep eutectic Biol. 930 (2013) 499–526.
solvents using Sσ− profile as molecular descriptors, J. Mol. Liq. 309 (2020), [50] P.P. Roy, S. Paul, I. Mitra, K. Roy, On two novel parameters for validation of
113165. predictive QSAR models, Molecules 14 (2009) 1660–1701.
[23] T. Lemaoui, A.S. Darwish, A. Attoui, F.A. Hatab, N.E.H. Hammoudi, Y. Benguerba, [51] A. Golbraikh, A. Tropsha, Beware of q2!, J. Mol. Graph. Model. 20 (2002) 269–276.
L.F. Vega, I.M. Alnashef, Predicting the density and viscosity of hydrophobic [52] P.K. Ojha, I. Mitra, R.N. Das, K. Roy, Further exploring r(m)(2) metrics for
eutectic solvents: towards the development of sustainable solvents, Green Chem. validation of QSPR models, Chemometr. Intell. Lab. Syst. 107 (2011) 194–205.
22 (2020) 8511–8530. [53] P.K. Ojha, K. Roy, Comparative QSARs for antimalarial endochins: importance of
[24] L.L. Zhang, J.X. Wang, Z.P. Liu, Y. Lu, G.W. Chu, W.C. Wang, J.F. Chen, Efficient descriptor-thinning and noise reduction prior to feature selection, Chemometr.
capture of carbon dioxide with novel mass-transfer intensification device using Intell. Lab. Syst. 109 (2011) 146–161.
ionic liquids, AIChE J. 59 (2013) 2957–2965. [54] P. Gramatica, Principles of QSAR models validation: internal and external, QSAR
[25] Y.A. Elhamarnah, M. Nasser, H. Qiblawey, A. Benamor, M. Atilhan, S. Aparicio, Comb. Sci. 26 (2007) 694–701.
A comprehensive review on the rheological behavior of imidazolium based ionic [55] K. Roy, S. Kar, P. Ambure, On a simple approach for determining applicability
liquids and natural deep eutectic solvents, J. Mol. Liq. 277 (2019) 932–958. domain of QSAR models, Chemometr. Intell. Lab. Syst. 145 (2015) 22–29.
[26] Q.H. Zhang, K.D. Vigier, S. Royer, F. Jerome, Deep eutectic solvents: syntheses, [56] J.D. Hunter, Matplotlib: a 2D graphics environment, Comput. Sci. Eng. 9 (2007)
properties and applications, Chem. Soc. Rev. 41 (2012) 7108–7146. 90–95.
[27] J. Trivedi, J.H. Lee, H.J. Lee, Y.K. Jeong, J.W. Choi, Deep eutectic solvents as [57] M. Grauer, G. Gruhn, L. Pollmer, Application of a generalized reduced gradient-
attractive media for CO2 capture, Green. Chem. 18 (2016) 2834–2842. method to process optimization problems, Hung. J. Ind. Chem. 7 (1979) 315–322.
[28] S. Sarmad, Y.J. Xie, J.P. Mikkola, X.Y. Ji, Screening of deep eutectic solvents [58] R. Todeschini, V. Consonni, Handbook of molecular descriptors, Wiley-VCH,
(DESs) as green CO2 sorbents: From solubility to viscosity, New J. Chem. 41 (2017) Weinheim; New York, 2000.
290–301. [59] R. Todeschini, V. Consonni, R. Todeschini. Molecular Descriptors for
[29] C.A. Nicolaou, N. Brown, Multi-objective optimization methods in drug design, Chemoinformatics, second ed.,, Wiley-VCH; John Wiley Distributor, Weinheim
Drug Discov. Today 10 (2013) e427–e435. Chichester, 2009.
[30] G. Lambrinidis, A. Tsantili-Kakoulidou, Challenges with multi-objective QSAR in [60] M. Reutlinger, C.P. Koch, D. Reker, N. Todoroff, P. Schneider, T. Rodrigues,
drug discovery, Expert Opin. Drug Discov. 13 (2018) 851–859. G. Schneider, Chemically advanced template search (CATS) for scaffold-hopping
[31] M. Cruz-Monteagudo, F. Borges, M.N.D.S. Cordeiro, Desirability-based and prospective target prediction for ‘orphan’ molecules, Mol. Inform. 32 (2013)
multiobjective optimization for global QSAR studies: Application to the design of 133–138.
novel NSAIDs with improved analgesic, antiinflammatory, and ulcerogenic [61] K. Khan, P.M. Khan, G. Lavado, C. Valsecchi, J. Pasqualini, D. Baderna, M. Marzo,
profiles, J. Comput. Chem. 29 (2008) 2445–2459. A. Lombardo, K. Roy, E. Benfenati, QSAR modeling of Daphnia magna and fish
[32] M. Cruz-Monteagudo, F. Borges, M.N.D.S. Cordeiro, Jointly handling potency and toxicities of biocides using 2D descriptors, Chemosphere 229 (2019) 8–17.
toxicity of antimicrobial peptidomimetics by simple rules from desirability theory [62] P. Labute, A widely applicable set of descriptors, J. Mol. Graph. Model. 18 (2000)
and chemoinformatics, J. Chem. Inf. Model. 51 (2011) 3060–3077. 464–477.
[33] A. Sanchez-Rodriguez, Y. Perez-Castillo, S.C. Schürer, O. Nicolotti, G. [63] S.A. Mirkhani, F. Gharagheizi, Predictive quantitative structure-property
F. Mangiatordi, F. Borges, M.N.D.S. Cordeiro, E. Tejera, J.L. Medina-Franco, relationship model for the estimation of ionic liquid viscosity, Ind. Eng. Chem. Res.
M. Cruz-Monteagudo, From flamingo dance to (desirable) drug discovery: a nature- 51 (2012) 2470–2477.
inspired approach, Drug Discov. Today 22 (2017) 1498–1502. [64] A.R. Abbott, G. Capper, S. Gray, Design of improved deep eutectic solvents using
[34] Y. Perez-Castillo, A. Sanchez-Rodriguez, E. Tejera, M. Cruz-Monteagudo, F. Borges, hole theory, Chem. Phys. Chem. 7 (2006) 803–806.
M.N.D.S. Cordeiro, L.T.T. Huong, P.-T. Hai, A desirability-based multi objective [65] R. Haghbakhsh, M. Taherzadeh, A.R.C. Duarte, S. Raeissi, A general model for the
approach for the virtual screening discovery of broad-spectrum anti-gastric cancer surface tensions of deep eutectic solvents, J. Mol. Liq. 307 (2020), 112972.
agents, PLoS One 13 (2018), e0192176. [66] F.S. Mjalli, G. Murshid, S. Al-Zakwani, A. Hayyan, Monoethanolamine-based deep
[35] G. Derringer, R. Suich, Simultaneous-optimization of several response variables, eutectic solvents, their synthesis and characterization, Fluid Phase Equilibr. 448
J. Qual. Technol. 12 (1980) 214–219. (2017) 30–40.
[36] A. Yadav, S. Trivedi, R. Rai, S. Pandey, Densities and dynamic viscosities of [67] A.K. Halder, M.N.D.S. Cordeiro, Probing the environmental toxicity of deep
(choline chloride plus glycerol) deep eutectic solvent and its aqueous mixtures in eutectic solvents and their components: an in silico modeling approach, ACS
the temperature range (283.15-363.15) K, Fluid Phase Equilibr. 367 (2014) Sustainable Chem. Eng. 7 (2019) 10649–10660.
135–142. [68] M. Bystrzanowska, M. Tobiszewski, Assessment and design of greener deep eutectic
[37] A. Yadav, S. Pandey, Densities and viscosities of (choline chloride plus urea) deep solvents – a multicriteria decision analysis, J. Mol. Liq. 321 (2021), 114878.
eutectic solvent and its aqueous mixtures in the temperature range 293.15 K to [69] Y. Chen, D.K. Yu, Y.H. Lu, G.H. Li, L. Fu, T.C. Mu, Volatility of deep eutectic solvent
363.15 K, J. Chem. Eng. Data 59 (2014) 2221–2229. choline chloride:n-methylacetamide at ambient temperature and pressure, Ind.
[38] F.S. Mjalli, J. Naser, Viscosity model for choline chloride-based deep eutectic Eng. Chem. Res. 58 (2019) 7308–7317.
solvents, Asia Pac. J. Chem. Eng. 10 (2015) 273–281.

11

You might also like