You are on page 1of 16

Automatic fitting of binary interaction

parameters for multi-fluid


Helmholtz-energy-explicit mixture models†
Ian H. Bell∗ and Eric W. Lemmon∗
National Institute of Standards and Technology, Boulder, CO, USA

E-mail: ian.bell@nist.gov; eric.lemmon@nist.gov

Abstract supplemental data, as well as the entire set


of binary interaction parameters obtained and
In the highest-accuracy mixture models comparisons with the best experimental vapor-
available today, these being the multi-fluid liquid-equilibrium data that are available.
Helmholtz-energy-explicit formulations, there
are a number of binary interaction parame-
ters that must be obtained through correlation Introduction
or estimation schemes. These binary interac-
tion parameters are used to shape the thermo- There are many types of mixture models avail-
dynamic surface and yield higher-fidelity pre- able to represent the thermodynamic properties
dictions of various thermodynamic properties of mixtures of fluids that cover both phase equi-
including vapor-liquid equilibria and homoge- libria and homogeneous (liquid, gas, and super-
neous p-v-T data, amongst others. critical) states. Over the years, the mixture
In this work, we have used a novel and entirely models have increased in complexity and ac-
automatic evolutionary optimization algorithm curacy, largely driven by the increasing power
written in the python programming language to of computers, which has both allowed the use
fit the two most important interaction parame- of more computationally-expensive equations of
ters for more than 1100 binary mixtures. This state, as well as allowed correlators the ability
fitting algorithm can be run on multiple proces- to leverage this computational power to fit more
sors in parallel, resulting in a reasonable total complex equations of state.
running time for this large set of binary mix- One of the most commonly used families
tures. For more than 830 of the binary pairs, of models is that of cubic equations of state.
the median absolute relative error in bubble- These cubic equations are straightforward to
point pressure is less than 5%. implement, even for mixtures, and can allow for
The source code for the fitter is provided as solution of the density as a function of temper-
ature and pressure in an explicit formulation,

Commercial equipment, instruments, or materials even though they are expressed in the form of
are identified only in order to adequately specify cer-
tain procedures. In no case does such identification im-
pressure as a function of temperature and spe-
ply recommendation or endorsement by the National cific volume. These cubic equations of state can
Institute of Standards and Technology, nor does it im- be converted to a Helmholtz-energy-explicit for-
ply that the products identified are necessarily the best mulation through the use of the analysis of Bell
available for the purpose. Contribution of the National and Jäger 1 . Other models have also been pro-
Institute of Standards and Technology, not subject to
copyright in the US
posed, including the SAFT-based models, Lee-
Kesler-Plöcker 2 , and the non-dimensionalized

1
Helmholtz energy formulation utilized here. vr = 1/ρr ) can be given in a common form by
The multi-fluid mixture property model has
N N −1 X N
seen wide application because it represents X
2
X xi + xj
a fundamental thermodynamic potential, and Yr (x̄) = xi Yc,i + 2xi xj 2 Yij
i=1 i=1 j=i+1
βY,ij xi + xj
therefore all other thermodynamic properties
(2)
can be obtained by derivatives of the Helmholtz
where Y is the parameter of interest, either mo-
energy. This mixture formulation is that used
lar specific volume v or temperature T . The
in the state-of-the-art thermophysical prop-
necessary parameters are given by
erty libraries REFPROP 3 , CoolProp 4 , and
TREND 5 .
Yr Yc,i βY,ij Yij
Non-dimensionalized Helmholtz Tr Tc,i βT,ij βT,ij γT,ij (Tc,i Tc,j )0.5
 3
1 1/3 1/3
energy models vr vc,i βv,ij β γ
8 v,ij v,ij
vc,i + v c,j

REFPROP 9.1 3 is the current state-of-the-art


in mixture binary parameters; it includes inter- These mixture reducing equations are simply
action parameters for 697 mixtures, with ap- weighting functions of the critical properties of
proximately 200 of these obtained from the lit- the pure fluids that form the mixture.
erature 6–13 as well as mixture interaction pa- For the ij pair, there are a total of four
rameters that were fitted through the use of adjustable parameters - βT,ij , γT,ij , βv,ij , and
in-house code. To the author’s knowledge, the γv,ij . The γ parameters are symmetric (γY,ij =
only other libraries that implement a signifi- γY,ji ), while the β parameters are not symmet-
cant number of binary interaction parameters ric (βY,ij = 1/βY,ji ), and thus the order of flu-
for high-accuracy mixture models are the open- ids in the binary pair is important and must
source library CoolProp 5.1 4 comprising 220 bi- be handled carefully when implementing this
nary pairs and the TREND 2.0 package from sort of mixture model. There is additionally
the University of Bochum, Germany 5 , compris- an adjustable parameter Fij that is applied to
ing 210 binary pairs. the generalized departure function as described
In the multi-fluid mixture model, the pressure by Kunz and Wagner. 12 In this work, we do
of the mixture can be obtained from not consider the departure terms because of the
paucity of experimental data for many of the
∂αr
 
p = ρRT 1 + δ (τ, δ) . (1) mixtures under study; a significant body of ex-
∂δ perimental data is needed to be able to fit the
departure function. Hence, the parameter Fij
Other thermodynamic properties of interest in the departure term is set equal to zero.
(enthalpy, entropy, etc.) can be obtained in a The mixture interaction parameters βT,ij ,
similar fashion. For more information on cal- γT,ij , βv,ij , and γv,ij are often obtained through
culating other thermodynamic properties from manual fitting, some deterministic optimization
derivatives of the residual Helmholtz energy, routines, as well as user intervention to divide
see the work of Span 14 . The Helmholtz en- the experimental data into training and vali-
ergy is expressed in terms of the reduced density dation sets. The culmination of all this work
δ = ρ/ρr (x̄) and the reciprocal reduced temper- is the set of binary interaction parameters in
ature τ = Tr (x̄)/T . For further information, the the REFPROP 9.1 library. Figure 1 shows a
reader is directed to the coverage of the mixture boxplot of the parameter distributions for all
models in the literature 11–13 . binary pairs that are included in REFPROP
According to the most recent formulation 9.1. As can be seen in the figure, the median
used in all of the state-of-the-art libraries, the value (given by the red bar inside the box), is
reducing parameters for the mixture (Tr and very nearly unity for all four of the interac-
tion parameters. Aside from γT,ij , the range

2
of the parameters is also quite narrow, being envelopes, the deformation does not result in
quite tightly clustered around unity. drastically different phase envelope shapes. Im-
portantly, modifying the interaction parame-
ters has a much more significant impact on the
bubble-point lines than on the dew-point lines.
This is important because fitting only dew point
data to obtain the interaction parameters could
yield significantly erroneous predictions for the
bubble-point data. It is largely for this rea-
son that the bubble-point data were used exclu-
sively to fit the mixture interaction parameters
in this work. Furthermore, for many mixtures,
the only data available are bubble-point mea-
surements.

Figure 1: Boxplot of the mixture interaction


parameters in the REFPROP 9.1 library (blue
box shows inter-quartile range containing 25th
to 75th percentile of the data, fliers outside this
range shown by markers, the red lines show me-
dian values).

While the work, which culminated in the


mixture interaction parameters that are im-
plemented in the REFPROP library (and the
GERG models 12,13,15 , amongst others), was
able to use the binary-specific departure func- Figure 2: Phase envelopes for an equimolar
tions to strongly shape the thermodynamic mixture of methane and ethane.
property surface, the analysis here only fits two
parameters, βT,ij and γT,ij , and sets the param-
eters βv,ij and γv,ij to 1.0. The motivation for Data collection and prepara-
fitting only these two parameters is that they
are the parameters that are the most impor- tion
tant for fitting bubble-point data, and adjust-
The values obtained for the binary interaction
ing these parameters within reasonable ranges
parameters can only be as good as the experi-
does not tend to result in significant (and unde-
mental data upon which they are based. There-
sired) deformation of the thermodynamic prop-
fore, it is of utmost importance to have as
erty surfaces.
comprehensive and complete an experimental
The significance of these parameters can be
database to draw from as possible.
seen by plotting phase envelopes for binary mix-
In this work, two large databases of ex-
tures. Phase envelopes represent the saturation
perimental vapor-liquid equilibria data were
curve (dew-point and bubble-point), including
merged into one joined database. The primary
the entire critical region, for a fixed mixture
database for this study is the database Thermo-
composition. Figure 2 shows several phase en-
DataEngine 16–18 that is developed at the Na-
velopes for an equimolar mixture of methane
tional Institute of Standards and Technology in
and ethane, demonstrating that while adjusting
Boulder, Colorado. As of publication, Ther-
the interaction parameters deforms the phase
moDataEngine includes vapor-liquid equilibria

3
data for more than 40,000 binary mixtures.
The second database of experimental data is
an internal database developed by researchers
at NIST over the last few decades.
Both ThermoDataEngine and the internal
NIST mixture property database include other
types of data, including homogeneous-phase
pressure-density-temperature (p-ρ-T ) data,
virial coefficient data, etc.
Merging the two databases required careful
reconciliation of their respective datasets. In
many cases the same dataset was present in
both databases, in which case only one copy
was retained. Furthermore, the databases were
in entirely different formats with incompatible
metadata, requiring several steps of normal-
Figure 3: Distribution of bubble-point data
ization to make the databases possible to be
from reconciled databases.
merged into one master database.
In the end, the master database of experi-
mental data was created in a form compatible
with the python package pandas, and a sepa-
Fitting Methodology
rate database of bibliographic information was As described above, the goal of the fitting pro-
generated in the BibTeX format. cedure is to arrive at the best values for βT,ij
Figure 3 shows the distribution of bubble- and γT,ij for a given binary mixture of two pure
point data in the master database, sorted fluids; this pair of optimal values represents a
by the bubble-point data available. There global minimization of the objective function
are a number of popular mixtures that have (to be defined below). The fitting of binary in-
been exceptionally well studied; in this case teraction parameters is complicated by a num-
the five mixtures with the most experimental ber of unavoidable features of the fitting pro-
bubble-point data are ethanol-water, methanol- cess:
water, nitrogen-water, benzene-cyclohexane,
and ammonia-water. On the other end of the • Modification of the interaction parame-
spectrum are comparably many mixtures for ters can cause failure of flash routines
which there is only a small amount of data that previously succeeded, or vice versa.
available, with as few as one bubble-point mea- Sometimes very small changes can cause
surement. this failure state transition, and only one
datapoint may see this transition.
• The objective function is not smooth,
making numerical partial differentiation
of the objective function with respect to
the interaction parameters difficult or im-
possible.
• A highly accurate and well constructed
equation of state must already exist for
each of the components in the mixture.
• Some of the experimental data points are
questionable or are known to be faulty,

4
though that may not be noted as such in instance Schmidt and Lipson 20 used evolution-
the respective database. ary optimization to distill governing physical
• The experimental uncertainties are in laws from experimental data on (simple) phys-
most cases unknown. ical systems.
Furthermore, other stochastic optimization
The overarching conclusion is therefore that methods are available, and Krink et al. 21
deterministic (especially derivative-based) op- demonstrated that evolutionary optimization is
timization routines are not viable for this prob- one of the best optimization methodologies for
lem. Therefore, another method must be se- noisy objective functions. They show that evo-
lected that can accommodate the roughness of lutionary optimization is superior to differential
the objective function, is robust to transitions evolution for noisy optimization. This provides
in the failure state of the flash routines, and can further support for the optimization scheme
conveniently arrive at the optimal values for the used here.
parameters.
Before beginning an optimization campaign, Fitting Process
it is crucial to understand the shape of the ob-
jective function. Figure 4 shows a plot of the The computational pipeline employed for this
objective function for the methane-isobutane problem is to start with a library of many
binary pair, as well as cross-sections A and B (approximately 160,000) experimental vapor-
cutting through the surface in the vicinity of the liquid-equilibrium data points spanning more
minimum value of the objective function. There than 1,200 binary mixtures for which reliable
is a clear minimum in the objective function equations of state are available for both com-
near βT,ij = 1.01, and γT,ij = 1.2. The color fill- ponents. The current fitting methodology in-
ing the plot is generated by interpolation of the cludes only the vapor-liquid-equilibrium data,
objective function at the gridded data points. and more specifically, the fitting procedure is
The objective function is not smooth due to the only based on the use of bubble-point measure-
random selection of experimental data points, ments. All of the bubble-point data for a given
as demonstrated by the jagged form of the con- binary pair are extracted from the library into
tours, especially at high values of γT,ij . a pair-specific subset. This subset for the se-
In the course of this work, several opti- lected mixture forms the data source for the
mization methods were investigated, including fitting routines.
brute-force optimization with cubic interpola- The optimization routine has two indepen-
tion, amongst others. In the end, the most dent variables, βT,ij and γT,ij . For a given
robust and computationally efficient solution set of βT,ij and γT,ij , a handful of datapoints
proved to be genetic optimization. In particu- (generally 5-10) are randomly selected from the
lar, we employed the open-source DEAP python dataset for the binary pair. For each of the
package in this work 19 , the details of which are selected datapoints in the handful, the bubble-
described further on. point pressure is calculated as a function of the
given bubble-point temperature and bulk mole
fraction (the mole fractions of the liquid phase
Optimization and Fitting at the bubble point are equivalent to the bulk
As described above, determination of the mix- mole fractions). The signed error vector over
ture interaction parameters is a challenging these randomly selected handful of datapoints
optimization problem that is not well suited is calculated as
to deterministic (derivative-based) optimiza- p~exp − p~calc
tion methods. For this reason, various stochas- ~eS = × 100 (3)
p~exp
tic optimization routines were considered.
The use of stochastic, or evolutionary, opti- If a flash routine fails for one of the datapoints
mization routines is not a new concept. For forming the handful, a very large number is en-

5
Figure 4: Presentation of the objective function over a regular grid for the binary pair methane-
isobutane along with cross-sections cutting through the surface in the vicinity of the minimum of
the surface. The objective function surface is truncated at a value of 300.

6
tered in the appropriate location in p~calc . The data for illustrative purposes) that looks like
error metric for the randomly selected handful
is then simply the root-sum-of-squares ERSS = [1, 2, 3, 1010 , 1010 ] (6)

where 1010 are large values signifying the failure


qX
ERSS = [(~eS )2 ] (4)
of a flash routine. In this case, 40% of the runs
With the RSSE error metric for the randomly have failed, but we still yield a median error
selected handful, at least one datapoint causing that is 3%.
failure of a flash routine will also result in large In order to “push” the optimizer back to the
errors for the handful. best solution (trading flash routine failures for
In order to get a representative calculation of errors in prediction of VLE data), it is necessary
the error for the binary pair for a given set of to penalize solutions that have a low median er-
βT,ij and γT,ij , this random-selection and error- ror but cause a significant number of flash rou-
calculation process must be carried out a num- tine failures. To that end, the OBJ0 (βT,ij , γT,ij )
ber of times. In general, this process is repeated function is multiplied by a penalty function de-
about 100 times, each time randomly selecting fined by
approximately 5 of the data points. F = 10ffail nfail (7)
Selection of the appropriate error metric is where nfail is an integral penalty exponent (usu-
key when considering all the calculated values ally 2 or 3), which controls the sharpness of the
and errors for one pair of βT,ij and γT,ij . In penalty and ffail is the fraction (in the range 0
general, it is quite common to use either a root- to 1) of the flash routine evaluations that fail.
sum-of-squares error (RSSE) or mean absolute If none of the flash routine evaluations fail, the
error (MAE) error metric. The weakness of penalty factor F is 1, and thus no penalty is ap-
these error metrics is that one very large value plied. If all the flash routines fail, the F factor
(for instance a failure of a flash routine entered is 10nfail .
as a large number) can dramatically skew the Thus, the final objective function that is used
error metric. On the other hand, the use of the in the optimizer is given by
median absolute error rather than either RSSE
or MAE results in an error that rejects the fail- OBJ(βT,ij , γT,ij ) = F · OBJ0 (βT,ij , γT,ij ) (8)
ures in the flash routine. So long as only a few
randomly selected handfuls contain failures of Fitting Implementation
the flash routines, the median absolute error
metric will allow the optimization algorithm to The DEAP package used (and as described
reject these failures and keep moving towards above) is a flexible set of open-source tools for
the optimal solution. minimization of objective function(s). The em-
Thus, the base objective function that is be- phasis is on evolutionary optimization, though
ing minimized is: other optimization methodologies are also sup-
ported. As such, it is relevant to briefly de-
OBJ0 (βT,ij , γT,ij ) = median([ERSS,1 , ERSS,2 , . . . ]) scribe the methodology that is used in DEAP
(5) to arrive at the “best” solution. To begin with,
It is necessary to further constrain the objec- a population of individuals (sometimes referred
tive function to yield more reliable flash routine to as chromosomes in genetic optimization lit-
evaluations. With the median error metric, if erature) is generated, each with its own fitness
49% of the experimental data points cause fail- associated with how well the individual predicts
ures of the flash routines, the median error can the bubble-point pressures. In this case the fit-
still be quite low since the middle value is used ness is the inverse of the objective function -
when sorting the absolute error. As an exam- the higher the fitness, the lower the objective
ple, we might have an error array ERSS (sample function, and the better the values represent
the experimental measurements.

7
After having randomly generated the initial • Cross-over probability: 30%
population of individuals, a number of methods
are used to combine and mutate the population The implementation is entirely based on
of individuals. The primary types of operations a python-based open-source framework, aside
in the genetic optimization are: from the use of NIST REFPROP as the prop-
erty backend to provide the bubble-point eval-
• Mutation: Similar to spontaneous muta- uation. All thermodynamic calls are made
tion of genetic chromosomes in biological through the CoolProp library, which delegates
systems, the individual can mutate, and to the REFPROP library to make the neces-
each of its parameters can be given an off- sary bubble-point calculations. The system is
set, where the offset is given by a Gaus- constructed so that it can scale to multiple pro-
sian distribution centered around zero. cesses in parallel. Process-level parallelization
The standard deviation of the Gaussian (as opposed to thread-level parallelization) is
distribution is used to control the spread required as REFPROP is not thread-safe, and
of the mutations, and a probabilistic pa- multiple mixture interaction files in REFPROP
rameter is applied to decide whether the must have the interaction parameters injected
individual should mutate or not. into them by the optimization routines (one
mixture file per process). Finally, the fitting
• Cross-over / reproduction : In cross-over
routines make heavy use of the pandas python
or reproduction, various models can be
package for data management and data subset-
applied to govern the outcome of the in-
ting.
teraction of two individuals. This is simi-
The flowchart in Figure 5 shows the way that
lar, in biological systems, to the resulting
the optimization is divided into sub-processes,
offspring’s chromosomes when its parents
each of which operates on a given binary mix-
reproduce sexually. Various models are
ture. The master process spawns a number of
available, including weighting of the par-
subprocesses, each of which is passed a subset
ent’s chromosome, or allowing for weights
of the data for the binary pair that is being
that can yield chromosomes outside the
calculated by the subprocess. In this way, the
range of the parent’s chromosomes.
large job is subdivided into smaller segments
The extent of user involvement in running this that can be easily handled by one processor.
code is tuning the width of the Gaussian distri- As each subprocess finishes its optimization, it
butions, and setting the probabilities of muta- writes its data to a file and signals the master
tion, cross-over, and reproduction of the indi- process that it has completed, and begins to
viduals. Otherwise, the code is entirely auto- work on the next mixture.
matic and requires no user intervention. The The source code for the fitting routines is
following values were used in the final version available as an electronic appendix, along with
of the fitter: sample data for one binary pair (propane + n-
decane) taken from the work of Mansfield et
• Number of generations: 30 al 22 . These data serve as a real-world demon-
• Population size: 150 stration of the fitting methodology.
• Tournament selection size1 : 5
• Mutation probability: 50%
1
Tournament selection refers to the process of com-
paring a number of individuals, in randomly selected
pools, and keeping the best individual from this pool
to populate the next generation. The larger the pool
size, the greater the selection pressure - corresponding
to the aggressive selection of high-fitness individuals at
the expense of population diversity.

8
Master process two compositions then βT,ij was set to 1.0 and
only γT,ij was fitted.
Subset data for binary pair For more complex mixtures, adjusting γT,ij
and βT,ij does not provide enough flexibility to
Process igen = 0 properly model the vapor-liquid equilibria and
homogeneous fluid properties of the binary mix-
ture in a consistent fashion. Highly asymmetric
Generation binary mixtures (helium + water being perhaps
Get individual (βT,ij , γT,ij ) the most extreme example) further stress the
flash routines, resulting in frequent failures due
Individual j=0 to insufficiently accurate starting values. These
failures of the flash routines mean that poten-
tially high-accuracy vapor-liquid equilibria data
Get Nselect points are being rejected.
experimental points

igen = igen + 1
As a result of these challenges, the resulting
j =j+1

Calculate error vector binary interaction parameters were divided into


⃗e = f (T⃗ , z) two tiers. The first tier are mixtures for which it
is believed that the interaction parameters are
Calculate error metric
⃗ j = f (⃗e)
E
reliable and can be used for accurate predictions
of the properties of this system.
Yes In order for a binary pair to be in tier #1, the
j < Nloops following two conditions must both be met:

No • The median absolute relative error (de-


fined below) associated with the bubble-

Store individual error f (E) point data is less than 5 %.

No • Sufficient experimental data are available


Generation Full? to fit both βT,ij and γT,ij (experimental
bubble point data for at least two tem-
Yes peratures and two compositions).
Mate/Mutate/Cross
If either of the conditions is not met, the binary
pair is relegated to tier #2.
No The supplemental data associated with this
Max generation #? paper include the set of interaction parame-
ters that were obtained from the fitting proce-
Yes dure, as well as the complete set of bibliographic
STOP information associated with the data used for
each of the binary pairs.
Figure 5: Flowchart for the fitting process
Accuracy
Results The tables in the supplemental information
contain all of the mixture interaction parame-
This fitting procedure was carried out for all the ters that were fit, along with information about
mixtures for which at least one experimental the accuracy of each fit at predicting the bubble
bubble-point data point could be found in the point data. The median absolute relative error
reconciled database. If bubble-point data were
not available for at least two temperatures and

9
(MARE) parameter is defined by data that were used in the fitting process, while
  in this work, the sheer amount of experimen-
p~exp − p~calc tal data considered required an automatic ap-
MARE = median × 100 (9)

p~exp proach. Furthermore, in many cases (the open
markers), the developers of the given pair of
The median absolute relative error is only a rel-
BIPs also fit a departure function with the mul-
evant metric in comparison to existing values
tiplication factor Fij . The reader is directed to
for the error. Here we define an improvement-
the description of the GERG model for a further
in-error term I that quantifies the improvement
discussion of the departure function. In many
compared with REFPROP
cases, two BIP parameters were fit, but the two
 parameters that were fit were either γT,ij and
 (ER  − 1)  × 100 for ER >= 1
γv,ij instead of the pair βT,ij and γT,ij used in
I= 1
 − − 1 × 100 for ER < 1 this work. Additionally, the fits in REFPROP
ER
(10) are, in some cases, also based on p-v-T , heat
where the error ratio ER is defined by capacity, speed of sound, and any other data
available for a particular binary pair.
MAREREFPROP Figure 7 shows the overall MARE distribu-
ER = (11)
MAREBell tion for all the mixtures that were fit in this
work. For nearly 1000 of the binary mixtures,
The reason for this (admittedly convoluted) the MARE is less than 10%; a 10% MARE
definition for the improvement in error is such can be considered as a sufficiently accurate rep-
that if one (but not both) of MAREREFPROP resentation of the data for many applications.
or MAREBell values is very nearly zero, the im- More than 800 of the mixtures have a MARE
provement term will neither go to zero nor infin- less than 5%. In some cases this is because there
ity. The improvement I should be interpreted was only one data point, which was fit very ac-
in this way – a value of I = 1 is a 1% decrease curately, so the MARE is an imperfect metric
in the MARE, I = −1 is a 1% increase in the for goodness-of-fit.
MARE. Basically, the more positive I is, the
better.
Figure 6 shows a plot of the MARE versus the
improvement factor I. While more than 1100
mixtures were fit in this work, REFPROP in-
cludes fitted binary interaction parameter val-
ues for 522 mixtures, and those are plotted here.
For approximately 300 of the binary mixtures,
the current fitted parameters yield an improve-
ment over the parameters in REFPROP; some
of this measured improvement is a result of
the datasets considered in the respective fitting
process. On the other hand, there are many
mixtures where the fitted parameters provide a
worse representation of the VLE data than the
use of the parameters from REFPROP.
It is challenging to draw clear conclusions
from a comparison of the resultant errors
caused by the use of the binary interaction Figure 7: Distribution of MARE for the binary
parameters (BIP) in REFPROP and those in pairs fit in this work.
this work. The BIPs in REFPROP were ob-
tained by careful curation of the experimental

10
Figure 6: Improvement I and median absolute relative error (MARE) of the bubble point pre-
dictions. The improvement I is based on the median error over all experimental bubble point
measurements. The entries in the legend correspond to the number of interaction parameters that
were fit in REFPROP 9.1. The count of mixtures with this number of parameters fit in REFPROP
9.1 is given in parenthesis.

Repeatability
Due to the stochastic nature of the fitting pro-
cess, each time the optimization is executed,
the algorithm will arrive at a different opti-
mal solution for the mixture interaction pa-
rameters βT,ij and γT,ij . Ideally, the solutions
for the optimal mixture interaction parameters
should be tightly clustered around their respec-
tive mean values. In order to demonstrate the
repeatability of the optimization outlined here,
the optimizer was run 100 times for two mix-
tures. These mixtures were selected because
they demonstrated low computational require-
Figure 8: Deviations from the mean values
ments, while also having enough bubble point
(given by overbar) for βT,ij and γT,ij for the
measurements that the same points were not
binary mixtures n-heptane + n-hexane and re-
always sampled.
frigerant R143a + refrigerant R152a over 100
Figure 8 demonstrates the results from 100
runs of the fitter. Each marker represents the
runs of the optimizer for the interaction param-
results from one execution of the fitter.
eters for these mixtures. As seen in this figure,
the largest deviation of βT,ij and γT,ij from the
mean value is less than 0.2%. In the vast major- Error evolution
ity of cases, the deviations are less than 0.05%
from the mean value. As the evolutionary algorithm works to find an
optimal solution for the interaction parameters
βT,ij and γT,ij , the error decreases in a nearly
monotonic fashion. In the case of this optimiza-
tion problem, there are sometimes steps that re-

11
sult in an increase in the error due to the experi- set of variables, the binary pair was sorted such
mental data points that have been selected. For that the first element in the binary pair (i =
instance, even if βT,ij and γT,ij are unchanged, 1) is that with the lower molar mass, and the
the error can sometimes increase due to the in- second element (j = 2) in the binary pair is that
clusion of one or more points with higher error. with the higher molar mass. Furthermore, the
Fig. 9 shows the evolution of the fitness func- abscissa (x-coordinate) of the plot is the ratio
tion over the 200 repeatability tests described of the maximum molar mass in the binary pair
above. These profiles demonstrate the char- to the minimum molar mass in the binary pair.
acteristic error evolution, that of a steep de- The trend in interaction parameters for the
crease in the error initially, followed by a near- homologous n-alkane + n-alkane family is
asymptotic behavior in the limit of an infi- nearly linear, and as a result, linear curves were
nite number of generations. For both mixtures, fitted to βT,ij and γT,ij as a function of the ra-
most runs arrive very near their optimal value tio of molar masses. These linear correlations
after approximately 10 generations, and the re- can be used to yield reasonable predictions of
maining generations are used to refine the so- the interaction parameters for binary n-alkane
lution. In some cases, there are local increases mixtures for which no experimental data of any
in the objective function (points with worse er- kind exist. This estimation scheme has been
ror), but in the end, all runs arrive near the applied to mixtures that include higher alkanes
same solution. and has demonstrated reasonable extrapolation
ability.
Similar exercises could be carried out for
other families of fluids. As another demonstra-
tion of the strong familial trends, the obtained
values of γT,ij are plotted for the binary pairs
containing carbon dioxide in Figure 11. This
figure demonstrates greater values of γT,ij for
greater differences in critical temperature be-
tween carbon dioxide and the other component.

Figure 9: Evolution of the fitness function for


the binary mixtures n-heptane + n-hexane and
refrigerant R143a + refrigerant R152a from the
repeatability tests.

Homologous families
As a demonstration of the output of the fitting
procedure, the results for the family of n-alkane Figure 11: Values of γT,ij for the carbon dioxide
+ n-alkane mixtures are presented for all binary family. The abcissa Tc refers to the critical tem-
pairs where it was possible to fit both βT,ij and perature of the other component in the mixture
γT,ij . Figure 10 presents the fitted values for with carbon dioxide. The value for water is not
βT,ij and γT,ij . A few comments are required for included.
this plot. As described above, βT,ij = 1/βT,ji .
Thus the order of fluids in the binary pair is
of consequence. In order to yield a consistent

12
Figure 10: Binary interaction parameters for the homologous n-alkane + n-alkane family

Conclusions DataEngine and made available computational


resources used for this study, and Marcia Hu-
The binary interaction parameters βT,ij and ber, also of NIST, for background on genetic
γT,ij have been fitted for more than 1100 mix- optimization.
tures. The improved interaction parameters
will be included in a future version of REF- Supporting Information Available:
PROP. The interaction parameters are ob- a) Python source code of the fitter
tained from an evolutionary optimization algo- b) Fitted binary interaction parameters
rithm based on an open-source python frame- c) Comparison with the available experimental
work. The fitting procedure conveniently scales data
to large computational clusters. This material is available free of charge via the
Further work will entail the development of Internet at http://pubs.acs.org/.
a predictive scheme for βT,ij and γT,ij based on
information about the pure fluids forming the Literature Cited
mixture.
Acknowledgement The authors thank Ken (1) Bell, I. H.; Jäger, A. Helmholtz energy
Kroenlein and Chris Muzny of the National In- transformations of common cubic equa-
stitute of Standards and Technology who fa- tions of state for use with pure fluids
cilitated the extraction of data from Thermo- and mixtures. J. Res. NIST 2016, DOI:
10.6028/jres.121.011.

13
(2) Plocker, U.; Knapp, H.; Prausnitz, J. Conditioning Conference at Purdue, July
Calculation of high-pressure vapor-liquid 14-17, 2014. 2014.
equilibria from a corresponding-states cor-
relation with emphasis on asymmetric (10) Akasaka, R. Thermodynamic property
mixtures. Industrial & Engineering Chem- models for the difluoromethane (R-
istry Process Design and Development 32) + trans-1,3,3,3-tetrafluoropropene
1978, 17, 324–332. (R-1234ze(E)) and difluoromethane +
2,3,3,3-tetrafluoropropene (R-1234yf)
(3) Lemmon, E. W.; Bell, I. H.; Huber, M. L.; mixtures. Fluid Phase Equilib. 2013, 358,
McLinden, M. O. NIST Standard Refer- 98–104, DOI: 10.1016/j.fluid.2013.07.057.
ence Database 23: Reference Fluid Ther-
modynamic and Transport Properties- (11) Gernert, G. J. A New Helmholtz En-
REFPROP, Version 9.1.1, National Insti- ergy Model for Humid Gases and CCS
tute of Standards and Technology. 2016. Mixtures. Ph.D. thesis, Ruhr-Universität
Bochum, 2013.
(4) Bell, I. H.; Wronski, J.; Quoilin, S.;
(12) Kunz, O.; Klimeck, R.; Wagner, W.;
Lemort, V. Pure and Pseudo-pure Fluid
Jaeschke, M. The GERG-2004 Wide-
Thermophysical Property Evaluation and
Range Equation of State for Natural Gases
the Open-Source Thermophysical Prop-
and Other Mixtures; VDI Verlag GmbH,
erty Library CoolProp. Ind. & Eng.
2007.
Chem. Res. 2014, 53, 2498–2508, DOI:
10.1021/ie4033999. (13) Kunz, O.; Wagner, W. The GERG-2008
Wide-Range Equation of State for Nat-
(5) Span, R.; Eckermann, T.; Herrig, S.;
ural Gases and Other Mixtures: An
Hielscher, S.; Jäger, A.; Thol, M. TREND:
Expansion of GERG-2004. J. Chem.
Thermodynamic Reference and Engineer-
Eng. Data 2012, 57, 3032–3091, DOI:
ing Data 2.0. 2015.
10.1021/je300655b.
(6) Lemmon, E. W.; Jacobsen, R. T.; Penon-
(14) Span, R. Multiparameter Equations of
cello, S. G.; Friend, D. G. Thermodynamic
State - An Accurate Source of Thermody-
Properties of Air and Mixtures of Nitro-
namic Property Data; Springer, 2000.
gen, Argon, and Oxygen from 60 to 2000
K at Pressures to 2000 MPa. J. Phys. (15) Gernert, J.; Span, R. EOS-CG: A
Chem. Ref. Data 2000, 29, 331–385, DOI: Helmholtz energy mixture model for hu-
10.1063/1.1285884. mid gases and CCS mixtures. J. Chem.
Thermodyn. 2016, 93, 274–293, DOI:
(7) Lemmon, E. W.; Jacobsen, R. T. Equa-
10.1016/j.jct.2015.05.015.
tions of State for Mixtures of R-32, R-
125, R-134a, R-143a, and R-152a. J. Phys. (16) Diky, V.; Chirico, R. D.; Muzny, C. D.;
Chem. Ref. Data 2004, 33, 593–620, DOI: Kazakov, A. F.; Kroenlein, K.;
10.1063/1.1649997. Magee, J. W.; Abdulagatov, I.;
Kang, J. W.; Frenkel, M. ThermoData
(8) Lemmon, E. W.; Jacobsen, R. T. A
Engine (TDE) software implementation
Generalized Model for the Thermody-
of the dynamic data evaluation concept.
namic Properties of Mixtures. Int. J.
7. Ternary mixtures. J. Chem. Inf. Model.
Thermophys. 1999, 20, 825–835, DOI:
2011, 52, 260–276.
10.1023/A:1022627001338.
(17) Diky, V.; Chirico, R. D.; Muzny, C. D.;
(9) Akasaka, R. A Thermodynamic Property Kazakov, A. F.; Kroenlein, K.;
Model for the R-134a/245fa Mixtures. Magee, J. W.; Abdulagatov, I.;
15th International Refrigeration and Air Kang, J. W.; Gani, R.; Frenkel, M.

14
ThermoData Engine (TDE): Software
implementation of the dynamic data eval-
uation concept. 8. Properties of material
streams and solvent design. J. Chem. Inf.
Model. 2012, 53, 249–266.

(18) Diky, V.; Chirico, R. D.; Muzny, C. D.;


Kazakov, A. F.; Kroenlein, K.;
Magee, J. W.; Abdulagatov, I.;
Frenkel, M. ThermoData Engine (TDE):
Software implementation of the dynamic
data evaluation concept. 9. Extensible
thermodynamic constraints for pure com-
pounds and new model developments. J.
Chem. Inf. Model. 2013, 53, 3418–3430.

(19) Fortin, F.-A.; De Rainville, F.-M.; Gard-


ner, M.-A.; Parizeau, M.; Gagné, C.
DEAP: Evolutionary Algorithms Made
Easy. J. Mach. Learn. Res. 2012, 13,
2171–2175.

(20) Schmidt, M.; Lipson, H. Distilling free-


form natural laws from experimental data.
Science 2009, 324, 81–85.

(21) Krink, T.; Filipič, B.; Fogel, G. B. Noisy


optimization problems - A particular chal-
lenge for differential evolution? CEC2004.
Congress on Evolutionary Computation.
2004; pp 332–339.

(22) Mansfield, E.; Bell, I. H.; Outcalt, S. L.


Bubble Point Measurements of n-Propane
+ n-Decane Binary Mixtures with Com-
parisons of Binary Mixture Interaction Pa-
rameters for Linear Alkanes. J. Chem.
Eng. Data 2016, In Press,

15
Graphical TOC Entry

16

You might also like