Professional Documents
Culture Documents
1
Helmholtz energy formulation utilized here. vr = 1/ρr ) can be given in a common form by
The multi-fluid mixture property model has
N N −1 X N
seen wide application because it represents X
2
X xi + xj
a fundamental thermodynamic potential, and Yr (x̄) = xi Yc,i + 2xi xj 2 Yij
i=1 i=1 j=i+1
βY,ij xi + xj
therefore all other thermodynamic properties
(2)
can be obtained by derivatives of the Helmholtz
where Y is the parameter of interest, either mo-
energy. This mixture formulation is that used
lar specific volume v or temperature T . The
in the state-of-the-art thermophysical prop-
necessary parameters are given by
erty libraries REFPROP 3 , CoolProp 4 , and
TREND 5 .
Yr Yc,i βY,ij Yij
Non-dimensionalized Helmholtz Tr Tc,i βT,ij βT,ij γT,ij (Tc,i Tc,j )0.5
3
1 1/3 1/3
energy models vr vc,i βv,ij β γ
8 v,ij v,ij
vc,i + v c,j
2
of the parameters is also quite narrow, being envelopes, the deformation does not result in
quite tightly clustered around unity. drastically different phase envelope shapes. Im-
portantly, modifying the interaction parame-
ters has a much more significant impact on the
bubble-point lines than on the dew-point lines.
This is important because fitting only dew point
data to obtain the interaction parameters could
yield significantly erroneous predictions for the
bubble-point data. It is largely for this rea-
son that the bubble-point data were used exclu-
sively to fit the mixture interaction parameters
in this work. Furthermore, for many mixtures,
the only data available are bubble-point mea-
surements.
3
data for more than 40,000 binary mixtures.
The second database of experimental data is
an internal database developed by researchers
at NIST over the last few decades.
Both ThermoDataEngine and the internal
NIST mixture property database include other
types of data, including homogeneous-phase
pressure-density-temperature (p-ρ-T ) data,
virial coefficient data, etc.
Merging the two databases required careful
reconciliation of their respective datasets. In
many cases the same dataset was present in
both databases, in which case only one copy
was retained. Furthermore, the databases were
in entirely different formats with incompatible
metadata, requiring several steps of normal-
Figure 3: Distribution of bubble-point data
ization to make the databases possible to be
from reconciled databases.
merged into one master database.
In the end, the master database of experi-
mental data was created in a form compatible
with the python package pandas, and a sepa-
Fitting Methodology
rate database of bibliographic information was As described above, the goal of the fitting pro-
generated in the BibTeX format. cedure is to arrive at the best values for βT,ij
Figure 3 shows the distribution of bubble- and γT,ij for a given binary mixture of two pure
point data in the master database, sorted fluids; this pair of optimal values represents a
by the bubble-point data available. There global minimization of the objective function
are a number of popular mixtures that have (to be defined below). The fitting of binary in-
been exceptionally well studied; in this case teraction parameters is complicated by a num-
the five mixtures with the most experimental ber of unavoidable features of the fitting pro-
bubble-point data are ethanol-water, methanol- cess:
water, nitrogen-water, benzene-cyclohexane,
and ammonia-water. On the other end of the • Modification of the interaction parame-
spectrum are comparably many mixtures for ters can cause failure of flash routines
which there is only a small amount of data that previously succeeded, or vice versa.
available, with as few as one bubble-point mea- Sometimes very small changes can cause
surement. this failure state transition, and only one
datapoint may see this transition.
• The objective function is not smooth,
making numerical partial differentiation
of the objective function with respect to
the interaction parameters difficult or im-
possible.
• A highly accurate and well constructed
equation of state must already exist for
each of the components in the mixture.
• Some of the experimental data points are
questionable or are known to be faulty,
4
though that may not be noted as such in instance Schmidt and Lipson 20 used evolution-
the respective database. ary optimization to distill governing physical
• The experimental uncertainties are in laws from experimental data on (simple) phys-
most cases unknown. ical systems.
Furthermore, other stochastic optimization
The overarching conclusion is therefore that methods are available, and Krink et al. 21
deterministic (especially derivative-based) op- demonstrated that evolutionary optimization is
timization routines are not viable for this prob- one of the best optimization methodologies for
lem. Therefore, another method must be se- noisy objective functions. They show that evo-
lected that can accommodate the roughness of lutionary optimization is superior to differential
the objective function, is robust to transitions evolution for noisy optimization. This provides
in the failure state of the flash routines, and can further support for the optimization scheme
conveniently arrive at the optimal values for the used here.
parameters.
Before beginning an optimization campaign, Fitting Process
it is crucial to understand the shape of the ob-
jective function. Figure 4 shows a plot of the The computational pipeline employed for this
objective function for the methane-isobutane problem is to start with a library of many
binary pair, as well as cross-sections A and B (approximately 160,000) experimental vapor-
cutting through the surface in the vicinity of the liquid-equilibrium data points spanning more
minimum value of the objective function. There than 1,200 binary mixtures for which reliable
is a clear minimum in the objective function equations of state are available for both com-
near βT,ij = 1.01, and γT,ij = 1.2. The color fill- ponents. The current fitting methodology in-
ing the plot is generated by interpolation of the cludes only the vapor-liquid-equilibrium data,
objective function at the gridded data points. and more specifically, the fitting procedure is
The objective function is not smooth due to the only based on the use of bubble-point measure-
random selection of experimental data points, ments. All of the bubble-point data for a given
as demonstrated by the jagged form of the con- binary pair are extracted from the library into
tours, especially at high values of γT,ij . a pair-specific subset. This subset for the se-
In the course of this work, several opti- lected mixture forms the data source for the
mization methods were investigated, including fitting routines.
brute-force optimization with cubic interpola- The optimization routine has two indepen-
tion, amongst others. In the end, the most dent variables, βT,ij and γT,ij . For a given
robust and computationally efficient solution set of βT,ij and γT,ij , a handful of datapoints
proved to be genetic optimization. In particu- (generally 5-10) are randomly selected from the
lar, we employed the open-source DEAP python dataset for the binary pair. For each of the
package in this work 19 , the details of which are selected datapoints in the handful, the bubble-
described further on. point pressure is calculated as a function of the
given bubble-point temperature and bulk mole
fraction (the mole fractions of the liquid phase
Optimization and Fitting at the bubble point are equivalent to the bulk
As described above, determination of the mix- mole fractions). The signed error vector over
ture interaction parameters is a challenging these randomly selected handful of datapoints
optimization problem that is not well suited is calculated as
to deterministic (derivative-based) optimiza- p~exp − p~calc
tion methods. For this reason, various stochas- ~eS = × 100 (3)
p~exp
tic optimization routines were considered.
The use of stochastic, or evolutionary, opti- If a flash routine fails for one of the datapoints
mization routines is not a new concept. For forming the handful, a very large number is en-
5
Figure 4: Presentation of the objective function over a regular grid for the binary pair methane-
isobutane along with cross-sections cutting through the surface in the vicinity of the minimum of
the surface. The objective function surface is truncated at a value of 300.
6
tered in the appropriate location in p~calc . The data for illustrative purposes) that looks like
error metric for the randomly selected handful
is then simply the root-sum-of-squares ERSS = [1, 2, 3, 1010 , 1010 ] (6)
7
After having randomly generated the initial • Cross-over probability: 30%
population of individuals, a number of methods
are used to combine and mutate the population The implementation is entirely based on
of individuals. The primary types of operations a python-based open-source framework, aside
in the genetic optimization are: from the use of NIST REFPROP as the prop-
erty backend to provide the bubble-point eval-
• Mutation: Similar to spontaneous muta- uation. All thermodynamic calls are made
tion of genetic chromosomes in biological through the CoolProp library, which delegates
systems, the individual can mutate, and to the REFPROP library to make the neces-
each of its parameters can be given an off- sary bubble-point calculations. The system is
set, where the offset is given by a Gaus- constructed so that it can scale to multiple pro-
sian distribution centered around zero. cesses in parallel. Process-level parallelization
The standard deviation of the Gaussian (as opposed to thread-level parallelization) is
distribution is used to control the spread required as REFPROP is not thread-safe, and
of the mutations, and a probabilistic pa- multiple mixture interaction files in REFPROP
rameter is applied to decide whether the must have the interaction parameters injected
individual should mutate or not. into them by the optimization routines (one
mixture file per process). Finally, the fitting
• Cross-over / reproduction : In cross-over
routines make heavy use of the pandas python
or reproduction, various models can be
package for data management and data subset-
applied to govern the outcome of the in-
ting.
teraction of two individuals. This is simi-
The flowchart in Figure 5 shows the way that
lar, in biological systems, to the resulting
the optimization is divided into sub-processes,
offspring’s chromosomes when its parents
each of which operates on a given binary mix-
reproduce sexually. Various models are
ture. The master process spawns a number of
available, including weighting of the par-
subprocesses, each of which is passed a subset
ent’s chromosome, or allowing for weights
of the data for the binary pair that is being
that can yield chromosomes outside the
calculated by the subprocess. In this way, the
range of the parent’s chromosomes.
large job is subdivided into smaller segments
The extent of user involvement in running this that can be easily handled by one processor.
code is tuning the width of the Gaussian distri- As each subprocess finishes its optimization, it
butions, and setting the probabilities of muta- writes its data to a file and signals the master
tion, cross-over, and reproduction of the indi- process that it has completed, and begins to
viduals. Otherwise, the code is entirely auto- work on the next mixture.
matic and requires no user intervention. The The source code for the fitting routines is
following values were used in the final version available as an electronic appendix, along with
of the fitter: sample data for one binary pair (propane + n-
decane) taken from the work of Mansfield et
• Number of generations: 30 al 22 . These data serve as a real-world demon-
• Population size: 150 stration of the fitting methodology.
• Tournament selection size1 : 5
• Mutation probability: 50%
1
Tournament selection refers to the process of com-
paring a number of individuals, in randomly selected
pools, and keeping the best individual from this pool
to populate the next generation. The larger the pool
size, the greater the selection pressure - corresponding
to the aggressive selection of high-fitness individuals at
the expense of population diversity.
8
Master process two compositions then βT,ij was set to 1.0 and
only γT,ij was fitted.
Subset data for binary pair For more complex mixtures, adjusting γT,ij
and βT,ij does not provide enough flexibility to
Process igen = 0 properly model the vapor-liquid equilibria and
homogeneous fluid properties of the binary mix-
ture in a consistent fashion. Highly asymmetric
Generation binary mixtures (helium + water being perhaps
Get individual (βT,ij , γT,ij ) the most extreme example) further stress the
flash routines, resulting in frequent failures due
Individual j=0 to insufficiently accurate starting values. These
failures of the flash routines mean that poten-
tially high-accuracy vapor-liquid equilibria data
Get Nselect points are being rejected.
experimental points
igen = igen + 1
As a result of these challenges, the resulting
j =j+1
9
(MARE) parameter is defined by data that were used in the fitting process, while
in this work, the sheer amount of experimen-
p~exp − p~calc tal data considered required an automatic ap-
MARE = median × 100 (9)
p~exp proach. Furthermore, in many cases (the open
markers), the developers of the given pair of
The median absolute relative error is only a rel-
BIPs also fit a departure function with the mul-
evant metric in comparison to existing values
tiplication factor Fij . The reader is directed to
for the error. Here we define an improvement-
the description of the GERG model for a further
in-error term I that quantifies the improvement
discussion of the departure function. In many
compared with REFPROP
cases, two BIP parameters were fit, but the two
parameters that were fit were either γT,ij and
(ER − 1) × 100 for ER >= 1
γv,ij instead of the pair βT,ij and γT,ij used in
I= 1
− − 1 × 100 for ER < 1 this work. Additionally, the fits in REFPROP
ER
(10) are, in some cases, also based on p-v-T , heat
where the error ratio ER is defined by capacity, speed of sound, and any other data
available for a particular binary pair.
MAREREFPROP Figure 7 shows the overall MARE distribu-
ER = (11)
MAREBell tion for all the mixtures that were fit in this
work. For nearly 1000 of the binary mixtures,
The reason for this (admittedly convoluted) the MARE is less than 10%; a 10% MARE
definition for the improvement in error is such can be considered as a sufficiently accurate rep-
that if one (but not both) of MAREREFPROP resentation of the data for many applications.
or MAREBell values is very nearly zero, the im- More than 800 of the mixtures have a MARE
provement term will neither go to zero nor infin- less than 5%. In some cases this is because there
ity. The improvement I should be interpreted was only one data point, which was fit very ac-
in this way – a value of I = 1 is a 1% decrease curately, so the MARE is an imperfect metric
in the MARE, I = −1 is a 1% increase in the for goodness-of-fit.
MARE. Basically, the more positive I is, the
better.
Figure 6 shows a plot of the MARE versus the
improvement factor I. While more than 1100
mixtures were fit in this work, REFPROP in-
cludes fitted binary interaction parameter val-
ues for 522 mixtures, and those are plotted here.
For approximately 300 of the binary mixtures,
the current fitted parameters yield an improve-
ment over the parameters in REFPROP; some
of this measured improvement is a result of
the datasets considered in the respective fitting
process. On the other hand, there are many
mixtures where the fitted parameters provide a
worse representation of the VLE data than the
use of the parameters from REFPROP.
It is challenging to draw clear conclusions
from a comparison of the resultant errors
caused by the use of the binary interaction Figure 7: Distribution of MARE for the binary
parameters (BIP) in REFPROP and those in pairs fit in this work.
this work. The BIPs in REFPROP were ob-
tained by careful curation of the experimental
10
Figure 6: Improvement I and median absolute relative error (MARE) of the bubble point pre-
dictions. The improvement I is based on the median error over all experimental bubble point
measurements. The entries in the legend correspond to the number of interaction parameters that
were fit in REFPROP 9.1. The count of mixtures with this number of parameters fit in REFPROP
9.1 is given in parenthesis.
Repeatability
Due to the stochastic nature of the fitting pro-
cess, each time the optimization is executed,
the algorithm will arrive at a different opti-
mal solution for the mixture interaction pa-
rameters βT,ij and γT,ij . Ideally, the solutions
for the optimal mixture interaction parameters
should be tightly clustered around their respec-
tive mean values. In order to demonstrate the
repeatability of the optimization outlined here,
the optimizer was run 100 times for two mix-
tures. These mixtures were selected because
they demonstrated low computational require-
Figure 8: Deviations from the mean values
ments, while also having enough bubble point
(given by overbar) for βT,ij and γT,ij for the
measurements that the same points were not
binary mixtures n-heptane + n-hexane and re-
always sampled.
frigerant R143a + refrigerant R152a over 100
Figure 8 demonstrates the results from 100
runs of the fitter. Each marker represents the
runs of the optimizer for the interaction param-
results from one execution of the fitter.
eters for these mixtures. As seen in this figure,
the largest deviation of βT,ij and γT,ij from the
mean value is less than 0.2%. In the vast major- Error evolution
ity of cases, the deviations are less than 0.05%
from the mean value. As the evolutionary algorithm works to find an
optimal solution for the interaction parameters
βT,ij and γT,ij , the error decreases in a nearly
monotonic fashion. In the case of this optimiza-
tion problem, there are sometimes steps that re-
11
sult in an increase in the error due to the experi- set of variables, the binary pair was sorted such
mental data points that have been selected. For that the first element in the binary pair (i =
instance, even if βT,ij and γT,ij are unchanged, 1) is that with the lower molar mass, and the
the error can sometimes increase due to the in- second element (j = 2) in the binary pair is that
clusion of one or more points with higher error. with the higher molar mass. Furthermore, the
Fig. 9 shows the evolution of the fitness func- abscissa (x-coordinate) of the plot is the ratio
tion over the 200 repeatability tests described of the maximum molar mass in the binary pair
above. These profiles demonstrate the char- to the minimum molar mass in the binary pair.
acteristic error evolution, that of a steep de- The trend in interaction parameters for the
crease in the error initially, followed by a near- homologous n-alkane + n-alkane family is
asymptotic behavior in the limit of an infi- nearly linear, and as a result, linear curves were
nite number of generations. For both mixtures, fitted to βT,ij and γT,ij as a function of the ra-
most runs arrive very near their optimal value tio of molar masses. These linear correlations
after approximately 10 generations, and the re- can be used to yield reasonable predictions of
maining generations are used to refine the so- the interaction parameters for binary n-alkane
lution. In some cases, there are local increases mixtures for which no experimental data of any
in the objective function (points with worse er- kind exist. This estimation scheme has been
ror), but in the end, all runs arrive near the applied to mixtures that include higher alkanes
same solution. and has demonstrated reasonable extrapolation
ability.
Similar exercises could be carried out for
other families of fluids. As another demonstra-
tion of the strong familial trends, the obtained
values of γT,ij are plotted for the binary pairs
containing carbon dioxide in Figure 11. This
figure demonstrates greater values of γT,ij for
greater differences in critical temperature be-
tween carbon dioxide and the other component.
Homologous families
As a demonstration of the output of the fitting
procedure, the results for the family of n-alkane Figure 11: Values of γT,ij for the carbon dioxide
+ n-alkane mixtures are presented for all binary family. The abcissa Tc refers to the critical tem-
pairs where it was possible to fit both βT,ij and perature of the other component in the mixture
γT,ij . Figure 10 presents the fitted values for with carbon dioxide. The value for water is not
βT,ij and γT,ij . A few comments are required for included.
this plot. As described above, βT,ij = 1/βT,ji .
Thus the order of fluids in the binary pair is
of consequence. In order to yield a consistent
12
Figure 10: Binary interaction parameters for the homologous n-alkane + n-alkane family
13
(2) Plocker, U.; Knapp, H.; Prausnitz, J. Conditioning Conference at Purdue, July
Calculation of high-pressure vapor-liquid 14-17, 2014. 2014.
equilibria from a corresponding-states cor-
relation with emphasis on asymmetric (10) Akasaka, R. Thermodynamic property
mixtures. Industrial & Engineering Chem- models for the difluoromethane (R-
istry Process Design and Development 32) + trans-1,3,3,3-tetrafluoropropene
1978, 17, 324–332. (R-1234ze(E)) and difluoromethane +
2,3,3,3-tetrafluoropropene (R-1234yf)
(3) Lemmon, E. W.; Bell, I. H.; Huber, M. L.; mixtures. Fluid Phase Equilib. 2013, 358,
McLinden, M. O. NIST Standard Refer- 98–104, DOI: 10.1016/j.fluid.2013.07.057.
ence Database 23: Reference Fluid Ther-
modynamic and Transport Properties- (11) Gernert, G. J. A New Helmholtz En-
REFPROP, Version 9.1.1, National Insti- ergy Model for Humid Gases and CCS
tute of Standards and Technology. 2016. Mixtures. Ph.D. thesis, Ruhr-Universität
Bochum, 2013.
(4) Bell, I. H.; Wronski, J.; Quoilin, S.;
(12) Kunz, O.; Klimeck, R.; Wagner, W.;
Lemort, V. Pure and Pseudo-pure Fluid
Jaeschke, M. The GERG-2004 Wide-
Thermophysical Property Evaluation and
Range Equation of State for Natural Gases
the Open-Source Thermophysical Prop-
and Other Mixtures; VDI Verlag GmbH,
erty Library CoolProp. Ind. & Eng.
2007.
Chem. Res. 2014, 53, 2498–2508, DOI:
10.1021/ie4033999. (13) Kunz, O.; Wagner, W. The GERG-2008
Wide-Range Equation of State for Nat-
(5) Span, R.; Eckermann, T.; Herrig, S.;
ural Gases and Other Mixtures: An
Hielscher, S.; Jäger, A.; Thol, M. TREND:
Expansion of GERG-2004. J. Chem.
Thermodynamic Reference and Engineer-
Eng. Data 2012, 57, 3032–3091, DOI:
ing Data 2.0. 2015.
10.1021/je300655b.
(6) Lemmon, E. W.; Jacobsen, R. T.; Penon-
(14) Span, R. Multiparameter Equations of
cello, S. G.; Friend, D. G. Thermodynamic
State - An Accurate Source of Thermody-
Properties of Air and Mixtures of Nitro-
namic Property Data; Springer, 2000.
gen, Argon, and Oxygen from 60 to 2000
K at Pressures to 2000 MPa. J. Phys. (15) Gernert, J.; Span, R. EOS-CG: A
Chem. Ref. Data 2000, 29, 331–385, DOI: Helmholtz energy mixture model for hu-
10.1063/1.1285884. mid gases and CCS mixtures. J. Chem.
Thermodyn. 2016, 93, 274–293, DOI:
(7) Lemmon, E. W.; Jacobsen, R. T. Equa-
10.1016/j.jct.2015.05.015.
tions of State for Mixtures of R-32, R-
125, R-134a, R-143a, and R-152a. J. Phys. (16) Diky, V.; Chirico, R. D.; Muzny, C. D.;
Chem. Ref. Data 2004, 33, 593–620, DOI: Kazakov, A. F.; Kroenlein, K.;
10.1063/1.1649997. Magee, J. W.; Abdulagatov, I.;
Kang, J. W.; Frenkel, M. ThermoData
(8) Lemmon, E. W.; Jacobsen, R. T. A
Engine (TDE) software implementation
Generalized Model for the Thermody-
of the dynamic data evaluation concept.
namic Properties of Mixtures. Int. J.
7. Ternary mixtures. J. Chem. Inf. Model.
Thermophys. 1999, 20, 825–835, DOI:
2011, 52, 260–276.
10.1023/A:1022627001338.
(17) Diky, V.; Chirico, R. D.; Muzny, C. D.;
(9) Akasaka, R. A Thermodynamic Property Kazakov, A. F.; Kroenlein, K.;
Model for the R-134a/245fa Mixtures. Magee, J. W.; Abdulagatov, I.;
15th International Refrigeration and Air Kang, J. W.; Gani, R.; Frenkel, M.
14
ThermoData Engine (TDE): Software
implementation of the dynamic data eval-
uation concept. 8. Properties of material
streams and solvent design. J. Chem. Inf.
Model. 2012, 53, 249–266.
15
Graphical TOC Entry
16