Professional Documents
Culture Documents
1 s2.0 S009813542200196X Main
1 s2.0 S009813542200196X Main
a r t i c l e i n f o a b s t r a c t
Article history: This work introduces Neural Network Programming (NNP) as an integrated hybrid modelling approach.
Received 4 October 2021 NNP consists in formulating a set of first principles equations that is later decomposed and transcribed
Revised 31 March 2022
into an Algorithmically Structured artificial Neural Network (ASNN). NNP leverages the advantages of the
Accepted 24 May 2022
universal approximation theorem and neural network optimization algorithms in order to generate phys-
Available online 26 May 2022
ically coherent machine learning models. Since ASNNs are not mere approximations of physics equations,
Keywords: it is not necessary to modify either the gradient or performance function in order to account for errors
Hybrid modelling with respect to the first principles equations. ASNNs are trained faster and more accurately than typical
Physics-informed machine learning hybrid models because the gradient is computed through automatic differentiation instead of numeric
Numerical analysis differentiation. It is shown that the same ASNN architecture is transferable between processes with sim-
Supervised learning ilar characteristics. In particular, a flash separator, distillation column, and a biogas upgrading process,
Surrogate modelling
were modelled using an identical architecture.
© 2022 The Author(s). Published by Elsevier Ltd.
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/)
1. Introduction and high-level decision making (Silver et al., 2016; Vinyals et al.,
2019).
The complex nature of the physics involved in engineering The outstanding performance of ML algorithms together with
processes is usually reflected in highly intertwined and sophisti- important media coverage has brought the awareness of their ca-
cated mechanistic models. This complexity has always been both pabilities, not only to academia, but also to the general public.
a challenge and a motivation for the formulation of new mod- However, ML has been around for decades or even centuries if one
elling strategies and techniques. Although quite rigorous, mecha- considers that linear regression was pivotal in the development of
nistic models often exhibit considerable deviations from experi- these algorithms. Fig. 1 shows that the overall relative amount of
mental measurements, perhaps due to unreasonable assumptions ML research publications indexed in SCOPUS with respect to the
about the physics or maybe because some effects were not prop- end of the previous decade has been rapidly increasing. We ex-
erly characterized. Recently, the modelling community has shifted pect that, in the same fashion as with other novel technologies,
its attention to data-driven modelling tools. From quite a few mod- the annual research output will reach a maximum and then a de-
elling alternatives, machine learning algorithms seem to have the cline will come afterwards. Nevertheless, due to the advances in
potential to become the dominant tool in the Industry 4.0 era computer science we expect that this technology will be around
(Sansana et al., 2021a; Venkatasubramanian, 2019). From all the for several decades. Therefore, it is fundamental to keep improving
ML methods, Artificial Neural Networks (ANNs) are particularly in- data-driven modelling paradigms.
teresting since they have been used to predict the unfolding mech- There might be some well-founded skepticism about the reli-
anism of macromolecules (Torrisi et al., 2020), redesign proteins ability and applicability of ML and ANNs in many chemical en-
(Xu et al., 2020) or even perform activities that require complex gineering subdisciplines. This might be due to the lack of trans-
parency and the fact that purely data-driven models override first
principles relationships. Some of these concerns have been par-
Abbreviations: ANN, artificial neural network; ASNN, algorithmically structured tially addressed in the past with the introduction of hybrid mod-
neural network; DNN, deep neural network; NNP, neural network programming; elling algorithms. However, from our perspective, in most cases
SNN, shallow neural network. the traditional hybrid modelling methodologies address the physics
∗
Corresponding author. problems more from a computer science perspective rather than
E-mail address: andres.c.abaid@ntnu.no (A. Carranza-Abaid).
https://doi.org/10.1016/j.compchemeng.2022.107858
0098-1354/© 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/)
A. Carranza-Abaid and J.P. Jakobsen Computers and Chemical Engineering 163 (2022) 107858
2. Modelling methodologies
2
A. Carranza-Abaid and J.P. Jakobsen Computers and Chemical Engineering 163 (2022) 107858
3
A. Carranza-Abaid and J.P. Jakobsen Computers and Chemical Engineering 163 (2022) 107858
4
A. Carranza-Abaid and J.P. Jakobsen Computers and Chemical Engineering 163 (2022) 107858
Fig. 5. Analogy between the thermodynamic entropy of an isolated system and the
parameter entropy in a) ANNs and b) mechanistic models.
2.4. Interpretability
Fig. 6. Equation decomposition algorithm. HO: hierarchy of operations.
5
A. Carranza-Abaid and J.P. Jakobsen Computers and Chemical Engineering 163 (2022) 107858
cording to the hierarchy of operations (HO), in other words, in the 4. Results and discussion
same order as if the equation was being numerically evaluated in a
handheld calculator. Therefore, the decomposition should be done This section presents the application of the NNP method to two
from left to right, from the innermost to the outermost parenthe- study cases. The first case uses NNP to model a thermodynamic
ses, and multiplication operations have priority over the addition / vapor-liquid equilibrium (VLE) problem and the second study case
subtraction operations. Note that if function Y∗k is not encompassed applies the NNP to two-product separators.
in a nonlinear function, then Yk = g(Y∗k ) = Y∗k . The algorithm ends
when the right side of the first principles equation cannot be de- 4.1. Case study 1: ideal VLE system
composed anymore (i.e., when Yk = f ).
It must be remarked that divisions cannot be expressed as a/b 4.1.1. Application of the NNP method
in an ANN, but rather as a (1 b). Therefore, divisions should Step 1. Setting the modelling objective and data collection. This ex-
be treated as nonlinear functions in NNP (this is valid for the Deep ample develops a VLE model for a benzene (1) and toluene (2)
Learning Toolbox of Matlab). system. A dataset containing the results of 200 simulations was
Step 5. Construction of the ASNN. A solution algorithm should utilized to train the ASNN. This dataset was generated by using
be formulated in order to solve the decomposed equations. In this the ideal thermodynamic package available in Aspen Plus v8.8.
work, we propose two alternatives for implementing the solution The molar fraction of component 1 (x1 ) and the temperature (T )
procedure: sequential solution algorithm and matrix inversion al- are the 2 independent variables and were randomly generated ac-
gorithm (both algorithms are further explained in Section 4 ). After cording to 0 < x1 < 1 and 333 K ≤ T ≤ 473 K. The output variables
the solution algorithm is proposed, the ASNN is constructed fol- taken from the Aspen Plus model were the partial pressure pi and
lowing the order of computation given by the solution algorithm. the total pressure P .
The input and output layers correspond to the inputs and out- Step 2. Definition of the assumptions, physics laws and constraints.
puts selected in steps 2 and 3. The inputs can be fed to the ASNN The physics laws and constraints to be considered are Raoult’s law
in different mathematical forms as long as they are not dependent (Eq. (3)), and Dalton’s law (Eq. (5))
on the computations done by the neural network (this is further
ln x + ln psat (T ) = ln p (3)
discussed in Section 4.2.1).
The algorithm shown in Fig. 6 automatically generates the so-
lution sequence of the decomposed equations. However, if more p = exp(ln p) (4)
than one function f is to be decomposed, then the user should
decompose each individual function and subsequently arrange all
the decomposed equations. pi = P. (5)
It must be remarked that every vector/connection fed to a hid- i
den layer must have an associated weight matrix (W ) (e.g., if the Considering that the partial pressure in Eq. (3) is in logarithmic
decomposition of an arbitrary equation yields Yk = X1 + X2 , the form and the partial pressure in Eq. (5) is not, an auxiliary equa-
neural network will represent it as Yk = W1 X1 + W2 X2 ). tion (Eq. (4)) is needed to solve the system of equations. When-
Constructing the ASNN architecture is the only mandatory re- ever there is a subscript i, the equation refers to component i, oth-
quirement in this step. Nonetheless, minor structural modifications erwise, it refers to a vector (e.g., x is the vector of liquid molar
can be performed in order to simplify the ASNN architecture or compositions and xi is only for component i).
to constraint the predictions. The first type of minor modification Imprinting physics laws into the ASNN inherently assumes that
merges the output of a universal approximator with the equation the theoretical framework in which they were developed is cor-
that it is feeding in the decomposition. However, this can only be rect. Hence, the available DoF must be the same as in the mech-
done if the output of the universal approximator feeds exactly one anistic model. It was previously shown that ignoring the DoF for
equation. A second type of minor modification consists in adding modelling a VLE system (Gibbs’ phase rule) makes the ANN to
process constraints in order to help the ASNN to avoid computing find correlations foreign to equilibrium thermodynamics (Carranza-
unrealistic results if the model is extrapolated too far away from Abaid et al., 2020). The Gibbs’ phase rule determines the num-
the training conditions (e.g., avoid negative volumes). Minor mod- ber of intensive variables that can be independently selected. For
ifications are not limited to the ones listed above. These modifica- vapor-liquid non-reactive systems, the Gibbs’ phase rule is
tions are done as a function of each modelled system and are not
a requirement for successfully implementing NNP. DoF = 2 + n − π . (6)
Step 6. Training the ASNN. The parameters with low entropy are Where n is the number of components and π is the number of
those that help to transcript physics laws within the ASNN archi- phases. By evaluating Eq. (6), the DoF is 2.
tecture (identified in step 2) while the high entropy parameters are Step 3 Identification of the uncertain/surrogate sections of the
the ones used for modelling semi-empirical parameters (identified model. In this example, the source of entropy (or ignorance accord-
in step 3). Considering this, the low entropy parameters must be ing to the definition of entropy (Jaynes, 1957)) comes from the un-
fixed so that the physics equations can be properly transcribed. certainty about the form of the pure component saturation pres-
The fixed parameters can either be used to only transfer the sure equation. At low pressures, psat can be safely assumed to be
signal (e.g., if the identity matrix is used), to perform summa- only a function of T . Therefore, a shallow neural network function
tions of the input vector (e.g., if an all-ones row vector is used), (η) with the general form can be utilized
or to arrange an output vector according to the solution algorithm
ln psat
i = η (T ). (7)
(e.g., to set up the constant vector C for the matrix inversion algo-
rithm). Utilizing the weight parameter matrices for quite different Note that η is formed by two sequential equations.
purposes is one of the main strengths of the NNP algorithm. Con- Step 4. Equation decomposition. Since the equations utilized in
versely, the high entropy parameters are set free so that the ASNN this example are relatively simple, the decomposition procedure
can adjust to the data. does not change the form of Eqs. (3), (4), (5), and (7). Therefore,
Since the unfixed parameters have a high entropic behavior, it they can be directly transcribed into an ASNN.
is recommended to train the model several times and pick the set Step 5. Construction of the ASNN. This example applies the
of parameters that provides the best performance. functionality matrix approach from Book and Ramirez (1984),
6
A. Carranza-Abaid and J.P. Jakobsen Computers and Chemical Engineering 163 (2022) 107858
Fig. 8. Shallow neural network (SNN) architecture of the benzene (1) – toluene (2)
VLE system.
Fig. 7. Visual representation of the ASNN that represents the VLE.
7
A. Carranza-Abaid and J.P. Jakobsen Computers and Chemical Engineering 163 (2022) 107858
Fig. 9. Histograms of the empirical probability of the optimized parameter values of: a) W2∗ (shallow neural network), b) WR,1 , c) WR,2 and d) combined effect of β A and
layer WD weights.
characteristics imply that there is a low likelihood that the same sions from it. The results of Fig. 9(c) indicate that WR,2 do not in-
fitting parameter values will be obtained during different training terfere with the prediction capabilities. The accurate predictions of
procedures. Therefore, the W2∗ parameters have high entropy and the UNN together with the fact that the WR,2 and βD values agree
consequently low interpretability. It is not possible to obtain def- with those of the FNN, indicate that these models, despite being
inite conclusions from their numerical values despite having good different, perform equivalent calculations. For example, due to lack
agreement with the dataset (the AARD of all shallow neural net- of information in the ASNN, the parameters WR,2 , and βR become
works is between 0.05 and 0.25%). If more artificial neurons are highly dependent on the parameter values of W2 and WR,2 . In order
utilized in the SNN, more variability of the fitting parameters are to interpret this, let us combine Eqs. (10)–(14)
expected, hence, the chances of interpreting overfitted ANNs are
even lower than non-overfitted ANNs.
LR = WR,1 ln (x ) + WR,2 (W2 L1 + β2 ) + βR (17)
The empirical probability distribution of the optimized param- Eq. (17) indicates that the product WR,2 (W2 L1 + β2 ) has the
eters of the ASNN are presented in Fig. 9b–d). Fig. 9(b) shows two condition of having infinite number of solutions. This condition
highly sharp distributions that correspond to WR,1 . These distribu- suggests that the parameter distribution of these two sets of pa-
tions indicate that there is an 80% chance that the optimized WR,1 rameters should be relatively flat (as shown in Fig. 9(c)). This signi-
parameter values are either 0 or 1. It is worth mentioning that the fies that the parameter WR,2 has high entropy, therefore, the terms
models whose WR,1 parameters are 0 and 1 have an AARD less than WR,2 (W2 L1 + β2 ) + β3 in Eq. (17) become equivalent to ln ( psat ).
1%. This suggests that the optimized parameters which have the The difference between the ln ( psat ) values and the values cal-
highest empirical probability are the ones with better predictions. culated from WR,2 (W2 L1 + β2 ) + β3 is quite low (0.03–0.06%). This
From now on, we refer to the model developed in suggests that the ASNN “knows” that ln ( psat ) must be calculated,
Section 4.1.1 as the fixed parameter neural network (FNN) while however, it is difficult to interpret it because we provided a se-
the models developed in this section are called unfixed parameter quence of operations with poor interpretability.
neural networks (UNN). There is a similar effect in the Dalton’s law section of the ASNN.
If one analyzes the parameters found in W2 , WR,2 , WD , βR , The bias βA (constant) in Eq. (15) is transformed to its exponential
and βA , it is not possible to obtain definite conclusions about the form and multiplies the parameters in WD . If the overall interaction
parameter behavior due to their blunt distribution. For example, of these parameters is analyzed, two narrow and definite distribu-
Fig. 9(c) shows a broad and flat empirical probability distribu- tions can be seen (Fig. 9(d)). By only considering the UNN with an
tion of the WR,2 parameters. The characteristics of the distribution AARD < 1%, there is a 50% probability that the optimized parame-
shown in Fig. 9(c) resemble those of a random probability distri- ter is -1 and 50% that it is 1. This apparent disagreement with the
bution, hence, neglecting the possibility to draw objective conclu- FNN (where all parameters should be equal to +1) is caused by
8
A. Carranza-Abaid and J.P. Jakobsen Computers and Chemical Engineering 163 (2022) 107858
9
A. Carranza-Abaid and J.P. Jakobsen Computers and Chemical Engineering 163 (2022) 107858
Fig. 11. ASNN architecture of the two-product separator using the exact first principles representation.
Step 6. Training the ASNN. The architecture of the ASNN is com- model for flash separation of multicomponent mixtures. Utilizing
posed of 4 input layers, 6 hidden layers and 2 output layers. The ASNN for overriding the need for iterative calculations may speed-
hidden layers L1 and L2 contain adjustable parameters while layers up complex equilibrium calculations with several components.
L3 , L4 , L5 and L6 containg fixed parameters layers (some of the pa- It is remarkable that the ASNN performance is so high despite
rameters in L2 are fixed and some are adjustable). In order to rep- the fact that a small dataset (only 50 simulations) made with
resent the first principles equations, the following considerations a random sampling method was used. In the context of surro-
must be done: gate modelling, this suggests that using NNP together with more
advanced sampling methods (e.g., (Eason and Cremaschi 2014;
• The parameter weight matrices W3,2 , W4,2 , W4,3 , W5,2 , and Nuchitprasittichai and Cremaschi 2013)) can provide more extrap-
W6,2 should be the identity matrix with 3 × 3 dimensions olable and robust models.
for this ternary mixture. It is important to note that strictly applying the NNP method
• The parameter weight matrices W5,1 and W6,1 are the nega- would yield an ASNN with 5 hidden layers instead of 6. The pur-
tive identity matrix with 3 × 3 dimensions. pose of adding layer L6 was to verify the robustness of the NNP
• The parameter weight vectors W2,1 , W3,1 and W4,1 should be method. Layer L6 computes v∗ a second time (note that v∗ is com-
a unit vector with 3 × 1 dimensions. puted in L4 as well) in order to ensure that vector v∗ does not
• Only the inputs in layer I1 are normalized using the “map- provide negative values. It was observed that the results do not
minmax” function from Matlab (transforms all variables to substantially change if the last layer is not added, in fact, if the
values between -1 and 1). Conversely, the remaining layers mass balances are not solved twice in the ASNN, only 0.008% of
should not be normalized. the solutions have an error in the mass balance larger than 10−15
while in the case with 6 layers is 0%. Therefore, it can be deter-
The data used to train the model was divided in the training, mined that the NNP method is robust and it is not necessary to
validation, and test sets. The training set has 70% of the data points substantially deviate from the core method.
while the validation and test sets are 15% each. In order to dis- As discussed earlier, ASNNs can be applied for fitting of ex-
tribute the error evenly across all the datapoints, it was neces- perimental or process data where noise is expected. In order to
sary to weight the observations. The datapoint weights for l ∗ and test the proposed model, randomly distributed noise was added to
v∗ were calculated with (1/l ∗ )2 and (1/v∗ )2 respectively. Due to the training dataset (the validation dataset remains without noise).
the stochastic nature of the neural network training algorithms, The results of training a model with a noisy data set are shown
the ASNN was trained 100 times and subsequently the model in Table 1 (ASNN, ±5 % and ASNN, ±20 %). The noise was added
with the lowest AARD was selected. A learning rate of 0.0 0 01 was to the reduced molar flows l ∗ and v∗ , then the values of x, y,
utilized. and λ were recalculated. It can be observed that, as expected, the
An extra validation dataset consisting of 10 0,0 0 0 datapoints average error increases when the noise in the datapoints is in-
was generated in the same fashion as the training dataset. This creased. Despite the aggregated noise, the difference between the
was done to test the generalization capabilities of the ASNN on trained ASNN and the validation dataset remains within a reason-
the entire parameter space. The AARDs of important process vari- able range.
ables are presented in Table 1 (under the column ASNN, ±0 %). The A hybrid model based on the serial paradigm was developed in
molar fractions were calculated by normalizing the corresponding order to compare it with the ASNN. The columns labeled as “SHM,
l ∗ and v∗ . The difference between the ASNN and Aspen’s calcula- ±5 % and SHM, ±20 % in Table 1 report the deviation of models
tions is small considering that few datapoints (50 runs) were used generated by a serial hybrid model framework (Fig. 4a). The λi pa-
for the training. It is possible to have better predictions if more rameters were fitted through a shallow neural network (SHM) with
training data is used. For example, if using 300 datapoints, the av- 8 artificial neurons (same as in the ASNN). The reduced product
erage error can be reduced to 0.2%. In view of the small AARD, it is flows l ∗ and v∗ were calculated with the rigorous mass balances
reasonable to consider the proposed ASNN as a suitable surrogate while the molar compositions (x and y) were calculated by nor-
10
A. Carranza-Abaid and J.P. Jakobsen Computers and Chemical Engineering 163 (2022) 107858
Table 1
AARD (%) between the neural networks and the extra validation dataset when using different
noise levels in the training dataset.
Fig. 12. Additional processes that can be modelled by utilizing the ASNN architecture shown in Fig. 11. (a) Stripper column, and (b) Biogas upgrading process.
malizing l ∗ and v∗ . The NNP-based models outperform the predic- The process shown in Fig. 12(b) has two inlet components in
tion capabilities of the coupled hybrid models. In cases where the the feed (the raw biogas has C H4 and C O2 ) and 6 process parame-
noise is high (±20 %), the AARD of the ASNNs can be 40 to 50% ters which are the reboiler temperature (TR ), reboiler pressure (PR ),
lower than when compared to the serial hybrid models. We con- solvent flow (S), solvent temperature (TS ), absorber pressure (PA )
sider that there are two causes of the superior performance of the and biomethane to raw biogas ratio ( ). Therefore, the layer I1 has
ASNNs over the serial hybrid models. The first cause is due to the 8 input parameters and layers L2 to L7 have 2 neurons. The ASNN
ASNN architecture which helps the optimization process. Secondly, were trained by utilizing data generated from models validated in
the numerical differentiation error negatively affects the optimiza- previous works (Carranza-Abaid et al., 2021; Carranza-Abaid and
tion step of the serial hybrid model. Jakobsen, 2021, 2020). The same ASNN architecture was used for
It is worth mentioning that the proposed ASNN structure can both processes and showed an AARD of 1,2% in both cases (8 arti-
also work for a P flash problem as well. The only difference ficial neurons were used in L1 , however, the AARD can be lower if
would be the input layer I1 , where T should be substituted by P . more parameters are included).
On the other hand, the solution of a P T flash would require a dif- There are important points that must be considered when ap-
ferent ASNN architecture. Due to the characteristics of the P T flash plying the two-product separator ASNN architecture to other pro-
problems, it has been suggested in the literature (Poort et al., cesses:
2019), to use regression and classifier neural networks together to • In cases where the process is quite complex or there is more
solve a P T flash problem. The authors reported that over 25 million available data, additional layers can be inserted in between
datapoints were used to train the model for a binary mixture. This or before layer L1 in Fig. 11. Adding more layers should not
points out that using the NNP approach for a T P flash will require modify layers L2 to L7 (except for the number of neurons
designing a possibly more complex architecture. which must be equal to the number of components).
The generality and easy adaptability of ASNN structures to other • As opposed to the flash process, the λi parameters might
processes is a practical advantage. In particular, the hidden lay- not have a formal definition in other processes (e.g., distil-
ers with fixed parameters (layers L2 to L6 ) will be the same as lation or gas stripping). Despite this, the λi parameter in-
in Fig. 11. This means that as long as the modelled system has a dicates that, independently of the process, there is always
feed stream and two product streams, the ASNN structure can be a relationship between the component compositions of the
utilized independently of the modelled process. For example, the two products that is independent of the extensive properties
same ASNN architecture was utilized for modelling the processes of the system.
shown in Fig. 12. The ASNN architecture for modelling the flash • If the process is modelled with the ASNN structure but the
separator and the stripping process (Fig. 12(a)) are quite similar, mass balances do not behave homogeneously, then the pro-
since the only differences are the involved components (the rich posed ASNN architecture might not be optimal. However,
amine stream has CO2 , MEA and H2 O) and the input variables to knowing that the system does not behave homogeneously is
layer I1 (molar fractions (z), temperature in the reboiler (TR ), feed also valuable modelling information that can be utilized in
temperature (TF ), and the bottoms to feed ratio ( )). order to modify the ASNN to account for extensive variables.
11
A. Carranza-Abaid and J.P. Jakobsen Computers and Chemical Engineering 163 (2022) 107858
12
A. Carranza-Abaid and J.P. Jakobsen Computers and Chemical Engineering 163 (2022) 107858
Fig. 13. ASNN architecture of the two-product separator using the matrix inversion solution algorithm.
Each model was trained 20 times and the model with the low- hesion between the data, one should explicitly represent all the
est AARD was selected. The weights of the datapoints were calcu- first-principles relationships.
lated with (1/l ∗ )2 and (1/v∗ )2 in order to optimize for the AARD
instead of the mean square error. Note that the input layer I1 that 5. Conclusions
feeds the universal approximator substructure uses a normalized
input while input layers I2 and I3 do not. The models were trained This work presents the Neural Network Programming (NNP)
using the Bayesian stochastic optimization algorithm available in hybrid modelling paradigm. It integrates a first principles mod-
the Deep Learning Toolbox of Matlab 2020b. elling approach with a data-driven algorithm. The models devel-
The AARD presented in Table 2 was calculated with the same oped with NNP are Algorithmically Structured Neural Networks
extra validation dataset used in Section 4.2.1. Comparing the AARD (ASNNs). The idea is to generate a first principles model, decom-
(models 1 and 2 in Table 2) against the results of Table 1 shows pose it and transcript the mathematical equations to an ASNN. This
that utilizing the ASNN1 (Fig. 11) has, in general, better predic- allows the ASNN to be physically coherent over the entire solution
tion capabilities than the ASNN2 (Fig. 13). This is understand- space, including limit cases (e.g., ASNNs do not predict a positive
able since the ASNN1 architecture has more information regard- molar flow of a component if it is not present in the mixture). Due
ing the relationship between the input and output variables than to the features of ASNNs, there is no need to utilize sophisticated
the ASNN2. This reveals a clear tradeoff between the ASNN com- performance functions in order to account for physics constraints
plexity and its prediction capabilities. The more physics informa- and the hyperparameter tuning is not critical as in other data-
tion is provided to the ASNN architecture, the less data and fit- driven modelling techniques. Since the first principles equations
ting parameters will be needed; thus, the model will have a lower are in-built in the ASNN, the data is more efficiently utilized for
entropy. In fact, adding more parameters without including more fitting process parameters rather than trying to rediscover physics
datapoints is detrimental to the model performance since it over- concepts that must hold. This causes the ASNNs to exhibit superior
fits the model to the training data (compare models #1 and #2 performance than black-box and conventional serial hybrid model
in Table 2). One can notice that in order to have a similar AARD configurations.
between the ASNN1 and ASNN2, the number of datapoints is in- Three examples on how to model chemical engineering prob-
creased roughly tenfold (from 50 to 500) and the number of neu- lems with the NNP method are presented in this work. The first
rons in the hidden layer is increased two times (from 8 to 16). example consisted in the rigorous representation of the physics
The type of universal approximator substructure can be differ- equations and relationships in the same way as how it is done in
ent from a shallow neural network. In fact, Table 2 shows that the a mechanistic model (Section 4.1). Being able to transcript physics
model AARD can be reduced 50% if a deep neural network sub- laws within an ASNN is of utmost importance in subfields where
structure that has 2 hidden layers is utilized. Moreover, better pre- the models must comply with a large set of requirements in order
diction capabilities can be achieved than in the ASNN1 model if to be deemed as “correct” (e.g., property modelling or thermody-
more datapoints and a deep neural network substructure is uti- namics). The second example (Section 4.2.1) shows a more relaxed
lized. representation of the first principles equations. It is highlighted
It should be remarked that applying NNP with exact first prin- that through a structural analysis of the first principles model it is
ciples representations for every equipment in a large process can possible to lump several parameters onto a single parameter with-
be a time-consuming task, therefore, in many cases a compro- out disrupting the reliability of the model. The third example (sec-
mise between accuracy and practicality will emerge. For exam- tion 4.2.2) presents an alternative approach for performing mass
ple, in a process with multiple unit operations and subprocesses balances where the parameter matrix of a hidden layer is used to
and large amount of data, it is more practical to apply the in- solve mass balance systems. This approach is of fundamental im-
verse matrix approach since the mass balances will be solved in portance in the application of NNP to large processes since it can
a single step instead of multiple layers. On the other hand, for sys- be used to model processes with several process streams.
tems with limited amount of measured data or where keeping cer- There is a compromise between the complexity of the ASNN ar-
tain mathematical relationships is paramount to maintain the co- chitectures and the amount of physics knowledge embedded into
13
A. Carranza-Abaid and J.P. Jakobsen Computers and Chemical Engineering 163 (2022) 107858
them. Therefore, a careful assessment of the assumptions and the we are grateful for the comments and suggestions provided by
information that wants to be predicted with the model must be the reviewers during the peer-review process. This research was
done before constructing the ASNN. Performing a structural anal- funded by the Faculty of Natural Sciences of the Norwegian Uni-
ysis allows the user to select the parts of the model that can be versity of Science and Technology (NTNU).
substituted with a universal approximator substructure and how
to connect them with the rest of the model. As shown in the two-
product separator example, a structural analysis allows ASNNs to Supplementary materials
be effectively utilized for the formulation of surrogate models by
removing the iterative loops that are commonly seen in mechanis- Supplementary material associated with this article can be
tic models. NNP allows the user to detect sections of the model found, in the online version, at doi:10.1016/j.compchemeng.2022.
that can be substituted by a universal approximator and how to 107858.
connect it to the rest of the model.
An interesting feature of ASNNs is their transferability between References
processes with akin characteristics. For example, this work pre-
sented the ASNN architecture needed to model a flash separator Aguiar, H.C., Filho, R.M., 2001. Neural network and hybrid model: a discussion about
which was later applied to model biogas upgrading processes with- different modeling techniques to predict pulping degree with industrial data.
Chem. Eng. Sci. 56, 565–570. doi:10.1016/S0 0 09-2509(0 0)0 0261-X.
out substantial modifications to the original ASNN architecture. Åkesson, J., Årzén, K.E., Gäfvert, M., Bergdahl, T., Tummescheit, H., 2010. Modeling
In contrast to the typical hybrid configurations, NNP only relies and optimization with Optimica and JModelica.org-Languages and tools for solv-
on using a single auto-differentiable model that complies with the ing large-scale dynamic optimization problems. Comput. Chem. Eng. 34, 1737–
1749. doi:10.1016/j.compchemeng.2009.11.011.
physics laws rather than independently utilizing a first principles Azarpour, A., Borhani, N.G., R, T., Wan Alwi, S., Manan, A., I, Z, Abdul Mutalib, M.,
model and an artificial neural network model. This allows a more 2017. A generic hybrid model development for process analysis of industrial
accurate and faster training because no numerical differentiation fixed-bed catalytic reactors. Chem. Eng. Res. Des. 117, 149–167. doi:10.1016/j.
cherd.2016.10.024.
techniques are utilized. Moreover, and as opposed to mechanistic
Bazaei, A., Majd, V.J., 2003. Feedback linearization of discrete-time nonlinear un-
models, the NNP framework directs the user to develop models in certain plants via first-principles-based serial neuro-gray-box models. J. Process
matricial form, which means that multiple simulations can be uti- Control 13, 819–830. doi:10.1016/S0959-1524(03)0 0 027-1.
Bikmukhametov, T., Jäschke, J., 2020. Combining machine learning and process en-
lized with a single call to the model function (i.e., instead of uti-
gineering physics towards enhanced accuracy and explainability of data-driven
lizing for or while cycles). We expect that these features will be models. Comput. Chem. Eng. 138. doi:10.1016/j.compchemeng.2020.106834.
exploited even further in the coming decades due to the current Bishop, C.M., 2006. Pattern Recognition and Machine Learning. Pattern Recognition
advances in quantum computing. and Machine Learning. Springer Science+Business Media LLC.
Bogusch, R., Lohmann, B., Marquardt, W., 2001. Computer-aided process model-
Interpreting the parameters of a black-box ANN might be an ing with ModKit. Comput. Chem. Eng. 25, 963–995. doi:10.1016/S0098-1354(01)
overcomplicated task due to the high variability of the optimized 00626-3.
parameters. In other words, it is unlikely to find a physical mean- Bollas, G.M., Papadokonstadakis, S., Michalopoulos, J., Arampatzis, G., Lappas, A.A.,
Vasalos, I.A., Lygeros, A., 2003. Using hybrid neural networks in scaling up an
ing of parameters whose numerical values are a consequence of FCC model from a pilot plant to an industrial unit. Chem. Eng. Process. Process
the entropy associated to the ANN training. Hence, it is more ef- Intensif. 42, 697–713. doi:10.1016/S0255-2701(02)00206-4.
fective to assume an ANN architecture based on a priori knowledge Book, N.L., Ramirez, W.F., 1984. Structural analysis and solution of systems of alge-
braic design equations. AIChE J. 30, 609–622. doi:10.10 02/aic.69030 0412.
and find patterns in the optimized parameter distributions rather Carranza-Abaíd, A., González-García, R., 2020. A Petlyuk distillation column dynamic
than trying to interpret generic ANN architectures. If the architec- analysis: Hysteresis and bifurcations. Chem. Eng. Process. Process Intensif. 149.
ture of an ANN has cohesion with the physics phenomenon de- doi:10.1016/j.cep.2020.107843.
Carranza-Abaid, A., Jakobsen, J.P., 2021. A computationally efficient formulation of
scription, the parameter entropy will be lower and therefore, there
the governing equations for unit operation design. Comput. Chem. Eng. 154,
is high likelihood that the model will provide repeatable param- 107500. doi:10.1016/j.compchemeng.2021.107500.
eters. NNP can be utilized to discard incorrect assumptions about Carranza-Abaid, A., Jakobsen, J.P., 2020. A non-autonomous relativistic frame of
reference for unit operation design. In: Computer Aided Chemical Engineer-
the physics phenomena.
ing. Comput. Aided Chem. Eng., pp. 151–156. doi:10.1016/B978- 0- 12- 823377-1.
Further work of NNP includes the application of this method for 50026-4.
the development of consistent thermodynamic and transport prop- Carranza-Abaid, A., Wanderley, R.R., Knuutila, H.K., Jakobsen, J.P., 2021. Analysis and
erty models. Additionally, we hypothesize that NNP can be utilized selection of optimal solvent-based technologies for biogas upgrading. Fuel 303,
121327. doi:10.1016/j.fuel.2021.121327.
to develop NNP-based PINNs in order to guarantee the exact exe- Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., Elhadad, N., 2015. Intelligible
cution of first principles equations. models for healthcare: Predicting pneumonia risk and hospital 30-day read-
mission. In: Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min. 2015-Augus,
pp. 1721–1730. doi:10.1145/2783258.2788613.
Declaration of Competing Interest Cellier, F.E., Elmqvist, H., 1993. Automated formula manipulation supports object-
oriented continuous-system modeling. IEEE Control Syst. 13, 28–38. doi:10.1109/
The authors declare that they have no known competing finan- 37.206983.
Chen, T., Guestrin, C., 2016. XGBoost. In: Proceedings of the 22nd ACM SIGKDD In-
cial interests or personal relationships that could have appeared to ternational Conference on Knowledge Discovery and Data Mining. ACM, New
influence the work reported in this paper. York, NY, USA, pp. 785–794. doi:10.1145/2939672.2939785.
Cybenko, G., 1989. Approximation by superpositions of a sigmoidal function. Math.
Control Signals Syst. 2, 303–314. doi:10.1007/BF02551274.
CRediT authorship contribution statement Daw, A., Karpatne, A., Watkins, W., Read, J., Kumar, V., 2017. Physics-guided Neural
Networks (PGNN): An Application in Lake Temperature Modeling.
Andres Carranza-Abaid: Conceptualization, Methodology, Soft- Destro, F., Facco, P., García Muñoz, S., Bezzo, F., Barolo, M., 2020. A hybrid frame-
work for process monitoring: enhancing data-driven methodologies with state
ware, Validation, Formal analysis, Validation, Data curation, Writ-
and parameter estimation. J. Process Control 92, 333–351. doi:10.1016/j.jprocont.
ing – original draft, Writing – review & editing, Visualization. 2020.06.002.
Jana P. Jakobsen: Resources, Supervision, Writing – review & edit- Dhooge, A., Govaerts, W., Kuznetsov, Y.A., 2003. MATCONT: a MATLAB package for
numerical bifurcation analysis of ODEs. ACM Trans. Math. Softw. 29, 141–164.
ing, Project administration, Funding acquisition.
doi:10.1145/779359.779362.
Duarte, B.P.M., Saraiva, P.M., 2003. Hybrid models combining mechanistic models
Acknowledgments with adaptive regression splines and local stepwise regression. Ind. Eng. Chem.
Res. 42, 99–107. doi:10.1021/ie0107744.
Eason, J., Cremaschi, S., 2014. Adaptive sequential sampling for surrogate model
We want to thank Tore Haug-Warberg for the fruitful and in- generation with artificial neural networks. Comput. Chem. Eng. 68, 220–232.
sightful discussions in the development of this work. Additionally, doi:10.1016/j.compchemeng.2014.05.021.
14
A. Carranza-Abaid and J.P. Jakobsen Computers and Chemical Engineering 163 (2022) 107858
Ebrahimpour, M., Yu, W., Young, B., 2021. Artificial neural network modelling for Nikolić, D.D., 2016. DAE Tools: equation-based object-oriented modelling, simu-
cream cheese fermentation pH prediction at lab and industrial scales. Food Bio- lation and optimisation software. PeerJ Comput. Sci. doi:10.7717/peerj-cs.54,
prod. Process. 126, 81–89. doi:10.1016/j.fbp.2020.12.006. 2016.
Elmqvist, H., 1978. A structured model language for large continuous systems. Lund Nuchitprasittichai, A., Cremaschi, S., 2013. Optimization of CO2 capture process with
Institute of Technology (LTH. aqueous amines - A comparison of two simulation-optimization approaches.
Faramarzi, L., Kontogeorgis, G.M., Michelsen, M.L., Thomsen, K., Stenby, E.H., 2010. Ind. Eng. Chem. Res. 52, 10236–10243. doi:10.1021/ie3029366.
Absorber Model for CO2 Capture by Monoethanolamine 3751–3759 doi:10.1021/ Oliveira, R., 2004. Combining first principles modelling and artificial neural net-
ie901671f. works: a general framework. Comput. Chem. Eng. 28, 755–766. doi:10.1016/j.
Fedorova, M., Sin, G., Gani, R., 2015. Computer-aided modelling template: concept compchemeng.2004.02.014.
and application. Comput. Chem. Eng. 83, 232–247. doi:10.1016/j.compchemeng. Peres, J., Oliveira, R., Feyo de Azevedo, S., 20 0 0. Knowledge based modular net-
2015.02.010. works for process modelling and control. Comput. Aided Chem. Eng. 8, 247–252.
Funahashi, K.I., 1989. On the approximate realization of continuous mappings by doi:10.1016/S1570-7946(0 0)80 043-7.
neural networks. Neural Netw. 2, 183–192. doi:10.1016/0893-6080(89)90 0 03-8. Perez, E., Strub, F., De Vries, H., Dumoulin, V., Courville, A., 2018. FiLM: Visual rea-
Gao, F., Huang, T., Wang, J., Sun, J., Yang, E., Hussain, A., 2018. Com- soning with a general conditioning layer. Proceedings of the AAAI Conference
bining deep convolutional neural network and SVM to SAR image tar- on Artificial Intelligence. AAAI 3942–3951 2018.
get recognition. In: Proc. - 2017 IEEE Int. Conf. Internet Things, IEEE Piela, P.C., Epperly, T.G., Westerberg, K.M., Westerberg, A.W., 1991. ASCEND: an
Green Comput. Commun. IEEE Cyber, Phys. Soc. Comput. IEEE Smart Data, object-oriented computer environment for modeling and analysis: the modeling
iThings-GreenCom-CPSCom-SmartData 2017 2018-Janua, 1082–1085 doi:10. language. Comput. Chem. Eng. 15, 53–72. doi:10.1016/0 098-1354(91)870 06-U.
1109/iThings- GreenCom- CPSCom- SmartData.2017.165. Poort, J.P., Ramdin, M., van Kranendonk, J., Vlugt, T.J.H., 2019. Solving vapor-liquid
Georgieva, P., Meireles, M.J., Feyo de Azevedo, S., 2003. Knowledge-based hybrid flash problems using artificial neural networks. Fluid Phase Equilibria 490, 39–
modelling of a batch crystallisation when accounting for nucleation, growth 47. doi:10.1016/j.fluid.2019.02.023.
and agglomeration phenomena. Chem. Eng. Sci. 58, 3699–3713. doi:10.1016/ Preisig, H.A., 2015. Constructing an ontology for physical-chemical processes.
S0 0 09-2509(03)0 0260-4. Comput. Aided Chem. Eng. 37, 1001–1006. doi:10.1016/B978- 0- 444-63577-8.
Güneş Baydin, A., Pearlmutter, B.A., Andreyevich Radul, A., Mark Siskind, J., 2018. 50012-7.
Automatic differentiation in machine learning: a survey. J. Mach. Learn. Res. 18, Psichogios, D.C., Ungar, L.H., 1992. A hybrid neural network-first principles approach
1–43. to process modeling. AIChE J. 38, 1499–1511. doi:10.1002/aic.690381003.
Hagan, M.T., Menhaj, M.B., 1994. Training feedforward networks with the Marquardt Raissi, M., Perdikaris, P., Karniadakis, G.E., 2019. Physics-informed neural networks:
algorithm. IEEE Trans. Neural Netw. 5, 989–993. doi:10.10 06/brcg.1996.0 066. a deep learning framework for solving forward and inverse problems involving
Heitzig, M., Linninger, A.A., Sin, G., Gani, R., 2014. A computer-aided framework for nonlinear partial differential equations. J. Comput. Phys. 378, 686–707. doi:10.
development, identification and management of physiologically-based pharma- 1016/j.jcp.2018.10.045.
cokinetic models. Comput. Chem. Eng. 71, 677–698. doi:10.1016/j.compchemeng. Ramirez, W.F., 1997. Computational Methods for Process Simulation. Butterworth
2014.07.016. Heinemann doi:10.1016/B978- 0- 7506- 1198- 5.50126- 0.
Hornik, K., 1991. Approximation capabilities of multilayer neural network. Neural Renon, H., Prausnitz, J.M., 1968. Local compositions in thermodynamic excess func-
Netw. 4, 251–257. tions for liquid mixtures. AICHE J. 14, 135–144.
Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer feedforward networks are Ribeiro, M.T., Singh, S., Guestrin, C., 2016. Model-Agnostic Interpretability of Ma-
universal approximators. Neural Netw. 2, 359–366. doi:10.1016/0893-6080(89) chine Learning.
90020-8. Rosenblatt, F., 1962. Principles of Neurodynamics: Perceptrons and the Theory of
Huang, H.L., Wu, D., Fan, D., Zhu, X., 2020. Superconducting quantum computing: a Brain Mechanisms.
review. Sci. China Inf. Sci. 63, 1–32. doi:10.1007/s11432- 020- 2881- 9. Sansana, J., Joswiak, M.N., Castillo, I., Wang, Z., Rendall, R., Chiang, L.H., Reis, M.S.,
Jain, B.J., Geibel, P., Wysotzki, F., 2005. SVM learning with the Schur-Hadamard inner 2021a. Recent trends on hybrid modeling for Industry 4.0. Comput. Chem. Eng.
product for graphs. Neurocomputing 64, 93–105. doi:10.1016/j.neucom.2004.11. 151, 107365. doi:10.1016/j.compchemeng.2021.107365.
011. Sansana, J., Joswiak, M.N., Castillo, I., Wang, Z., Rendall, R., Chiang, L.H., Reis, M.S.,
Jaynes, E.T., 1957. Information theory and statistical mechanics. Phys. Rev. 106, 620– 2021b. Recent trends on hybrid modeling for Industry 4.0. Comput. Chem. Eng.
630. doi:10.1103/PhysRev.106.620. 151, 107365. doi:10.1016/j.compchemeng.2021.107365.
Kahrs, O., Marquardt, W., 2008. Incremental identification of hybrid process models. Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G.,
Comput. Chem. Eng. 32, 694–705. doi:10.1016/j.compchemeng.2007.02.014. Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S.,
Karl Johan Åström, H.E.S.M, 1998. Evolution of continuous-time modeling and sim- Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M.,
ulation. ESM 1–10. Kavukcuoglu, K., Graepel, T., Hassabis, D., 2016. Mastering the game of Go
Karniadakis, G.E., Kevrekidis, I.G., Lu, L., Perdikaris, P., Wang, S., Yang, L., 2021. with deep neural networks and tree search. Nature 529, 484–489. doi:10.1038/
Physics-informed machine learning. Nat. Rev. Phys. 3, 422–440. doi:10.1038/ nature16961.
s42254- 021- 00314- 5. Soares, R., de, P., Secchi, A.R., 2003. EMSO: a new environment for modelling,
Karpatne, A., Atluri, G., Faghmous, J.H., Steinbach, M., Banerjee, A., Ganguly, A., simulation and optimisation. Comput. Aided Chem. Eng. 947–952. doi:10.1016/
Shekhar, S., Samatova, N., Kumar, V., 2017. Theory-guided data science: a new S1570- 7946(03)80239- 0.
paradigm for scientific discovery from data. IEEE Trans. Knowl. Data Eng. 29, Sohlberg, B., Jacobsen, E.W., 2008. Grey box modelling – branches and experiences.
2318–2331. doi:10.1109/TKDE.2017.2720168. In: IFAC Proc. Vol.. IFAC doi:10.3182/20080706- 5- kr- 1001.01934.
Kuntsche, S., Barz, T., Kraus, R., Arellano-Garcia, H., Wozny, G., 2011. MOSAIC a Stephanopoulos, G., Henning, G., Leone, H., 1990. MODEL.LA. A modeling language
web-based modeling environment for code generation. Comput. Chem. Eng. 35, for process engineering-I. The formal framework. Comput. Chem. Eng. 14, 813–
2257–2273. doi:10.1016/j.compchemeng.2011.03.022. 846. doi:10.1016/0098- 1354(90)87040- V.
Leal, J.R., Romanenko, A., Santos, L.O., 2017. Daedalus modeling framework: build- Su, H.T., Bhat, N., Minderman, P.A., McAvoy, T.J., 1992. Integrating neural networks
ing first-principle dynamic models. Ind. Eng. Chem. Res. 56, 3332–3346. doi:10. with first principles models for dynamic modeling. IFAC Proc. Vol. 25, 327–332.
1021/acs.iecr.6b03110. doi:10.1016/s1474-6670(17)51013-7.
Letham, B., Rudin, C., McCormick, T.H., Madigan, D., 2015. Interpretable classifiers Tan, K.C., Li, Y., 2002. Grey-box model identification via evolutionary computing.
using rules and bayesian analysis: building a better stroke prediction model. Control Eng. Pract. 10, 673–684. doi:10.1016/S0967-0661(02)0 0 031-X.
Ann. Appl. Stat. 9, 1350–1371. doi:10.1214/15-AOAS848. Thompson, M.L., Kramer, M.A., 1994. Modeling chemical processes using prior
Linninger, A.A., Krendl, H., 1999. TechTool - computer-aided generation of pro- knowledge and neural networks. AIChE J. 40, 1328–1340. doi:10.1002/aic.
cess models (part 1-a generic mathematical language). Comput. Chem. Eng. 23, 690400806.
S703–S706. doi:10.1016/S0098-1354(99)80172-0. Tian, Y., Zhang, J., Morris, J., 2001. Modeling and optimal control of a batch poly-
MacKay, D.J.C., 1992. Bayesian interpolation. Neural Comput. 4, 415–447. doi:10. merization reactor using a hybrid stacked recurrent neural network model. Ind.
1162/neco.1992.4.3.415. Eng. Chem. Res. 40, 4525–4535. doi:10.1021/ie0010565.
Manevitz, L., Yousef, M., 2007. One-class document classification via neural net- Torrisi, M., Pollastri, G., Le, Q., 2020. Deep learning methods in protein structure
works. Neurocomputing 70, 1466–1481. doi:10.1016/j.neucom.2006.05.013. prediction. Comput. Struct. Biotechnol. J. 18, 1301–1310. doi:10.1016/j.csbj.2019.
Marquardt, D.W., 1963. An algorithm for least-squares estimation of nonlin- 12.011.
ear parameters. J. Soc. Ind. Appl. Math. 11, 431–441. doi:10.1137/0111030, Tsen, A.Y.Di, Jang, S.S., Wong, D.S.H., Joseph, B., 1996. Predictive control of quality in
https://doi.org/https://doi.org/. batch polymerization using hybrid ANN models. AIChE J. 42, 455–465. doi:10.
McCulloch, W.S., Pitts, W., 1943. A logical calculus of the ideas immanent in nervous 1002/aic.690420215.
activity. Bull. Math. Phys. 113–133. Tulleken, H.J.A.F., 1993. Grey-box modelling and identification using physical
Meng, Y., Yu, S., Zhang, J., Qin, J., Dong, Z., Lu, G., Pang, H., 2019. Hybrid modeling knowledge and bayesian techniques. Automatica 29, 285–308. doi:10.1016/
based on mechanistic and data-driven approaches for cane sugar crystallization. 0 0 05- 1098(93)90124- C.
J. Food Eng. 257, 44–55. doi:10.1016/j.jfoodeng.2019.03.026. Van Lith, P.F., Betlem, B.H.L., Roffel, B., 2003. Combining prior knowledge with
Morbach, J., Yang, A., Marquardt, W., 2007. OntoCAPE-A large-scale ontology for data driven modeling of a batch distillation column including start-up. Comput.
chemical process engineering. Eng. Appl. Artif. Intell. 20, 147–161. doi:10.1016/j. Chem. Eng. 27, 1021–1030. doi:10.1016/S0 098-1354(03)0 0 067-X.
engappai.2006.06.010. Venkatasubramanian, V., 2019. The promise of artificial intelligence in chemical en-
Nagy, Z.K., Szilagyi, B., Pal, K., Tabar, I.B., 2020. A novel robust digital design of a gineering: Is it here, finally? AIChE J. 65, 466–478. doi:10.1002/aic.16489.
network of industrial continuous cooling crystallizers of dextrose monohydrate: Vinyals, O., Babuschkin, I., Czarnecki, W.M., Mathieu, M., Dudzik, A., Chung, J.,
from laboratory experiments to industrial application. Ind. Eng. Chem. Res. 59, Choi, D.H., Powell, R., Ewalds, T., Georgiev, P., Oh, J., Horgan, D., Kroiss, M., Dani-
22231–22246. doi:10.1021/acs.iecr.0c04870. helka, I., Huang, A., Sifre, L., Cai, T., Agapiou, J.P., Jaderberg, M., Vezhnevets, A.S.,
15
A. Carranza-Abaid and J.P. Jakobsen Computers and Chemical Engineering 163 (2022) 107858
Leblond, R., Pohlen, T., Dalibard, V., Budden, D., Sulsky, Y., Molloy, J., Paine, T.L., Wilson, G.M., 1964. Vapor-liquid equilibrium. XI. A new expression for the excess
Gulcehre, C., Wang, Z., Pfaff, T., Wu, Y., Ring, R., Yogatama, D., Wünsch, D., free energy of mixing. J. Am. Chem. Soc. 86, 127–130. doi:10.1021/ja01056a002.
McKinney, K., Smith, O., Schaul, T., Lillicrap, T., Kavukcuoglu, K., Hassabis, D., Xiong, Q., Jutan, A., 2002. Grey-box modelling and control of chemical processes.
Apps, C., Silver, D., 2019. Grandmaster level in StarCraft II using multi-agent re- Chem. Eng. Sci. 57, 1027–1039. doi:10.1016/S0 0 09-2509(01)0 0439-0.
inforcement learning. Nature 575, 350–354. doi:10.1038/s41586- 019- 1724- z. Xu, Y., Verma, D., Sheridan, R.P., Liaw, A., Ma, J., Marshall, N.M., McIntosh, J.,
Wang, F., Rudin, C., 2015. Falling rule lists. J. Mach. Learn. Res. 38, 1013–1022. Sherer, E.C., Svetnik, V., Johnston, J.M., 2020. Deep dive into machine learning
Wang, X., Chen, J., Liu, C., Pan, F., 2010. Hybrid modeling of penicillin fermentation models for protein engineering. J. Chem. Inf. Model. 60, 2773–2790. doi:10.1021/
process based on least square support vector machine. Chem. Eng. Res. Des. 88, acs.jcim.0c0 0 073.
415–420. doi:10.1016/j.cherd.2009.08.010. Zorzetto, L.F.M., Maciel Filho, R., Wolf-Maciel, M.R., 20 0 0. Process modelling devel-
Westerweele, M.R., Laurens, J., 2008. Mobatec Modeller - A flexible and transparent opment through artificial neural networks and hybrid models. Comput. Chem.
tool for building dynamic process models. Comput. Aided Chem. Eng. 25, 1045– Eng. 24, 1355–1360. doi:10.1016/S0 098-1354(0 0)0 0419-1.
1050. doi:10.1016/S1570- 7946(08)80180- 0.
16