Professional Documents
Culture Documents
Recent advances in statistical procedures, coupled with the availability of high performance computational
resources and the large mass of data generated from high throughput screening, have enabled a new paradigm
for building mathematical models of the kinetic behavior of catalytic reactions. A Bayesian approach is used
to formulate the model building problem, estimate model parameters by Monte Carlo based methods,
discriminate rival models, and design new experiments to improve the discrimination and fidelity of the
See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.
parameter estimates. The methodology is illustrated with a typical, model building problem involving three
proposed Langmuir-Hinshelwood rate expressions. The Bayesian approach gives improved discrimination
Downloaded via INDIAN INST OF SCIENCE on September 25, 2019 at 08:52:43 (UTC).
of the three models and higher quality model parameters for the best model selected as compared to the
traditional methods that employ linearized statistical tools. This paper describes the methodology and its
capabilities in sufficient detail to allow kinetic model builders to evaluate and implement its improved model
discrimination and parameter estimation features.
2. Framework for Mathematical Model Building follows the logic of the sequential model building paradigm.
20 Unfortunately, this simplicity is achieved at the expense of a
Our approach will follow that of Reilly and Blau, illustrated
computational burden which has only recently permitted its use
in Figure 1, where sequential design, experimentation, and
on user-friendly computer software and hardware available on
analysis of experimental data can be holistically integrated to
the desktop of a typical reaction modeling specialist. For
provide an efficient and accurate approach for model discrimi-
complex models, i.e. large numbers of reactions and parameters,
nation and parameter estimation of chemical reaction systems.
considerable skills with the mathematical aspects of the method
Summarily, the first step in the model building process is model
are still necessary, as well as the need to resort to a server or
discrimination, where several physically meaningful models are
cluster of servers to achieve the necessary computing power.
postulated by the scientist to describe the reaction system being
The strength of the Bayesian approach is its ability to identify
investigated. Then sets of experimental data are designed and
an adequate model with realistic parameter estimates by
carried out in the laboratory to discriminate among these
incorporating the prior knowledge of the scientist/experimental-
candidate models. The abilities of the various models to match
ist. Using this knowledge is an anathema to statisticians who
these sets of data for various model parameter values are
correctly point out that we are “biasing” the results and not
compared until the best one is found. If none of the models is
letting the data “speak for itself.” This is precisely why these
deemed adequate, additional ones are postulated and the
methods are being embraced by the engineering and scientific
sequential experimentation/analysis process is continued until
community who do not want their expertise silenced by the
a suitable one is found.
vagaries of experimental analysis. In traditional approaches, the
Once an adequate model has been selected, the next step in
main value of the experience of the investigator is in
the model building process is parameter estimation, where the
the translation of that experience into initial guesses for the
quality of the parameter estimates is quantified by point
parameters. In the Bayesian approach, the belief of the
estimates and their confidence regions. We note that there is a
experimentalist in parameter ranges and the plausibility of
tendency among model builders and scientists to attribute
candidate models are specifically acknowledged in the imple-
meaning to these estimates before an adequate model is
mentation. It is noteworthy to mention that as the amount of
obtained. Clearly, parameter estimates for invalid models are
well-designed experimental data increases, the influence of the
meaningless. Consequently, the quality of parameter estimates
initial beliefs diminishes.
can only be determined after an adequate model has been
2.1. Example Problem. Before proceeding with the math-
identified. Then, if the confidence regions for the parameters
ematical and statistical development of the Bayesian approach,
for that model are unacceptable, as quantified by their shape
we will first define a simple model reaction system A + B f
and size, additional experiments may be needed to improve their
C + D over a catalyst in a differential reactor. We assume that
quality. The process of designing experiments followed by the
A, B, and C reversibly adsorb on the surface of the catalyst, D
determination of parametric uncertainty followed by additional
is not absorbed, and the reaction is irreversible under the
experimentation is continued until acceptable quality parameters
operating conditions studied. To represent typical laboratory
are realized. Only after this process has been completed can
conditions, we use a feed stream V0 ) 100 cm3/min fed to a
the model be used for reactor design and analysis. In the special
tubular reactor which contains w ) 0.1 g of catalyst. The stream
case of catalyst design where the model parameters are
is composed of various input concentrations of A, B, and C
paramount, they may now be used as the response variables in
represented as partial pressures PA0, PB0, and PC0 selected from
another model building exercise that relates these parameters
the following ranges
to the structural and chemical descriptors of the catalyst.
The model building process (discrimination, validation, and 0.5 atm e PA0 e 1.5 atm
parameter estimation) as described above conceptually is well- 0.5 atm e PB0 e 1.5 atm
known although sometimes applied incorrectly because the 0 atm e PC0 e 0.2 atm
statistical and mathematical sophistication required is not fully
The temperature of the system can be changed over the range
appreciated. What is new in the Bayesian approach presented
here is the simplicity of the approach and the natural way it 630 K e T e 670 K
4770 Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009
The temperature range is a bit narrow, but is representative of adjusted to produce significant surface coverages. Experimental
systems where complex reaction networks impose selectivity error was added to the rate r predicted by eq 3 and was assumed
constraints. The outlet concentrations PA, PB, and PC in to be normally distributed with zero mean and a variance of
atmospheres are measured. From these values, the molar 0.002 (gmol/(min kg catalyst))2, i.e. a standard deviation of
conversion, x, of the reaction is calculated and used to determine 4.5%. An intuitive experimental program used by many catalysis
the rate of reaction from the design equation for a differential researchers is to change one experimental factor at a time,
plug flow reactor: keeping all other factors at their nominal values. We will
designate such an experimental program as the “one-variable-
xV0CA0 xV0PA0 at-a-time” approach. The reaction rate r was simulated for the
r) ) (gmol ⁄ (min kg catalyst)) (1)
w wRT one-variable-at-a-time design with {T, PA0, PB0, PC0}. The
where CA0 is the initial concentration of A in gram moles per resulting one-at-time data, shown in Table 1, is the initial set
cubic centimeter and R is the ideal gas law constant. The of experimental data D with 33 points, used for analysis. It is
following three Langmuir-Hinshelwood models are postulated assumed that the error in the measured values of r is constant
to describe the reaction. over the entire experimental region. This assumption is rarely
Model 1 (A, B, and C are adsorbed) valid, i.e., errors tend to be related to the magnitude of the r
values. However, we will assume constant error values in our
K1
example to keep the focus strictly on the Bayesian approach.
A + * T A*
K2
We note that since the data were generated directly from the
B + * T B* k4K1K2PAPB rate expression, changes in the rate determining step or most
r) (2) abundant surface intermediate over the data range are not
K3 (1 + K1PA + K2PB + K3PC)2
C + * T C* possible in this example.
k4 2.2. Model Building Formalism. The general problem, for
A* + B*f C* + D which the above three models are specific cases, is to postulate
Model 2 (A and C are adsorbed) and then select the best model M* from a set of P ) {M1, M2,...,
K1
MP} models from experimental data collected in a batch,
A + * T A* continuously stirred tank reactor (CSTR), or plug flow reactor.
K3 k4K1PAPB For simplicity, we will restrict ourselves to modeling data
C + * T C* r) (3)
1 + K1PA + K3PC obtained from CSTR’s or differential reactors where the rate
k4 of reaction is measured or calculated directly. In a recent paper
A* + Bf C* + D by Blau et al.,41 the more general problem of dealing with kinetic
Model 3 (B and C are adsorbed) models consisting of differential equations used to characterize
K2
concentration versus time or space time data from batch and
B + * T B* integral reactors is discussed. For the general case considered
K3 k4K2PAPB here, the models are related to N experimental data points data
C + * T C* r) (4)
1 + K2PB + K3PC by
k4
A + B*f C* + D Mk : ri ) fk(θk, ui) + εi(φ) i ) 1, ... , N, k ) 1, ... , P (6)
where S* is a species adsorbed on the surface, k4 is the kinetic where ri is the rate of the reaction for the ith experimental
constant of the rate determining step, and Ki’s are the equilibrium condition, fk(θk, ui) is the kth model with a Qk-dimensional vector
constants with proper units. The steps showing double ended of parameters, θk, and ui is an R-dimensional vector of
arrows are assumed to be in quasi-equilibrium. It is known that experimental conditions. The experimental error is described
one of the models is true because it is the one used to generate by the error model εi(φ), which will be discussed shortly. The
the simulated data. For simplicity, no additional models are data set formally represented by D ) {(ui, ri)|i ) 1, 2,..., N}.
considered. The temperature dependence of the parameters is In our example problem M ) 3, where the parameters to be
assumed to follow a normal Arrhenius relationship:27 estimated are
k4 ) k40 exp - ( ( ))
Ea 1
-
R T T0
1 Model 1 : θ1 ) (k40, E4, K10, ∆H1, K20, ∆H2, K30, ∆H3)
Model 2 : θ2 ) (k40, E4, K10, ∆H1, K30, ∆H3)
(gmol ⁄ (min kg catalyst))
( ( ))
Ki ) Ki0 exp -
∆Hi 1
R T T0
-
1 (
1 ⁄ atm) for i ) 1, 2, 3 (5)
Model 3 : θ3 ) (k40, E4, K10, ∆H1, K20, ∆H2)
Note that models 2 and 3 have two fewer parameters than model
where k40 is the rate constant at the reference temperature T0 ) 1. The R ) 4 dimensional vector of experimental conditions
650 K, which is the middle of the experimental range, and Ea for the ith measurement is ui )(Ti, PA0i, PB0i, PC0i).
is the activation energy. The Ki0 are reference equilibrium The rate of reaction ri is estimated from the conversion xi
constants at T0, and ∆Hi are the heats of adsorption for the and initial concentrations via eq 1. The conversion is obtained
corresponding equilibrium constant, Ki. The challenge is to from the measured input and output concentrations in the
determine the most suitable model and obtain high quality differential reactor; hence, the estimated rate ri has variability
parameter estimates from the simulated experimental data. or error associated with it. This error is represented by an error
Reaction rate data for the A + B f C + D example problem model εi(φ) in eq 6, which is a joint probability density function
were generated using model 2 defined by eq 3 for experimental for the errors associated with the reaction rate ri at experimental
factors T, PA0, PB0, and PC0, with K10 ) 1 1/atm, K30 ) 20 conditions ui. φ is an V-dimensional vector of statistical error
1/atm, k40 ) 3 gmol/(min kg catalyst), ∆H1/R ) -17.5 × 103 model parameters. It is common to assume that the errors are
K, ∆H3/R ) -20 × 103 K, and Ea/R ) 11 × 103 K. The values unbiased so that the mean of the error εi(φ) is zero, while the
were chosen to be representative and the heats of adsorption parameters φ characterize the variability in the measured value.
Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009 4771
Table 1. Data Set D: Experimental Design Based on One-Variable-at-a-Time Approach
expr no. T (K) PA0 (atm) PB0 (atm) PC0 (atm) PA (atm) PB (atm) PC (atm) rate (gmol/(min kg catalyst))
1 650 1 1 0.1 0.989 0.989 0.111 0.657
2 640 1 1 0.1 0.992 0.992 0.108 0.596
3 630 1 1 0.1 0.994 0.994 0.106 0.474
4 640 1 1 0.1 0.992 0.992 0.108 0.652
5 650 1 1 0.1 0.989 0.989 0.111 0.683
6 660 1 1 0.1 0.986 0.986 0.114 0.841
7 670 1 1 0.1 0.982 0.982 0.118 0.951
8 660 1 1 0.1 0.986 0.986 0.114 0.894
9 650 1 1 0.1 0.989 0.989 0.111 0.701
10 650 1.25 1 0.1 1.239 0.989 0.111 0.848
11 650 1.5 1 0.1 1.488 0.988 0.112 1.044
12 650 1.25 1 0.1 1.239 0.989 0.111 0.791
13 650 1 1 0.1 0.989 0.989 0.111 0.701
14 650 0.75 1 0.1 0.740 0.990 0.110 0.536
15 650 0.5 1 0.1 0.491 0.991 0.109 0.404
16 650 0.75 1 0.1 0.740 0.990 0.110 0.593
17 650 1 1 0.1 0.989 0.989 0.111 0.641
18 650 1 1.25 0.1 0.987 1.237 0.113 0.887
19 650 1 1.5 0.1 0.984 1.484 0.116 1.003
20 650 1 1.25 0.1 0.987 1.237 0.113 0.952
21 650 1 1 0.1 0.989 0.989 0.111 0.615
22 650 1 0.75 0.1 0.992 0.742 0.108 0.502
23 650 1 0.5 0.1 0.995 0.495 0.105 0.290
24 650 1 0.75 0.1 0.992 0.742 0.108 0.512
25 650 1 1 0.1 0.989 0.989 0.111 0.669
26 650 1 1 0.05 0.989 0.989 0.061 0.893
27 650 1 1 0 0.988 0.988 0.012 1.293
28 650 1 1 0.05 0.989 0.989 0.061 0.921
29 650 1 1 0.1 0.989 0.989 0.111 0.646
30 650 1 1 0.15 0.990 0.990 0.160 0.526
31 650 1 1 0.2 0.991 0.991 0.209 0.495
32 650 1 1 0.15 0.990 0.990 0.160 0.559
33 650 1 1 0.1 0.989 0.989 0.111 0.748
The proper identification and application of the error model is 3. Evaluation of Models by Nonlinear Regression
a step often overlooked in the modeling process. However, the Analysis
determination of the error model is every bit as important as
Nonlinear regression analysis is widely used for estimating
the determination of the kinetic model because selecting an
parameters of kinetic models. Here, different values of the
acceptable kinetic model and estimating the kinetic model parameters are selected to minimize the sum of squares of the
parameters depends critically on the quality of the experimental differences, or residuals, of one of the models defined by eq 6,
data as quantified by the error model. assuming that it is correct, using an iterative search procedure
It should be pointed out that eq 6 assumes that all kinetic (e.g., the Levenberg-Marquardt optimization method44). The
models adequately describe the data. However, all models, parameter values in the set that minimizes this sum of squares
including even the “best model” M*, are only an approxima- are called the least-squares parameter estimates. If repeat
tion of reality and are probably biased. Thus, the experimental experiments are available for each data point, then the method
error εi(φ) may be confused or “confounded” with kinetic automatically accounts for experimental error. It has been
modeling error. In what follows, each model will first be generally agreed by practitioners in this field that the only
assumed to be true, where deviations between the model assurance of achieving the “optimal” point estimate solution is
predications and the data are unbiased estimates of experi- to supply a starting guess sufficiently close to the optimal
mental error. solution.19 Even then there is the possibility of stopping short
Given a set of data, the first step in the model building of the best value or indeed finding a multiplicity of solutions
problem is to use this data to discriminate between the proposed because (1) the kinetic model may be incorrect or (2) the
necessary information to provide meaningful parameter esti-
models. The conventional way this is accomplished is to find
mates may not be available due to poorly designed experiments.
the best, in some statistical sense, point estimate of parameters
To demonstrate the real plausibility of obtaining false optima
for each model candidate and then compare the models using
with the nondesigned but conventional one-at-a-time approach
these best estimates. Generating such estimates is a challenging to experimentation, random initial estimates were supplied to
task since the kinetic model equations are nonlinear in the the nonlinear regression program. Different sets of parameter
parameters, requiring the use of nonlinear regression techniques estimates giving the same sum of squared error (SSE) are
which may lead to local or false optima which represent reported in Table 2. At first glance, these results are somewhat
incorrect parameter estimates. Even when the best or global surprising, not only in that there are multiple minima, but also
optimum is obtained, comparing models at their best single set that they have the essentially the same residual sums of squares.
of parameters can sometimes give very misleading results, This implies that the different sets of parameter values, even
playing havoc with the model building process.19 We demon- the wrong ones, would allow the models to fit the data equally
strate in section 3 below the pitfalls of using point estimates well. Regardless of the statistical criteria used, it would be
for the simple kinetic problem described in section 2.1, before impossible to distinguish or discriminate the models, yet we
developing an alternative Bayesian approach in section 4. know that the data was generated by one of the models.
4772 Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009
The solution to this conundrum is to note that only some of 4. Bayesian Approach to Parameter Estimation
the parameter values are different for the different minima. Let Bayesian methods have been suggested for building models
us look at the two models that were not used to generate the of reaction systems for over thirty years.20,45 However, they
data in Table 2, namely models 1 and 3. For model 1 ln(K30) have not been adopted by the catalyst model building community
and ln(K10) are the same for the four different local optima while because of the computational challenges associated with using
some of the other parameters change by orders of magnitude. them properly. Fortunately, the power and cost effectiveness
Similarly, ln(K30) is constant while the other parameters take of high speed computation are making it possible for the
on various values for model 3. Even for model 2, ln(K30) is researcher to exploit this important modeling tool. The Bayesian
well-defined while the other parameters change but to a much approach is fundamentally different from regression methods.
lesser degree than with models 1 and 3. This implies that the Instead of finding single optimal point estimates of the model
data give a great deal of information about the adsorption of C parameters to describe a given set of data, it uses methods that
on the surface of the catalyst but little else. This implies that identify “regions” where one believes the true values of the
changing the amount of C in the feed swamped out the effect parameters lie. Rather than review the details of the Bayesian
of changes in the other experimental conditions. In fact, the approach reported in the literature,47 it will be sufficient to define
first nine data points which account for the temperature changes the salient terms necessary to allow presentation of the results
are insufficient to provide meaningful estimates for the activation for our sample problem. The first quantity that must be defined
energy and adsorption coefficients defined by eq 6. This is the is p(εi), which is the joint probability density function for the
consequence of (1) a poorly designed experiment, i.e. the errors in the data, D, for all N experiments. This term
experimental conditions are changed individually (one-at-a-time acknowledges the existence of error in the data and accounts
approach), and (2) the use of a nonlinear regression approach for it explicitly through an error model function εi(φ) and the
where the uncertainties in the parameter estimates are not associated error model parameter vector φ. The second quantity
recognized. Some nonlinear parameter estimation programs is also a probability distribution called the likelihood function,
attempt to estimate this uncertainty, but they fail badly because L(D|(θk,φ)), which is the “likelihood” that model k with its
they do not properly account for the nonlinearities of the model kinetic parameters θk generated the data D. This function is
about the optimal point estimates. As we will show, the Bayesian derived in Appendix A for a general model building system.
approach properly accommodates the uncertainty in parameter Figure 2 gives some insight into this interpretation. It consists
estimation, but there is no solution for a poorly designed of a simple plot of a set of five (yi, xi) data points shown as
experiment. solid dots on the graph where y is the response variable and x
Before leaving this section, it is worth commenting on is the independent variable. At any value of xi, many different
experimental error. Examination of Table 1 shows that the center values of yi are experimentally plausible. The probability
point of the design is repeated five times and some of the points distribution for these points is sketched as a probability distri-
are repeated twice. These repeat points may be used to estimate bution p(yi) shown on the graph. Note that these probability
experimental error at different points in the operating region. distributions can change from point to point on the graph.
Seldom is any attempt made to model this experimental error, Several models could be used to “generate” the five data points.
and repeat points are simply used to weight the data differently. When a model is used to predict the response y for a given
We show how to do this in Appendix A. The greater the size value of x, this prediction is not a point but a probability
of the operating region, the better the ability to discriminate models distribution. Consequently, we can use this probability distribu-
and improve parameter estimates since the results are not con- tion to measure the probability that the model generated the
founded by experimental error and the different models have more measured response value yi. For example, the simplest model
operating space in which to show their divergence from one for the data of Figure 2 would be a straight line of the form y
another. Consequently, the extremes of the experimental conditions ) θ1 + θ2x. For any set of values of θ̂1and θ̂2, a straight line
are key points in any experimental program despite the challenges could be drawn and the “probability assessed” that it generated
of running the equipment under such conditions. the observed five points by comparing the observed and model
Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009 4773
predicted values assuming the model is correct and the error Table 3. Bounds of the Uniform Prior Probability Distributions of
model is operative. This probability is called the likelihood. The the Parameters in the Three Proposed Models
dotted straight line in Figure 2 is one such instance. By finding model 1 model 2 model 3
the set of parameter values θ*1 and θ*i which maximize this parameter LB UB LB UB LB UB
probability, we have point estimates called the maximum
ln(k40) -10 10 -10 10 -10 10
likelihood estimates. In this figure, we have shown the location
Ea/R (103 K) 5 40 5 40 5 40
of the simulated data points as dashed vertical lines on the ln(K10) -10 10 -10 10
probability distributions. A similar analysis can be performed ∆H1/R (103 K) -50 -10 -50 -10
for any additional model. In Figure 2, for example, the mean ln(K20) -10 10 -10 10
predicted values with maximum likelihood estimators for a ∆H2/R (103 K) -30 -10 -30 -10
ln(K30) -10 10 -10 10 -10 30
simple curvilinear model with an additional parameter is shown ∆H3/R (103 K) -30 -10 -30 -10 -30 -10
as the solid line. It is clear that the curvilinear model is more
apt to generate the five data points than the straight line since
parameters. Specifically, it quantifies the improvement in our
the fit of the data to the calculated values is better. That is,
knowledge about the parameters of the model by weighting the
the maximum likelihood value of generating the data is
prior information with the new data D generated by the
higher. The mean values predicted by the curvilinear model are
experimental program. This new knowledge about the param-
shown as vertical solid values on the probability distributions.
eters is captured by another probability distribution called the
Note that these values are closer to the mean values of the
“posterior” distribution reflecting the fact that it represents our
distributions than those for the linear model. In the general case,
belief in the parameters “after” the data have been generated.
the likelihood point estimate θ*, k which maximizes the likelihood
Formally, Bayes’ theorem states that this posterior distribution
function for known values of the error model parameters φ can
p({θk,φ}|D) is related to the product of the prior distribution
be obtained by the nonlinear search approach described in
and likelihood function by the proportionality
section 2.
There usually is some knowledge of the values of θk and φ p((θk, φ)|D) ∝ L(D|(θk, φ))p(θk, φ)
a priori (i.e., before the data is collected), e.g., the rate constants
are in a specific range, the activation energy must be greater This proportionality can be turned into an equation so that the
than a minimum value, the equilibrium constant is greater than posterior distribution can be calculated directly by normalization
one, etc. This knowledge is captured in a third probability of the probability distribution by integrating over the allowable
distribution called the prior distribution p(θk, φ). For example, parameter space to give
if there is only one rate constant k1 in the model, i.e., θ1 ){k1} L(D|(θk, φ))p(θk, φ)
and if it is known a priori that k1,min < k1 < k1,max, then p((θk, φ)|D) ) (8)
{
0 k1 < k1,min ∫ ∫ L(D|(θ , φ))p(θ , φ) dφ dθ
θk φ k k k
Figure 3. Marginal posterior probability density functions of the parameters in different models, given data set D. (dashed line) θj,max, (dash-dotted line)
MPPDE. The units of Ea/R, ∆H1/R, ∆H2/R, and ∆H3/R are 103 K.
It is quite conceivable that the three point estimates we have Here the likelihood function L(D|Mk) is interpreted as the
defined, namely, MPPDE, E(θk,j), and θk,j,max, will be significantly likelihood of the model generating the data set. Since∑k Pr(Mk|D)
different. Which one should be used and reported as “the” point ) 1, the proportionality can be turned into an equation by the
estimate? The answer is ambiguous and being argued among use of another normalization factor so that
statisticians and scientists alike. Because of this controversy, it
L(D|Mk)Pr(Mk)
would be preferable to simply report the HPD confidence limits Pr(Mk|D) ) (15)
without specifying some arbitrary point estimate, although this
would be a radical departure from current practice.
∑ L(D|M )Pr(M )
k k
k
The MPPDE and the upper and lower bounds at 95% Once the posterior probability is known, the preference for the
confidence are given in Table 4 for the three models using the different models is quantified directly. However, what is
nondesigned one-variable-at-a-time data given in Table 1. These L(D|Mk)? It is simply E(L(D|{θk,φ}), i.e. the expected value of
results are quite different from the nonlinear regression analysis the likelihood function for model k, where the dependence
shown in Table 2. The HPD confidence region is quite large upon the (θk,φ) model parameters have been integrated out so
for a number of parameters, and the θk,j,max and MPPDE of the that the relationships are valid over all parameter space.
posterior probability distribution are different. As before, the Specifically, the expected value of the likelihood function from
poorly designed data set accounts for much of the uncertainty the posterior probability distribution of the parameters is given
in the activation energies. However, we shall not agonize over by
any interpretation of the model parameters until discrimination
between the various candidate models has been completed and L(D|Mk) ) E(L(D|{θk, φ}) )
an adequate model selected.
Since the posterior distribution, which is calculated in the ∫ ∫ L(D|{θ , φ})p({θ , φ}|D) dφ dθ
θk φ k k k (16)
foregoing by Bayes’ theorem, is biased or influenced by the
prior distribution, it is natural to question the impact of the form This expected value of L(D|Mk) is computed via sampling from
and quality of this distribution. In Appendix C, we address this the posterior probability distribution for the parameters deter-
question for the simple problem by quantifying the effects of mined earlier using the MCMC process discussed in Appendix
different prior distribution on the posterior distributions for the B. Thus, only minimal additional computational effort is
same data set, D. required.
We emphasize that the Bayesian approach to model discrimi-
nation is fundamentally different from the more traditional single
5. Model Discrimination and Validation
point regression based approach. In the regression-based ap-
Now that Monte Carlo based Bayesian methods have been proach, the likelihood value for each model is determined at a
defined to generate confidence regions where we believe the single point, the maximum likelihood estimate for the model,
4776 Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009
improved models and deleting inadequate ones should example problem consider the range of operating conditions u
continue until one or more candidate models emerge that have as
a high probability of generating the data. If there are several
630 e T(K) e 670
candidate models which cannot be distinguished using the
posterior probabilities Pr(Mk) for the given data set, it is then 0.5 e PA0(atm) e 1.5
necessary to design additional experiments to discriminate 0.5 e PB0(atm) e 1.5
the rival models. It is important to point out before an 0.0 e PC0(atm) e 0.2
adequate model is found no effort should be placed on in which the upper and lower bounds correspond to the extremes
interpretation of the physical nature or magnitude of the of the one-at-a-time experiment of Table 1. A series of 34 ) 81
individual parameter estimates themselves. The parameter experiments was run over the region selecting all combinations
estimates are biased when the models are inadequate and/or of 3 values of each operating condition: the lowest point, the
the data is inadequate to discriminate between different
center point, and the highest point. Individuals familiar with
candidate models.
statistical design of experiments will recognize this is a full
factorial experiment for four factors at three levels. Predicted
6. Design of Experiments for Model Discrimination
values for these experimental runs were calculated using the 4
If two or more models emerge as candidates for describing × 4 ) 16 sets of “optimal” parameter point estimates from Table
the data in D, it is necessary to conduct additional experi- 2. The interested user can readily produce this grid of experi-
ments to discriminate between them. The basic concept of ments with an associated set of predicted values with a simple
design of experiments for model discrimination is to locate spreadsheet, and it will not be reproduced here. The results are
new experiments in a region which has not already been quite instructive for demonstrating the pitfalls both of using one-
investigated and which predict the greatest expected differ- at-a-time experimentation and using point estimates. First,
ence between the candidate models based on D. In the because of the uncertainty in the parameter estimates, the
traditional point estimate approach, the criterion that has been differences between the model predictions are small regardless
used14,18,20 is to locate experimental conditions, u*, at a point of the location of u over the above region. In fact, the maximum
that maximizes the absolute differences in the predicted absolute difference between the predicted values given by eq
values for each pair of two or more models. The traditional 18 for models 1 and 2 for all 16 combinations of local point
criterion is formally stated as: estimates is only 6%. Second, the location of the maximum
difference is highly dependent on the actual set of point
max |f̂k(θ*k , u) - f̂m(θ*m, u)| k * m for all k, m estimates used. They include the upper and lower bound on T
u*
over the region umin e u e umax (17) and PC0, the upper bound on PB0, and any value for the PA0
operating conditions. The most frequent value of u* is (T )
where f̂k(θ*,
k u) is the predicted rate of reaction r at u using 670 K, PA0 ) 1.5 atm, PB0 ) 1.5 atm, PC0 ) 0 atm), at which
model k at some point estimate of the model parameters θ*k the average difference between the models is predicted to be
such as the maximum likelihood estimators.48-51 For the about 3%.
4778 Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009
Table 5. Optimal Parameters for Model Candidates Determined by This experimental design criterion for model discrimination
Nonlinear Optimization, Including the Additional Experiment using the minimum overlap criterion as described above is
Designed by Traditional Techniques
intuitively appealing but is computationally challenging. The
parameter model 1 model 2 steps in the calculation are as follows: (1) samples of new
ln(k40) 70.27 46.61 experimental conditions ui that span the allowable experimental
Ea/R (103 K) 11.70 11.08 space are taken; (2) pk{r(ui; θk, φ)},the probability density
ln(K10) -23.76 -46.20 function for the reaction rate at ui for all of the k candidate
∆H1/R (103 K) -0.60 -0.63 models, must be computed; (3) the joint probability distribution
ln(K20) -46.14
∆H2/R (103 K) -0.79 for each pair of models must be determined; and, finally, (4)
ln(K30) 1.34 2.30 the joint probability distributions with the minimal area must
∆H3/R (103 K) 0.00 0.00 be determined to finally arrive at u*sthe
i optimal experiment
SSE 0.0849 0.0861 for model discrimination. Rather than resorting to an optimiza-
tion procedure for locating u*,i it should be possible to simply
What is the “true” value at this point? The true value from identify regions in experimental space which are the most
the simulator used to generate the data in Table 1 with model attractive. This follows in the spirit of the Bayesian approach
2, and ignoring experimental error, is 2.470 gmol/(min kg of replacing points with regions. One approach to finding these
catalyst), while the average predicted rates at the new u* by regions is to use a factorial or fractional factorial experiment
the two model candidates are 4.194 and 4.053 gmol/(min kg analogous to the approach in the point estimate case described
catalyst), for models 1 and 2, respectively. Since neither of the earlier. Such an approach is attractive when the dimension of u
models describe the rates at the new data point (2.470, T ) is small, which is typically the case. In our sample problem,
670, PA0 ) 1.5, PB0 ) 1.5, PC0 ) 0), this one data point and there are four experimental variables and, if we select three
the 33 other one-at-a-time points in Table 1 were reanalyzed values of each of these, for example,
using traditional nonlinear regression to generate new maximum
likelihood estimates for models 1 and 2 which are reported in T ) {630, 650, 670}(K)
Table 5. Note that these new maximum likelihood estimates PA0 ) {0.5, 1.0, 1.5}(atm)
are “very different” from the original ones in Table 2. What is PB0 ) {0.5, 1.0, 1.5}(atm)
even more surprising is that it is not only impossible to PC0 ) {0, 0.1, 0.2}(atm)
discriminate the rival models with the additional point, but the
fit of the model to the data is better for model 1 than for model there are 34 ) 81 possible new experiments in a full factorial,
2 which is simply wrong! Further, different optima can be which is computationally feasible. For u with larger dimensions,
obtained depending on the starting point selected from Table 2 there may be too many experimental variables to examine with
factorial methods and it may be necessary to resort to optimiza-
to start the regression algorithm, which suggests that the entire
tion procedures. Rather than searching for better regions along
approach is futile if nondesigned data and the associated point
a promising direction from a starting point, which is the
estimates for the parameters are used.
conventional approach to optimization, a simpler Monte Carlo
The Bayesian approach provides a more statistically compel- or Latin hypercube sampling procedure could be used to find
ling criterion to discriminate between any two models by attractive regions. However, unless a complete grid search is
locating the point u* in experimental space that minimizes the done over the entire operating region, the user is not guaranteed
probability that the model candidates will predict the same to find the most promising operating region. Before resorting
reaction rates. This idea is schematically illustrated in Figure to such an approach, the experimentalist/modeler should use
7. In Figure 7a, the probability density pk{r(u*1 ; θk, φ)} for his or her “prior” knowledge of how he or she would anticipate
rate predictions of model k with additional experiment u*1 the model behaving in these as yet unexplored regions and use
overlaps considerably with pm{r(u*1 ; θm, φ)} for model m. If a this information to guide the search procedure. We acknowledge,
new experiment were performed at these conditions and the however, that beyond knowledge of physical limitations on
observed rate was at A, it would indicate that model k would temperature or reactant concentration ratios or cases where
be much more likely than model m; and conversely, if the residuals point to model inadequacies, “anticipation” of model
experimental rate with error occurred at C, model m would be response is often virtually impossible if the system complexity
more likely. However, there is a considerable region in the is high. That makes efficient, guided searches of these spaces
neighborhood of B, where both models predict the same values. an interesting area for further development.
If the experimental rate occurred close to B, model discrimina- For the example problem with data given in Table 1, models
tion would not be possible. In contrast, in Figure 7b there is 1 and 2 are still both viable following both conventional and
very little region of overlap between pk{r(u*2 ; θk, φ)} and Bayesian model discrimination procedures. Using the Bayesian
pm{r(u*2 ; θm, φ)} for the candidate experimental condition u*2 , methodology outlined above, 81 candidate ui values were chosen
providing clear discrimination between models k and m. by factorial design at three levels for four different factors (T,
Moreover, if the experimentally observed rate were in the region PA0, PB0, PC0); the p{r(ui; θk, φ)} were computed for both
of B, it would indicate that neither of the models is probable models for all candidate ui values; the overlap area of p(r(ui;
and a new candidate model would need to be generated. Thus, θ1, φ)) and p(r(ui; θ2, φ)) was determined for all ui values; and
experiment u*2 is a better choice for model discrimination. The u* was determined. Table 7 lists the overlap area for each
formal statement of the design of experiment for the model candidate experiment and identifies the best data point as u* )
discrimination illustrated in Figure 7 is the following: choose (T ) 630 K, PA0 ) 1.5 atm, PB0 ) 1.5 atm, PC0 ) 0 atm).
u* to minimize the overlap of pk{r(u*; θk, φ)} and pm{r(u*; Figure 8 presents the probability distributions of the expected
θm, φ)}. The same concept can be applied for discriminating rates at u* and shows that there is still considerable overlap
more than two model candidates by calculating some weighted between the two distributions.
pairwise overlap of the probability density functions for all the Using this augmented experimental data set DAUG ) D +
adequate models.41 {u*, r*} (r* is the experimental rate under operating
Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009 4779
Table 6. Estimated Parameters and Corresponding 95% Confidence Intervals of Model Candidates (Including the Designed Experiment for
Model Discrimination)
Data Set DAUG and Uniform Prior Probability Distribution
model 1 model 2
parameter MPPDE θj,max 95% LB 95% UB MPPDE θj,max 95% LB 95% UB
ln(k40) 8.41 9.12 6.05 9.94 1.58 1.90 0.99 2.72
Ea/R (103 K) 39.7 34.9 22.6 39.8 14.6 13.1 10.2 30.0
ln(K10) -2.14 -2.04 -3.64 -1.37 -0.83 -1.17 -2.29 0.09
∆H1/R (103 K) -16.1 -13.4 -23.1 -10.2 -13.3 -20.9 -28.1 -10.0
ln(K20) -5.70 -5.35 -7.28 -2.53
∆H2/R (103 K) -23.2 -12.6 -26.6 -10.2
ln(K30) 1.40 1.38 1.18 1.58 2.64 2.50 2.20 3.02
∆H3/R (103 K) -10.1 -10.5 -16.2 -10.1 -10.2 -10.7 -19.1 -10.0
σ2 × 103 1.86 2.26 1.48 4.27 1.70 1.89 1.17 3.64
ln[E{L(Mk|D)}] 54.03 58.10
Pr(Mk|D) 0.0027 0.9973
condition u*), a new posterior probability distribution p((θk, model discrimination will be discussed before moving on to
φ)|DAUG) was computed using p((θk, φ)|D) as the prior the important topic of improving the parameter estimates after
distribution. The whole procedure of Bayesian parameter an adequate model has been selected.
estimation described earlier was repeated, although an 6.1. Impact of Design of Initial Data set D. The one-at-a-
efficient algorithm has been developed that makes use of the time approach to experimentation is practiced widely. It has
previous MCMC calculations of p((θk, φ)|D) for the calcula- the advantage of allowing the researchers to plot the results of
tion of the new posterior distribution p((θk, φ)|DAUG) (see changing one operating condition without confusion by the other
Appendix B). The results of the parameter estimates as well operating conditions. This is precisely its limitation because it
as the confidence regions determined from the marginal fails to accommodate interactions among the operating condi-
probability distribution p(θj) or p(φj)are shown in Table 6. tions and generates large unexplored regions of the operating
Finally, the Pr(Mk|DAUG) was determined using the procedures space. It is interesting to compare the single point regression
described in section 5, where again the prior of the two methods with the Bayesian approach to decide if the latter
models was assumed to be Pr(M1) ) 0.137 and Pr(M2) ) approach is still superior to the former despite the use of a well
0.863, which were obtained from the analysis in section 5. designed initial data set.
Note that we normalized the two probabilities to keep Pr(M1)
A new set of designed experimental data, Dc, is presented in
+ Pr(M2) ) 1 since model 3 is no longer considered. The
relative probabilities Pr(Mk|DAUG) for the two models are Table 8. It was generated over the same operating region as
reported in Table 6 and show clearly that model 1 can Table 1 using a slightly modified central composite design,
be eliminated. This is a remarkable result. With only one which includes both the 2-level full factorial design (24 ) 16
additional point and using the posterior probabilities from experiments) and 2-level one-variable-at-a-time design (2 × 4
the data in Table 1, model 2 is identified as the preferred ) 8 experiments). The one-at a-time points are taken at the
model even though the difference between the marginal extremes of the data in Table 1. Nine replicates of the center
probability distributions shown in Figure 8 is relatively small. point are included not only to test reproducibility but also to
Just to be sure, the probability of the various experimental keep the number of data points the same at N ) 33. Basically,
data points as shown in Figure 4 for D was recomputed for the only difference between the two sets is the location of the
DAUG, and it was found that all experimental points were experiments. Nonlinear parameter estimation starting with the
within the 95% confidence region. same randomly generated starting point as used for the nonde-
In summary, the addition of just one well-designed experiment signed data set was used to fit the data of Table 8 and the results
was able to unambiguously discriminate the models using are shown in Table 9 for the three model candidates. The first
Bayesian methods, while conventional regression methods failed observation is that once again models 1 and 3 have multiple
to do so. Of course, this investigation has been illustrated using optimal parameter sets, but model 2 has only one. Some of the
the one-variable-at-a-time data set, which is not properly parameters in models 1 and 3 are at their upper bounds. Once
designed. The impact of a properly designed initial data set on again by simply examining the sum of squared error, it is not
4780 Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009
Table 7. Design of Experiments for Model Discrimination: Overlapped Area for Each Candidate Experimenta
expr no. T (K) PA0 (atm) PB0 (atm) PC0 (atm) area expr no. T (K) PA0 (atm) PB0 (atm) PC0 (atm) area
1 630 1.5 1.5 0 0.602 42 670 1 1.5 0.1 0.813
2 670 1 1.5 0 0.603 43 630 0.5 1 0.2 0.814
3 650 1.5 1.5 0.2 0.610 44 670 1 1 0.2 0.818
4 670 0.5 1.5 0 0.617 45 670 0.5 1.5 0.1 0.832
5 650 1.5 1.5 0 0.618 46 650 1.5 0.5 0.2 0.846
6 630 1 1 0.2 0.628 47 670 1.5 1.5 0.1 0.852
7 650 1 1.5 0.2 0.631 48 630 1 0.5 0 0.852
8 650 1 1.5 0 0.651 49 650 1.5 0.5 0 0.853
9 630 1 1.5 0 0.657 50 630 1 1 0.1 0.855
10 630 1.5 1.5 0.1 0.657 51 650 1 0.5 0 0.857
11 650 0.5 1.5 0 0.664 52 630 1.5 0.5 0.1 0.861
12 670 1 1 0 0.664 53 650 0.5 0.5 0 0.865
13 670 1.5 1.5 0 0.670 54 670 0.5 1 0.2 0.877
14 630 1.5 1 0.2 0.671 55 650 0.5 1 0.2 0.882
15 650 1.5 1 0.2 0.672 56 630 0.5 0.5 0 0.893
16 630 0.5 1.5 0 0.678 57 650 1 0.5 0.2 0.897
17 630 1 1.5 0.2 0.680 58 670 1 1 0.1 0.897
18 650 1 1 0 0.682 59 650 1.5 1.5 0.1 0.899
19 670 0.5 1 0 0.690 60 670 0.5 1 0.1 0.900
20 630 1.5 0.5 0.2 0.698 61 670 1.5 0.5 0.2 0.902
21 630 1.5 1 0 0.701 62 650 1.5 0.5 0.1 0.906
22 650 1.5 1 0 0.703 63 630 0.5 0.5 0.2 0.908
23 670 1.5 1.5 0.2 0.716 64 670 1 0.5 0.2 0.912
24 630 0.5 1.5 0.2 0.722 65 630 0.5 1.5 0.1 0.925
25 670 1 1.5 0.2 0.729 66 650 1 0.5 0.1 0.925
26 670 1.5 1 0 0.730 67 670 1.5 1 0.1 0.926
27 630 1.5 1.5 0.2 0.732 68 650 1 1 0.1 0.927
28 630 1 1 0 0.734 69 650 1.5 1 0.1 0.927
29 630 1 1.5 0.1 0.746 70 630 1 0.5 0.1 0.941
30 650 0.5 1 0 0.747 71 670 0.5 0.5 0.2 0.942
31 630 1.5 1 0.1 0.752 72 650 0.5 0.5 0.2 0.946
32 630 0.5 1 0 0.763 73 650 1 1.5 0.1 0.946
33 670 1 0.5 0 0.763 74 670 0.5 0.5 0.1 0.958
34 650 1 1 0.2 0.770 75 670 1 0.5 0.1 0.960
35 630 1.5 0.5 0 0.790 76 650 0.5 0.5 0.1 0.962
36 670 1.5 1 0.2 0.799 77 650 0.5 1 0.1 0.968
37 670 0.5 0.5 0 0.800 78 630 0.5 1 0.1 0.971
38 670 0.5 1.5 0.2 0.801 79 630 0.5 0.5 0.1 0.976
39 630 1 0.5 0.2 0.804 80 650 0.5 1.5 0.1 0.979
40 650 0.5 1.5 0.2 0.807 81 670 1.5 0.5 0.1 0.989
41 670 1.5 0.5 0 0.809
a
Experiment no. 1 is the suggested experiment for model discrimination.
p(θj) and p(φj) defined in eqs 9 and 10 is a good way to visualize Assuming that the p(θj) and p(φj) distributions are normal,
the probability distribution for the individual kinetic and error model Var(θi) or Var(φi) may be used to produce another set of
parameters. The (θk, φ) parameter estimates are shown in Table 6 confidence limits. However, the normality assumption is
for the Bayesian analysis of model 2 with the DAUG data set. The highly suspect for nonlinear models that usually result in
associated probability distributions with DAUG are slightly narrower nonsymmetric marginal probability distributions like those
than the distributions with D reported in Table 4. shown in Figure 3.
Table 8. Designed Data Set Dc Generated by a Modified Central Composite Design
T (K) PA0 (atm) PB0 (atm) PC0 (atm) PA (atm) PB (atm) PC (atm) rate (gmol/(min kg catalyst))
650 1 1 0.1 0.989 0.989 0.111 0.692
670 1.5 1.5 0.2 1.472 1.472 0.228 1.321
670 1.5 1.5 0 1.467 1.467 0.033 2.515
670 1.5 0.5 0.2 1.491 0.491 0.209 0.492
670 1.5 0.5 0 1.489 0.489 0.011 0.869
650 1 1 0.1 0.989 0.989 0.111 0.711
670 0.5 1.5 0.2 0.482 1.482 0.218 0.556
670 0.5 1.5 0 0.474 1.474 0.026 1.218
670 0.5 0.5 0.2 0.494 0.494 0.206 0.172
670 0.5 0.5 0 0.491 0.491 0.009 0.373
650 1 1 0.1 0.989 0.989 0.111 0.687
630 1.5 1.5 0.2 1.492 1.492 0.208 0.568
630 1.5 1.5 0 1.490 1.490 0.010 1.853
630 1.5 0.5 0.2 1.497 0.497 0.203 0.213
630 1.5 0.5 0 1.497 0.497 0.003 0.588
650 1 1 0.1 0.989 0.989 0.111 0.747
630 0.5 1.5 0.2 0.494 1.494 0.206 0.224
630 0.5 1.5 0 0.491 1.491 0.009 1.123
630 0.5 0.5 0.2 0.498 0.498 0.202 0.066
630 0.5 0.5 0 0.497 0.497 0.003 0.426
650 1 1 0.1 0.989 0.989 0.111 0.689
670 1 1 0.1 0.982 0.982 0.118 0.864
630 1 1 0.1 0.994 0.994 0.106 0.396
650 1 1 0.1 0.989 0.989 0.111 0.733
650 1.5 1 0.1 1.488 0.988 0.112 0.980
650 0.5 1 0.1 0.491 0.991 0.109 0.357
650 1 1 0.1 0.989 0.989 0.111 0.706
650 1 1.5 0.1 0.984 1.484 0.116 1.036
650 1 0.5 0.1 0.995 0.495 0.105 0.347
650 1 1 0.1 0.989 0.989 0.111 0.727
650 1 1 0.2 0.991 0.991 0.209 0.511
650 1 1 0 0.988 0.988 0.012 1.334
650 1 1 0.1 0.989 0.989 0.111 0.661
Table 9. Local Optimal Parameter Estimates for Model Candidates Using Nonlinear Optimization, Fitted against Data Set Dc
Local Optimal Parameter Sets for Model 1
ln(k40) E4/R (103 K) ln(K10) ∆H1/R (103 K) ln(K20) ∆H2/R (103 K) ln(K30) ∆H3/R (103 K) SSE
15.7 7.16 -1.19 -8.57 -13.7 0.00 1.54 -11.0 0.0556
16.1 15.3 -1.19 -8.57 -14.1 -8.11 1.54 -11.0 0.0556
17.2 7.16 -1.19 -8.57 -15.2 0.00 1.54 -11.0 0.0556
18.5 7.16 -1.19 -8.57 -16.4 0.00 1.54 -11.0 0.0556
19.0 14.4 -1.19 -8.57 -16.9 -7.21 1.54 -11.0 0.0556
19.1 7.16 -1.19 -8.57 -17.0 0.00 1.54 -11.0 0.0556
19.6 17.9 -1.19 -8.57 -17.5 -10.8 1.54 -11.0 0.0556
20.0 37.1 -1.19 -8.57 -17.9 -30.0 1.54 -11.0 0.0556
24.1 7.2 -1.19 -8.57 -22.1 0.00 1.54 -11.0 0.0556
28.4 26.1 -1.19 -8.57 -26.4 -18.9 1.54 -11.0 0.0556
29.0 34.9 -1.19 -8.57 -26.9 -27.7 1.54 -11.0 0.0556
39.6 37.0 -1.19 -8.57 -37.6 -29.9 1.54 -11.0 0.0556
47.3 11.4 -1.19 -8.57 -45.3 -4.27 1.54 -11.0 0.0556
92.7 16.3 -1.19 -8.57 -90.6 -9.09 1.54 -11.0 0.0556
The marginal probability density integrates out any cor- The correlation coefficient has the conventional interpreta-
relation between the various model parameters; however, the tionsvalues close to +1 or -1 imply that the parameters are
model parameter estimates are rarely independent of one highly positively or negatively correlated, respectively. However,
another. For any pair of parameters, θi and θj, the covariance Fij is a linearized point estimate and may indicate spurious
among these parameters is given by results.52
A more useful way to evaluate these pairwise relationships
Cov(θi, θj) ) ∫ ∫ (θ - E(θ ))(θ - E(θ ))p(θ , θ ) dθ dθ
θi θj i i j j i j j i
among the parameters is to plot confidence region contours for
the marginal joint probability density function p(θi, θj). As was
(18) the case for a single parameter, there is not a unique way to
where the joint probability density function between a pair of specify a 100(1 - R)% confidence region for a parameter pair;
however, the HPD confidence region defined in section 4 can
parameters θi and θj is obtained by integrating out all the other again be used since it yields the smallest area in hyperspace.
parameters in the joint posterior probability distribution
Let Ω1-R ) {θi, θj} be the points in the HPD region and ΩR be
p(θi,θj):
the complement or points not in the HPD region. Then in an
analogous fashion to the approach described for determining
p(θi, θj) ) ∫ ′ ∫ p({θ, φ}|D
θi,j φ AUG)dφ ′,
dθi,j ′ )
θi,j the confidence limits of the one-dimensional marginal prob-
ability distribution, the two-dimensional 100(1 - R)% confi-
{θl|l * i, l * j} (19)
dence contours can be generated. For a linear model, the
p(θi, θj) is the two-dimensional analog to a marginal distribution. confidence regions will be ellipses, and if there is no correlation
The correlation coefficient Fij for these two parameters is simply between a particular pair of parameters, the major/minor axes
Cov(θi, θj) of the ellipse will be parallel to the x- and y-axes of the contour
Fij ) (20) plot. The confidence region contours are shown in Figure 10
√Var(θi)√Var(θj) for all pairs of the parameters in model 2 as determined from
Figure 9. Marginal probability density functions of the parameters in different model candidates, fitted against data set Dc.
Table 10. 95% Confidence Intervals of the Parameters in Three Model Candidates
Dc and Uniform Prior Probability Distribution
model 1 model 2 model 3
parameter MPPDE θj,max 95% LB 95% UB MPPDE θj,max 95% LB 95% UB MPPDE θj,max 95% LB 95% UB
ln(k40) 9.83 7.31 6.36 9.95 1.03 1.05 0.92 1.17 3.32 3.07 1.98 9.91
E4/R (103 K) 29.8 28.5 19.8 36.3 9.54 10.1 8.24 12.9 29.7 16.0 12.1 31.2
ln(K10) -1.20 -1.26 -1.46 -1.06 0.13 0.13 -0.11 0.41
∆H1/R (103 K) -10.1 -10.1 -14.0 -10.1 -14.7 -16.8 -19.9 -14.2
ln(K20) -7.74 -5.15 -7.77 -4.18 -3.00 -2.61 -9.63 -1.64
∆H2/R (103 K) -21.9 -19.8 -29.8 -11.4 -28.6 -11.0 -29.8 -10.2
ln(K30) 1.54 1.55 1.45 1.64 3.06 3.08 2.93 3.26 2.13 2.08 1.70 2.42
∆H3/R (103 K) -11.5 -12.0 -14.2 -10.1 -18.1 -19.7 -22.1 -18.0 -11.0 -10.2 -19.8 -10.2
σ2 × 103 1.80 2.14 1.29 4.01 1.38 1.66 1.07 3.08 16.3 18.8 13.3 33.9
ln[E{L(Mk|D)}] 54.7419 59.4249 18.0566
Pr(Mk|D) 0.009 0.991 0.000
Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009 4783
Figure 10. Pairwise of the marginal confidence regions of the parameters in model 2 with data set DAUG. The contours are 50%, 70%, 90%, and 95%
confidence regions from inside to outside, respectively. The units of Ea/R, ∆H1/R, and ∆H3/R are 103 K.
Figure 12. Marginal posterior probability distribution of model 2 for different data sets: (bold solid line) DAUG+6; (dotted line) DAUG; (thin solid line) D. The
units of Ea/R, ∆H1/R, and ∆H3/R are 103 K.
for the six new experiments are also given in the same table. the model may need to be improved, which is the next step
The new data set DAUG+6 includes the original data set D and of the model building process.
the experiment for model discrimination (one experiment) as
well as those for parameter estimation (six experiments), and 9. Discussion
thus DAUG+6 has 40 points. The complete Bayesian analysis of
this augmented data set was performed for model 2. A Bayesian framework for building kinetic models by means
The probability distributions for the model parameters of a sequential experimentation-analysis program has been
using the augmented data set DAUG+6 are shown in Figure 12, described. It allows the knowledge of the catalyst researcher to
where the distributions are significantly tighter as compared to drive the model building both by postulating viable models and
the DAUG data set employed for model discrimination. The assessing the quality of model parameter estimates. Procedures
pairwise marginal confidence regions are shown in Figure 13, are available for designing experiments to discriminate model
where the confidence regions are considerably smaller than the candidates, assesses their suitability against experimental error,
ones in Figure 10 determined by the data set DAUG. The and design experiments for the best model selected. The key
correlations were not completely removed by the additional features of this modeling approach are (i) the use of distributions
experiments, but they are less important. The confidence limits for the parameters rather than point estimates so that regression
for the parameters using DAUG+6 are given in Table 12 where procedures can be avoided, (ii) the ability to incorporate the
the range has decreased significantly for a number of the knowledge of the researcher, and (iii) the ability to more
parameters, although ln(K30) and ∆H3/R ranges have not changed accurately predict the behavior of a validated model under
much with the addition of the new experiments. The confidence different operating conditions.
interval of ln(K30) was relatively tight before adding the selected Probability densities are the most appropriate way to describe
six experiments to the data set. ∆H3/R is still not well estimated a model’s predictive capabilities since they fully acknowledge
even after the new experiments were added. Further designing the consequences of unavoidable experimental error in real
new experiments would help to improve the confidence limits catalytic systems. Specifically, for a given model and a given
of this parameter. experimental data set that includes error, there will be a
The criterion for stopping this parameter estimation distribution of predicted outcomes from that model. Thus, both
improvement process is subjective. A typical criterion might the analysis of a specific model and the comparison of different
be to ensure that the univariate 95% confidence interval is models need to explicitly acknowledge this distribution of
less than 10% of a selected point estimate, such as the predictions from the models rather than just analyzing/comparing
MMPDE or the mean value. When all the parameters satisfy unique predictions coming from single point estimates. In
the specification, the lack of fit test should be performed again contrast, traditional linear analysis and nonlinear optimization
to ensure the model adequacy if replicated data points are only provide point estimates of model parameters, often luring
available in the final data set. Also, if the variance parameter the researcher into the erroneous impression that there is a
φ is significantly larger than the measurement error, then unique prediction for a given model.
Table 11. Six Experiments Added to the Data Set DAUG for Improving Parameter Estimates
expr no. T (K) PA0 (atm) PB0 (atm) PC0 (atm) ln(det(Ψ)) PA (atm) PB (atm) PC (atm) rate (gmol/(min kg catalyst))
1 670 1.5 1.5 0.1 -24.9 1.470 1.470 0.130 1.727
2 630 1.5 1 0 -24.3 1.493 0.993 0.007 1.284
3 650 1.5 1.5 0 -23.4 1.481 1.481 0.019 2.324
4 650 0.5 1.5 0 -23.3 0.483 1.483 0.017 1.150
5 670 0.5 1.5 0 -23.2 0.474 1.474 0.026 1.125
6 670 1 1 0 -23.2 0.979 0.979 0.021 1.365
Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009 4785
Figure 13. Pairwise of the marginal confidence regions of the parameters in model 2 with data set DAUG+6. The contours are 50%, 70%, 90%, and 95%
confidence regions from inside to outside, respectively. The units of Ea/R, ∆H1/R, and ∆H3/R are 103 K.
Table 12. Estimated Parameters and Corresponding 95% 1 and 3. When an augmented data set with seven additional
Confidence Intervals for Model 2 experiments was employed, the parameter distributions, as
Data Set DAUG+6 and Uniform Prior shown in Figure 12, became much smoother and nearly
parameter MPPDE θj,max 95% LB 95% UB Gaussian. However, the additional data set was designed to
provide these highly improved (and now nearly Gaussian)
ln(k40) 1.17 1.18 0.97 1.35
parameter estimate distributions using the Bayesian methods.
Ea/R 10.6 10.2 7.41 13.6
ln(K10) -0.156 -0.139 -0.512 0.234 The need for incorporating the nonlinear error structure of the
∆H1/R -14.2 -12.2 -19.1 -10.2 models was demonstrated for a relatively simple kinetic model
ln(K30) 2.91 2.93 2.73 3.16 with data that was of reasonable quality. We expect the need
∆H3/R -14.7 -15.0 -22.2 -11.0 for directly including the nonlinearity will become even more
σ2 × 103 1.91 2.24 1.43 3.85
important as the complexity of the models increases or the data
Another key feature of the methods discussed here is the becomes noisier.54 It is possible that, with an overwhelming
importance of modeling the error as well as modeling the kinetic amount of data, nonlinear optimization with linear error analysis
relationships. The error associated with any specific experiment may also be able to determine reasonable parameter estimates
is assumed to be normally distributed around the average value and improve model discrimination, but even in an era of high
for that experiment. Typically, the error is assumed to be throughput experimentation, maximizing the impact of each
constant or proportional to the magnitude of the response (i.e., experiment still has value.
the latter is the case when the logarithm of data is fit). However, A distinguishing feature of the Bayesian method is that it
much more complex error structures are more typical, e.g. the takes full advantage of the catalyst researcher’s expertise prior
error is proportional to the magnitude of the experimentally to the determination of the parameters. This is in contrast to
measured response except for very small responses, lower than regression methods, which only deal with the data and the model
the detector sensitivity. A simple three-parameter model for without any subjective input from the expert other than
capturing this “heteroscedasity” in the experimental error has specification of candidate models and supply of initial
been described in Appendix A. The key issue is that although guesses.19,43,55 Although both model building approaches can
the experimental error is normally distributed, when that error be used, we believe that the Bayesian approach is of particular
is propagated through the nonlinear models that are used in value to kinetic problems, where the potential rate and equi-
kinetic analysis, the resulting errors in the parameter estimates librium constants can vary by orders of magnitude. Any
are often anything but Gaussian. The parameter distribution information that the expert can provide can significantly reduce
estimates given in Figure 3 for the three candidate models for the amount of experimental data needed for model discrimina-
the test data set given in Table 1 are a good example. tion and parameter estimation. Moreover, in studies of a series
Considering all the parameters for the three models, the only of catalytic structures (e.g., the systematic change of ligand
distribution that is even remotely Gaussian is ln(K30) for models molecular structure, the stoichiometry of a mixed metal surface,
4786 Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009
etc.) and/or a series of different reactants, one might anticipate out the computational challenges imposed by its proper imple-
that the rate/equilibrium constants and activation energies for mentation. The MCMC sampling methodology outlined in
one member of a catalyst family would provide a good point Appendix B was implemented in two different programs. A PC-
of departure for analysis of other members of the family. based program called MODQUEST was written in MATLAB
Bayesian methods can take full advantage of such expert prior and used to solve the example problem described in this paper.
knowledge of related kinetic systems. The expert knowledge is About 10 min were required on a Dell precision 6300 Intel Core
captured via the prior probability distribution p(θk, φ). In the 2 Extreme CPU X7900 2.8 GHZ with 4 GB RAM running 32-
simple example presented here, the prior knowledge was used bit Microsoft Windows Vista to obtain convergence for an eight
only to place limits on various parameters (see Table 7). The parameter model such as model 1 on the original data set D.
assumption of a uniform distribution is called an uninformative Another software program written as a combination of C++
prior, although this is a misnomer: there is often considerable and Fortran was used to handle larger real world microkinetic
expert knowledge in specifying the parameter limits. Sometimes problems consisting of systems of differential algebraic equa-
additional information is available from screening studies, a tions. In this case, convergence for a 25 parameter microkinetic
linear analysis, etc. If an initial point estimate is available, an problem was achieved in less than 48 h on a large data set using
alternative is a triangular prior distribution that has a maximum an Intel Xeon 3.2 GHz CPU with 4GB RAM running 64-bit
at the initial point estimate and goes to zero at the expert defined Redhat Enterprise Linux 4.0. Experience has shown that the
limits on the parameters. The triangular prior distribution computational time is dramatically reduced with the quality of
incorporates more knowledge in the prior than the uninformative the proposal distribution for the posterior and the starting guess
prior. The relationship between the information content of the as well as the suitability of the model. Conversely, computational
prior probability distribution and the final parameter confidence times increase with the size of the systems of differential
region will be an interesting one for further study. algebraic equations to be solved and the stiffness of the system.
New criteria for model discrimination and for parameter It is also evident that the computational times could be
estimation have been proposed in this work. First, additional dramatically reduced by using parallel processors or computing
experiments for model discrimination are designed in the region clusters, which are ideal for Monte Carlo like formulations. The
where the probability distributions of the model predictions are latter approach is being vigorously pursued by the authors.
the most different. This is a general criterion without any In summary, the power of Bayesian methods for model
assumptions. The conventional approach of experimental design discrimination and parameter estimation, including design of
for model discrimination compares the model prediction cal- experiments, has been shown for a simple, model kinetics
culated at the maximum likelihood estimates only. By using problem. The traditional tool for analyzing kinetics is nonlinear
probability distributions, our proposed criterion for model optimization with a linear statistical analysis around the optimal
discrimination incorporates the uncertainty of the model predic- solution, which can provide good results if there are sufficient
tion as well, and this appears to be its source of improved ability data that have been well designed and the potential models are
to discriminate models. Second, to improve the parameter quite different. The Bayesian approach outperformed these
estimates, the additional experiments are designed to minimize methods in this comparison, however. The ability of this
the volume of the confidence region, which is approximated approach to treat nonlinearity without approximation gives it
by the determinant of the covariance matrix. We can calculate high potential for a wide variety of problems in catalytic kinetics
the covariance matrix from the samples of the Markov Chain and thus provides a new set of tools to be added to the arsenal
Monte Carlo process and then use it to search for new of any researcher who is developing models of catalytic systems.
experiments that narrow the parameter probability distribution.
These two criteria are intuitive but have not been discussed in Acknowledgment
available literature, perhaps due to the computational complex- The authors would like to acknowledge the financial support
ity. Since the era of high-speed computation has arrived, they of the Department of Energy of the United States (DOE-BES
should now be exploited because they are both powerful and Catalysis Science Grant DE-FG02-03ER15466) and ExxonMo-
general. We have not compared these approaches directly to bil through collaboration on the design of hydrotreating catalysts.
the conventional D-optimal design for parameter estimation, but
note that D-optimal design uses the linearized model around Appendix A: Likelihood Function and Error Model
the maximum likelihood estimate, a step avoided by the
Bayesian approach. It is necessary to specify a form of the probability distributions
for the likelihood function L(D|{θk, φ}) and the prior joint
The entire Bayesian framework was demonstrated on a very
probability distributions p(θk, φ) for the model parameters θk
simple catalyst kinetics test problem, where data were generated
and the error model parameters φ before eq 8 can be solved.
from one of the models, including reasonable amounts of
Selecting the joint prior probability distributions for the
experimental error. The results of this exercise show that linear
parameters will be discussed in Appendix C. In this section,
methods and nonlinear optimization were unable to identify the
we will develop the form of the likelihood function. It is
correct model from the three candidate models, even with an
reasonable to assume that the experimental errors εi(φ) for each
additional experiment. In contrast, Bayesian methods were able
of the N data points are independent and normally distributed
to robustly eliminate one of the models and suggest a single
random variables with mean zero and variance σi2. On the basis
additional experiment that was able to discriminate between the
of this assumption, the joint probability density function for the
remaining models. The importance of appropriate experimental
N data points in the data set D is
design is also indicated, in that the one-variable-at-a-time
approach did not provide a robust parameter estimates. However,
Bayesian-based design of experiments suggested the optimal
set of new experiments needed to improve parameter estimates.
p(ε) )
N
∏ p(ε ) ) ∏
i)1
i
N
i)1 { 1
(2π)1⁄2σi ( )}εi2
exp - 2
2σi
(A.1)
Although the Bayesian approach is much preferred over the For any set of values of the model parameters θk, the difference
conventional approach to model building, the paper has pointed between the measured rates ri and the values predicted by model
Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009 4787
k, fk(θk, ui) are the residuals expected value of the likelihood function E(L(D|θk, φ)). This
value can be approximated by the relationship
eik ) ri - fk(θk, ui) (A.2)
Assuming a high probability that the kth model generated D E{L(D|θk, φ)} ) ∫ ∫ L(D|(θ , φ))p(θ , φ) dφ dθ ≈
θk φ k k k
with the θk parameters, i.e., the model is valid, the residuals T
are estimates of experimental error and may be substituted for
the errors in A.1 to give the joint probability distribution function
1
∑
T j)1
L(D|{θk,j, φj}) (B.1)
{ ( )}
N (ri - fk(θk, ui)) 2 where{θk,j, φj} for j ) 1,..., T is the discrete set of values of
∏
1
L(D|θk, φ) ) 1⁄2
exp - model and statistical parameters selected from the prior distribu-
i)1 (2π) σi 2σi2 tion p(θk, φ) using a Monte Carlo sampling process.56 The size
(A.3) of T required to give a good approximation to the integral
depends on the nonlinearity of the model, the dimension and
which is called the likelihood function for model k. When the size of parameter space Qk + V, and the accuracy required. If
model is incorrect, the residuals are biased so they are not evaluating fk(θk,j, ui) is fast and the dimensionality of the
estimates of experimental error, but the substitution is made parameter space is small, such a standardized Monte Carlo
anyway so that a likelihood function is defined for all k models. sampling procedure is an effective way to evaluate the integral.
If replicates are available at each set of experimental A more efficient sampling approach called the Markov Chain
conditions ui it is possible to estimate the variance of the normal Monte Carlo (MCMC) method has been developed by Me-
distribution σi2. In this case, it is not necessary to define a tropolis et al.46 and later modified by Hastings.57 In this
statistical error model or define the parameters φ. However, procedure, it is not necessary to evaluate the integral directly.
when replicates are not available, it is convenient to model the Rather, the MCMC process converges to a sampling procedure
error. Since we have assumed that the errors are normally which randomly generates samples from the posterior probability
distributed with mean zero, the error model represents the distribution. By collecting a sufficient number of samples from
statistical parameters that characterize uncertainties in the this converged process, it is possible to calculate moments of
experimental setup and the response variable measurements. In the distribution (i.e., means, variances), confidence regions, and
statistical terms, we are going to use a model to characterize predictions made with the model itself. The interested reader is
the heteroscedasticity in the data. The following three-parameter referred to the literature for the mechanics of this process.47
model has been found to be very useful for representing the Basically, the form of the posterior distribution is proposed, a
variability in reaction systems20 series of samples is selected from the proposal distribution, and
a decision rule is defined involving the prior distribution and
σi2 ) ω2r̂γi + η2 i ) 1, .... , n (A.4) the previous point in the series. By repeated application of this
rule, the proposal distribution is gradually modified until samples
where ω, γ, and η are independent statistical model parameters,
from this modified distribution evolve into a sampling scheme
i.e. φ ) {ω, γ, η} and r̂i is the predicted reaction rate at the for the desired posterior distribution.
current value of ui and θk. This model has physically meaningful The efficiency of the MCMC method compared to the MC
boundary conditions. If the measurement errors are constant over scheme will now be shown. Consider the situation when the
the entire experimental region, then γ ) 0 and the variance is prior probability distributions of the parameters in the three
constant. If the measurement errors are directly proportional so models from the sample problem are all uniform between
that there is a constant percent error in the data, then γ ) 2. the expert chosen bounds given in Table 3. Also assume that
Finally, if the error in the measurements goes from being the error is normally distributed with a constant but unknown
proportional to the measurements until the limit of detection is variance φ ) σ2 (This is the situation where γ ) 0, η2 ) 0,
reached, all three terms are needed in the model, with the limit and ω2 ) σ2). Using the data set D, the expected value of the
of detection being η2. In the analysis of the simple model likelihood function for model 2 can be calculated from eq B.1
presented in this paper, the mathematical model parameters θk using a simple Monte Carlo approach. Because of the size of
and statistical model parameters φ were estimated simulta- the numbers, the log of the likelihood function is calculated
neously. This is particularly challenging in the point estimate and shown in Figure B.1 as the number of samples T increases
approach but very natural in the Bayesian approach affording from 103 to 107. We have also reported the maximum value of
insights into the quality and interaction between the mathemati- the log of the likelihood function achieved by this sampling
cal and statistical model parameters. scheme since it is the point which defines the maximum
likelihood point estimators. For comparison purposes, the
MCMC method is used to generate samples from the posterior
Appendix B. Posterior Probability Distribution probability distribution and the samples are used to calculate
Evaluation Methods the logarithm of the posterior probability distribution using eq
In order to determine the joint posterior probability distribu- 8. The results, plotted in Figure B.1, are quite dramatic. Note
tion of the parameters p({θk, φ}|D) after the data have been that significantly more trials are needed using Monte Carlo than
collected, it is necessary to integrate over the entire parameter with MCMC to obtain the same degree of accuracy. The MCMC
space to determine the normalization factor for the denomi- converges after approximately 5 × 104 simulations, while more
nator of eq 8. Once this factor has been determined, the than 106 trials are need for the simple Monte Carlo methodsa
posterior probability distribution is obtained by multiplying 20-fold increase in efficiency. This is an isolated example but
the likelihood function with samples taken from the prior comparable results can be seen for the other models.
distribution function for the parameters. This is computa- When the experimental data set used in the MCMC method
tionally feasible when the space of the parameters is small is expanded for model discrimination D (see section 6) or for
i.e. Qk + V e 10. One approach to obtaining this integral is improving the quality of the parameter estimates (see section
to recognize that the integral to be evaluated is simply the 8), it is desirable to take advantage of MCMC calculations that
4788 Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009
Ψ) [ Σθ Σθ,φ
Σφ,θ Σφ ] (D.1)
practice, however, it is more useful to find regions in the space
of experimental conditions that show lower values for det(Ψ)
rather than focusing on finding a single new experimental
[ ]
where condition that minimizes det(Ψ). Similar to the approach for
Var(θ1) Cov(θ1, θ2) · · · Cov(θ1, θp) the experimental design for model discrimination, described in
Cov(θ2, θ1) Var(θ2) ··· Cov(θ2, θp) section 6, a factorial or fractional factorial design can be applied
Σθ ) · to determine the best experiments which accomplish this goal.
l l ·· l
Each additional experiment can provide insight into improve-
Cov(θp, θ1) Cov(θp, θ2) · · ·
[ ]
Var(θp) ment of the parameter estimates. Consequently, once an
Var(φ1) Cov(φ1, ϑ2) · · · Cov(φ1, φn) additional designed experiment is available, a new posterior
Cov(φ2, φ1) Var(φ2) ··· Cov(φ2, φn) probability distribution should be calculated for DAUG+1. Then,
Σφ ) · the confidence limits and contour regions can be calculated and
l l ·· l
a new experiment is determined using ΨAUG+1. This iterative
Cov(φn, φ1) Cov(φn, φ2) · · ·
[ ]
Var(φn) process (experimentation and analysis) is continued until suitably
Cov(φ1, θ1) Cov(φ1, θ2) · · · Cov(φ1, θp) refined parameter estimates are obtained.
Cov(φ2, θ1) Cov(φ2, θ2) · · · Cov(φ2, θp) This is a viable approach if the experiments are expensive
Σφ,θ ) · and time-consuming to generate. If high throughput experimen-
l l ·· l
tation, where data can be generated rapidly, is available, then
Cov(φn, θ1) Cov(φn, θ2) · · · Cov(φn, θp)
[ ]
the computational challenges associated with implementing a
Cov(θ1, φ1) Cov(θ1, φ2) · · · Cov(θ1, φn) one-experiment-at-a-time sequential approach dominate. In this
Cov(θ2, φ1) Cov(θ2, φ2) · · · Cov(θ2, φn) case it is more efficient to propose a set of designed experiments
Σθ,φ ) · ui, i ) 1,..., q, where the estimate of the det(ΨAUG+1) for each
l l ·· l
individual experiment, the posterior distribution is determined
Cov(θp, φ1) Cov(θp, φ2) · · · Cov(θp, φn)
by MCMC for the augmented data set DAUG+q, the joint marginal
The diagonal elements of Ψ are the variances for the individual probability distribution of parameters determined and the
parameters and the off-diagonal elements are their covariances. confidence estimates of parameters examined. This time the
Ψ is a function of both the parameters and the experimental iterative process will involve q experiments.
conditions u and can be readily evaluated for the data set DAUG In either case, the sequential process will continue until the
using suitable point estimates of the parameters. The quality of catalyst researcher is comfortable with the quality of the
the parameter estimate is related to the elements of Ψ. For estimates or additional experimentation does not improve
example, if the off diagonal elements are small, the contours the quality of the results. If the researcher is still not comfort-
of the posterior probability density function will be spherical able with the quality of the parameter estimates, it is necessary
and the parameter estimates are uncorrelated. Also, the smaller to revisit the experimental apparatus and attempt to improve
the variance estimates the smaller the uncertainty in the the overall quality of the data itself.
parameter estimates. It may be shown8 that det(Ψ) is propor-
tional to the size of confidence regions for linear models.
The hypervolumes of the confidence regions can be numeri- Literature Cited
cally calculated from the volume inside the contours of the full (1) Caruthers, J. M.; Lauterbach, J. A.; Thomson, K. T.; Venkatasubra-
probability density distribution, i.e. eq 8; however, this is a manian, V.; Snively, C. M.; Bhan, A.; Katare, S.; Oskarsdottir, G. Catalyst
difficult calculation. The linearized volume, i.e. det(Ψ), com- design: knowledge extraction from high-throughput experimentation. J.
Catal. 2003, 216 (1-2), 98–109.
puted from one million realizations from the posterior probability
(2) Dumesic, J. A.; Milligan, B. A.; Greppi, L. A.; Balse, V. R.;
distribution of the parameters can be used to approximate the Sarnowski, K. T.; Beall, C. E.; Kataoka, T.; Rudd, D. F.; Trevino, A. A. A
confidence region. To improve the confidence region, q ad- Kinetic Modeling Approach to the Design of Catalysts - Formulation of a
ditional experiments ui, i ) 1,..., q should be selected to Catalyst Design Advisory Program. Ind. Eng. Chem. Res. 1987, 26 (7),
minimize this determinant. The design procedure to select these 1399–1407.
(3) Banaresalcantara, R.; Westerberg, A. W.; Ko, E. I.; Rychener, M. D.
q experiments is the following: DECADEsA Hybrid Expert System for Catalyst Selection. 1. Expert
1. Generate a new candidate experiment, ui. System Consideration. Comput. Chem. Eng. 1987, 11 (3), 265–277.
2. Calculate the expected model predictions E{yi} for ui by (4) Banaresalcantara, R.; Ko, E. I.; Westerberg, A. W.; Rychener, M. D.
DECADEsA Hybrid Expert System for Catalyst Selection. 2. Final
E{yi|DAUG} ) Architecture and Results. Comput. Chem. Eng. 1988, 12 (9-10), 923–938.
T (5) Ammal, S. S. C.; Takami, S.; Kubo, M.; Miyamoto, A. Integrated
∫ θ,φ
f(θ, ui)p({θ, φ}|DAUG) dφ dθ =
1
∑ f(θ , u ) (E.2)
T j)1 j i
computational chemistry system for catalysts design. Bull. Mater. Sci. 1999,
22 (5), 851–861.
(6) Burello, E.; Rothenberg, G. In silico design in homogeneous catalysis
where {θj} are sampled from the posterior probability using descriptor modelling. Int. J. Mol. Sci. 2006, 7 (9), 375–404.
distribution p({θ, φ}|DAUG). (7) Dumesic, J. A.; Rudd, D. F.; Aparicio, L. M.; Rekoske, J. E.; Trevino,
3. Use E{yi} to form a modified data set DMD ) (ui, E{yi|D}). A. A. The Microkinetics of Heterogeneous Catalysis; American Chemical
Society: Washington, D.C., 1993; p 316.
4. Generate the posterior probability distribution from DAUG+1
(8) Box, G. E. P.; Lucas, H. L. Design of Experiments in Non-Linear
) DAUG + DMD. Situations. Biometrika 1959, 46 (1/2), 77–90.
5. Calculate the variance and covariance of the parameters (9) Chernoff, H. Sequential design of experiments. Ann. Math. Statist.
and form matrix Ψ. 1959, 30, 755–770.
6. Calculate det(Ψ). (10) Franckaerts, J.; Froment, G. F. Kinetic study of the dehydrogenation
of ethanol. Chem. Eng. Sci. 1964, 19 (10), 807–818.
7. Go to step 1 until det(Ψ) is minimized.
(11) Box, G. E. P.; Draper, N. R. The Bayesian Estimation of Common
The computational burden required to implement this pro- Parameters from Several Responses. Biometrika 1965, 52 (3), 355–365.
cedure is enormous. Optimum seeking methods can readily be (12) Hunter, W., G.; Reiner, A. M. Designs for discriminating between
substituted for finding the best point u1(t) when q ) 1. In two rival models. Technometrics 1965, 7 (3), 307–323.
4790 Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009
(13) Kittrel, J. R.; Hunter, W. G.; Watson, C. C. Nonlinear Least Squares (36) Katare, S.; Bhan, A.; Caruthers, J. M.; Delgass, W. N.; Venkata-
Analysis of Catalytic Rate Models. AIChE J. 1965, 11 (6), 1051–1057. subramanian, V. A hybrid genetic algorithm for efficient parameter
(14) Box, G. E. P.; Hill, W. J. Discrimination among mechanistic models. estimation of large kinetic models. Comput. Chem. Eng. 2004, 28 (12),
Technometrics 1967, 9 (1), 57–71. 2569–2581.
(15) Hunter, W. G.; Mezaki, R. An experimental design strategy for (37) Bhan, A.; Hsu, S.-H.; Blau, G.; Caruthers, J. M.; Venkatasubra-
distinguishing among rival mechanistic models. An application to the manian, V.; Delgass, W. N. Microkinetic Modeling of Propane Aromati-
catalytic hydrogenation of propylene. Can. J. Chem. Eng. 1967, 45, 247– zation over HZSM-5. J. Catal. 2005, 235 (1), 35–51.
249. (38) Bogacha, B.; Wright, F. Non-linear design problem in a chemical
(16) Froment, G. F.; Mezaki, R. Sequential Discrimination and Estima- kinetic model with non-constant error variance. J. Stat. Plan. Inference 2005,
tion Procedures for Rate Modeling in Heterogeneous Catalysis. Chem. Eng. 128, 633–648.
Sci. 1970, 25, 293–301. (39) Ucinski, D.; Bogacha, B. T-optimum design for discrimination
(17) Van Welsenaere, R. J.; Froment, G. F. Parametric Sensitivity and between two multiresponse dynamic models. J. R. Stat. Soc. Ser. B: Stat.
Runaway in Fixed Bed Catalytic Reactors. Chem. Eng. Sci. 1970, 25, 1503– Method. 2005, 67, 3–18.
1516. (40) Englezos, P. J.; Kalogerakis, N. Applied Parameter Estimation for
(18) Reilly, P. M. Statistical methods in model discrimination. Can. Chemical Engineers; Marcel-Decker, Inc.: New York, 2001.
J. Chem. Eng. 1970, 48, 168–173. (41) Blau, G. E.; Lasinski, M.; Orcun, S.; Hsu, S.-H.; Caruthers, J. M.;
(19) Bard, Y. Nonlinear parameter estimation; Academic Press: New Delgass, W. N.; Venkatasubramanian, V. High Fidelity Mathematical Model
York, 1974. Building with Experimental Data: A Bayesian Approach. Comput. Chem.
(20) Reilly, P. M.; Blau, G. E. The Use of Statistical Methods to Build Eng. 2008, 32 (4-5), 971–989.
Mathematical Models of Chemical Reaction Systems. Can. J. Chem. Eng. (42) Draper, N. R.; Hunter, W. G. Design of experiments for parameter
1974, 52, 289–299. estimation in multiresponse situations. Biometrika 1966, 53 (3/4), 525–
(21) Reilly, P. M.; Bajramovic, R.; Blau, G. E.; Branson, D. R.; 533.
Sauerhoff, M. W. Guidelines for the optimal desing of experiments to (43) Draper, D. Bayesian Hierarchical Modeling; Springer-Verlag: New
estimate parameters in first order kinetic models. Can. J. Chem. Eng. 1977, York, 2000.
55, 614. (44) Nocedal, J.; Wright, S. J. Numerical optimization; Springer: New
(22) Stewart, W. E.; Sorensen, J. P. Bayesian Estimation of Common York, 1999; p 636.
Parameters From Multiresponse Data With Missing Observations. Tech- (45) Froment, G. F. Model discrimination and parameter estimation in
nometrics 1981, 23 (2), 131–141. heterogeneous catalysis. AIChE J. 1975, 21 (6), 1041–1057.
(23) Rabitz, H.; Kramer, M.; Dacol, D. Sensitivity Analysis in Chemical (46) Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller,
Kinetics. Annu. ReV. Phys. Chem. 1983, 34, 419–461. A. H.; Teller, E. Equations of State Calculations by Fast Computing
(24) Froment, G. F. The kinetics of complex catalytic reactions. Chem. Machines. J. Chem. Phys. 1953, 21, 1087–1092.
Eng. Sci. 1987, 42 (5), 1073–1087. (47) Gilks, W. R.; Richardson, S.; Spiegelhalter, D. J. MarkoV Chain
(25) Stewart, W. E.; Caracotsios, M.; Sorensen, J. P. Parameter Monte Carlo in Practice; Chapman & Hall/CRC: New York, 1996.
estimation from multiresponse data. AIChE J. 1992, 38 (5), 641–650. (48) Atkinson, A. C.; Cox, D. R. Planning Experiments for Discriminat-
(26) Stewart, W. E.; Shon, Y.; Box, G. E. P. Discrimination and ing between Models. J. R. Stat. Soc. Ser. B: Stat. Method. 1974, 36 (3),
goodness of fit of multiresponse mechanistic models. AIChE J. 1998, 44 321–348.
(6), 1404–1412. (49) Atkinson, A. C.; Fedorov, V. V. Optimal design: Experiments for
(27) Asprey, S. P.; Naka, Y. Mathematical Problems in Fitting Kinetic discriminating between several models. Biometrika 1975, 62 (2), 289–303.
ModelssSome New Persopectives. J. Chem. Eng. Jpn. 1999, 32 (3), 328– (50) Fedorov, V. V.; Pazman, A. Design of physical experiments.
337. Fortschr. Phys. 1968, 16, 325–355.
(28) Stewart, W. E.; Henson, T. L.; Box, G. E. P. Model discrimination (51) Hsiang, T.; Reilly, P. M. A practical method for discriminating
and criticism with single-response data. AIChE J. 1996, 42 (11), 3055– among mechanistic models. Can. J. Chem. Eng. 1971, 49, 865–871.
3062. (52) Montgomary, D. C.; Runger, G. C. Applied Statistics and Prob-
(29) Park, T.-Y.; Froment, G. F. A Hybrid Genetic Algorithm for the ability for Engineers, 3rd ed.; Wiley: Hoboken, NJ, 2002; p 720.
Estimation of Paramters in Detailed Kinetic Models. Comput. Chem. Eng. (53) Prasad, V.; Vlachos, D. G. Multiscale Model and Informatics Based
1998, 22 (Suppl.), S103-S110. Optimal Design of Experiments: Application to the Catalytic Decomposition
(30) Petzold, L.; Zhu, W. Model reduction for chemical kinetics: an of Ammonia on Ruthenium. Ind. Eng. Chem. Res. 2008, 47, 6555–6567.
optimization approach. AIChE J. 1999, 45 (4), 869–886. (54) Hsu, S.-H. Bayesian Model Building Strategy and Chemistry
(31) Ross, J.; Vlad, M. O. Nonlinear Kinetics and New Approaches to Knowledge Compilation for Kinetic Behaviors of Catalytic Systems. Ph.D.
Complex Reaction Mechanisms. Annu. ReV. Phys. Chem. 1999, 50, 51–78. Thesis, Purdue University, West Lafayette, IN, 2006.
(32) Atkinson, A. C. Non-constant variance and the design of experi- (55) Bates, D. M.; Watts, D. G. Nonlinear Regression Analysis and its
ments for chemical kinetic models. In Dynamic model deVelopmentsmethods, Applications; Wiley and Sons: New York, 1988.
theory and applications; Asprey, S. P., Macchietto, S., Eds.; Elsevier: (56) Fishman, G. S. A First Course in Monte Carlo; Thomson Brooks/
Amsterdam, 2000; pp 141-158. Cole: Belmont, CA, 2005.
(33) Cortright, R. D.; Dumesic, J. A. Kinetics of Heterogeneous Catalytic (57) Hastings, W. K. Monte Carlo Sampling Methods Using Markov
Reactions: Analysis of Reaction Schemes. AdV. Catal. 2001, 46, 161–264. Chains and Their Applications. Biometrika 1970, 57, 97–109.
(34) Song, J.; Stephanopoulos, G.; Green, W. H. Valid Parameter Range
Analyses for Chemical Reaction Kinetic Models. Chem. Eng. Sci. 2002, ReceiVed for reView October 30, 2008
57, 4475–4491. ReVised manuscript receiVed February 24, 2009
(35) Sirdeshpande, A. R.; Ierapetritou, M. G.; Androulakis, I. P. Design Accepted March 13, 2009
of Flexible Reduced Kinetic Mechanisms. AIChE J. 2001, 47 (11), 2461–
2473. IE801651Y