You are on page 1of 23

4768 Ind. Eng. Chem. Res.

2009, 48, 4768–4790

Bayesian Framework for Building Kinetic Models of Catalytic Systems


Shuo-Huan Hsu, Stephen D. Stamatis, James M. Caruthers, W. Nicholas Delgass,* and
Venkat Venkatasubramanian
Center for Catalyst Design, School of Chemical Engineering, Purdue UniVersity, West Lafayette, Indiana 47907

Gary E. Blau, Mike Lasinski, and Seza Orcun


E-enterprise Center, DiscoVery Park, Purdue UniVersity, West Lafayette, Indiana 47907

Recent advances in statistical procedures, coupled with the availability of high performance computational
resources and the large mass of data generated from high throughput screening, have enabled a new paradigm
for building mathematical models of the kinetic behavior of catalytic reactions. A Bayesian approach is used
to formulate the model building problem, estimate model parameters by Monte Carlo based methods,
discriminate rival models, and design new experiments to improve the discrimination and fidelity of the
See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

parameter estimates. The methodology is illustrated with a typical, model building problem involving three
proposed Langmuir-Hinshelwood rate expressions. The Bayesian approach gives improved discrimination
Downloaded via INDIAN INST OF SCIENCE on September 25, 2019 at 08:52:43 (UTC).

of the three models and higher quality model parameters for the best model selected as compared to the
traditional methods that employ linearized statistical tools. This paper describes the methodology and its
capabilities in sufficient detail to allow kinetic model builders to evaluate and implement its improved model
discrimination and parameter estimation features.

1. Introduction In contrast to traditional point estimate methods, Bayesian


approaches make full use of the prior knowledge of the
A validated kinetic model is a critical and well-recognized experimenter, do not require linearization of nonlinear models,
tool for understanding the fundamental behavior of catalytic and naturally develop probability distributions of the parameter
systems. Caruthers et al.1 and others2-6 have suggested that the estimates that are a more informative description of parameter
fundamental understanding captured by the model can also be values than point estimates. When combined with effective
used to guide catalyst design, where a microkinetic7 analysis is modeling of experimental error, Bayesian methods are ideally
a key component of any fundamental model that intends to suited to kinetic modeling. The value of Bayesian approaches
predict catalytic performance. While the building of such kinetic has been known for some time,11,12,14,16,18,22,42,43 but they require
models has seen many advances,8-40 the complexity of real significant computational resources. However, recent advances
reaction systems can still overwhelm the capabilities of even in computational power have made the implementation viable.41
the most recent existing optimization-based model building The purpose of this paper is to introduce the application of
procedures. The difficulty in discriminating rival models and Bayesian methods for the development of microkinetic models.
in determining kinetic parameters even from designed experi- The emphasis is on identification of the most appropriate
mental data is exacerbated by kinetic complexity as well as the description of the reaction chemistry, i.e. sequence of elementary
fact that all experimental data contain error. Most approaches steps and on obtaining the best quantification for the associated
assume that the proposed model is true and that the data has a kinetic parameters. While this level of detail may not be essential
certain error structure, e.g., constant error or constant percentage for some levels of reactor design, it is critical for catalyst design,
error over the entire experimental range. Linear or nonlinear where the objective of the research is to connect the various
regression techniques are used to generate point estimates of kinetic parameters in a microkinetic model with the molecular
the parameters. The uncertainty of the parameters is subse- structure of the catalyst. In order to illustrate the principles and
quently computed under the assumption that the model is linear capabilities of the Bayesian approach and its differences from
in the neighborhood of these point estimates. There are several currently practiced point estimate approaches, we will analyze
limitations with this approach. First, the model may be wrong simulated sets of experimental data generated from known
and thus parameter estimates are corrupted by model bias. Even reaction equations. In an attempt to separate the capabilities of
when the model is adequate, linearization of nonlinear models the method from the mathematical details of its execution, we
can lead to spurious confidence and joint confidence regions.41 will first present the modeling framework, introduce the models
An additional complication is the potential for generating to be discriminated, and summarize the results. Then, when the
multiple local optimal parameter sets depending on the starting advantages of the Bayesian approach are clear, we will describe
“guesses” of the parameters supplied to the regression algorithm. the details of its utility. Its ability to treat nonlinearity without
The bottom line is that the existing point estimation methods approximation, its inherent recognition of error, the effect it
frequently give the wrong parameters for the wrong model. In has on the probability distribution of kinetic parameters are the
addition, they do not take advantage of the considerable a priori reasons it can differentiate models and improve confidence
knowledge about the reaction system to design the experimental intervals for the parameters. The price for these advantages is
program. a higher, but manageable, computational load. Our intent is to
make the computational requirements clear, but to emphasize
* To whom correspondence should be addressed. E-mail: delgass@ the benefits in analysis of catalytic kinetics that are the return
ecn.purdue.edu. on the investment in the mathematics.
10.1021/ie801651y CCC: $40.75  2009 American Chemical Society
Published on Web 04/23/2009
Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009 4769

Figure 1. Model building procedure.

2. Framework for Mathematical Model Building follows the logic of the sequential model building paradigm.
20 Unfortunately, this simplicity is achieved at the expense of a
Our approach will follow that of Reilly and Blau, illustrated
computational burden which has only recently permitted its use
in Figure 1, where sequential design, experimentation, and
on user-friendly computer software and hardware available on
analysis of experimental data can be holistically integrated to
the desktop of a typical reaction modeling specialist. For
provide an efficient and accurate approach for model discrimi-
complex models, i.e. large numbers of reactions and parameters,
nation and parameter estimation of chemical reaction systems.
considerable skills with the mathematical aspects of the method
Summarily, the first step in the model building process is model
are still necessary, as well as the need to resort to a server or
discrimination, where several physically meaningful models are
cluster of servers to achieve the necessary computing power.
postulated by the scientist to describe the reaction system being
The strength of the Bayesian approach is its ability to identify
investigated. Then sets of experimental data are designed and
an adequate model with realistic parameter estimates by
carried out in the laboratory to discriminate among these
incorporating the prior knowledge of the scientist/experimental-
candidate models. The abilities of the various models to match
ist. Using this knowledge is an anathema to statisticians who
these sets of data for various model parameter values are
correctly point out that we are “biasing” the results and not
compared until the best one is found. If none of the models is
letting the data “speak for itself.” This is precisely why these
deemed adequate, additional ones are postulated and the
methods are being embraced by the engineering and scientific
sequential experimentation/analysis process is continued until
community who do not want their expertise silenced by the
a suitable one is found.
vagaries of experimental analysis. In traditional approaches, the
Once an adequate model has been selected, the next step in
main value of the experience of the investigator is in
the model building process is parameter estimation, where the
the translation of that experience into initial guesses for the
quality of the parameter estimates is quantified by point
parameters. In the Bayesian approach, the belief of the
estimates and their confidence regions. We note that there is a
experimentalist in parameter ranges and the plausibility of
tendency among model builders and scientists to attribute
candidate models are specifically acknowledged in the imple-
meaning to these estimates before an adequate model is
mentation. It is noteworthy to mention that as the amount of
obtained. Clearly, parameter estimates for invalid models are
well-designed experimental data increases, the influence of the
meaningless. Consequently, the quality of parameter estimates
initial beliefs diminishes.
can only be determined after an adequate model has been
2.1. Example Problem. Before proceeding with the math-
identified. Then, if the confidence regions for the parameters
ematical and statistical development of the Bayesian approach,
for that model are unacceptable, as quantified by their shape
we will first define a simple model reaction system A + B f
and size, additional experiments may be needed to improve their
C + D over a catalyst in a differential reactor. We assume that
quality. The process of designing experiments followed by the
A, B, and C reversibly adsorb on the surface of the catalyst, D
determination of parametric uncertainty followed by additional
is not absorbed, and the reaction is irreversible under the
experimentation is continued until acceptable quality parameters
operating conditions studied. To represent typical laboratory
are realized. Only after this process has been completed can
conditions, we use a feed stream V0 ) 100 cm3/min fed to a
the model be used for reactor design and analysis. In the special
tubular reactor which contains w ) 0.1 g of catalyst. The stream
case of catalyst design where the model parameters are
is composed of various input concentrations of A, B, and C
paramount, they may now be used as the response variables in
represented as partial pressures PA0, PB0, and PC0 selected from
another model building exercise that relates these parameters
the following ranges
to the structural and chemical descriptors of the catalyst.
The model building process (discrimination, validation, and 0.5 atm e PA0 e 1.5 atm
parameter estimation) as described above conceptually is well- 0.5 atm e PB0 e 1.5 atm
known although sometimes applied incorrectly because the 0 atm e PC0 e 0.2 atm
statistical and mathematical sophistication required is not fully
The temperature of the system can be changed over the range
appreciated. What is new in the Bayesian approach presented
here is the simplicity of the approach and the natural way it 630 K e T e 670 K
4770 Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009

The temperature range is a bit narrow, but is representative of adjusted to produce significant surface coverages. Experimental
systems where complex reaction networks impose selectivity error was added to the rate r predicted by eq 3 and was assumed
constraints. The outlet concentrations PA, PB, and PC in to be normally distributed with zero mean and a variance of
atmospheres are measured. From these values, the molar 0.002 (gmol/(min kg catalyst))2, i.e. a standard deviation of
conversion, x, of the reaction is calculated and used to determine 4.5%. An intuitive experimental program used by many catalysis
the rate of reaction from the design equation for a differential researchers is to change one experimental factor at a time,
plug flow reactor: keeping all other factors at their nominal values. We will
designate such an experimental program as the “one-variable-
xV0CA0 xV0PA0 at-a-time” approach. The reaction rate r was simulated for the
r) ) (gmol ⁄ (min kg catalyst)) (1)
w wRT one-variable-at-a-time design with {T, PA0, PB0, PC0}. The
where CA0 is the initial concentration of A in gram moles per resulting one-at-time data, shown in Table 1, is the initial set
cubic centimeter and R is the ideal gas law constant. The of experimental data D with 33 points, used for analysis. It is
following three Langmuir-Hinshelwood models are postulated assumed that the error in the measured values of r is constant
to describe the reaction. over the entire experimental region. This assumption is rarely
Model 1 (A, B, and C are adsorbed) valid, i.e., errors tend to be related to the magnitude of the r
values. However, we will assume constant error values in our
K1
example to keep the focus strictly on the Bayesian approach.
A + * T A*
K2
We note that since the data were generated directly from the
B + * T B* k4K1K2PAPB rate expression, changes in the rate determining step or most
r) (2) abundant surface intermediate over the data range are not
K3 (1 + K1PA + K2PB + K3PC)2
C + * T C* possible in this example.
k4 2.2. Model Building Formalism. The general problem, for
A* + B*f C* + D which the above three models are specific cases, is to postulate
Model 2 (A and C are adsorbed) and then select the best model M* from a set of P ) {M1, M2,...,
K1
MP} models from experimental data collected in a batch,
A + * T A* continuously stirred tank reactor (CSTR), or plug flow reactor.
K3 k4K1PAPB For simplicity, we will restrict ourselves to modeling data
C + * T C* r) (3)
1 + K1PA + K3PC obtained from CSTR’s or differential reactors where the rate
k4 of reaction is measured or calculated directly. In a recent paper
A* + Bf C* + D by Blau et al.,41 the more general problem of dealing with kinetic
Model 3 (B and C are adsorbed) models consisting of differential equations used to characterize
K2
concentration versus time or space time data from batch and
B + * T B* integral reactors is discussed. For the general case considered
K3 k4K2PAPB here, the models are related to N experimental data points data
C + * T C* r) (4)
1 + K2PB + K3PC by
k4
A + B*f C* + D Mk : ri ) fk(θk, ui) + εi(φ) i ) 1, ... , N, k ) 1, ... , P (6)
where S* is a species adsorbed on the surface, k4 is the kinetic where ri is the rate of the reaction for the ith experimental
constant of the rate determining step, and Ki’s are the equilibrium condition, fk(θk, ui) is the kth model with a Qk-dimensional vector
constants with proper units. The steps showing double ended of parameters, θk, and ui is an R-dimensional vector of
arrows are assumed to be in quasi-equilibrium. It is known that experimental conditions. The experimental error is described
one of the models is true because it is the one used to generate by the error model εi(φ), which will be discussed shortly. The
the simulated data. For simplicity, no additional models are data set formally represented by D ) {(ui, ri)|i ) 1, 2,..., N}.
considered. The temperature dependence of the parameters is In our example problem M ) 3, where the parameters to be
assumed to follow a normal Arrhenius relationship:27 estimated are

k4 ) k40 exp - ( ( ))
Ea 1
-
R T T0
1 Model 1 : θ1 ) (k40, E4, K10, ∆H1, K20, ∆H2, K30, ∆H3)
Model 2 : θ2 ) (k40, E4, K10, ∆H1, K30, ∆H3)
(gmol ⁄ (min kg catalyst))

( ( ))
Ki ) Ki0 exp -
∆Hi 1
R T T0
-
1 (
1 ⁄ atm) for i ) 1, 2, 3 (5)
Model 3 : θ3 ) (k40, E4, K10, ∆H1, K20, ∆H2)
Note that models 2 and 3 have two fewer parameters than model
where k40 is the rate constant at the reference temperature T0 ) 1. The R ) 4 dimensional vector of experimental conditions
650 K, which is the middle of the experimental range, and Ea for the ith measurement is ui )(Ti, PA0i, PB0i, PC0i).
is the activation energy. The Ki0 are reference equilibrium The rate of reaction ri is estimated from the conversion xi
constants at T0, and ∆Hi are the heats of adsorption for the and initial concentrations via eq 1. The conversion is obtained
corresponding equilibrium constant, Ki. The challenge is to from the measured input and output concentrations in the
determine the most suitable model and obtain high quality differential reactor; hence, the estimated rate ri has variability
parameter estimates from the simulated experimental data. or error associated with it. This error is represented by an error
Reaction rate data for the A + B f C + D example problem model εi(φ) in eq 6, which is a joint probability density function
were generated using model 2 defined by eq 3 for experimental for the errors associated with the reaction rate ri at experimental
factors T, PA0, PB0, and PC0, with K10 ) 1 1/atm, K30 ) 20 conditions ui. φ is an V-dimensional vector of statistical error
1/atm, k40 ) 3 gmol/(min kg catalyst), ∆H1/R ) -17.5 × 103 model parameters. It is common to assume that the errors are
K, ∆H3/R ) -20 × 103 K, and Ea/R ) 11 × 103 K. The values unbiased so that the mean of the error εi(φ) is zero, while the
were chosen to be representative and the heats of adsorption parameters φ characterize the variability in the measured value.
Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009 4771
Table 1. Data Set D: Experimental Design Based on One-Variable-at-a-Time Approach
expr no. T (K) PA0 (atm) PB0 (atm) PC0 (atm) PA (atm) PB (atm) PC (atm) rate (gmol/(min kg catalyst))
1 650 1 1 0.1 0.989 0.989 0.111 0.657
2 640 1 1 0.1 0.992 0.992 0.108 0.596
3 630 1 1 0.1 0.994 0.994 0.106 0.474
4 640 1 1 0.1 0.992 0.992 0.108 0.652
5 650 1 1 0.1 0.989 0.989 0.111 0.683
6 660 1 1 0.1 0.986 0.986 0.114 0.841
7 670 1 1 0.1 0.982 0.982 0.118 0.951
8 660 1 1 0.1 0.986 0.986 0.114 0.894
9 650 1 1 0.1 0.989 0.989 0.111 0.701
10 650 1.25 1 0.1 1.239 0.989 0.111 0.848
11 650 1.5 1 0.1 1.488 0.988 0.112 1.044
12 650 1.25 1 0.1 1.239 0.989 0.111 0.791
13 650 1 1 0.1 0.989 0.989 0.111 0.701
14 650 0.75 1 0.1 0.740 0.990 0.110 0.536
15 650 0.5 1 0.1 0.491 0.991 0.109 0.404
16 650 0.75 1 0.1 0.740 0.990 0.110 0.593
17 650 1 1 0.1 0.989 0.989 0.111 0.641
18 650 1 1.25 0.1 0.987 1.237 0.113 0.887
19 650 1 1.5 0.1 0.984 1.484 0.116 1.003
20 650 1 1.25 0.1 0.987 1.237 0.113 0.952
21 650 1 1 0.1 0.989 0.989 0.111 0.615
22 650 1 0.75 0.1 0.992 0.742 0.108 0.502
23 650 1 0.5 0.1 0.995 0.495 0.105 0.290
24 650 1 0.75 0.1 0.992 0.742 0.108 0.512
25 650 1 1 0.1 0.989 0.989 0.111 0.669
26 650 1 1 0.05 0.989 0.989 0.061 0.893
27 650 1 1 0 0.988 0.988 0.012 1.293
28 650 1 1 0.05 0.989 0.989 0.061 0.921
29 650 1 1 0.1 0.989 0.989 0.111 0.646
30 650 1 1 0.15 0.990 0.990 0.160 0.526
31 650 1 1 0.2 0.991 0.991 0.209 0.495
32 650 1 1 0.15 0.990 0.990 0.160 0.559
33 650 1 1 0.1 0.989 0.989 0.111 0.748

The proper identification and application of the error model is 3. Evaluation of Models by Nonlinear Regression
a step often overlooked in the modeling process. However, the Analysis
determination of the error model is every bit as important as
Nonlinear regression analysis is widely used for estimating
the determination of the kinetic model because selecting an
parameters of kinetic models. Here, different values of the
acceptable kinetic model and estimating the kinetic model parameters are selected to minimize the sum of squares of the
parameters depends critically on the quality of the experimental differences, or residuals, of one of the models defined by eq 6,
data as quantified by the error model. assuming that it is correct, using an iterative search procedure
It should be pointed out that eq 6 assumes that all kinetic (e.g., the Levenberg-Marquardt optimization method44). The
models adequately describe the data. However, all models, parameter values in the set that minimizes this sum of squares
including even the “best model” M*, are only an approxima- are called the least-squares parameter estimates. If repeat
tion of reality and are probably biased. Thus, the experimental experiments are available for each data point, then the method
error εi(φ) may be confused or “confounded” with kinetic automatically accounts for experimental error. It has been
modeling error. In what follows, each model will first be generally agreed by practitioners in this field that the only
assumed to be true, where deviations between the model assurance of achieving the “optimal” point estimate solution is
predications and the data are unbiased estimates of experi- to supply a starting guess sufficiently close to the optimal
mental error. solution.19 Even then there is the possibility of stopping short
Given a set of data, the first step in the model building of the best value or indeed finding a multiplicity of solutions
problem is to use this data to discriminate between the proposed because (1) the kinetic model may be incorrect or (2) the
necessary information to provide meaningful parameter esti-
models. The conventional way this is accomplished is to find
mates may not be available due to poorly designed experiments.
the best, in some statistical sense, point estimate of parameters
To demonstrate the real plausibility of obtaining false optima
for each model candidate and then compare the models using
with the nondesigned but conventional one-at-a-time approach
these best estimates. Generating such estimates is a challenging to experimentation, random initial estimates were supplied to
task since the kinetic model equations are nonlinear in the the nonlinear regression program. Different sets of parameter
parameters, requiring the use of nonlinear regression techniques estimates giving the same sum of squared error (SSE) are
which may lead to local or false optima which represent reported in Table 2. At first glance, these results are somewhat
incorrect parameter estimates. Even when the best or global surprising, not only in that there are multiple minima, but also
optimum is obtained, comparing models at their best single set that they have the essentially the same residual sums of squares.
of parameters can sometimes give very misleading results, This implies that the different sets of parameter values, even
playing havoc with the model building process.19 We demon- the wrong ones, would allow the models to fit the data equally
strate in section 3 below the pitfalls of using point estimates well. Regardless of the statistical criteria used, it would be
for the simple kinetic problem described in section 2.1, before impossible to distinguish or discriminate the models, yet we
developing an alternative Bayesian approach in section 4. know that the data was generated by one of the models.
4772 Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009

Table 2. Parameter Estimates from Nonlinear Regression


Local Optimal Parameter Sets for Model 1
ln(k40) Ea/R (103 K) ln(K10) ∆H1/R (103 K) ln(K20) ∆H2/R (103 K) ln(K30) ∆H3/R (103 K) SSE
25.42 34.04 -2.46 0.00 -22.45 -26.34 1.39 -0.80 0.0585
15.61 29.28 -2.46 0.00 -12.64 -21.54 1.39 -0.75 0.0585
25.18 17.26 -2.46 0.00 -22.21 -9.55 1.39 -0.80 0.0585
13.79 27.69 -2.46 0.00 -10.82 -20.17 1.39 -1.11 0.0585

Local Optimal Parameter Sets for Model 2


3
ln(k40) Ea/R (10 K) ln(K10) ∆H1/R (103 K) ln(K30) ∆H3/R (103 K) SSE
1.85 7.97 -1.20 -1.71 2.53 -3.25 0.0558
1.84 11.09 -1.18 -5.57 2.54 -3.88 0.0558
1.85 7.57 -1.20 -0.69 2.53 -2.27 0.0558
1.97 8.03 -1.35 0.00 2.50 -0.24 0.0558

Local Optimal Parameter Sets for Model 3


3
ln(k40) Ea/R (10 K) ln(K20) ∆H2/R (103 K) ln(K30) ∆H3/R (103 K) SSE
45.38 30.26 -44.99 -22.58 2.28 -1.01 0.0600
41.46 34.66 -41.06 -26.97 2.28 -1.01 0.0600
28.93 40.00 -28.53 -32.42 2.28 -1.21 0.0600
29.62 8.02 -29.22 0.00 2.28 -0.38 0.0600

The solution to this conundrum is to note that only some of 4. Bayesian Approach to Parameter Estimation
the parameter values are different for the different minima. Let Bayesian methods have been suggested for building models
us look at the two models that were not used to generate the of reaction systems for over thirty years.20,45 However, they
data in Table 2, namely models 1 and 3. For model 1 ln(K30) have not been adopted by the catalyst model building community
and ln(K10) are the same for the four different local optima while because of the computational challenges associated with using
some of the other parameters change by orders of magnitude. them properly. Fortunately, the power and cost effectiveness
Similarly, ln(K30) is constant while the other parameters take of high speed computation are making it possible for the
on various values for model 3. Even for model 2, ln(K30) is researcher to exploit this important modeling tool. The Bayesian
well-defined while the other parameters change but to a much approach is fundamentally different from regression methods.
lesser degree than with models 1 and 3. This implies that the Instead of finding single optimal point estimates of the model
data give a great deal of information about the adsorption of C parameters to describe a given set of data, it uses methods that
on the surface of the catalyst but little else. This implies that identify “regions” where one believes the true values of the
changing the amount of C in the feed swamped out the effect parameters lie. Rather than review the details of the Bayesian
of changes in the other experimental conditions. In fact, the approach reported in the literature,47 it will be sufficient to define
first nine data points which account for the temperature changes the salient terms necessary to allow presentation of the results
are insufficient to provide meaningful estimates for the activation for our sample problem. The first quantity that must be defined
energy and adsorption coefficients defined by eq 6. This is the is p(εi), which is the joint probability density function for the
consequence of (1) a poorly designed experiment, i.e. the errors in the data, D, for all N experiments. This term
experimental conditions are changed individually (one-at-a-time acknowledges the existence of error in the data and accounts
approach), and (2) the use of a nonlinear regression approach for it explicitly through an error model function εi(φ) and the
where the uncertainties in the parameter estimates are not associated error model parameter vector φ. The second quantity
recognized. Some nonlinear parameter estimation programs is also a probability distribution called the likelihood function,
attempt to estimate this uncertainty, but they fail badly because L(D|(θk,φ)), which is the “likelihood” that model k with its
they do not properly account for the nonlinearities of the model kinetic parameters θk generated the data D. This function is
about the optimal point estimates. As we will show, the Bayesian derived in Appendix A for a general model building system.
approach properly accommodates the uncertainty in parameter Figure 2 gives some insight into this interpretation. It consists
estimation, but there is no solution for a poorly designed of a simple plot of a set of five (yi, xi) data points shown as
experiment. solid dots on the graph where y is the response variable and x
Before leaving this section, it is worth commenting on is the independent variable. At any value of xi, many different
experimental error. Examination of Table 1 shows that the center values of yi are experimentally plausible. The probability
point of the design is repeated five times and some of the points distribution for these points is sketched as a probability distri-
are repeated twice. These repeat points may be used to estimate bution p(yi) shown on the graph. Note that these probability
experimental error at different points in the operating region. distributions can change from point to point on the graph.
Seldom is any attempt made to model this experimental error, Several models could be used to “generate” the five data points.
and repeat points are simply used to weight the data differently. When a model is used to predict the response y for a given
We show how to do this in Appendix A. The greater the size value of x, this prediction is not a point but a probability
of the operating region, the better the ability to discriminate models distribution. Consequently, we can use this probability distribu-
and improve parameter estimates since the results are not con- tion to measure the probability that the model generated the
founded by experimental error and the different models have more measured response value yi. For example, the simplest model
operating space in which to show their divergence from one for the data of Figure 2 would be a straight line of the form y
another. Consequently, the extremes of the experimental conditions ) θ1 + θ2x. For any set of values of θ̂1and θ̂2, a straight line
are key points in any experimental program despite the challenges could be drawn and the “probability assessed” that it generated
of running the equipment under such conditions. the observed five points by comparing the observed and model
Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009 4773
predicted values assuming the model is correct and the error Table 3. Bounds of the Uniform Prior Probability Distributions of
model is operative. This probability is called the likelihood. The the Parameters in the Three Proposed Models
dotted straight line in Figure 2 is one such instance. By finding model 1 model 2 model 3
the set of parameter values θ*1 and θ*i which maximize this parameter LB UB LB UB LB UB
probability, we have point estimates called the maximum
ln(k40) -10 10 -10 10 -10 10
likelihood estimates. In this figure, we have shown the location
Ea/R (103 K) 5 40 5 40 5 40
of the simulated data points as dashed vertical lines on the ln(K10) -10 10 -10 10
probability distributions. A similar analysis can be performed ∆H1/R (103 K) -50 -10 -50 -10
for any additional model. In Figure 2, for example, the mean ln(K20) -10 10 -10 10
predicted values with maximum likelihood estimators for a ∆H2/R (103 K) -30 -10 -30 -10
ln(K30) -10 10 -10 10 -10 30
simple curvilinear model with an additional parameter is shown ∆H3/R (103 K) -30 -10 -30 -10 -30 -10
as the solid line. It is clear that the curvilinear model is more
apt to generate the five data points than the straight line since
parameters. Specifically, it quantifies the improvement in our
the fit of the data to the calculated values is better. That is,
knowledge about the parameters of the model by weighting the
the maximum likelihood value of generating the data is
prior information with the new data D generated by the
higher. The mean values predicted by the curvilinear model are
experimental program. This new knowledge about the param-
shown as vertical solid values on the probability distributions.
eters is captured by another probability distribution called the
Note that these values are closer to the mean values of the
“posterior” distribution reflecting the fact that it represents our
distributions than those for the linear model. In the general case,
belief in the parameters “after” the data have been generated.
the likelihood point estimate θ*, k which maximizes the likelihood
Formally, Bayes’ theorem states that this posterior distribution
function for known values of the error model parameters φ can
p({θk,φ}|D) is related to the product of the prior distribution
be obtained by the nonlinear search approach described in
and likelihood function by the proportionality
section 2.
There usually is some knowledge of the values of θk and φ p((θk, φ)|D) ∝ L(D|(θk, φ))p(θk, φ)
a priori (i.e., before the data is collected), e.g., the rate constants
are in a specific range, the activation energy must be greater This proportionality can be turned into an equation so that the
than a minimum value, the equilibrium constant is greater than posterior distribution can be calculated directly by normalization
one, etc. This knowledge is captured in a third probability of the probability distribution by integrating over the allowable
distribution called the prior distribution p(θk, φ). For example, parameter space to give
if there is only one rate constant k1 in the model, i.e., θ1 ){k1} L(D|(θk, φ))p(θk, φ)
and if it is known a priori that k1,min < k1 < k1,max, then p((θk, φ)|D) ) (8)

{
0 k1 < k1,min ∫ ∫ L(D|(θ , φ))p(θ , φ) dφ dθ
θk φ k k k

1 The Bayesian approach directly incorporates both knowledge


p(θ1) ) k k1,min e k1 e k1,max (7)
1,max - k1,min about the error in the data (either via repeats or via an error
0 k1,max < k1 model which is an inherent part of the likelihood function; see
Table 3 specifies a set of uniform prior probability distributions Appendix A) as well as prior information about the model
for the parameters of the three models in our example problem parameters. It produces not a single value or point estimate for
from section 2.1. Note that UB stands for upper bound and LB a parameter as in regression analysis but rather a joint probability
stands for lower bound. distribution for the parameter set which can define a confidence
We are now ready to apply Bayes’ theorem for parameter region which can be used to assess the quality of the parameter
estimation. The Bayesian approach is a natural way of thinking estimates including their interactions. Another advantage of the
for scientists and engineering since it mimics the scientific Bayesian approach is that the parameter φi’s in the error model
method. In the parameter estimation application, the model can be determined simultaneously with the kinetic model
builder postulates that the parameters follow some prior parameters once the error model has been specified. However,
distribution. A new data set D is then collected. This data is with high throughput experimentation, repeat experiments at
then used to confirm or modify the prior belief about the different experimental conditions should be included in D so
that the error model parameters can be estimated “before” the
kinetic model parameters reducing the computational burden
and validating the form of the error model.
Before illustrating the advantages of the Bayesian approach
over the point estimate regression method, it is necessary to
point out the major limitation of the former and the reason
it has not been generally applied despite the obvious
limitations of the latter. It is a nontrivial exercise to integrate
the denominator of eq 8. The conventional approach is to
use Monte Carlo sampling procedures which will converge
to the value of the definite integral if a sufficiently large
number of parameter values is sampled. The problem is
the dimensionality of the parameter space which is equal to
the sum of the number of parameters in the kth model (Qk)
plus the number of parameters in the error model (V). For
small models, Qk + V e 10, this is computationally feasible
for a well-defined prior distribution (e.g., uniform or normal).
Figure 2. Schematic of likelihood function L(D|(θk,φ)). However, for larger model systems, simple Monte Carlo is
4774 Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009

Figure 3. Marginal posterior probability density functions of the parameters in different models, given data set D. (dashed line) θj,max, (dash-dotted line)
MPPDE. The units of Ea/R, ∆H1/R, ∆H2/R, and ∆H3/R are 103 K.

too slow. Fortunately, the pioneering work of Metropolis et


al.46 and its recent rediscovery by Gilks47 has led to the
implementation of Markov Chain Monte Carlo (MCMC)
sampling procedures which guide the selection of sampling
points by conditioning each selection on the current value
of the point available. The description of this sampling
procedure has been relegated to Appendix B so as not to
detract the experimentalist from application issues. However
it should be pointed out that it is different conceptually from
a simple Monte Carlo procedure. Rather than actually
evaluating the integral of the denominator of eq 8, the MCMC
scheme converges to a process which samples from the
posterior probability distribution. These samples can be used
to calculate statistical moments of the posterior distribution
such as mean values as well as determine “shapes” of the
posterior distribution from which joint confidence regions Figure 4. 100(1 - R)% HPD confidence interval for a parameter.
for the parameter values can be calculated. One particularly
useful aid to the modeler/experimentalist in visualizing what parameters are Gaussian with the maximum likelihood estimates
is happening is the marginal distribution for a single being the mean values. The skewed nature of the parameter
parameter. It can be readily extracted from the posterior estimates demonstrates the need for the Bayesian approach and
distributions p((θk,φ)|D) by integrating out the dependence should be no surprise considering the inherently nonlinear
on all the other parameters as follows: features of chemical kinetics. Figure 3 also shows the maximum
posterior probability density estimate (MPPDE), which is the
p(θk,j) ) ∫ ′ ∫ p({θ , φ}|D) dθ′
θk,j φ k k,j dφ, θ′k,j ) {θk,l|l * j} (9) value of that particular parameter at its maximum probability
density value in the (θk, φ) hyperspace. This is not necessarily
p(φj) ) ∫ ∫ ′ p({θ , φ}|D) dθ
θk φj k k dφj′, φj′ ) {φl|l * j} (10) the mean of the marginal distribution but represents the most
likely value for the parameter if a single point estimate needs
It must be remembered that by integrating out the effects of all to be selected.
other parameters, one hides any correlation with the other Specification of confidence limits for these non-Gaussian
parameters. marginal distributions is more involved. There exist an infinite
We are now ready to demonstrate these advantages of the number of confidence limits that can be placed on the
Bayesian approach for estimating the parameters for the three parameters. A commonly used confidence limit, called the
models defining the simple A + B f C + D example introduced highest probability density (HPD) confidence limit, is
in section 2.1. The marginal probability distributions defined the smallest of all confidence regions that enclose the desired
by eqs 9 and 10 for the various parameters for all three models percentage of the parameter space. This is shown schematically
are shown in Figure 3. Some of the marginal distributions like in Figure 4. Note that (i) the expectation of p(θk,j) or p(φj) is
those for K30 and σ2 are symmetrical with a single well-defined not necessarily at the peak of the p(θk,j) or p(φj) distributions
(u) (l)
peak; however, other distributions like those for ∆H2 and ∆H3 and (ii) the upper and lower bounds, i.e. θk,j and θk,j , are not
are highly skewed. The traditional approach misses all of this symmetric around either the peak or the expectation of the
information since it assumes that the distributions of the distribution. The expectation of p(x) is calculated by the average
Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009 4775
Table 4. Estimated Parameters and Corresponding 95% Confidence Intervals for Model Candidates
Data Set D and Uniform Prior Probability Distribution
model 1 model 2 model 3
parameter MPPDE θj,max 95% LB 95% UB MPPDE θj,max 95% LB 95% UB MPPDE θj,max 95% LB 95% UB
ln(k40) 8.98 9.94 5.93 9.94 1.74 1.81 0.69 9.04 8.03 4.48 2.43 10.0
Ea/R (103 K) 32.4 29.0 25.2 39.8 12.5 19.7 9.4 27.5 20.9 22.9 19.9 26.1
ln(K10) -2.75 -2.38 -5.46 -1.31 -1.05 -1.57 -8.82 0.70
∆H1/R (103 K) -11.0 -12.6 -25.5 -10.2 -10.7 -18.7 -26.4 -10.0
ln(K20) -5.75 -5.13 -7.34 -2.66 -7.65 -4.12 -9.76 -1.06
∆H2/R (103 K) -19.9 -17.0 -29.8 -10.2 -22.3 -27.6 -30.0 -21.3
ln(K30) 1.37 1.36 1.19 1.56 2.57 2.30 2.00 3.06 2.25 2.36 1.54 2.95
∆H3/R (103 K) -10.1 -14.5 -21.1 -10.2 -10.1 -10.3 -23.0 -10.0 -20.3 -20.8 -29.2 -19.3
σ2 × 103 1.84 2.42 1.47 4.48 1.67 2.13 1.31 4.12 2.14 5.79 1.02 90.0
ln[E{L(Mk|D)}] 53.63 55.47 51.60
Pr(Mk|D) 0.135 0.848 0.018
value of all the samples of x. The 100(1 - R)% highest parameters are located for each of the candidate models, the next
probability confidence limit is defined by task is to determine which one of the candidate models best
describes the data, i.e. model discrimination. Model discrimination
Pr(θ(l) (u)

k,j e θk,j e θk,j ) ) Ω1-R
p(θk,j) dθk,j ) 1 - R (11) is possible once the posterior probability distribution for the
parameters p({θk,φ}|D) is available for each model or, equivalently,
Pr(φ(l) (u)
j e φj e φj ))∫ Ω1-R
p(φj) dφj ) 1 - R (12) it is possible to sample from the posterior probability distribution.
Once again, Bayes’ theorem can be applied. The first step is to
where Ω1-R is the HPD region, in which the probability density
assign discrete prior probabilities Pr(M1), Pr(M2),..., Pr(MP) to each
of any point is larger than that of any point in the complement
of the models such that ∑k Pr(Mk) ) 1. These prior probabilities
of Ω1-R. Once R is specified, the upper and lower values on
can be based on subjective opinions of the experimentalist/scientist,
the parameters are found by simply examining points from the
literature data, or simple exploratory analysis of any available data.
marginal distribution p(θk,j) or p(φj). If R is sufficiently small
It is important to select these prior probabilities “before” the actual
and the distribution is unimodal, the HPD confidence limits of
experimental program such as the data shown in Table 1 is
eq 11 will contain various point estimates such as the expected
conducted. Once this data set D becomes available, the posterior
values of p(θk,j) and θk,j,max, i.e., the value of θk,j at the peak in
probability of models Pr(Mk|D) is determined by applying Bayes’
the marginal probability distribution. By definition, the expected
theorem, this time on the models themselves using a different
value of θk,j is the first moment of the marginal distribution
likelihood function, L(D|Mk)
E(θk,j) ) ∫θ k,jp(θk,j) dθk,j = ∑θ k,jp(θk,j) (13) Pr(Mk|D) ∝ L(D|Mk)Pr(Mk) (14)
θk,j

It is quite conceivable that the three point estimates we have Here the likelihood function L(D|Mk) is interpreted as the
defined, namely, MPPDE, E(θk,j), and θk,j,max, will be significantly likelihood of the model generating the data set. Since∑k Pr(Mk|D)
different. Which one should be used and reported as “the” point ) 1, the proportionality can be turned into an equation by the
estimate? The answer is ambiguous and being argued among use of another normalization factor so that
statisticians and scientists alike. Because of this controversy, it
L(D|Mk)Pr(Mk)
would be preferable to simply report the HPD confidence limits Pr(Mk|D) ) (15)
without specifying some arbitrary point estimate, although this
would be a radical departure from current practice.
∑ L(D|M )Pr(M )
k k
k
The MPPDE and the upper and lower bounds at 95% Once the posterior probability is known, the preference for the
confidence are given in Table 4 for the three models using the different models is quantified directly. However, what is
nondesigned one-variable-at-a-time data given in Table 1. These L(D|Mk)? It is simply E(L(D|{θk,φ}), i.e. the expected value of
results are quite different from the nonlinear regression analysis the likelihood function for model k, where the dependence
shown in Table 2. The HPD confidence region is quite large upon the (θk,φ) model parameters have been integrated out so
for a number of parameters, and the θk,j,max and MPPDE of the that the relationships are valid over all parameter space.
posterior probability distribution are different. As before, the Specifically, the expected value of the likelihood function from
poorly designed data set accounts for much of the uncertainty the posterior probability distribution of the parameters is given
in the activation energies. However, we shall not agonize over by
any interpretation of the model parameters until discrimination
between the various candidate models has been completed and L(D|Mk) ) E(L(D|{θk, φ}) )
an adequate model selected.
Since the posterior distribution, which is calculated in the ∫ ∫ L(D|{θ , φ})p({θ , φ}|D) dφ dθ
θk φ k k k (16)
foregoing by Bayes’ theorem, is biased or influenced by the
prior distribution, it is natural to question the impact of the form This expected value of L(D|Mk) is computed via sampling from
and quality of this distribution. In Appendix C, we address this the posterior probability distribution for the parameters deter-
question for the simple problem by quantifying the effects of mined earlier using the MCMC process discussed in Appendix
different prior distribution on the posterior distributions for the B. Thus, only minimal additional computational effort is
same data set, D. required.
We emphasize that the Bayesian approach to model discrimi-
nation is fundamentally different from the more traditional single
5. Model Discrimination and Validation
point regression based approach. In the regression-based ap-
Now that Monte Carlo based Bayesian methods have been proach, the likelihood value for each model is determined at a
defined to generate confidence regions where we believe the single point, the maximum likelihood estimate for the model,
4776 Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009

and error model parameters. The models are compared in a


pairwise fashion using the ratio of the likelihood of each model
versus the model with the highest likelihood. This likelihood
ratio method20 may be wildly misleading as seen earlier because
it suggests preference for one model over the other but only at
a single point for each model. Intuitively the model that is
preferred over the others is selected as the best candidate. It is
then compared with the experimental error using a conventional
lack of fit F-test to see if it is adequate to describe the data.
This entire procedure is mentioned in passing only. It is flawed
for the same reason alluded to earlier: it compares the models
at one point only, and in the case of the data set from Table 1,
multiple parameter sets can generate the same values of the
likelihood ratios (see Table 2) leading to the question of which
point estimate set should be utilized. Alternatively, the Bayesian
approach compares models by sampling from the entire prior Figure 5. Schematic illustration of how to compute the confidence that at
distribution hyperspace in the determination of the likelihood a given set of conditions the experimentally observed rate r′ could be
generated by the model. The confidence is quantified as the probability
function and the associated posterior model probability Pr(Mk|D).
whose numeric value is equal to the area of the unshaded region.
We are now ready to revisit the A + B f C + D example
to perform model discrimination; but still using the nondesigned given in Figure 6, where low values mean a low probability
one-variable-at-a-time data set of Table 1. Assuming a uniform that the model was able to generate that specific experimental
prior distribution for the three models Pr(M1) ) Pr(M2) ) Pr(M3) point. For all the experimental measurements with 1 - R >
) 1/3, Pr(Mk|D) was determined via eq 15, using the procedure 0.05, both models 1 and 2 adequately describe the data in Table
described above, and the results are shown in Table 4. Note 1. Note that R is a positive number less than 1 and that 100(1
that for model 3, Pr(M3|D) ) 0.018, which suggests there is a - R) is the percent confidence limit.
very low probability of this being the model that generated the By using the probabilities of generating the individual data
data of Table 1. Although model 2 is strongly preferred to model points, an overall probability of the ability of the model to generate
1, it is not possible to reject the later with the existing data. It all the data can be used as a quantitative index of model adequacy.
is interesting that even with this nondesigned data set the Since the individual data points are assumed to be statistically
analysis shows a strong preference for model 2 over model 1. independent, the overall index of model adequacy for the mth model
This is definitely not the case for the traditional approach where pm(D) is calculated by multiplying together the probability of
all three models fit the data equally well; see SSE in Table 2. generating each data point. For example, pm(D) for models 1 and
The discrete posterior probabilities for the model candidates 2, according to Figure 6, are p1(D) )1.59 × 10-11 and p2(D) )1.85
Pr(Mk|D) only show the preference among the candidate models. × 10-11, respectively. If sufficient replicate data is available, the
However, they do not indicate if any of the models, even the probability of a specific data point r can be determined using the
one with the highest probability, adequately describes the data. same methodology as described in the previous paragraph, and
If the entire experiment has been replicated, the conventional the overall probability pd(D) for a specific data set D is just the
approach to this question is the lack-of-fit test, comparing the product of the probability of the individual p{r(uj)}. Theoretically,
experimental error and the prediction error based on the “best” pm(D) must be less than pd(D), but if pm(D) is approximately pd(D),
point estimate of the parameters. If replicates are not available, we can conclude that there is no significant lack of fit for the model
the Bayesian methods still can measure the adequacy of a model at that point. We will not implement this procedure in this paper,
since an error model has been assumed and estimates of the since there are insufficient replicate points in the one-at-a-time data
error model parameters are available. Since the model param- set given in Table 1. However, if significant replicated data are
eters are represented as the joint posterior probability distribution available via high throughput experimentation, this approach has
p({θk,φ}|D), the model prediction at any point uj will be a value. With the data set D, neither models 1 nor 2 has a lack of fit
probability distribution as well. By sampling from this distribu- at the 95% confidence level, since all points can be generated with
tion with the MCMC process, the distribution for the predicted 5% or higher probability.
rate of reaction r̂k,j for model k for the jth experiment including If the preceding analysis does not identify an adequate
experimental error is determined using eq 6. Recall that ε(φ) is model, a conventional residual analysis may be used to
the error model and is assumed to follow a normal distribution identify specific model limitations and suggest possible
with zero mean and variance σ2(φ). A schematic illustrating modifications. Using the probability distribution of the model
how to compute the confidence that a model could generate a prediction, r̂i as determined via the procedure described
specific observed rate is shown in Figure 5. The probability above, calculate (i) the expected value of the predicted rate
density for the predicted rate r̂ at a particular set of conditions of reaction E{r̂i} ) jri and (ii) the residual ei ) ri - jri for
is represented by the curve. We quantify our confidence in terms each of the data points. Then two types of residual plots
of the probability that the model could generate the observed should be prepared: (i) ei versus jri to see if there is a pattern
rate r′ by taking one minus the area under the density that is with the magnitude of the measured value, e.g, the deviation
marked off from the observed rate to the rate with equal density of the model from the data is larger for larger/smaller values
on the other side of the mode. If one imagines an experimental of the rate, and (ii) ei versus the various experimental
rate occurring that is equal to the mode, the confidence that the conditions, e.g. temperature, feed composition, etc., to
model could generate it would be equal to one. Whereas, if determine if a pattern emerges which will suggest modifica-
one imagines a point far into a tail of the density, the confidence tions to the model.
would approach zero. A bar chart of the confidence values for When new models are postulated, they should then be fit
all of the experiments given in Table 1 for models 1 and 2 are to the data set D. This process of successively adding new
Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009 4777

Figure 6. Probability for generating the experimental data D by models 1 and 2.

improved models and deleting inadequate ones should example problem consider the range of operating conditions u
continue until one or more candidate models emerge that have as
a high probability of generating the data. If there are several
630 e T(K) e 670
candidate models which cannot be distinguished using the
posterior probabilities Pr(Mk) for the given data set, it is then 0.5 e PA0(atm) e 1.5
necessary to design additional experiments to discriminate 0.5 e PB0(atm) e 1.5
the rival models. It is important to point out before an 0.0 e PC0(atm) e 0.2
adequate model is found no effort should be placed on in which the upper and lower bounds correspond to the extremes
interpretation of the physical nature or magnitude of the of the one-at-a-time experiment of Table 1. A series of 34 ) 81
individual parameter estimates themselves. The parameter experiments was run over the region selecting all combinations
estimates are biased when the models are inadequate and/or of 3 values of each operating condition: the lowest point, the
the data is inadequate to discriminate between different
center point, and the highest point. Individuals familiar with
candidate models.
statistical design of experiments will recognize this is a full
factorial experiment for four factors at three levels. Predicted
6. Design of Experiments for Model Discrimination
values for these experimental runs were calculated using the 4
If two or more models emerge as candidates for describing × 4 ) 16 sets of “optimal” parameter point estimates from Table
the data in D, it is necessary to conduct additional experi- 2. The interested user can readily produce this grid of experi-
ments to discriminate between them. The basic concept of ments with an associated set of predicted values with a simple
design of experiments for model discrimination is to locate spreadsheet, and it will not be reproduced here. The results are
new experiments in a region which has not already been quite instructive for demonstrating the pitfalls both of using one-
investigated and which predict the greatest expected differ- at-a-time experimentation and using point estimates. First,
ence between the candidate models based on D. In the because of the uncertainty in the parameter estimates, the
traditional point estimate approach, the criterion that has been differences between the model predictions are small regardless
used14,18,20 is to locate experimental conditions, u*, at a point of the location of u over the above region. In fact, the maximum
that maximizes the absolute differences in the predicted absolute difference between the predicted values given by eq
values for each pair of two or more models. The traditional 18 for models 1 and 2 for all 16 combinations of local point
criterion is formally stated as: estimates is only 6%. Second, the location of the maximum
difference is highly dependent on the actual set of point
max |f̂k(θ*k , u) - f̂m(θ*m, u)| k * m for all k, m estimates used. They include the upper and lower bound on T
u*
over the region umin e u e umax (17) and PC0, the upper bound on PB0, and any value for the PA0
operating conditions. The most frequent value of u* is (T )
where f̂k(θ*,
k u) is the predicted rate of reaction r at u using 670 K, PA0 ) 1.5 atm, PB0 ) 1.5 atm, PC0 ) 0 atm), at which
model k at some point estimate of the model parameters θ*k the average difference between the models is predicted to be
such as the maximum likelihood estimators.48-51 For the about 3%.
4778 Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009

Table 5. Optimal Parameters for Model Candidates Determined by This experimental design criterion for model discrimination
Nonlinear Optimization, Including the Additional Experiment using the minimum overlap criterion as described above is
Designed by Traditional Techniques
intuitively appealing but is computationally challenging. The
parameter model 1 model 2 steps in the calculation are as follows: (1) samples of new
ln(k40) 70.27 46.61 experimental conditions ui that span the allowable experimental
Ea/R (103 K) 11.70 11.08 space are taken; (2) pk{r(ui; θk, φ)},the probability density
ln(K10) -23.76 -46.20 function for the reaction rate at ui for all of the k candidate
∆H1/R (103 K) -0.60 -0.63 models, must be computed; (3) the joint probability distribution
ln(K20) -46.14
∆H2/R (103 K) -0.79 for each pair of models must be determined; and, finally, (4)
ln(K30) 1.34 2.30 the joint probability distributions with the minimal area must
∆H3/R (103 K) 0.00 0.00 be determined to finally arrive at u*sthe
i optimal experiment
SSE 0.0849 0.0861 for model discrimination. Rather than resorting to an optimiza-
tion procedure for locating u*,i it should be possible to simply
What is the “true” value at this point? The true value from identify regions in experimental space which are the most
the simulator used to generate the data in Table 1 with model attractive. This follows in the spirit of the Bayesian approach
2, and ignoring experimental error, is 2.470 gmol/(min kg of replacing points with regions. One approach to finding these
catalyst), while the average predicted rates at the new u* by regions is to use a factorial or fractional factorial experiment
the two model candidates are 4.194 and 4.053 gmol/(min kg analogous to the approach in the point estimate case described
catalyst), for models 1 and 2, respectively. Since neither of the earlier. Such an approach is attractive when the dimension of u
models describe the rates at the new data point (2.470, T ) is small, which is typically the case. In our sample problem,
670, PA0 ) 1.5, PB0 ) 1.5, PC0 ) 0), this one data point and there are four experimental variables and, if we select three
the 33 other one-at-a-time points in Table 1 were reanalyzed values of each of these, for example,
using traditional nonlinear regression to generate new maximum
likelihood estimates for models 1 and 2 which are reported in T ) {630, 650, 670}(K)
Table 5. Note that these new maximum likelihood estimates PA0 ) {0.5, 1.0, 1.5}(atm)
are “very different” from the original ones in Table 2. What is PB0 ) {0.5, 1.0, 1.5}(atm)
even more surprising is that it is not only impossible to PC0 ) {0, 0.1, 0.2}(atm)
discriminate the rival models with the additional point, but the
fit of the model to the data is better for model 1 than for model there are 34 ) 81 possible new experiments in a full factorial,
2 which is simply wrong! Further, different optima can be which is computationally feasible. For u with larger dimensions,
obtained depending on the starting point selected from Table 2 there may be too many experimental variables to examine with
factorial methods and it may be necessary to resort to optimiza-
to start the regression algorithm, which suggests that the entire
tion procedures. Rather than searching for better regions along
approach is futile if nondesigned data and the associated point
a promising direction from a starting point, which is the
estimates for the parameters are used.
conventional approach to optimization, a simpler Monte Carlo
The Bayesian approach provides a more statistically compel- or Latin hypercube sampling procedure could be used to find
ling criterion to discriminate between any two models by attractive regions. However, unless a complete grid search is
locating the point u* in experimental space that minimizes the done over the entire operating region, the user is not guaranteed
probability that the model candidates will predict the same to find the most promising operating region. Before resorting
reaction rates. This idea is schematically illustrated in Figure to such an approach, the experimentalist/modeler should use
7. In Figure 7a, the probability density pk{r(u*1 ; θk, φ)} for his or her “prior” knowledge of how he or she would anticipate
rate predictions of model k with additional experiment u*1 the model behaving in these as yet unexplored regions and use
overlaps considerably with pm{r(u*1 ; θm, φ)} for model m. If a this information to guide the search procedure. We acknowledge,
new experiment were performed at these conditions and the however, that beyond knowledge of physical limitations on
observed rate was at A, it would indicate that model k would temperature or reactant concentration ratios or cases where
be much more likely than model m; and conversely, if the residuals point to model inadequacies, “anticipation” of model
experimental rate with error occurred at C, model m would be response is often virtually impossible if the system complexity
more likely. However, there is a considerable region in the is high. That makes efficient, guided searches of these spaces
neighborhood of B, where both models predict the same values. an interesting area for further development.
If the experimental rate occurred close to B, model discrimina- For the example problem with data given in Table 1, models
tion would not be possible. In contrast, in Figure 7b there is 1 and 2 are still both viable following both conventional and
very little region of overlap between pk{r(u*2 ; θk, φ)} and Bayesian model discrimination procedures. Using the Bayesian
pm{r(u*2 ; θm, φ)} for the candidate experimental condition u*2 , methodology outlined above, 81 candidate ui values were chosen
providing clear discrimination between models k and m. by factorial design at three levels for four different factors (T,
Moreover, if the experimentally observed rate were in the region PA0, PB0, PC0); the p{r(ui; θk, φ)} were computed for both
of B, it would indicate that neither of the models is probable models for all candidate ui values; the overlap area of p(r(ui;
and a new candidate model would need to be generated. Thus, θ1, φ)) and p(r(ui; θ2, φ)) was determined for all ui values; and
experiment u*2 is a better choice for model discrimination. The u* was determined. Table 7 lists the overlap area for each
formal statement of the design of experiment for the model candidate experiment and identifies the best data point as u* )
discrimination illustrated in Figure 7 is the following: choose (T ) 630 K, PA0 ) 1.5 atm, PB0 ) 1.5 atm, PC0 ) 0 atm).
u* to minimize the overlap of pk{r(u*; θk, φ)} and pm{r(u*; Figure 8 presents the probability distributions of the expected
θm, φ)}. The same concept can be applied for discriminating rates at u* and shows that there is still considerable overlap
more than two model candidates by calculating some weighted between the two distributions.
pairwise overlap of the probability density functions for all the Using this augmented experimental data set DAUG ) D +
adequate models.41 {u*, r*} (r* is the experimental rate under operating
Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009 4779

Figure 7. Experimental design for model discrimination: b is more preferable than a.

Table 6. Estimated Parameters and Corresponding 95% Confidence Intervals of Model Candidates (Including the Designed Experiment for
Model Discrimination)
Data Set DAUG and Uniform Prior Probability Distribution
model 1 model 2
parameter MPPDE θj,max 95% LB 95% UB MPPDE θj,max 95% LB 95% UB
ln(k40) 8.41 9.12 6.05 9.94 1.58 1.90 0.99 2.72
Ea/R (103 K) 39.7 34.9 22.6 39.8 14.6 13.1 10.2 30.0
ln(K10) -2.14 -2.04 -3.64 -1.37 -0.83 -1.17 -2.29 0.09
∆H1/R (103 K) -16.1 -13.4 -23.1 -10.2 -13.3 -20.9 -28.1 -10.0
ln(K20) -5.70 -5.35 -7.28 -2.53
∆H2/R (103 K) -23.2 -12.6 -26.6 -10.2
ln(K30) 1.40 1.38 1.18 1.58 2.64 2.50 2.20 3.02
∆H3/R (103 K) -10.1 -10.5 -16.2 -10.1 -10.2 -10.7 -19.1 -10.0
σ2 × 103 1.86 2.26 1.48 4.27 1.70 1.89 1.17 3.64
ln[E{L(Mk|D)}] 54.03 58.10
Pr(Mk|D) 0.0027 0.9973

condition u*), a new posterior probability distribution p((θk, model discrimination will be discussed before moving on to
φ)|DAUG) was computed using p((θk, φ)|D) as the prior the important topic of improving the parameter estimates after
distribution. The whole procedure of Bayesian parameter an adequate model has been selected.
estimation described earlier was repeated, although an 6.1. Impact of Design of Initial Data set D. The one-at-a-
efficient algorithm has been developed that makes use of the time approach to experimentation is practiced widely. It has
previous MCMC calculations of p((θk, φ)|D) for the calcula- the advantage of allowing the researchers to plot the results of
tion of the new posterior distribution p((θk, φ)|DAUG) (see changing one operating condition without confusion by the other
Appendix B). The results of the parameter estimates as well operating conditions. This is precisely its limitation because it
as the confidence regions determined from the marginal fails to accommodate interactions among the operating condi-
probability distribution p(θj) or p(φj)are shown in Table 6. tions and generates large unexplored regions of the operating
Finally, the Pr(Mk|DAUG) was determined using the procedures space. It is interesting to compare the single point regression
described in section 5, where again the prior of the two methods with the Bayesian approach to decide if the latter
models was assumed to be Pr(M1) ) 0.137 and Pr(M2) ) approach is still superior to the former despite the use of a well
0.863, which were obtained from the analysis in section 5. designed initial data set.
Note that we normalized the two probabilities to keep Pr(M1)
A new set of designed experimental data, Dc, is presented in
+ Pr(M2) ) 1 since model 3 is no longer considered. The
relative probabilities Pr(Mk|DAUG) for the two models are Table 8. It was generated over the same operating region as
reported in Table 6 and show clearly that model 1 can Table 1 using a slightly modified central composite design,
be eliminated. This is a remarkable result. With only one which includes both the 2-level full factorial design (24 ) 16
additional point and using the posterior probabilities from experiments) and 2-level one-variable-at-a-time design (2 × 4
the data in Table 1, model 2 is identified as the preferred ) 8 experiments). The one-at a-time points are taken at the
model even though the difference between the marginal extremes of the data in Table 1. Nine replicates of the center
probability distributions shown in Figure 8 is relatively small. point are included not only to test reproducibility but also to
Just to be sure, the probability of the various experimental keep the number of data points the same at N ) 33. Basically,
data points as shown in Figure 4 for D was recomputed for the only difference between the two sets is the location of the
DAUG, and it was found that all experimental points were experiments. Nonlinear parameter estimation starting with the
within the 95% confidence region. same randomly generated starting point as used for the nonde-
In summary, the addition of just one well-designed experiment signed data set was used to fit the data of Table 8 and the results
was able to unambiguously discriminate the models using are shown in Table 9 for the three model candidates. The first
Bayesian methods, while conventional regression methods failed observation is that once again models 1 and 3 have multiple
to do so. Of course, this investigation has been illustrated using optimal parameter sets, but model 2 has only one. Some of the
the one-variable-at-a-time data set, which is not properly parameters in models 1 and 3 are at their upper bounds. Once
designed. The impact of a properly designed initial data set on again by simply examining the sum of squared error, it is not
4780 Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009

Table 7. Design of Experiments for Model Discrimination: Overlapped Area for Each Candidate Experimenta
expr no. T (K) PA0 (atm) PB0 (atm) PC0 (atm) area expr no. T (K) PA0 (atm) PB0 (atm) PC0 (atm) area
1 630 1.5 1.5 0 0.602 42 670 1 1.5 0.1 0.813
2 670 1 1.5 0 0.603 43 630 0.5 1 0.2 0.814
3 650 1.5 1.5 0.2 0.610 44 670 1 1 0.2 0.818
4 670 0.5 1.5 0 0.617 45 670 0.5 1.5 0.1 0.832
5 650 1.5 1.5 0 0.618 46 650 1.5 0.5 0.2 0.846
6 630 1 1 0.2 0.628 47 670 1.5 1.5 0.1 0.852
7 650 1 1.5 0.2 0.631 48 630 1 0.5 0 0.852
8 650 1 1.5 0 0.651 49 650 1.5 0.5 0 0.853
9 630 1 1.5 0 0.657 50 630 1 1 0.1 0.855
10 630 1.5 1.5 0.1 0.657 51 650 1 0.5 0 0.857
11 650 0.5 1.5 0 0.664 52 630 1.5 0.5 0.1 0.861
12 670 1 1 0 0.664 53 650 0.5 0.5 0 0.865
13 670 1.5 1.5 0 0.670 54 670 0.5 1 0.2 0.877
14 630 1.5 1 0.2 0.671 55 650 0.5 1 0.2 0.882
15 650 1.5 1 0.2 0.672 56 630 0.5 0.5 0 0.893
16 630 0.5 1.5 0 0.678 57 650 1 0.5 0.2 0.897
17 630 1 1.5 0.2 0.680 58 670 1 1 0.1 0.897
18 650 1 1 0 0.682 59 650 1.5 1.5 0.1 0.899
19 670 0.5 1 0 0.690 60 670 0.5 1 0.1 0.900
20 630 1.5 0.5 0.2 0.698 61 670 1.5 0.5 0.2 0.902
21 630 1.5 1 0 0.701 62 650 1.5 0.5 0.1 0.906
22 650 1.5 1 0 0.703 63 630 0.5 0.5 0.2 0.908
23 670 1.5 1.5 0.2 0.716 64 670 1 0.5 0.2 0.912
24 630 0.5 1.5 0.2 0.722 65 630 0.5 1.5 0.1 0.925
25 670 1 1.5 0.2 0.729 66 650 1 0.5 0.1 0.925
26 670 1.5 1 0 0.730 67 670 1.5 1 0.1 0.926
27 630 1.5 1.5 0.2 0.732 68 650 1 1 0.1 0.927
28 630 1 1 0 0.734 69 650 1.5 1 0.1 0.927
29 630 1 1.5 0.1 0.746 70 630 1 0.5 0.1 0.941
30 650 0.5 1 0 0.747 71 670 0.5 0.5 0.2 0.942
31 630 1.5 1 0.1 0.752 72 650 0.5 0.5 0.2 0.946
32 630 0.5 1 0 0.763 73 650 1 1.5 0.1 0.946
33 670 1 0.5 0 0.763 74 670 0.5 0.5 0.1 0.958
34 650 1 1 0.2 0.770 75 670 1 0.5 0.1 0.960
35 630 1.5 0.5 0 0.790 76 650 0.5 0.5 0.1 0.962
36 670 1.5 1 0.2 0.799 77 650 0.5 1 0.1 0.968
37 670 0.5 0.5 0 0.800 78 630 0.5 1 0.1 0.971
38 670 0.5 1.5 0.2 0.801 79 630 0.5 0.5 0.1 0.976
39 630 1 0.5 0.2 0.804 80 650 0.5 1.5 0.1 0.979
40 650 0.5 1.5 0.2 0.807 81 670 1.5 0.5 0.1 0.989
41 670 1.5 0.5 0 0.809
a
Experiment no. 1 is the suggested experiment for model discrimination.

models 1 and 3. It was observed that for both models 1 and 3,


the parameter distributions for E4/R and ∆H2/R are both wide
and shallow. In the case of ∆H2/R, the entire prior distribution
bounds are enclosed in the 95% confidence region. However,
as seen in Figure 9, model 2 does not have any parameter
distributions that are both wide and shallow. One could consider
these shallow distributions as uniform and attribute the slight
variation to incomplete sampling. To understand this observa-
tion, remember that models 1 and 3 assumed that species B
was adsorbing while model 2 did not. Since Dc was generated
Figure 8. Posterior probability distribution density of the predicted rate r with model 2, it does not contain information about B adsorbing.
from models 1 (dashed line) and 2 (solid line) at the suggested experimental
condition u* ) (T ) 630 K, PA0 ) 1.5 atm, PB0 ) 1.5 atm, PC0 ) 0 atm). By comparing Table 10 with the analogous Table 6, the 95%
confidence regions for the correct model 2 are significantly
smaller with the designed data set than with the undesigned
clear whether models 1 and 3 can be rejected, although model data whereas the uncertainty in the parameter estimates for the
2 has a lower value. What is clear, however, is the presence of incorrect models 1 and 3 emphasizes the futility of attempting
the local optima for the incorrect models despite the use of a to interpret parameter estimates for an inadequate or false model.
designed data set.
Comparing the posterior probabilities of the model candidates,
The Bayesian approach was then used to analyze the data
set Dc using the same prior probability distributions as with the one sees that model 2 is clearly superior to the others. There is
nondesigned data set. The marginal posterior probability density less than a 1% chance of generating Dc with model 1 and
functions of the parameters for different model candidates are essentially no probability with model 3. Hence, we are able to
shown in Figure 9, and the 95% confidence intervals are reported conclude that model 2 is superior to the others and are ready to
in Table 10. Figure 9 shows that the parameters in model 2, test for model adequacy without additional discrimination
which is the correct model, are better defined than those of experiments.
Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009 4781
7. Quality of the Parameter Estimates The second moment or variance Var(θi) of the individual
parameters may also be determined from the marginal distribu-
Now that a single, acceptable model has been identified using
tion:
the DAUG experimental data set, we are in a position to examine
the quality of the parameter estimates. As discussed earlier in
section 4, the marginal distribution of each individual parameter
Var(θi) ) ∫ (θ - E(θ )) p(θ ) dθ
θi i i
2
i i

p(θj) and p(φj) defined in eqs 9 and 10 is a good way to visualize Assuming that the p(θj) and p(φj) distributions are normal,
the probability distribution for the individual kinetic and error model Var(θi) or Var(φi) may be used to produce another set of
parameters. The (θk, φ) parameter estimates are shown in Table 6 confidence limits. However, the normality assumption is
for the Bayesian analysis of model 2 with the DAUG data set. The highly suspect for nonlinear models that usually result in
associated probability distributions with DAUG are slightly narrower nonsymmetric marginal probability distributions like those
than the distributions with D reported in Table 4. shown in Figure 3.
Table 8. Designed Data Set Dc Generated by a Modified Central Composite Design
T (K) PA0 (atm) PB0 (atm) PC0 (atm) PA (atm) PB (atm) PC (atm) rate (gmol/(min kg catalyst))
650 1 1 0.1 0.989 0.989 0.111 0.692
670 1.5 1.5 0.2 1.472 1.472 0.228 1.321
670 1.5 1.5 0 1.467 1.467 0.033 2.515
670 1.5 0.5 0.2 1.491 0.491 0.209 0.492
670 1.5 0.5 0 1.489 0.489 0.011 0.869
650 1 1 0.1 0.989 0.989 0.111 0.711
670 0.5 1.5 0.2 0.482 1.482 0.218 0.556
670 0.5 1.5 0 0.474 1.474 0.026 1.218
670 0.5 0.5 0.2 0.494 0.494 0.206 0.172
670 0.5 0.5 0 0.491 0.491 0.009 0.373
650 1 1 0.1 0.989 0.989 0.111 0.687
630 1.5 1.5 0.2 1.492 1.492 0.208 0.568
630 1.5 1.5 0 1.490 1.490 0.010 1.853
630 1.5 0.5 0.2 1.497 0.497 0.203 0.213
630 1.5 0.5 0 1.497 0.497 0.003 0.588
650 1 1 0.1 0.989 0.989 0.111 0.747
630 0.5 1.5 0.2 0.494 1.494 0.206 0.224
630 0.5 1.5 0 0.491 1.491 0.009 1.123
630 0.5 0.5 0.2 0.498 0.498 0.202 0.066
630 0.5 0.5 0 0.497 0.497 0.003 0.426
650 1 1 0.1 0.989 0.989 0.111 0.689
670 1 1 0.1 0.982 0.982 0.118 0.864
630 1 1 0.1 0.994 0.994 0.106 0.396
650 1 1 0.1 0.989 0.989 0.111 0.733
650 1.5 1 0.1 1.488 0.988 0.112 0.980
650 0.5 1 0.1 0.491 0.991 0.109 0.357
650 1 1 0.1 0.989 0.989 0.111 0.706
650 1 1.5 0.1 0.984 1.484 0.116 1.036
650 1 0.5 0.1 0.995 0.495 0.105 0.347
650 1 1 0.1 0.989 0.989 0.111 0.727
650 1 1 0.2 0.991 0.991 0.209 0.511
650 1 1 0 0.988 0.988 0.012 1.334
650 1 1 0.1 0.989 0.989 0.111 0.661

Table 9. Local Optimal Parameter Estimates for Model Candidates Using Nonlinear Optimization, Fitted against Data Set Dc
Local Optimal Parameter Sets for Model 1
ln(k40) E4/R (103 K) ln(K10) ∆H1/R (103 K) ln(K20) ∆H2/R (103 K) ln(K30) ∆H3/R (103 K) SSE
15.7 7.16 -1.19 -8.57 -13.7 0.00 1.54 -11.0 0.0556
16.1 15.3 -1.19 -8.57 -14.1 -8.11 1.54 -11.0 0.0556
17.2 7.16 -1.19 -8.57 -15.2 0.00 1.54 -11.0 0.0556
18.5 7.16 -1.19 -8.57 -16.4 0.00 1.54 -11.0 0.0556
19.0 14.4 -1.19 -8.57 -16.9 -7.21 1.54 -11.0 0.0556
19.1 7.16 -1.19 -8.57 -17.0 0.00 1.54 -11.0 0.0556
19.6 17.9 -1.19 -8.57 -17.5 -10.8 1.54 -11.0 0.0556
20.0 37.1 -1.19 -8.57 -17.9 -30.0 1.54 -11.0 0.0556
24.1 7.2 -1.19 -8.57 -22.1 0.00 1.54 -11.0 0.0556
28.4 26.1 -1.19 -8.57 -26.4 -18.9 1.54 -11.0 0.0556
29.0 34.9 -1.19 -8.57 -26.9 -27.7 1.54 -11.0 0.0556
39.6 37.0 -1.19 -8.57 -37.6 -29.9 1.54 -11.0 0.0556
47.3 11.4 -1.19 -8.57 -45.3 -4.27 1.54 -11.0 0.0556
92.7 16.3 -1.19 -8.57 -90.6 -9.09 1.54 -11.0 0.0556

Local Optimal Parameter Sets for Model 2


ln(k40) E4/R (103 K) ln(K10) ∆H1/R (103 K) ln(K30) ∆H3/R (103 K) SSE
1.02 9.54 0.145 -14.9 3.06 -18.4 0.0446

Local Optimal Parameter Sets for Model 3


ln(k40) E4/R (103 K) ln(K20) ∆H2/R (103 K) ln(K30) ∆H3/R (103 K) SSE
3.40 30.0 -3.08 -28.5 2.14 -10.5 0.545
3.40 30.0 -3.09 -28.6 2.15 -11.0 0.545
4782 Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009

The marginal probability density integrates out any cor- The correlation coefficient has the conventional interpreta-
relation between the various model parameters; however, the tionsvalues close to +1 or -1 imply that the parameters are
model parameter estimates are rarely independent of one highly positively or negatively correlated, respectively. However,
another. For any pair of parameters, θi and θj, the covariance Fij is a linearized point estimate and may indicate spurious
among these parameters is given by results.52
A more useful way to evaluate these pairwise relationships
Cov(θi, θj) ) ∫ ∫ (θ - E(θ ))(θ - E(θ ))p(θ , θ ) dθ dθ
θi θj i i j j i j j i
among the parameters is to plot confidence region contours for
the marginal joint probability density function p(θi, θj). As was
(18) the case for a single parameter, there is not a unique way to
where the joint probability density function between a pair of specify a 100(1 - R)% confidence region for a parameter pair;
however, the HPD confidence region defined in section 4 can
parameters θi and θj is obtained by integrating out all the other again be used since it yields the smallest area in hyperspace.
parameters in the joint posterior probability distribution
Let Ω1-R ) {θi, θj} be the points in the HPD region and ΩR be
p(θi,θj):
the complement or points not in the HPD region. Then in an
analogous fashion to the approach described for determining
p(θi, θj) ) ∫ ′ ∫ p({θ, φ}|D
θi,j φ AUG)dφ ′,
dθi,j ′ )
θi,j the confidence limits of the one-dimensional marginal prob-
ability distribution, the two-dimensional 100(1 - R)% confi-
{θl|l * i, l * j} (19)
dence contours can be generated. For a linear model, the
p(θi, θj) is the two-dimensional analog to a marginal distribution. confidence regions will be ellipses, and if there is no correlation
The correlation coefficient Fij for these two parameters is simply between a particular pair of parameters, the major/minor axes
Cov(θi, θj) of the ellipse will be parallel to the x- and y-axes of the contour
Fij ) (20) plot. The confidence region contours are shown in Figure 10
√Var(θi)√Var(θj) for all pairs of the parameters in model 2 as determined from

Figure 9. Marginal probability density functions of the parameters in different model candidates, fitted against data set Dc.

Table 10. 95% Confidence Intervals of the Parameters in Three Model Candidates
Dc and Uniform Prior Probability Distribution
model 1 model 2 model 3
parameter MPPDE θj,max 95% LB 95% UB MPPDE θj,max 95% LB 95% UB MPPDE θj,max 95% LB 95% UB
ln(k40) 9.83 7.31 6.36 9.95 1.03 1.05 0.92 1.17 3.32 3.07 1.98 9.91
E4/R (103 K) 29.8 28.5 19.8 36.3 9.54 10.1 8.24 12.9 29.7 16.0 12.1 31.2
ln(K10) -1.20 -1.26 -1.46 -1.06 0.13 0.13 -0.11 0.41
∆H1/R (103 K) -10.1 -10.1 -14.0 -10.1 -14.7 -16.8 -19.9 -14.2
ln(K20) -7.74 -5.15 -7.77 -4.18 -3.00 -2.61 -9.63 -1.64
∆H2/R (103 K) -21.9 -19.8 -29.8 -11.4 -28.6 -11.0 -29.8 -10.2
ln(K30) 1.54 1.55 1.45 1.64 3.06 3.08 2.93 3.26 2.13 2.08 1.70 2.42
∆H3/R (103 K) -11.5 -12.0 -14.2 -10.1 -18.1 -19.7 -22.1 -18.0 -11.0 -10.2 -19.8 -10.2
σ2 × 103 1.80 2.14 1.29 4.01 1.38 1.66 1.07 3.08 16.3 18.8 13.3 33.9
ln[E{L(Mk|D)}] 54.7419 59.4249 18.0566
Pr(Mk|D) 0.009 0.991 0.000
Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009 4783

Figure 10. Pairwise of the marginal confidence regions of the parameters in model 2 with data set DAUG. The contours are 50%, 70%, 90%, and 95%
confidence regions from inside to outside, respectively. The units of Ea/R, ∆H1/R, and ∆H3/R are 103 K.

model 2 the correlations are relatively weak. This weak


correlation is observed because we have already compensated
for this correlation in eq 5 by using a reference temperature T0
in the data range rather than the traditional pre-exponential factor
defined at T ) ∞. If there is a strong correlation between two
parameters, often there is a way to transform the relationship
so that the correlation can be reduced significantly.

8. Design of Experiments for Improving Parameter


Estimates
Even after the parameters have been transformed, there is
still the very real possibility that pairwise correlation will exist
and the HPD marginal confidence limits will be too large. In
this situation, it is necessary to set up another sequential
experimental program designed to improve the quality of the
parameter estimates by minimizing the volume of their region
Figure 11. Pairwise confidence regions of the model parameters and error of uncertainty in hyperspace or to design groups of experiments
parameters in model 2. The units of Ea/R, ∆H1/R, and ∆H3/R are 103 K. with a similar aim and iterate until the quality of the parameter
estimates is acceptable. Prasad and Vlachos have recently
the DAUG data set. Correlations are observed between (i) k40 described an effective information-based optimal design of
and K30, (ii) K10 and K30, (iii) Ea/R and ∆H1/R, and (iv) especially experiments applied to a kinetic model describing the catalytic
between k40 and K10. For example, the error of estimating K10 decomposition of ruthenium.53 An extension of these methods
can be compensated by the estimation error of k40. Figure 11 to the Bayesian point of view is presented in Appendix D. The
shows that the experimental error σ2 is independent of the other procedure has been applied to design new experiments for the
parameters. Notice that the contours are not always ellipses as example problem with particular focus on the energy parameters
would be expected from linear analysis, e.g. the contour surface which are poorly defined. Another three level factorial design
for K30 vs k40. In many applications the authors have observed for the four experimental variables, and thus, 81 possible
significant departure from elliptical behavior. experimental candidates were investigated. Table 11 shows the
Correlation between pre-exponential factors and activation six experiments with the smallest det(Ψ), the quantitative
energies is known to be a problem in parameter estimation; measure of the region of uncertainty described in Appendix D.
however, for the K10 vs ∆H1/R, K30 vs ∆H3/R, and k40 vs Ea for The corresponding rates simulated for model 2 including error
4784 Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009

Figure 12. Marginal posterior probability distribution of model 2 for different data sets: (bold solid line) DAUG+6; (dotted line) DAUG; (thin solid line) D. The
units of Ea/R, ∆H1/R, and ∆H3/R are 103 K.

for the six new experiments are also given in the same table. the model may need to be improved, which is the next step
The new data set DAUG+6 includes the original data set D and of the model building process.
the experiment for model discrimination (one experiment) as
well as those for parameter estimation (six experiments), and 9. Discussion
thus DAUG+6 has 40 points. The complete Bayesian analysis of
this augmented data set was performed for model 2. A Bayesian framework for building kinetic models by means
The probability distributions for the model parameters of a sequential experimentation-analysis program has been
using the augmented data set DAUG+6 are shown in Figure 12, described. It allows the knowledge of the catalyst researcher to
where the distributions are significantly tighter as compared to drive the model building both by postulating viable models and
the DAUG data set employed for model discrimination. The assessing the quality of model parameter estimates. Procedures
pairwise marginal confidence regions are shown in Figure 13, are available for designing experiments to discriminate model
where the confidence regions are considerably smaller than the candidates, assesses their suitability against experimental error,
ones in Figure 10 determined by the data set DAUG. The and design experiments for the best model selected. The key
correlations were not completely removed by the additional features of this modeling approach are (i) the use of distributions
experiments, but they are less important. The confidence limits for the parameters rather than point estimates so that regression
for the parameters using DAUG+6 are given in Table 12 where procedures can be avoided, (ii) the ability to incorporate the
the range has decreased significantly for a number of the knowledge of the researcher, and (iii) the ability to more
parameters, although ln(K30) and ∆H3/R ranges have not changed accurately predict the behavior of a validated model under
much with the addition of the new experiments. The confidence different operating conditions.
interval of ln(K30) was relatively tight before adding the selected Probability densities are the most appropriate way to describe
six experiments to the data set. ∆H3/R is still not well estimated a model’s predictive capabilities since they fully acknowledge
even after the new experiments were added. Further designing the consequences of unavoidable experimental error in real
new experiments would help to improve the confidence limits catalytic systems. Specifically, for a given model and a given
of this parameter. experimental data set that includes error, there will be a
The criterion for stopping this parameter estimation distribution of predicted outcomes from that model. Thus, both
improvement process is subjective. A typical criterion might the analysis of a specific model and the comparison of different
be to ensure that the univariate 95% confidence interval is models need to explicitly acknowledge this distribution of
less than 10% of a selected point estimate, such as the predictions from the models rather than just analyzing/comparing
MMPDE or the mean value. When all the parameters satisfy unique predictions coming from single point estimates. In
the specification, the lack of fit test should be performed again contrast, traditional linear analysis and nonlinear optimization
to ensure the model adequacy if replicated data points are only provide point estimates of model parameters, often luring
available in the final data set. Also, if the variance parameter the researcher into the erroneous impression that there is a
φ is significantly larger than the measurement error, then unique prediction for a given model.

Table 11. Six Experiments Added to the Data Set DAUG for Improving Parameter Estimates
expr no. T (K) PA0 (atm) PB0 (atm) PC0 (atm) ln(det(Ψ)) PA (atm) PB (atm) PC (atm) rate (gmol/(min kg catalyst))
1 670 1.5 1.5 0.1 -24.9 1.470 1.470 0.130 1.727
2 630 1.5 1 0 -24.3 1.493 0.993 0.007 1.284
3 650 1.5 1.5 0 -23.4 1.481 1.481 0.019 2.324
4 650 0.5 1.5 0 -23.3 0.483 1.483 0.017 1.150
5 670 0.5 1.5 0 -23.2 0.474 1.474 0.026 1.125
6 670 1 1 0 -23.2 0.979 0.979 0.021 1.365
Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009 4785

Figure 13. Pairwise of the marginal confidence regions of the parameters in model 2 with data set DAUG+6. The contours are 50%, 70%, 90%, and 95%
confidence regions from inside to outside, respectively. The units of Ea/R, ∆H1/R, and ∆H3/R are 103 K.

Table 12. Estimated Parameters and Corresponding 95% 1 and 3. When an augmented data set with seven additional
Confidence Intervals for Model 2 experiments was employed, the parameter distributions, as
Data Set DAUG+6 and Uniform Prior shown in Figure 12, became much smoother and nearly
parameter MPPDE θj,max 95% LB 95% UB Gaussian. However, the additional data set was designed to
provide these highly improved (and now nearly Gaussian)
ln(k40) 1.17 1.18 0.97 1.35
parameter estimate distributions using the Bayesian methods.
Ea/R 10.6 10.2 7.41 13.6
ln(K10) -0.156 -0.139 -0.512 0.234 The need for incorporating the nonlinear error structure of the
∆H1/R -14.2 -12.2 -19.1 -10.2 models was demonstrated for a relatively simple kinetic model
ln(K30) 2.91 2.93 2.73 3.16 with data that was of reasonable quality. We expect the need
∆H3/R -14.7 -15.0 -22.2 -11.0 for directly including the nonlinearity will become even more
σ2 × 103 1.91 2.24 1.43 3.85
important as the complexity of the models increases or the data
Another key feature of the methods discussed here is the becomes noisier.54 It is possible that, with an overwhelming
importance of modeling the error as well as modeling the kinetic amount of data, nonlinear optimization with linear error analysis
relationships. The error associated with any specific experiment may also be able to determine reasonable parameter estimates
is assumed to be normally distributed around the average value and improve model discrimination, but even in an era of high
for that experiment. Typically, the error is assumed to be throughput experimentation, maximizing the impact of each
constant or proportional to the magnitude of the response (i.e., experiment still has value.
the latter is the case when the logarithm of data is fit). However, A distinguishing feature of the Bayesian method is that it
much more complex error structures are more typical, e.g. the takes full advantage of the catalyst researcher’s expertise prior
error is proportional to the magnitude of the experimentally to the determination of the parameters. This is in contrast to
measured response except for very small responses, lower than regression methods, which only deal with the data and the model
the detector sensitivity. A simple three-parameter model for without any subjective input from the expert other than
capturing this “heteroscedasity” in the experimental error has specification of candidate models and supply of initial
been described in Appendix A. The key issue is that although guesses.19,43,55 Although both model building approaches can
the experimental error is normally distributed, when that error be used, we believe that the Bayesian approach is of particular
is propagated through the nonlinear models that are used in value to kinetic problems, where the potential rate and equi-
kinetic analysis, the resulting errors in the parameter estimates librium constants can vary by orders of magnitude. Any
are often anything but Gaussian. The parameter distribution information that the expert can provide can significantly reduce
estimates given in Figure 3 for the three candidate models for the amount of experimental data needed for model discrimina-
the test data set given in Table 1 are a good example. tion and parameter estimation. Moreover, in studies of a series
Considering all the parameters for the three models, the only of catalytic structures (e.g., the systematic change of ligand
distribution that is even remotely Gaussian is ln(K30) for models molecular structure, the stoichiometry of a mixed metal surface,
4786 Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009

etc.) and/or a series of different reactants, one might anticipate out the computational challenges imposed by its proper imple-
that the rate/equilibrium constants and activation energies for mentation. The MCMC sampling methodology outlined in
one member of a catalyst family would provide a good point Appendix B was implemented in two different programs. A PC-
of departure for analysis of other members of the family. based program called MODQUEST was written in MATLAB
Bayesian methods can take full advantage of such expert prior and used to solve the example problem described in this paper.
knowledge of related kinetic systems. The expert knowledge is About 10 min were required on a Dell precision 6300 Intel Core
captured via the prior probability distribution p(θk, φ). In the 2 Extreme CPU X7900 2.8 GHZ with 4 GB RAM running 32-
simple example presented here, the prior knowledge was used bit Microsoft Windows Vista to obtain convergence for an eight
only to place limits on various parameters (see Table 7). The parameter model such as model 1 on the original data set D.
assumption of a uniform distribution is called an uninformative Another software program written as a combination of C++
prior, although this is a misnomer: there is often considerable and Fortran was used to handle larger real world microkinetic
expert knowledge in specifying the parameter limits. Sometimes problems consisting of systems of differential algebraic equa-
additional information is available from screening studies, a tions. In this case, convergence for a 25 parameter microkinetic
linear analysis, etc. If an initial point estimate is available, an problem was achieved in less than 48 h on a large data set using
alternative is a triangular prior distribution that has a maximum an Intel Xeon 3.2 GHz CPU with 4GB RAM running 64-bit
at the initial point estimate and goes to zero at the expert defined Redhat Enterprise Linux 4.0. Experience has shown that the
limits on the parameters. The triangular prior distribution computational time is dramatically reduced with the quality of
incorporates more knowledge in the prior than the uninformative the proposal distribution for the posterior and the starting guess
prior. The relationship between the information content of the as well as the suitability of the model. Conversely, computational
prior probability distribution and the final parameter confidence times increase with the size of the systems of differential
region will be an interesting one for further study. algebraic equations to be solved and the stiffness of the system.
New criteria for model discrimination and for parameter It is also evident that the computational times could be
estimation have been proposed in this work. First, additional dramatically reduced by using parallel processors or computing
experiments for model discrimination are designed in the region clusters, which are ideal for Monte Carlo like formulations. The
where the probability distributions of the model predictions are latter approach is being vigorously pursued by the authors.
the most different. This is a general criterion without any In summary, the power of Bayesian methods for model
assumptions. The conventional approach of experimental design discrimination and parameter estimation, including design of
for model discrimination compares the model prediction cal- experiments, has been shown for a simple, model kinetics
culated at the maximum likelihood estimates only. By using problem. The traditional tool for analyzing kinetics is nonlinear
probability distributions, our proposed criterion for model optimization with a linear statistical analysis around the optimal
discrimination incorporates the uncertainty of the model predic- solution, which can provide good results if there are sufficient
tion as well, and this appears to be its source of improved ability data that have been well designed and the potential models are
to discriminate models. Second, to improve the parameter quite different. The Bayesian approach outperformed these
estimates, the additional experiments are designed to minimize methods in this comparison, however. The ability of this
the volume of the confidence region, which is approximated approach to treat nonlinearity without approximation gives it
by the determinant of the covariance matrix. We can calculate high potential for a wide variety of problems in catalytic kinetics
the covariance matrix from the samples of the Markov Chain and thus provides a new set of tools to be added to the arsenal
Monte Carlo process and then use it to search for new of any researcher who is developing models of catalytic systems.
experiments that narrow the parameter probability distribution.
These two criteria are intuitive but have not been discussed in Acknowledgment
available literature, perhaps due to the computational complex- The authors would like to acknowledge the financial support
ity. Since the era of high-speed computation has arrived, they of the Department of Energy of the United States (DOE-BES
should now be exploited because they are both powerful and Catalysis Science Grant DE-FG02-03ER15466) and ExxonMo-
general. We have not compared these approaches directly to bil through collaboration on the design of hydrotreating catalysts.
the conventional D-optimal design for parameter estimation, but
note that D-optimal design uses the linearized model around Appendix A: Likelihood Function and Error Model
the maximum likelihood estimate, a step avoided by the
Bayesian approach. It is necessary to specify a form of the probability distributions
for the likelihood function L(D|{θk, φ}) and the prior joint
The entire Bayesian framework was demonstrated on a very
probability distributions p(θk, φ) for the model parameters θk
simple catalyst kinetics test problem, where data were generated
and the error model parameters φ before eq 8 can be solved.
from one of the models, including reasonable amounts of
Selecting the joint prior probability distributions for the
experimental error. The results of this exercise show that linear
parameters will be discussed in Appendix C. In this section,
methods and nonlinear optimization were unable to identify the
we will develop the form of the likelihood function. It is
correct model from the three candidate models, even with an
reasonable to assume that the experimental errors εi(φ) for each
additional experiment. In contrast, Bayesian methods were able
of the N data points are independent and normally distributed
to robustly eliminate one of the models and suggest a single
random variables with mean zero and variance σi2. On the basis
additional experiment that was able to discriminate between the
of this assumption, the joint probability density function for the
remaining models. The importance of appropriate experimental
N data points in the data set D is
design is also indicated, in that the one-variable-at-a-time
approach did not provide a robust parameter estimates. However,
Bayesian-based design of experiments suggested the optimal
set of new experiments needed to improve parameter estimates.
p(ε) )
N

∏ p(ε ) ) ∏
i)1
i
N

i)1 { 1
(2π)1⁄2σi ( )}εi2
exp - 2
2σi
(A.1)

Although the Bayesian approach is much preferred over the For any set of values of the model parameters θk, the difference
conventional approach to model building, the paper has pointed between the measured rates ri and the values predicted by model
Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009 4787
k, fk(θk, ui) are the residuals expected value of the likelihood function E(L(D|θk, φ)). This
value can be approximated by the relationship
eik ) ri - fk(θk, ui) (A.2)
Assuming a high probability that the kth model generated D E{L(D|θk, φ)} ) ∫ ∫ L(D|(θ , φ))p(θ , φ) dφ dθ ≈
θk φ k k k
with the θk parameters, i.e., the model is valid, the residuals T
are estimates of experimental error and may be substituted for
the errors in A.1 to give the joint probability distribution function
1

T j)1
L(D|{θk,j, φj}) (B.1)

{ ( )}
N (ri - fk(θk, ui)) 2 where{θk,j, φj} for j ) 1,..., T is the discrete set of values of

1
L(D|θk, φ) ) 1⁄2
exp - model and statistical parameters selected from the prior distribu-
i)1 (2π) σi 2σi2 tion p(θk, φ) using a Monte Carlo sampling process.56 The size
(A.3) of T required to give a good approximation to the integral
depends on the nonlinearity of the model, the dimension and
which is called the likelihood function for model k. When the size of parameter space Qk + V, and the accuracy required. If
model is incorrect, the residuals are biased so they are not evaluating fk(θk,j, ui) is fast and the dimensionality of the
estimates of experimental error, but the substitution is made parameter space is small, such a standardized Monte Carlo
anyway so that a likelihood function is defined for all k models. sampling procedure is an effective way to evaluate the integral.
If replicates are available at each set of experimental A more efficient sampling approach called the Markov Chain
conditions ui it is possible to estimate the variance of the normal Monte Carlo (MCMC) method has been developed by Me-
distribution σi2. In this case, it is not necessary to define a tropolis et al.46 and later modified by Hastings.57 In this
statistical error model or define the parameters φ. However, procedure, it is not necessary to evaluate the integral directly.
when replicates are not available, it is convenient to model the Rather, the MCMC process converges to a sampling procedure
error. Since we have assumed that the errors are normally which randomly generates samples from the posterior probability
distributed with mean zero, the error model represents the distribution. By collecting a sufficient number of samples from
statistical parameters that characterize uncertainties in the this converged process, it is possible to calculate moments of
experimental setup and the response variable measurements. In the distribution (i.e., means, variances), confidence regions, and
statistical terms, we are going to use a model to characterize predictions made with the model itself. The interested reader is
the heteroscedasticity in the data. The following three-parameter referred to the literature for the mechanics of this process.47
model has been found to be very useful for representing the Basically, the form of the posterior distribution is proposed, a
variability in reaction systems20 series of samples is selected from the proposal distribution, and
a decision rule is defined involving the prior distribution and
σi2 ) ω2r̂γi + η2 i ) 1, .... , n (A.4) the previous point in the series. By repeated application of this
rule, the proposal distribution is gradually modified until samples
where ω, γ, and η are independent statistical model parameters,
from this modified distribution evolve into a sampling scheme
i.e. φ ) {ω, γ, η} and r̂i is the predicted reaction rate at the for the desired posterior distribution.
current value of ui and θk. This model has physically meaningful The efficiency of the MCMC method compared to the MC
boundary conditions. If the measurement errors are constant over scheme will now be shown. Consider the situation when the
the entire experimental region, then γ ) 0 and the variance is prior probability distributions of the parameters in the three
constant. If the measurement errors are directly proportional so models from the sample problem are all uniform between
that there is a constant percent error in the data, then γ ) 2. the expert chosen bounds given in Table 3. Also assume that
Finally, if the error in the measurements goes from being the error is normally distributed with a constant but unknown
proportional to the measurements until the limit of detection is variance φ ) σ2 (This is the situation where γ ) 0, η2 ) 0,
reached, all three terms are needed in the model, with the limit and ω2 ) σ2). Using the data set D, the expected value of the
of detection being η2. In the analysis of the simple model likelihood function for model 2 can be calculated from eq B.1
presented in this paper, the mathematical model parameters θk using a simple Monte Carlo approach. Because of the size of
and statistical model parameters φ were estimated simulta- the numbers, the log of the likelihood function is calculated
neously. This is particularly challenging in the point estimate and shown in Figure B.1 as the number of samples T increases
approach but very natural in the Bayesian approach affording from 103 to 107. We have also reported the maximum value of
insights into the quality and interaction between the mathemati- the log of the likelihood function achieved by this sampling
cal and statistical model parameters. scheme since it is the point which defines the maximum
likelihood point estimators. For comparison purposes, the
MCMC method is used to generate samples from the posterior
Appendix B. Posterior Probability Distribution probability distribution and the samples are used to calculate
Evaluation Methods the logarithm of the posterior probability distribution using eq
In order to determine the joint posterior probability distribu- 8. The results, plotted in Figure B.1, are quite dramatic. Note
tion of the parameters p({θk, φ}|D) after the data have been that significantly more trials are needed using Monte Carlo than
collected, it is necessary to integrate over the entire parameter with MCMC to obtain the same degree of accuracy. The MCMC
space to determine the normalization factor for the denomi- converges after approximately 5 × 104 simulations, while more
nator of eq 8. Once this factor has been determined, the than 106 trials are need for the simple Monte Carlo methodsa
posterior probability distribution is obtained by multiplying 20-fold increase in efficiency. This is an isolated example but
the likelihood function with samples taken from the prior comparable results can be seen for the other models.
distribution function for the parameters. This is computa- When the experimental data set used in the MCMC method
tionally feasible when the space of the parameters is small is expanded for model discrimination D (see section 6) or for
i.e. Qk + V e 10. One approach to obtaining this integral is improving the quality of the parameter estimates (see section
to recognize that the integral to be evaluated is simply the 8), it is desirable to take advantage of MCMC calculations that
4788 Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009

tion (section 5). In parameter estimation, a set of independent


uniform distributions were assumed for the parameters with
arbitrary but realistic upper and lower bounds for the various
parameters. The formalism is described in eq 7 and the actual
values used in Table 3. The purpose of this appendix is to show
how information from exploratory experiments can be used via
an exploratory data analysis to gain prior information to start
the model building process.
Linear or nonlinear regression analysis (see section 3) is a
common technique to obtain point estimates of the model
parameters as well as approximations of the confidence limits.
Unfortunately, in this type of analysis, the model parameters
are confounded with error model parameters unless the error
model parameters are determined separately. Such an approach
using replicate sampling is highly desirable and gives an
assessment of the variability in the system to help design future
Figure B.1. Expected log likelihood value obtained by Monte Carlo method experiments. If the rate expression is linearized so that the linear
with different numbers of samples. regression can be used, then it is convenient to use a triangular
distribution where the most likely values are the point estimates
were employed for the initial data set during the MCMC from the linear analysis and the maximum and minimum at
calculations for the new, augmented data set DAUG ) D + DMD. values suggested by the experimentalist/modeler. If nonlinear
First, Bayes’ theorem is applied using the posterior p({θk, φ}|D) regression is used to analyze the exploratory data with the model
of the first data set D as the prior and p(DAUG|{θk, φ}) as the directly, then the optimal point estimates of the parameters are
likelihood function to calculate a new posterior probability generated as well as the covariance matrix Σ̂k of the model
distribution p({θk, φ}|DAUG) for the augmented data set DAUG. parameters. In most nonlinear regression packages, this value
This posterior probability distribution for the augmented data is directly calculated from the inverse of the Hessian matrix,
will be calculated using the same MCMC methods and as Hk, that results from a Taylor series approximation of the
described above. If effective discrimination or sufficient quality objective function about the optimal point estimates of param-
of parameter estimation has still not been achieved with the k Therefore, the prior probability distribution of θk can
eters θ*.
first augmentation of the data set, it may be necessary to repeat be formulated as a normal distribution with mean θ*k (nonlinear
the process and collect additional data sets. At each stage of least-squares parameter point estimates) and covariance Σ̂k.
the iterative process, the augmented data set becomes
1 1
j p(θk) ) m⁄2 1⁄2 ( 2 )
exp - (θk - θ*k )TΣ̂k-1(θk - θ*k )
DAUG,j ) D + ∑ DMD,l (B.2)
(2π) |Σ̂k|
|Hk|1⁄2
l)1 1
With each addition to the data set, i.e. DMD,l, the posterior from
)
(2π)m⁄2 (
exp - (θk - θ*k )THk(θk - θ*k )
2 ) (C.1)
the previous set of data is used as the prior in determination of the In linear regression it is unnecessary to specify a starting point
new augmented data set. To the best of our knowledge, this method since the method guarantees a unique solution. When using
for more efficient MCMC calculations as the data is extended is nonlinear regression, however, a starting point for the algorithm
novel. is required. The convergence to a solution is strongly influenced
by this starting point and convergence to different optima is
Appendix C: Determination of the Prior Distribution possible from different starting points. This is possible especially
In Bayesian analysis, any knowledge about the model should with sparse nondesigned data. We have already demonstrated
be captured before the data is collected. This knowledge is the presence of multi optima in the body of the paper even with
quantified via the prior probabilities Pr(Mk) during model the large data set D. It is absolutely critical that the variability
discrimination and p(θk, φ) during parameter estimation. The around the optimum not be too constrained so that the true
prior probability distribution for the models may be obtained optima can be considered in the Bayesian approach. Note that
from analysis on similar catalytic systems or expert opinion; one of the advantages of the Bayesian approach is the ability
however, for new catalytic systems, their unique character makes to search the entire region of the prior distribution and generate
it difficult to indicate a preference between models before data an entire posterior distribution instead of meaningless point
is collected. In these cases one can only use the uninformative estimates.
prior Pr(Mk) ) 1/P where P is the number of postulated models Rather than using a linear or nonlinear point estimation
at the start of the model building exercise. However, if data is approach it is also possible to use a Bayesian approach to
available from exploratory experiments or the literature, it should generate a posterior distribution pretending that the exploratory
be used to help estimate the prior probabilities. Although data are new. However, this requires assuming a prior without
exploratory experiments are not in the true spirit of Bayesian any data which defeats the purpose of the exploratory data
statistics (i.e., every experiment should be designed based upon analysis.
currently available knowledge), in practice, catalysis researchers
nearly always conduct a number of “exploratory” runs on the Appendix D: Design of Experiments to Improve
apparatus to discern the idiosyncrasies of the experimental setup Parameter Estimation
such as the feasible operating range of u, the expected analytical Given the augmented data set DAUG, eqs 19 and 20 can be used
and sampling variability, and the reproducibility of the catalytic to calculate the variance-covariance matrix Ψ. It can then be
system as well as catalyst deactivation. In the body of the paper used to design set of experiments to efficiently improve the
an uninformative prior was employed during model discrimina- quality of the parameter estimates. Ψ is defined by
Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009 4789

Ψ) [ Σθ Σθ,φ
Σφ,θ Σφ ] (D.1)
practice, however, it is more useful to find regions in the space
of experimental conditions that show lower values for det(Ψ)
rather than focusing on finding a single new experimental

[ ]
where condition that minimizes det(Ψ). Similar to the approach for
Var(θ1) Cov(θ1, θ2) · · · Cov(θ1, θp) the experimental design for model discrimination, described in
Cov(θ2, θ1) Var(θ2) ··· Cov(θ2, θp) section 6, a factorial or fractional factorial design can be applied
Σθ ) · to determine the best experiments which accomplish this goal.
l l ·· l
Each additional experiment can provide insight into improve-
Cov(θp, θ1) Cov(θp, θ2) · · ·

[ ]
Var(θp) ment of the parameter estimates. Consequently, once an
Var(φ1) Cov(φ1, ϑ2) · · · Cov(φ1, φn) additional designed experiment is available, a new posterior
Cov(φ2, φ1) Var(φ2) ··· Cov(φ2, φn) probability distribution should be calculated for DAUG+1. Then,
Σφ ) · the confidence limits and contour regions can be calculated and
l l ·· l
a new experiment is determined using ΨAUG+1. This iterative
Cov(φn, φ1) Cov(φn, φ2) · · ·

[ ]
Var(φn) process (experimentation and analysis) is continued until suitably
Cov(φ1, θ1) Cov(φ1, θ2) · · · Cov(φ1, θp) refined parameter estimates are obtained.
Cov(φ2, θ1) Cov(φ2, θ2) · · · Cov(φ2, θp) This is a viable approach if the experiments are expensive
Σφ,θ ) · and time-consuming to generate. If high throughput experimen-
l l ·· l
tation, where data can be generated rapidly, is available, then
Cov(φn, θ1) Cov(φn, θ2) · · · Cov(φn, θp)

[ ]
the computational challenges associated with implementing a
Cov(θ1, φ1) Cov(θ1, φ2) · · · Cov(θ1, φn) one-experiment-at-a-time sequential approach dominate. In this
Cov(θ2, φ1) Cov(θ2, φ2) · · · Cov(θ2, φn) case it is more efficient to propose a set of designed experiments
Σθ,φ ) · ui, i ) 1,..., q, where the estimate of the det(ΨAUG+1) for each
l l ·· l
individual experiment, the posterior distribution is determined
Cov(θp, φ1) Cov(θp, φ2) · · · Cov(θp, φn)
by MCMC for the augmented data set DAUG+q, the joint marginal
The diagonal elements of Ψ are the variances for the individual probability distribution of parameters determined and the
parameters and the off-diagonal elements are their covariances. confidence estimates of parameters examined. This time the
Ψ is a function of both the parameters and the experimental iterative process will involve q experiments.
conditions u and can be readily evaluated for the data set DAUG In either case, the sequential process will continue until the
using suitable point estimates of the parameters. The quality of catalyst researcher is comfortable with the quality of the
the parameter estimate is related to the elements of Ψ. For estimates or additional experimentation does not improve
example, if the off diagonal elements are small, the contours the quality of the results. If the researcher is still not comfort-
of the posterior probability density function will be spherical able with the quality of the parameter estimates, it is necessary
and the parameter estimates are uncorrelated. Also, the smaller to revisit the experimental apparatus and attempt to improve
the variance estimates the smaller the uncertainty in the the overall quality of the data itself.
parameter estimates. It may be shown8 that det(Ψ) is propor-
tional to the size of confidence regions for linear models.
The hypervolumes of the confidence regions can be numeri- Literature Cited
cally calculated from the volume inside the contours of the full (1) Caruthers, J. M.; Lauterbach, J. A.; Thomson, K. T.; Venkatasubra-
probability density distribution, i.e. eq 8; however, this is a manian, V.; Snively, C. M.; Bhan, A.; Katare, S.; Oskarsdottir, G. Catalyst
difficult calculation. The linearized volume, i.e. det(Ψ), com- design: knowledge extraction from high-throughput experimentation. J.
Catal. 2003, 216 (1-2), 98–109.
puted from one million realizations from the posterior probability
(2) Dumesic, J. A.; Milligan, B. A.; Greppi, L. A.; Balse, V. R.;
distribution of the parameters can be used to approximate the Sarnowski, K. T.; Beall, C. E.; Kataoka, T.; Rudd, D. F.; Trevino, A. A. A
confidence region. To improve the confidence region, q ad- Kinetic Modeling Approach to the Design of Catalysts - Formulation of a
ditional experiments ui, i ) 1,..., q should be selected to Catalyst Design Advisory Program. Ind. Eng. Chem. Res. 1987, 26 (7),
minimize this determinant. The design procedure to select these 1399–1407.
(3) Banaresalcantara, R.; Westerberg, A. W.; Ko, E. I.; Rychener, M. D.
q experiments is the following: DECADEsA Hybrid Expert System for Catalyst Selection. 1. Expert
1. Generate a new candidate experiment, ui. System Consideration. Comput. Chem. Eng. 1987, 11 (3), 265–277.
2. Calculate the expected model predictions E{yi} for ui by (4) Banaresalcantara, R.; Ko, E. I.; Westerberg, A. W.; Rychener, M. D.
DECADEsA Hybrid Expert System for Catalyst Selection. 2. Final
E{yi|DAUG} ) Architecture and Results. Comput. Chem. Eng. 1988, 12 (9-10), 923–938.
T (5) Ammal, S. S. C.; Takami, S.; Kubo, M.; Miyamoto, A. Integrated
∫ θ,φ
f(θ, ui)p({θ, φ}|DAUG) dφ dθ =
1
∑ f(θ , u ) (E.2)
T j)1 j i
computational chemistry system for catalysts design. Bull. Mater. Sci. 1999,
22 (5), 851–861.
(6) Burello, E.; Rothenberg, G. In silico design in homogeneous catalysis
where {θj} are sampled from the posterior probability using descriptor modelling. Int. J. Mol. Sci. 2006, 7 (9), 375–404.
distribution p({θ, φ}|DAUG). (7) Dumesic, J. A.; Rudd, D. F.; Aparicio, L. M.; Rekoske, J. E.; Trevino,
3. Use E{yi} to form a modified data set DMD ) (ui, E{yi|D}). A. A. The Microkinetics of Heterogeneous Catalysis; American Chemical
Society: Washington, D.C., 1993; p 316.
4. Generate the posterior probability distribution from DAUG+1
(8) Box, G. E. P.; Lucas, H. L. Design of Experiments in Non-Linear
) DAUG + DMD. Situations. Biometrika 1959, 46 (1/2), 77–90.
5. Calculate the variance and covariance of the parameters (9) Chernoff, H. Sequential design of experiments. Ann. Math. Statist.
and form matrix Ψ. 1959, 30, 755–770.
6. Calculate det(Ψ). (10) Franckaerts, J.; Froment, G. F. Kinetic study of the dehydrogenation
of ethanol. Chem. Eng. Sci. 1964, 19 (10), 807–818.
7. Go to step 1 until det(Ψ) is minimized.
(11) Box, G. E. P.; Draper, N. R. The Bayesian Estimation of Common
The computational burden required to implement this pro- Parameters from Several Responses. Biometrika 1965, 52 (3), 355–365.
cedure is enormous. Optimum seeking methods can readily be (12) Hunter, W., G.; Reiner, A. M. Designs for discriminating between
substituted for finding the best point u1(t) when q ) 1. In two rival models. Technometrics 1965, 7 (3), 307–323.
4790 Ind. Eng. Chem. Res., Vol. 48, No. 10, 2009

(13) Kittrel, J. R.; Hunter, W. G.; Watson, C. C. Nonlinear Least Squares (36) Katare, S.; Bhan, A.; Caruthers, J. M.; Delgass, W. N.; Venkata-
Analysis of Catalytic Rate Models. AIChE J. 1965, 11 (6), 1051–1057. subramanian, V. A hybrid genetic algorithm for efficient parameter
(14) Box, G. E. P.; Hill, W. J. Discrimination among mechanistic models. estimation of large kinetic models. Comput. Chem. Eng. 2004, 28 (12),
Technometrics 1967, 9 (1), 57–71. 2569–2581.
(15) Hunter, W. G.; Mezaki, R. An experimental design strategy for (37) Bhan, A.; Hsu, S.-H.; Blau, G.; Caruthers, J. M.; Venkatasubra-
distinguishing among rival mechanistic models. An application to the manian, V.; Delgass, W. N. Microkinetic Modeling of Propane Aromati-
catalytic hydrogenation of propylene. Can. J. Chem. Eng. 1967, 45, 247– zation over HZSM-5. J. Catal. 2005, 235 (1), 35–51.
249. (38) Bogacha, B.; Wright, F. Non-linear design problem in a chemical
(16) Froment, G. F.; Mezaki, R. Sequential Discrimination and Estima- kinetic model with non-constant error variance. J. Stat. Plan. Inference 2005,
tion Procedures for Rate Modeling in Heterogeneous Catalysis. Chem. Eng. 128, 633–648.
Sci. 1970, 25, 293–301. (39) Ucinski, D.; Bogacha, B. T-optimum design for discrimination
(17) Van Welsenaere, R. J.; Froment, G. F. Parametric Sensitivity and between two multiresponse dynamic models. J. R. Stat. Soc. Ser. B: Stat.
Runaway in Fixed Bed Catalytic Reactors. Chem. Eng. Sci. 1970, 25, 1503– Method. 2005, 67, 3–18.
1516. (40) Englezos, P. J.; Kalogerakis, N. Applied Parameter Estimation for
(18) Reilly, P. M. Statistical methods in model discrimination. Can. Chemical Engineers; Marcel-Decker, Inc.: New York, 2001.
J. Chem. Eng. 1970, 48, 168–173. (41) Blau, G. E.; Lasinski, M.; Orcun, S.; Hsu, S.-H.; Caruthers, J. M.;
(19) Bard, Y. Nonlinear parameter estimation; Academic Press: New Delgass, W. N.; Venkatasubramanian, V. High Fidelity Mathematical Model
York, 1974. Building with Experimental Data: A Bayesian Approach. Comput. Chem.
(20) Reilly, P. M.; Blau, G. E. The Use of Statistical Methods to Build Eng. 2008, 32 (4-5), 971–989.
Mathematical Models of Chemical Reaction Systems. Can. J. Chem. Eng. (42) Draper, N. R.; Hunter, W. G. Design of experiments for parameter
1974, 52, 289–299. estimation in multiresponse situations. Biometrika 1966, 53 (3/4), 525–
(21) Reilly, P. M.; Bajramovic, R.; Blau, G. E.; Branson, D. R.; 533.
Sauerhoff, M. W. Guidelines for the optimal desing of experiments to (43) Draper, D. Bayesian Hierarchical Modeling; Springer-Verlag: New
estimate parameters in first order kinetic models. Can. J. Chem. Eng. 1977, York, 2000.
55, 614. (44) Nocedal, J.; Wright, S. J. Numerical optimization; Springer: New
(22) Stewart, W. E.; Sorensen, J. P. Bayesian Estimation of Common York, 1999; p 636.
Parameters From Multiresponse Data With Missing Observations. Tech- (45) Froment, G. F. Model discrimination and parameter estimation in
nometrics 1981, 23 (2), 131–141. heterogeneous catalysis. AIChE J. 1975, 21 (6), 1041–1057.
(23) Rabitz, H.; Kramer, M.; Dacol, D. Sensitivity Analysis in Chemical (46) Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller,
Kinetics. Annu. ReV. Phys. Chem. 1983, 34, 419–461. A. H.; Teller, E. Equations of State Calculations by Fast Computing
(24) Froment, G. F. The kinetics of complex catalytic reactions. Chem. Machines. J. Chem. Phys. 1953, 21, 1087–1092.
Eng. Sci. 1987, 42 (5), 1073–1087. (47) Gilks, W. R.; Richardson, S.; Spiegelhalter, D. J. MarkoV Chain
(25) Stewart, W. E.; Caracotsios, M.; Sorensen, J. P. Parameter Monte Carlo in Practice; Chapman & Hall/CRC: New York, 1996.
estimation from multiresponse data. AIChE J. 1992, 38 (5), 641–650. (48) Atkinson, A. C.; Cox, D. R. Planning Experiments for Discriminat-
(26) Stewart, W. E.; Shon, Y.; Box, G. E. P. Discrimination and ing between Models. J. R. Stat. Soc. Ser. B: Stat. Method. 1974, 36 (3),
goodness of fit of multiresponse mechanistic models. AIChE J. 1998, 44 321–348.
(6), 1404–1412. (49) Atkinson, A. C.; Fedorov, V. V. Optimal design: Experiments for
(27) Asprey, S. P.; Naka, Y. Mathematical Problems in Fitting Kinetic discriminating between several models. Biometrika 1975, 62 (2), 289–303.
ModelssSome New Persopectives. J. Chem. Eng. Jpn. 1999, 32 (3), 328– (50) Fedorov, V. V.; Pazman, A. Design of physical experiments.
337. Fortschr. Phys. 1968, 16, 325–355.
(28) Stewart, W. E.; Henson, T. L.; Box, G. E. P. Model discrimination (51) Hsiang, T.; Reilly, P. M. A practical method for discriminating
and criticism with single-response data. AIChE J. 1996, 42 (11), 3055– among mechanistic models. Can. J. Chem. Eng. 1971, 49, 865–871.
3062. (52) Montgomary, D. C.; Runger, G. C. Applied Statistics and Prob-
(29) Park, T.-Y.; Froment, G. F. A Hybrid Genetic Algorithm for the ability for Engineers, 3rd ed.; Wiley: Hoboken, NJ, 2002; p 720.
Estimation of Paramters in Detailed Kinetic Models. Comput. Chem. Eng. (53) Prasad, V.; Vlachos, D. G. Multiscale Model and Informatics Based
1998, 22 (Suppl.), S103-S110. Optimal Design of Experiments: Application to the Catalytic Decomposition
(30) Petzold, L.; Zhu, W. Model reduction for chemical kinetics: an of Ammonia on Ruthenium. Ind. Eng. Chem. Res. 2008, 47, 6555–6567.
optimization approach. AIChE J. 1999, 45 (4), 869–886. (54) Hsu, S.-H. Bayesian Model Building Strategy and Chemistry
(31) Ross, J.; Vlad, M. O. Nonlinear Kinetics and New Approaches to Knowledge Compilation for Kinetic Behaviors of Catalytic Systems. Ph.D.
Complex Reaction Mechanisms. Annu. ReV. Phys. Chem. 1999, 50, 51–78. Thesis, Purdue University, West Lafayette, IN, 2006.
(32) Atkinson, A. C. Non-constant variance and the design of experi- (55) Bates, D. M.; Watts, D. G. Nonlinear Regression Analysis and its
ments for chemical kinetic models. In Dynamic model deVelopmentsmethods, Applications; Wiley and Sons: New York, 1988.
theory and applications; Asprey, S. P., Macchietto, S., Eds.; Elsevier: (56) Fishman, G. S. A First Course in Monte Carlo; Thomson Brooks/
Amsterdam, 2000; pp 141-158. Cole: Belmont, CA, 2005.
(33) Cortright, R. D.; Dumesic, J. A. Kinetics of Heterogeneous Catalytic (57) Hastings, W. K. Monte Carlo Sampling Methods Using Markov
Reactions: Analysis of Reaction Schemes. AdV. Catal. 2001, 46, 161–264. Chains and Their Applications. Biometrika 1970, 57, 97–109.
(34) Song, J.; Stephanopoulos, G.; Green, W. H. Valid Parameter Range
Analyses for Chemical Reaction Kinetic Models. Chem. Eng. Sci. 2002, ReceiVed for reView October 30, 2008
57, 4475–4491. ReVised manuscript receiVed February 24, 2009
(35) Sirdeshpande, A. R.; Ierapetritou, M. G.; Androulakis, I. P. Design Accepted March 13, 2009
of Flexible Reduced Kinetic Mechanisms. AIChE J. 2001, 47 (11), 2461–
2473. IE801651Y

You might also like