Thesis - Data Reconc and Params Estimation

FRAMEWORK FOR JOINT DATA RECONCILIATION AND
PARAMETER ESTIMATION
JOE YEN YEN

(B.Eng.(Hons.), NUS)
A THESIS SUBMITTED
FOR THE DEGREE OF MASTER OF ENGINEERING
DEPARTMENT OF ELECTRICAL
AND COMPUTER ENGINEERING
NATIONAL UNIVERSITY OF SINGAPORE
2004
ACKNOWLEDGMENTS
I would like to express my gratitude to my supervisors Dr Arthur Tay and A/Professor
Ho Weng Khuen of the NUS ECE Department and Professor Ching Chi Bun of the
Institute of Chemical and Engineering Sciences (ICES), for their advice and guidance,
and for having confidence in me in carrying out this research.
The support of ICES in financing the research and providing the research environment is
greatly acknowledged. The guidance and assistance from Dr Liu Jun of ICES is also
appreciated.
Most of the research work is done in the Process Systems Engineering (PSE) Group of
the Department of Chemical Engineering in University of Sydney. The guidance of
Professor Jose Romagnoli and Dr David Wang is valuable in directing and improving the
quality of this work. I would also like to thank them for accommodating me with such
hospitality that my stay in the group is not only fruitful, but also enjoyable. Thanks also
go to the other members of the PSE for sharing their research knowledge and experience
and for making me feel part of the group.
My friend Zhao Sumin has seen to my well being in Sydney, and most importantly, has
been a like-minded confidante from whom I obtain inspiration in carrying out my
research work. I would like to dedicate this thesis to her, as with her persistent
encouragements, I feel that the effort in completing this thesis is partly hers.
i
TABLE OF CONTENTS
ACKNOWLEDGMENTS............................................................................................................................. I
TABLE OF CONTENTS.............................................................................................................................II
SUMMARY ................................................................................................................................................ IV
LIST OF TABLES ..................................................................................................................................... VI
LIST OF FIGURES ..................................................................................................................................VII
CHAPTER 1: INTRODUCTION ................................................................................................................1
1.1. MOTIVATION ........................................................................................................................................1
1.2. CONTRIBUTION .....................................................................................................................................3
1.3. THESIS ORGANIZATION ........................................................................................................................4
CHAPTER 2: THEORY & LITERATURE REVIEW..............................................................................6
2.2. DATA RECONCILIATION (DR)...............................................................................................................6
2.3. JOINT DATA RECONCILIATION – PARAMETER ESTIMATION (DRPE) ..................................................11
2.4. ROBUST ESTIMATION .........................................................................................................................15
2.5. PARTIALLY ADAPTIVE ESTIMATION ...................................................................................................22
2.6. CONCLUSION ......................................................................................................................................24
CHAPTER 3: JOINT DRPE BASED ON THE GENERALIZED T DISTRIBUTION .......................26
3.1. INTRODUCTION ...................................................................................................................................26
3.2. THE GENERALIZED T (GT) DISTRIBUTION..........................................................................................26
3.3. ROBUSTNESS OF THE GT DISTRIBUTION .............................................................................................29
3.4. PARTIALLY ADAPTIVE GT-BASED ESTIMATOR...................................................................................30
3.5. THE GENERAL ALGORITHM ................................................................................................................32
3.6. CONCLUSION ......................................................................................................................................35
CHAPTER 4: CASE STUDY .....................................................................................................................36
4.1. INTRODUCTION ...................................................................................................................................36
4.2. THE GENERAL-PURPOSE CHEMICAL PLANT .......................................................................................36
4.3. VARIABLES AND MEASUREMENTS ......................................................................................................40
4.4. SYSTEM DECOMPOSITION: VARIABLE CLASSIFICATION .....................................................................43
4.4.1. Derivation of Reduced Equations ..............................................................................................43
4.4.2. Classification of Measurements .................................................................................................45
4.5. DATA GENERATION ............................................................................................................................47
4.5.1. Simulation Data .........................................................................................................................47
4.5.2. Real Data ...................................................................................................................................48
4.6. METHODS COMPARED ........................................................................................................................48
4.7. PERFORMANCE MEASURES .................................................................................................................50
4.8. CONCLUSION ......................................................................................................................................51
CHAPTER 5: RESULTS & DISCUSSION ..............................................................................................53
5.1. INTRODUCTION ...................................................................................................................................53
5.2. PARTIAL ADAPTIVENESS & EFFICIENCY .............................................................................................53
5.3. EFFECT OF OUTLIERS ON EFFICIENCY .................................................................................................64
5.4. EFFECT OF DATA SIZE ON EFFICIENCY ...............................................................................................67
5.5. ESTIMATION OF ADAPTIVE ESTIMATOR PARAMETERS: PRELIMINARY ESTIMATORS AND ITERATIONS
..................................................................................................................................................................71
5.6. REAL DATA APPLICATION ..................................................................................................................76
5.7. CONCLUSION ......................................................................................................................................79
ii
CHAPTER 6: CONCLUSION ...................................................................................................................80
6.1. FINDINGS ............................................................................................................................................80
6.2. FUTURE WORKS .................................................................................................................................82
REFERENCES ............................................................................................................................................83
AUTHOR’S PUBLICATIONS...................................................................................................................85
APPENDIX A ..............................................................................................................................................86
APPENDIX B...............................................................................................................................................95
iii
SUMMARY
Objective knowledge about a process is essential for process monitoring, optimization,
identification and other general management planning. Since measurement of process
states always contain some type of errors, it is necessary to correct these measurement
data to obtain more accurate information about the process. Data reconciliation is such an
error correction procedure that utilizes estimation theory and the conservation laws
within the process to improve the accuracy of the measurement data and estimates the
values of unmeasured variables such that reliable and complete information about the
process is obtained.
Conventional data reconciliation, and other procedures that involve estimation, have
relied on the assumption that observation errors are normally distributed. The inevitable
presence of gross errors and outliers violates this assumption. In addition, the actual
underlying distribution is not known exactly and may not be normal. Various robust
approaches such as the M-estimators have been proposed, but most assumed, in a priori,
yet other forms of distribution, although with thicker tails than that of normal distribution
in order to suppress gross errors / outliers. To address the issue of the suitability of the
actual distribution to the assumed one, posteriori estimation of the actual distribution,
based on non-parametric methods such as kernel, wavelet and elliptical basis function, is
then proposed. However, these fully adaptive methods are complex and computationally
demanding. An alternative is to strike a balance between the simplicity of the parametric
approach and the flexibility of the non-parametric approach, i.e. by adopting a
iv
generalized objective function that covers a wide variety of distributions. The parameters
of the generalized distribution can be estimated posteriori to ensure its suitability to the
data.
This thesis proposes the use of a generalized distribution, namely the Generalized T (GT)
distribution in the joint estimation of process states and model parameters. The desirable
properties of the GT-based estimator are its robustness, simplicity, flexibility and
efficiency for the wide range of commonly encountered distributions (including Box-Tiao
and t-distributions) that belong to the GT distribution family. To achieve estimation
efficiency, the parameters of the GT distribution are adapted from the data through
preliminary estimation. The strategy is applied to data from both the virtual version and a
trial run of a chemical engineering pilot plant. The results confirm the robustness and
efficiency of the estimator.
v
LIST OF TABLES
Table 5.1. MSE of Measurements..................................................................................... 57

Table 5.2. Estimated parameters of the partially adaptive estimators used to generate
Figure 5.5 .................................................................................................................. 63
Table 5.3. MSE of Measurements with Outliers............................................................... 64
Table 5.4. Reconciled Data and Estimated Parameter Values Using Different DRPE
Methods..................................................................................................................... 78
vi
LIST OF FIGURES
Figure 2.1. Plots of Influence Function for Weighted Least Square Estimator (dashed
line) and the Robust Estimator based on Bivariate Normal Distribution (solid line)21
Figure 2.2 Partially Adaptive Estimation Scheme............................................................ 24
Figure 3.1. Plot of GT density functions for various settings of distribution parameters p
and q.......................................................................................................................... 27
Figure 3.2. GT Distribution Family Tree, Depicting the Relationships among Some
Special Cases of the GT Distribution........................................................................ 28
Figure 3.3. Plots of Influence Function for GT-based Estimator with different parameter
settings ...................................................................................................................... 30
Figure 3.4. General Algorithm for Joint DRPE using partially adaptive GT-based
estimator.................................................................................................................... 33
Figure 4.1. Flow Diagram of the General Purpose Plant for Application Case Study ..... 37
Figure 4.2. Simulink Model of the General Purpose Plant in Figure 4.1. ........................ 38
Figure 4.3. Configuration of the General Purpose Plant for Trial Run............................. 39
Figure 4.4. Reactor 1 Configuration Details and Measurements...................................... 39
Figure 4.5. Reactor 2 Configuration Details and Measurements...................................... 40
Figure 5.1. MSE Comparison of GT-based with Weighted Least Squares and
Contaminated Normal Estimators............................................................................. 57
Figure 5.2. Percentage of Relative MSE........................................................................... 58
Figure 5.3. Comparison of the Accuracy of Estimates for the overall heat transfer
coefficient of Reactor 1 cooling coil......................................................................... 60
coefficient of Reactor 2 cooling coil......................................................................... 61
Figure 5.5. Adaptation to Data: Fitting the Relative Frequency of Residuals with GT,
Contaminated Normal, and Normal distributions..................................................... 62
Figure 5.6. MSE of Variable Estimates for Data with Outliers ........................................ 66
Figure 5.7. Comparison of MSE with and without outliers for GT and Contaminated
Normal Estimators .................................................................................................... 66
Figure 5.8. MSE Results of WLS, Contaminated Normal and GT-based estoimators for
Different Data Sizes.................................................................................................. 68
Figure 5.9. Improvement in MSE Efficiency when Data Size is increased...................... 70
Figure 5.10. Iterative Joint DRPE with Preliminary Estimation ...................................... 72
Figure 5.11. Final MSE Comparison for GT-based DRPE method Using GT, Median and
WLS as preliminary estimators................................................................................. 74
Figure 5.12. MSE throughout iterations ........................................................................... 75
Figure 5.13. Scaled Histogram of Data and Density Plots of GT, Contaminated Normal
and Normal Distributions.......................................................................................... 77
vii
CHAPTER 1: INTRODUCTION
1.1. Motivation
The continuously increasing demand for higher product quality and stricter compliance to
environmental and safety regulations requires the performance of a process to be
continuously improved through process modifications (Romagnoli and Sanchez, 2000).
Decision making associated with these process modifications requires accurate and
objective knowledge of the process state. This knowledge of process state is obtained
from interpretation of data generated by the process control systems. The modern-day
Distributed Control System (DCS) is capable of high-frequency sampling, resulting in
vast amount of data to be interpreted, be it for the purpose of process monitoring,
optimization or other general management planning. Since measurement data always
contain some type of error, it is necessary to correct their values in order to obtain
accurate information about the process.
Data reconciliation (DR) is such an error-correction procedure that improves the accuracy
of measurement data, and estimates the values of unmeasured variables, such that reliable
and complete information about the process is obtained. It makes use of conservation
equations and other system/model equations to correct the measurement data, i.e. by
adjusting the measurements such that the adjusted data are consistent with respect to the
equations. The conventional data reconciliation approach is the least squares
minimization, whereby the (square of) adjustments to the measurements are minimized,
1
while at the same time subjecting the measurements to satisfy the system/model
equations. The least squares method is simple and reasonably efficient; in fact, it is the
best linear unbiased, the most efficient in terms of minimum variance, and also the
maximum likelihood estimator when the measurement errors are distributed according to
the Normal (Gaussian) distribution.
However, measurement error is made up of random and gross error. Gross errors are
often present in the measurements and these large deviations are not accounted for in the
normal distribution. In this case, the least squares method can produce heavily biased
estimates. Attempts to deal with gross errors can be grouped in two classes. The first
includes methods that still keep the least squares approach, but incorporate additional
statistical tests to the residuals of either the constraints (which can be done pre-
reconciliation) or the measurements (which must be done post-reconciliation). The
drawback of these approaches is that there is a need for separate gross-error processing
step. Also, most importantly, normality is still assumed for the data, while the data may
not be best represented by the Normal distribution. Furthermore, the statistical tests are
theoretically valid only for linear system/model equations, which is a rather constricting
restriction in chemical processes where most relationships are nonlinear.
The second class of gross-error handling approaches comprises the more recent
approaches to suppress gross error, i.e. by making use of the so-called robust estimators.
These estimators can suppress gross error while performing reconciliation, so there is no
need for a separate procedure to remove the gross errors. Most of these approaches are
2
based on the concept of statistical robustness, and they can be further grouped as either
parametric or non-parametric approach. The parametric approach either represents the
data with a certain distribution that has thicker tails to account for gross errors, or uses a
certain form of estimator that does not assume normality and gives small weights for
largely deviating observations. The non-parametric group consists of estimators that do
not assume any fixed form of distributions, but adjust their forms to the data distribution
through non-parametric density estimation instead. The resulting estimator will be
efficient as it is fitted to the data. However, these fully flexible estimators are prone to the
data size for the preliminary fitting and often do not perform well for data size often
encountered in practice (Butler et al, 1990).
A strategy is proposed to improve the efficiency of the parametric estimation by allowing
the parameters of the estimator to vary to suit the data. This is called partially adaptive
estimation. In this thesis, a robust partially adaptive data reconciliation procedure using
the generalized-T (GT) distribution will be studied and applied to the virtual version of a
chemical engineering pilot plant. The strategy is extended to the joint DRPE, which gives
both parameter and variable estimates that are consistent with the system/model
equations.
1.2. Contribution
In this thesis, a robust and efficient strategy for joint data reconciliation and parameter
estimation is studied. The strategy makes use of the generalized T (GT) distribution, a
robust and versatile general distribution family that is originally proposed in the field of
3
statistics by McDonald and Newey (1988). The GT distribution is first used in data
reconciliation by Wang and Romagnoli (2003) in a comparison case study.
In the present work, the strategy is extended to incorporate parameter estimation in the
joint data reconciliation and parameter estimation scheme. The properties of the GT-
based partially adaptive estimators are comprehensively studied through various
simulation cases.
A comprehensive literature review of the data reconciliation and joint data reconciliation
– parameter estimation, and the technical aspects associated with them is conducted.
As an application case study, the virtual version of a real lab-scale general-purpose
chemical plant is developed in Matlab/ Simulink. Besides simulation studies, steady-state
experimental data are also obtained from a trial run of the pilot plant. Full system
decomposition based on formal transformation methods and symbolic manipulation is
then conducted on the plant to facilitate accurate and complete estimation of the process
states and parameters by the joint data reconciliation – parameter estimation procedure.
1.3. Thesis Organization
This thesis is organized as follows. The theory on relevant topics in data reconciliation
and parameter estimation, along with existing works in the literature, is given in Chapter
2. Chapter 3 starts with an introduction to the Generalized T (GT) distribution, and goes
on to describe the proposed GT-based joint data reconciliation – parameter estimation
4
strategy. Chapter 4 describes the application case study and gives overview of some of
the settings used in the case studies in Chapter 5, where the results of the case studies are
presented and discussed in detail. The thesis is then concluded with Chapter 6.
5
CHAPTER 2: THEORY & LITERATURE REVIEW
2.1. Introduction
Data reconciliation aims to improve the accuracy of measurement data by enforcing the
data consistency. It uses estimation theory and subjects the optimization to model balance
equations. Two important estimation criteria are robustness and efficiency. In this
chapter, an introduction to data reconciliation is provided along with its important
aspects, including the incorporation of robustness and variable classification. Relevant to
the estimation criteria, the concept of robustness in statistical sense and an approach to
improve estimation efficiency is presented. A section is also devoted to joint data
reconciliation and parameter estimation, a strategy that combines the two estimation
procedures to simultaneously obtain consistent variable and parameter estimates.
2.2. Data Reconciliation (DR)
Measurements always contain some form of errors. Errors can be classified into random
error and gross error. Random errors are caused by natural fluctuations and variability
inherent in the process; they occur randomly and are typically small in magnitude. Gross
errors, on the other hand, are large in magnitude but occur less frequently; their
occurrences can be attributed to incorrect calibration or malfunction of instruments,
process leaks, and other unnatural causes.
In order to obtain objective knowledge of the actual state of the process, accurate data
must be used, requiring erroneous measurements to be first corrected before used. Data
6
reconciliation is such error correction technique that makes use of simple, well-known
and indubitable process relationships that should be satisfied regardless of the
measurement accuracy, i.e. the multicomponent mass and energy balances (Romagnoli
and Sanchez, 2000).
The presence of errors in the measurement of process variables gives rise to discrepancies
in the mass and energy balances. Data reconciliation adjusts or reconciles the
measurements to obtain estimates of the corresponding process variables that are more
accurate and consistent with the process mass and energy balances. The adjustment of
measurements is such that certain optimality regarding the characteristic of the error is
achieved. Mathematically, the general data reconciliation translates into the following
constrained optimization problem:
(reconciled variables) = arg min (optimality criteria)
subject to
(mass and energy balances; variable bounds)
To illustrate more clearly, the data reconciliation problem using the weighted least
squares method will be formulated in the following.
Denote as
y, an (mx1) vector of measurements;
x, an (mx1) vector of corresponding true values of the variables with measurements y;
u, a (px1) vector of unmeasured variables;
7
and g(x,u)=0, the multicomponent mass and energy balance equations of the process; the
data reconciliation problem for weighted least square estimator can then be formulated
as:
Min ( y − x) T Ψ −1 ( y − x)
x, u
s.t.
g ( x, u ) = 0 (2.1)
x L ≤ x ≤ xu
u L ≤ u ≤ uu
where Ψ is the measurement error covariance matrix, x L and u L the lower bounds on x
and u, respectively, and xU and uU the upper bounds on x and u, respectively.
Three features are observed from the above formulation:
(1) Firstly, the objective function of the optimization is the square of the adjustments
made to the measurements y, weighted by the inverse of the error covariance matrix.
This corresponds to the weighted least square estimator used in the problem. The
objective function of the data reconciliation optimization problem is in fact
determined by the estimator applied to the problem. The choice of estimator is, in
turn, usually dependent on the assumption regarding the error characteristics. For
example, the use of weighted least square reflects the assumption that the error is
small relative to its standard deviation such that the measurement must lie somewhere
within very few standard deviations from the true value of the variable. In fact, if the
weighted least square is chosen based on maximum likelihood consideration, the error
is assumed to follow multivariate normal distribution with mean zero and covariance
8
Ψ . To demonstrate this, consider the likelihood function of the multivariate normal
distribution
f (ε ) = (2π det(Ψ )) −1 exp(−ε T Ψ −1ε ) , (2.2)
where ε = y − x ; the maximum of f (ε ) is obtained by minimizing ε T Ψ −1ε , i.e. the
weighted least square of adjustments. As will be discussed in Section 2.4 on
robustness, the adequacy of the assumption regarding the error and hence, the choice
of estimator plays an important role in ensuring the accuracy of the reconciled data in
all situations.
(2) Secondly, the constraint g(x,u) comprises the mass and energy balances of the
process. Together with the form of the objective function, the constraint equation
determines the difficulty of the DR optimization. If only total mass balances are
considered and the weighted least square objective function is used, the optimization
problem will have quadratic objective function with linear constraints. This kind of
optimization can be solved analytically, i.e. closed-form solution can be obtained.
However, as using merely mass balances limits the reconciliation to only flow
measurements, component and energy balances are usually also considered. This
results in non-linear optimization problem for which analytical solution usually does
not exist. Several optimization methods are proposed for this case, including the QR
orthogonal factorization for bilinear systems (Crowe, 1986), successive linearization
(Swartz, 1989), and nonlinear numerical optimization methods such as sequential
9
quadratic programming (Gill et al., 1986; Tjoa and Biegler, 1991). In this thesis, the
sequential quadratic programming (SQP) is used, as it is not restricted to linear or
bilinear systems, and is flexible in terms of the form of objective function. Although
convexity is essential to guarantee convergence to the true optimum, the algorithm
also converges to satisfactory solutions for many non-convex problems, as will be
demonstrated by the estimation results in Chapter 5.
(3) Thirdly, the optimization problem involves measured and unmeasured variables, x
and u, respectively. A very important procedure must be taken before formulating the
data reconciliation problem. This procedure is called variable classification. Given the
knowledge of which variables are measured, the balance equations can be analysed to
identify which measured variables are redundant or non-redundant, and which
unmeasured variables are determinable or undeterminable. The value of a
determinable variable can be determined from the value of measured variables
through the model equations, whereas an undeterminable variable is not involved in
such equations, and hence its value cannot be determined from the values of other
variables. A redundant variable is a variable whose value can still be determined from
measurements of other variables through the model equations, even if its own
measurement is deleted. On the contrary, if the measurement of a non-redundant
variable is deleted, its value becomes undeterminable.
In the actual optimization step of data reconciliation, the decision variables will
consist of only the measured variables, as adjustments can only be made to
10
measurements. The calculation of determinable unmeasured variables is performed
using the reconciled values of the measured variables, and hence, this calculation can
only be carried out after the optimization is completed. Therefore it can be said that
the problem in equation (2.1) is decomposed into two steps, the optimization to obtain
reconciled measurements, and the calculation of determinable variables.
Various methods have been proposed for this variable classification / problem
decomposition (Romagnoli and Stephanopoulos, 1980; Sanchez et al, 1992; Crowe,
1989; Joris and Kalitventzeff, 1987). However, any formal method is restricted to
linear or linearized model equations.
2.3. Joint Data Reconciliation – Parameter Estimation (DRPE)
Parameters of a process are often not known and have to be estimated from the
measurements of the process variables. These estimated parameters are often important
for design, evaluation, optimization and control of the process. As measurements of
process variables are corrupted by errors, the measurements are usually reconciled first
before being used for parameter estimation. This results in two separate estimations with
two sets of variable estimates, i.e. the reconciled data satisfying the process constraints
and the data fitted with the model parameter estimate. It is most likely that these two sets
of data are not similar, albeit representing the same physical quantities. In this thesis, the
two estimation steps corresponding to data reconciliation and parameter estimation are
merged into a single joint data reconciliation – parameter estimation (DRPE) step.
11
The problem formulation, taking the weighted least square objective function as an
example, is as follows.
Min ( y − x) T Ψ −1 ( y − x)
x, u,θ
s.t.
g ( x, u , θ ) = 0 (2.3)
x L ≤ x ≤ xu
u L ≤ u ≤ uu
θ L ≤θ ≤θu
θ is the model parameter to be estimated, while the meaning of the other symbols are as
in equation (2.1). It should, however, be noted that the vector of measurements y may
now contain non-redundant measured variables which are involved in the equations to
estimate θ now included in g. Because all measurements, both redundant and non-
redundant are subject to the constraints which now includes data reconciliation process
constraints and parameter estimation model equations, the resulting estimates of both
variables and model parameters are now consistent with the whole set of constraints.
The joint data reconciliation – parameter estimation is also a general formulation of the
error-in-variables method (EVM) in parameter estimation, where there is no distinction
between independent and dependent variables and all variables are subject to
measurement errors. Main aspects of the joint DRPE include the general algorithm for the
solution strategy, the optimization strategy and the robustness of the estimation. The
optimization and robustness issues are similar to those in data reconciliation; therefore, it
12
will not be discussed here. The optimization strategy is discussed in Section 2.2, while
the estimation robustness is discussed in Section 2.4.
In error-in-variables method (EVM), the need for efficient general algorithm for the
solution strategy arises due to the large optimization problem that results from the
aggregation of independent and dependent variables in the estimation. From the point of
view of data reconciliation, the computation complexity is due to the addition of non-
redundant variables and the unknown model parameters to be estimated. The general
algorithm can be distinguished into three main approaches (Romagnoli and Sanchez,
2000):
(1) Simultaneous solution methods
This is the most straightforward approach, i.e. solving the joint estimation of
variables and model parameters simultaneously. This approach relies on efficient
optimization method that is able to handle large-scale problems involving large
amount of decision variables. This is because considering there are p model
parameters to be estimated and N set of measurements of m variables, the number of
decision variables will be (p+Nm).
(2) Nested EVM
Reilly and Patino-Leal (1981) was the first to propose the idea of nested EVM, i.e.
decoupling the parameter estimation problem from the data reconciliation problem,
where the data reconciliation problem is optimized at each iteration of the parameter
estimation problem. While they used successive linearization for the constraints, Kim
13
et al (1990) later replaced the linearization with the more general non-linear
programming. The algorithm due to Kim et al is as follows (Romagnoli and Sanchez,
2000):
Step 1:
At the first iteration, x = y, and θ = θ 0 (initial guess of the parameter)
Step 2:
Find the minimum of the function for x and θ :
Min ( y − x) T Ψ −1 ( y − x)
θ
s.t.
Min ( y − x) T Ψ −1 ( y − x)
x
s.t.
g ( x, θ ) = 0
x L ≤ x ≤ xu
θ L ≤θ ≤θu
The optimization size is reduced to the same order as the number of parameters to be
estimated.
(3) Two-stage EVM
Valko and Vadja (1987) proposed an algorithm where the data reconciliation and
parameter estimation steps are essentially also decoupled. By linearizing the
constraints and solving the weighted least square optimization analytically, they
14
utilizes the analytical solution to manipulate the problem such that it can be re-
formulated into two optimization stages: the first stage to optimize for the model
parameters, keeping the variable estimates fixed at the results from previous iteration,
and the second to optimize for the variable estimates, keeping the parameter values
fixed at the values obtained from the preceding step. The resulting algorithm is
compact yet flexible. However, the ability to decouple the problem into two stages
depends heavily on whether the optimization problem can be solved analytically.
Therefore, it is restricted to only few very simple estimators such as the weighted
least squares.
2.4. Robust Estimation
The conventional and most prevalent form of estimator is the weighted least squares
formulation. It has been shown in Section 2.1 that if maximum likelihood estimation is
considered, the weighted least square estimates are the maximum likelihood estimates
when the measurement errors follow the multivariate normal (Gaussian) distribution in
equation (2.2). However, the normality assumption is rather restrictive; it assumes that a
measurement will lie within a small range around the true variable value, that is, the error
consists of natural variability of the measurement process. The presence of gross errors,
whose magnitudes are considerably large compared to the standard deviation of the
assumed normal distribution, presents the risk of the weighted least square estimates
becoming heavily biased. Besides, the natural variability of the measurement process
may also be better characterized by distributions other than normal.
15
The usual approach to deal with departure from normality is by detecting the presence of
gross errors through statistical tests. Some typical gross error detection schemes are the
global test, nodal test, and measurement test (Serth and Heenan, 1986; Narasimhan and
Mah, 1987). The global and nodal tests are based on residuals of model constraints, while
the measurement test is based on Neyman-Pearson hypothesis testing using the residuals
between the measurements and the estimates (Wang, 2003). The limitations of these
complementary tests are as follows. Firstly, the statistical tests are still based on the
assumption that the error is normally distributed; they detect deviations from normal
distribution. When the empirical character of the error differs significantly from the
normal distribution, the results of the statistical test might be very misleading. The
second limitation is the restrictions of the tests to linear or linearized constraint equations.
In a practical setting, the model equations are usually nonlinear and linearization will
introduce some approximation errors that will confound the statistical gross error tests.
An alternative approach to deal with gross errors is to reformulate the objective function
to take into account the presence of gross errors from the beginning, such that gross
errors can be suppressed while performing the estimation. In this case, there is no need
for a separate procedure such as the previously mentioned statistical tests to recover from
gross errors. A seminal work that proposed this approach is the maximum likelihood
estimator based on the contaminated normal density proposed by Tjoa and Biegler
(1991). Instead of assuming purely normal error distribution, Tjoa and Biegler combined
two normal distributions: a narrow Gaussian with the same standard deviation as that of
the weighted least squares method, and a wide Gaussian with considerably larger
16
standard deviation to represent gross errors. The density function of the contaminated
normal distribution can be expressed as
1 u2 1 u2
f (u) = (1− p) exp(− 2 ) + p exp(− 2 2 ) (2.4)
2πσ 2σ 2πσb 2σ b
where u is the measurement residual, p the probability of gross errors, b the ratio of the
standard deviation of the wide Gaussian to the narrow one, and σ the standard deviation
of the narrow Gaussian. As illustrated in Figure 2.1, this distribution has heavier tails
than the uncontaminated normal distribution, which means that it recognizes the
possibility of gross error occurring (i.e. with a probability of p as in equation 2.4). In their
paper, Tjoa and Biegler showed that the estimator is able to detect gross errors and to
recover from them in most of the cases studied.
To study the robustness of an estimator, a unifying theoretical framework has been
proposed by Huber (1981) and Hampel (1986). The analysis based on Hampel et al’s
influence function (IF) will be adopted here. To simplify presentation, the derivation of
formulae will be omitted here; details can be found in Hampel et al (1986). The influence
function aims to describe the behaviour of an estimator in the neighbourhood of the
parametric distribution assumed by the estimator. If the residual u is drawn from a
distribution with density f(u), and if T[f(u)] is the unbiased estimate corresponding to u,
then the influence function of a residual u0 is given by
17
T [(1 − t ) f + tδ (u − u 0 )] − T [ f ]
IF = ψ (u 0 ) = lim (2.5)
t →0 t
where δ (u - u 0 ) is the delta function centred about u0. The following properties are
desirable for the influence function (McDonald and Newey, 1988):
(1) The influence function should be bounded, so that the any single residual cannot
dominate and distort the estimation.
(2) The influence function should be descending to very small values as the residuals
get large, so that a single large deviating residual has negligible effect on the
estimation.
(3) The influence function should be continuous in the residuals, such that grouping
or rounding of data has minimal effect on the estimation.
The robustness of the contaminated normal estimator above and a few other robust
estimators will be studied using the influence function in the following discussion. Before
that, however, it is appropriate to introduce the class of estimators called M-estimators.
M-estimators are a slight generalization of maximum likelihood estimators proposed by
Huber (1964). The maximum likelihood estimators have the following objective function
form:
Min - Σ log f (u ) = 0
x ,θ
18
where x and θ are the variables and model parameters to be estimated, u the residuals,
and f (u ) the density function of u. This is also equivalent to solving
Σ f ′/ f = 0
where f ′ = ∂f / ∂u . Huber generalized the estimation problem to
Min Σ ρ (u ) = 0 ,
x ,θ
or equivalently,
Σ ψ (u) = 0 ,
where ρ and ψ are not necessarily of the form - log f (u ) and f ′ / f , respectively.
The reason for introducing M-estimators here is that most of the robust estimators that
have been proposed in the literature, including those going to be discussed here, fall
under this class. Furthermore, the influence function of the M-estimators is proportional
to the first derivative of the objective function, i.e.
IF ∝ ∂ρ / ∂u = ψ(u)
or for maximum likelihood estimators:
19
IF ∝ ∂ ln( f (u )) / ∂u .
In the following discussions, the term influence function and ψ function will be use
interchangeably.
Having introduced the concept of influence function and shown the simple derivation of
the influence function for M-estimators, the robustness of several M-estimators can now
be analysed. It is useful to first examine the influence function of the weighted least
squares estimator, to explain why it is not robust to outliers and to compare it with robust
estimators. Since ρ = − log f (u ) , the ρ function for the weighted least squares estimator
is ρ = u T Ψ −1u . For simplicity but without losing generality, let u be a 1x1 vector, and
Ψ = σ so that ρ = u 2 / σ . Taking the derivative of ρ , the influence function will be
proportional to the magnitude of the residual, i.e.
u2 2
IF ∝ ∂ ( ) / ∂u = u.
σ σ
Consequently, a large outlying measurement will have a proportionally large influence on
the estimation; this means that a gross error can bias the final estimates. The dotted line
in Figure 2.1 is the plot of the influence function; it is seen that the function is not
bounded and increases infinitely.
20
For comparison, the influence function of the contaminated normal estimator is also
plotted in Figure 2.1. This function can be expressed as
u p u
(1 - p)exp(- )+
exp(- 2 2 )
IF ∝ 2σ 2
b 3
2b σ u = w(u )u = ψ (u )
u p u
(1 - p)exp(- ) + exp(- 2 2 )
2σ 2 b 2b σ
For very large values of u, the weighting function w(u) can be approximated by
w(u ) ≈ 1 / b 2 ; whereas for small values of u, w(u ) ≈ 1 . This means that large residuals are
suppressed by weighting them down; while for small residuals, the estimator behaves like
a weighted least squares estimator, which can also be observed from Figure 2.1.
5
IF value
-5
-5 0 5
Normalised Residual, u
Figure 2.1. Plots of Influence Function for Weighted Least Square Estimator (dashed
line) and the Robust Estimator based on Contaminated Normal Distribution (solid
line)
21
2.5. Partially Adaptive Estimation
Besides robustness, an important goal of any estimation procedure is the estimation
efficiency. Fully adaptive estimators are, from a theoretical point of view, ideal in that
respect. The concept of adaptive estimation was proposed by Stein in 1956, who
suggested the incorporation of preliminary testing of data to determine the most likely
distribution that they are drawn from, and the use of a set of estimators corresponding to
the set of likely distributions. The idea is to use the estimator that is best suited for the
data. This is desirable in practice for obvious reasons. However, Bickel (1982) states in
his seminal work on adaptive estimation:
‘The difficulty of nonparametric estimation of score functions suggests
that a more practical goal is partial adaptation, the construction of
estimates which are (i) always n-consistent, and (ii) efficient over a large
parametric subfamily of F [the space of distributions]. Our results
indicate that . . . this goal should be achievable by using a one-step Newton
approximation to the maximum likelihood estimate for the parametric
subfamily by starting with an estimate which is n-consistent for all
of F.’
In other words, partially adaptive estimators can be constructed by adopting a family of
distributions that have a general form which depends on a number of parameters, and
each member of the family can be derived by a particular set of values of the parameters.
22
For example, Potscher and Prucha (1986) used a family of t-distributions, and McDonald
and Newey (1988) used a generalized t-distribution, the idea of which will be adopted in
this thesis. Adopting the family of distribution is the first step; the most important step is
to then estimate or adapt the distribution parameters to the data, in order to characterize
the data better to improve estimation efficiency. Of course, the partially adaptive
estimators should also be reasonably robust.
A rather similar yet different approach to the partially adaptive concept above is the
tuning of the estimator parameters to the data by means other than statistical criteria. A
good example is the joint data reconciliation and parameter estimation strategy by Arora
and Biegler (2001), who use the Akaike Information Criterion (AIC) to tune the
parameters of their redescending estimator. The difference is that the redescending
estimator is constructed based on the robustness criteria (influence function) and there are
no statistical distribution corresponding to the influence function. Therefore, no statement
about the efficiency of the estimator for any distributions can be made. However, it is a
practical approach that has been shown in the paper to perform well for all the cases
considered.
The motivation for partially adaptive estimation is to include the information about the
error characteristics into the estimation. This inclusion of prior information also
corresponds to the formulation of posterior density which is a Bayes estimation. As stated
above, this inclusion of prior information can be achieved through preliminary testing of
the data, followed by selection of most suitable estimators according to the data
23
characteristics obtained from the preliminary testing. As such the selected estimators can
be said to have been adapted to the data. The general steps of this scheme are illustrated
in Figure 2.2below.
Preliminary
Data testing of data Data
characteristics
Adaptation:
Determination of most Adapted
estimator
suitable estimators
Final Estimation
Figure 2.2 Partially Adaptive Estimation Scheme
2.6. Conclusion
It is noticed that three main issues prevail in the general joint data reconciliation –
parameter estimation scheme. The first two are none other than the estimation criteria: it
is essential for the estimator to be robust against outliers, and it is desirable to attain
higher estimation efficiency. The analysis of statistical robustness by means of the
influence function facilitates the identification and construction of robust estimators.
Meanwhile, estimation efficiency can be achieved by partially adaptive estimation where
the estimator is constructed such that it is best suited to the data. The third issue is that of
variable classification. Parameter estimation involves unknown model parameters, which
24
means that some non-redundant measurements may be present. The identification of non-
redundant measurements is essential as such measurements may bias the estimation if
gross error is present in them.
25
CHAPTER 3: JOINT DRPE BASED ON THE GENERALIZED T
DISTRIBUTION
3.1. Introduction
The Generalized T (GT) distribution is the generalization of a family of statistical
distributions comprising many important distributions such as the exponential and the t
distribution. It has the potential to be adaptive and robust, and hence, in this thesis it is
proposed as an estimator for the joint data reconciliation – parameter estimation strategy.
An introduction to the GT distribution is presented in this chapter, along with its
robustness and adaptiveness properties. At the same time, techniques used in the strategy
proposed in this thesis are also described when the relevant topic is being discussed.
These techniques are then summarized in the last section of this chapter, along with the
outline of the algorithm for the proposed strategy.
3.2. The Generalized T (GT) Distribution
The Generalized T (GT) distribution has the following density function:
p (3.1)
f GT (u;σ , p, q) = q +1 p
; -∞<u <∞
⎛ |u| ⎞ p
2σq1/ p B(1 p , q)⎜⎜1 + ⎟

p ⎟
⎝ qσ ⎠
It is symmetric about zero and unimodal. The density function is characterized by the
distribution parameters { p, q,σ } : p and q determine the shape of the distribution,
26
while σ determines the scale of the distribution spread. Figure 3.1 plots the density
function for several settings of p and q.
0.4 0.4
PDF of GT with PDF of GT with p=1
p=1
p = 1, 1.25, 2, 3, 4, 5;
0.35 p=1.25 0.35 p = 1, 1.25, 2, 3, 4, 5; p=1.25
q = 0.5; q = 50; p=2
p=2
sigma = sqrt(2) sigma = sqrt(2) p=3
0.3 p=3 0.3
p=4 p=4
p=5 p=5
0.25 0.25
fGT
fGT
0.2 0.2
0.15 0.15
0.1 0.1
0.05 0.05
0 0
-6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6
Residual, u Residual, u
(a) (b)
0.4 0.4
PDF of GT with PDF of GT with
0.35 q = 0.5, 1, 5, 10, 20, 50; p=1 0.35 q = 0.5, 1, 5, 10, 20, 50; p=1
p = 1; p=1.25 p = 5; p=1.25
sigma = sqrt(2) p=2 sigma = sqrt(2) p=2
0.3 0.3
p=3 p=3
p=4 p=4
0.25 p=5 0.25 p=5
fGT
fGT
0.2 0.2
0.15 0.15
0.1 0.1
0.05 0.05
0 0
-6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6
(c) (d)
Figure 3.1. Plot of GT density functions for various settings of distribution parameters p
and q
(a) various p; q = 0.5; (b) various p; q = 50;
(c) various q; p = 1; (d) various q; p = 5.
From Figure 3.1, the role of the distribution parameters p and q can be observed as
follows. The parameter p determines the type of shape of the distribution in the area
around its center of symmetry. Depending on this type of shape, the thickness of the
27
distribution tail also varies. For lower value of p, the density around the symmetry has
narrower shape (Figure 3.1 a or b), and hence the distribution has thicker tails. For the
same type of shape, i.e. for the same values of p, varying q will vary the thickness of the
tails (Figure 3.1 a and b; Figure 3.1 c or d).
It is also apparent from Figure 3.1 that the GT distribution has high flexibility to assume
various distribution shapes. In fact, the Generalized T defines a very general family of
density functions and combines two general forms that include as special cases most of
the stochastic specifications encountered in practice. Some of the more commonly known
special cases, along with the particular values of { p, q,σ } for each case, are depicted in the
GT family tree in Figure 3.2.
GT
p=2
q →∞
σ =α 2
Power exponential/ T-distribution,
Box-Tiao df = 2q
p=2
q →∞
p =1 σ =α 2 q = 12
Double Normal Cauchy

exponential/
Laplacian
Figure 3.2. GT Distribution Family Tree, Depicting the Relationships among Some
Special Cases of the GT Distribution (McDonald and Newey, 1989)
28
3.3. Robustness of the GT Distribution
The influence function (IF) of the GT-based estimator can be obtained as:
p −1
( pq + 1) sign ( u ) u (3.2)
ψ GT ( u ; σ , p , q ) = p
qσ 1/ p
+ u
Figure 3.3 shows the influence function with several different sets of values of { p, q, σ } ;
these set of values correspond to those in Figure 3.1. It shows that generally the IF is
bounded and actually descending when the residuals get large. However, it is also
observed that as p increases, the IF for large residuals increases and as q increases, the IF
becomes less bounded. In fact, notice that one special case where q → ∞ ,
with p = 2,σ = α 2 (Figure 3.2; α is the standard deviation) is none other than the
normal distribution. To ensure that the GT-based estimator is insensitive to large
residuals, therefore, upper bounds must be imposed on the values that p and q can take.
Finally, it is also noted that for any given plant, the effect of p and q on the influence
function will be the same, because the variations from one plant to another (or from one
measurement to another) will be taken into account by the value of sigma. The expression
of the GT distribution (equation 3.1) further affirms this, i.e. the estimation residue
(measurement error) is always scaled by sigma.
29
2.5 2.5
IF of GT with IF of GT with
2 p = 1, 1.25, 2, 3, 4, 5; 2 p = 1, 1.25, 2, 3, 4, 5;
q = 0.5; sigma = sqrt(2) q = 50; sigma = sqrt(2)
1.5 1.5
1 1
0.5 0.5
0 0
IF
IF
-0.5 -0.5
p=1 p=1
-1 p=1.25 -1 p=1.25
p=2 p=2
-1.5 p=3 -1.5 p=3
p=4 p=4
-2 p=5 -2 p=5
-2.5 -2.5
-6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6
Residue, u Residue, u
(a) (b)
2.5 2.5
IF of GT with IF of GT with
2 q = 0.5, 1, 5, 10, 20, 50; q = 0.5, 1, 5, 10, 20, 50;
2
p = 1; sigma = sqrt(2) p = 5; sigma = sqrt(2)
1.5 1.5
1 1
0.5 0.5
0
IF
0
IF
-0.5 -0.5
q=0.5 q=0.5
-1 q=1 -1 q=1
q=5 q=5
-1.5 q=10 -1.5 q=10
q=20 q=20
-2 -2
q=50 q=50
-2.5 -2.5
-6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6
Residue, u Residue, u
(c) (d)
Figure 3.3. Plots of Influence Function for GT-based Estimator with different parameter
settings
3.4. Partially Adaptive GT-based Estimator
For the GT-based approach, the distributional parameters { p, q, σ } can be estimated using
the maximum likelihood estimator (McDonald and Newey, 1988):
30
{ p, q, σ } = arg max ∑ log f GT (u0 ; { p, q, σ }) (3.3)
where u0 is the residual from the preliminary estimates. The values of { p, q,σ } are then
obtained as the parameters of a GT member from which the data are most likely sampled.
The GT estimator using such estimated values of { p, q, σ } has been shown to be
asymptotically efficient among all estimators, when the error distribution is within the
GT family. Rigorous mathematical proof can be found in McDonald and Newey (1988).
Nothing can be said about the efficiency in the case of non-GT distributed errors, and
some loss of efficiency is possible. However, since the GT family includes a wide range
of commonly encountered distributions, the fact that it is asymptotically efficient for all
distributions within this wide range makes its application very appealing.
In the current work, the maximum likelihood estimator in equation 3.3 is used to
estimate { p, q, σ } . To obtain the preliminary residuals u0, a preliminary estimation is
necessary. The choice of preliminary estimator is not limited to robust estimators. As will
be shown in Chapter 5, regardless of the robustness of the preliminary estimator, the final
estimates always converge to highly similar values. It is found that the crucial
components in this (robust and) partially adaptive estimation scheme are the use of
iteration and the robustness of the range of GT distributions considered. The iteration
scheme will be discussed in Section 3.5 about the general algorithm. The robustness of
the range of the GT distributions, on the other hand, depends on the range of values that p
and q can assume, i.e. the bounds of p and q. This will also be discussed in Section 3.5.
31
In this thesis, the median of the data set is taken as the preliminary estimated values.
Taking the median as the estimate corresponds to the use of robust L-estimator
(Albuquerque and Biegler, 1996; Hampel et al., 1986). It is feasible in the current work
as only steady state case is considered, i.e. the true values of the variables are assumed to
be constant over the time horizon considered. This method of estimating the distribution
parameters is simple, robust and computationally more convenient, as compared to
performing a full preliminary DRPE to obtain u0. It will therefore give a good initial
estimate in the iterative scheme. Moreover, since the asymptotic distribution of the
estimates depend only on the limit of the distribution parameters, and not on the
particular way by which they are estimated (McDonald and Newey, 1988), this approach
is justifiable. For cases where steady state cannot be assumed, the full preliminary DRPE
must be performed to obtain u0.
3.5. The General Algorithm
Figure 3.4 outlines the general algorithm steps for the joint data reconciliation –
parameter estimation (DRPE) based on the partially adaptive GT-based estimator.
32
Data Preliminary Preliminary
Estimation Residuals
Adaptation: Estimator
Estimation of estimator parameters Parameters
{ p , q , σ } = arg max log (f GT (u 0 ; p , q , σ )) {p,q,ó}
{ p , q ,σ }
Final Estimation
{params,vars} = arg max f (u)
{params,vars}
= arg min- log(fGT(u | p, q,σ))

{params,vars}
s.t. modelequations
Figure 3.4. General Algorithm for Joint DRPE using partially adaptive GT-based
estimator
There are three main steps in the algorithm:
1. Preliminary estimation
The preliminary estimation is necessary in the partially adaptive scheme, to provide
the initial residuals from which prior information regarding the data is first obtained.
As discussed in the previous section, the median of the data is used as the preliminary
estimates. Taking the median provides estimates that do not necessarily satisfy the
model equations; however, the constraint residues are usually small. Moreover,
compared to performing a full data reconciliation – parameter estimation, taking the
median in a steady-state case is much more computationally efficient and the median
estimates run less risk of being biased due to errors in parameter estimation because it
33
does not impose any constraints and thus is not affected by constraints which include
unknown parameters.
2. Adaptation: Estimation of the Estimator Parameters
Using the preliminary residuals obtained from the previous step, the GT parameters
{ p, q, σ } are estimated as the maximum likelihood estimates of the GT objective
function (equation 3.3). Bounds on the GT parameters values are imposed as the only
constraints in this optimization. The values of these bounds are set as p ≤ 5 , q ≤ 50
and σ ≤ 50 . These values are chosen based on the analysis of the GT influence
function, i.e. p and q are the parameters that result in a reasonably robust GT
distribution without sacrificing the efficiency of the estimation. The meaning of
reasonably robust has been discussed in Section 3.3 regarding influence function, i.e.
where upper bounds are necessary to ensure sufficiently effective suppression of large
outliers. On the other hand, not sacrificing the efficiency of the estimation means that
the range of values of p and q within these bounds must sufficiently spans the shapes
of distribution that might possibly be encountered.
3. Final Estimation: Joint DRPE using the adapted estimator parameters
The joint data reconciliation – parameter estimation is then performed using the
adapted GT estimator, i.e. GT with the estimated { p, q, σ } . The term “final” for
“final estimation” is used very loosely here and only to differentiate this DRPE step
form the other two estimation steps in the whole framework, i.e. preliminary
estimation and the estimation of GT parameters. When improvement in the efficiency
34
of the preliminary estimates is considered necessary, iterations can be performed. In
this case, the estimates from the joint DRPE are used to obtain the residuals for the
next estimation of GT parameters. This will be further discussed and investigated in
Chapter 5.
3.6. Conclusion
The Generalized T (GT) distribution is a potentially robust and efficient estimator. Its
robustness stems from the flexibility to adjust its tail weight to produce thicker tail
distributions. The flexibility to adjust its shape makes it potentially efficient as it can
adapt to data profiles that resemble the GT family of distributions. Furthermore, the GT
family of distributions is wide-ranging and includes many important distributions as
special cases. Considering these beneficial properties, the GT-based estimator for a joint
data reconciliation – parameter estimation strategy is formulated. The algorithm consists
of three main steps: the preliminary estimation and the estimation of the GT parameters
to adapt the estimator to the data, and the joint DRPE itself. The performance of this
strategy is investigated in a case study in Chapter 4, and the results are discussed in
Chapter 5.
35
CHAPTER 4: CASE STUDY
4.1. Introduction
To study the performance of the GT-based joint data reconciliation – parameter
estimation strategy, it is applied to a general-purpose chemical engineering plant. This
chapter serves to present preparatory information for the studies in Chapter 5. The
description of the plant and its virtual version is given in the first section of this chapter,
followed by the decomposition of the model equations to classify the variables. An
overview of the data generation and the DRPE strategies compared in Chapter 5 is also
given.
4.2. The General-Purpose Chemical Plant
To study the performance of the proposed strategy, a case study of the pilot-scale setting
of a general-purpose plant containing two CSTRs, a mixer and a number of heat
exchangers (Figure 4.1) is performed. Material feed from the feed tank is heated before
being fed to the first reactor and the mixer. The effluent from the first reactor is then
mixed with the material feed in the mixer and fed to the second reactor. The effluent from
the second reactor is, in turn, fed back to the feed tank and the cycle continues. Each
reactor has a steam jacket and a cooling coil for temperature control.
36
HX3
Figure 4.1. Flow Diagram of the General Purpose Plant for Application Case Study
Associated with the pilot-scale plant, a virtual environment that mimics the plant
behaviour has been developed within the Matlab/Simulink framework and will be used to
generate simulation data. Since the equations governing the operations of the reactors,
heat exchangers, mixer and feedtank are nonlinear differential algebraic equations, the
simulation for these units is implemented using the Matlab S-functions. There are
altogether 26 equations, consisting of 18 nonlinear differential equations and 8 nonlinear
algebraic equations. The model equations and descriptions can be found in Appendix A,
while the overview diagram of the simulation model is shown in Figure 4.2 below.
37
Fin,Tin,Cain
Fin,Tin,Cain Out1 F,T,Ca
F,T,Ca In
Out2
530 Tcin
530 Tcin splitter 2
T cin1 V
Tcin V U
Y Fsin
U
Y Fsin
0 C
0 C
Xsin1 FC5 Tc
U
Xsin FC2 Tc Y Fcin
U
Y Fcin
49.9 C
Fcin3
49.9 C
Fcin1 Fcin2 FC3
U Ts Terminator2
Fcin FC1 T erminator Y F
U Ts F1, T1, Ca1 40 C
Y F
40 C F, T, Ca In1
Out Fout1 FC6 REACTOR2
Fout FC4 REACTOR1 In2
F2, T2, Ca2 junction RX1_LV1

RX1_LV Selector2
SP 40
SP 40 U
X
U Vrx1_sp1
X Vrx1_sp V
Selector U
Y F RX1_FC1
40 C
RX1_FC
Fout2 FC8 SP 600
SP 600 U
MIXER X Trx_sp1
U
X Trx_sp RX1_XS1
RX1_XS MX_LV SP
SP U
U
SP 40 X
X U
X Vrx1_sp2
Out1
600 600
In
Out2
T 2_sp T2_sp1
Selector1 Selector3 splitter 3
X
X
SP
SP
ITC1 ITC2
U
U
Vft_sp 50
F1, T1, Ca1
F2, T2, Ca2
F
Selector4
600
T2_sp2
X
SP
IT C3
U
Steam hx1 Steam hx2
U U FEEDT ANK
Y In1 Out1 Y In1 Out1
[40, 670, 0] C [40, 670, 0] C U
ST1 ST2 Y
steam in FC steam in1 FC7 [40, 530, 0] C
steam in2 FC9
In2 Out2 In2 Out2
F, T, Ca
HEAT_EX1 HEAT _EX2 Out1 In1
V
ST3
Steam hx3
Terminator1
Out1
In Out2 In2
Out2
splitter 1
HEAT_EX3
Figure 4.2. Simulink Model of the General Purpose Plant in Figure 4.1.
Besides simulation data, a data set is also collected from the trial run of the real plant.
The configuration for this trial run is shown in Figure 4.3. The simulation data studied in
Chapter 5 are also generated by running the Simulink model according to this
configuration. Appendix B lists the full set of steady-state equations according to this
configuration.
38
Vrx1
Trx1
F5
T5 Trx2
F9
F12
T9
F2 F7 T12
T2 T7
HEAT EX 1 HEAT EX 2
HEAT EX 3
Vft
Tft
T13a
Figure 4.3. Configuration of the General Purpose Plant for Trial Run
Measurements are indicated in the text boxes.
Fcin
Tc1 Tcin
T2
Trx1
Vrx1
F2
T13
T5
F5
Figure 4.4. Reactor 1 Configuration Details and Measurements
39
Fcin
Tc2 Tcin
T9
Trx2
F9
Vrx2
T12
F12
Figure 4.5. Reactor 2 Configuration Details and Measurements
4.3. Variables and Measurements
The variables consist of the volume flow rates and temperatures of streams and the
temperatures and levels of vessels; however, as the current study is performed for steady-
state case, only flow rates and temperatures will be considered. Following the
configuration in Figures 4.3, 4.4 and 4.5, the steady-state model equations involve a total
of 34 variables, consisting of 13 flow variables and 21 temperature variables. Out of the
total 13 flow variables, 7 are measured; while for temperature variables, 12 out of the
total 21 are measured. The locations of measurements are indicated in the text boxes in
Figures 4.3, 4.4 and 4.5. The following lists the measurements.
40
The measured flow rates are:
(i) Feed flow to reactor 1, F2
(ii) Effluent flow from reactor 1, F5
(iii) Feed flow to mixer, before mixing with reactor 1 effluent, F7
(iv) Output flow of mixer / Feed flow to reactor 2, F9
(v) Effluent flow of reactor 2, F12
(vi) Cooling water flow to reactor 1, Fc1
(vii) Cooling water flow to reactor 2, Fc2
The measured temperatures are:
(i) Feed temperature of reactor 1, T2
(ii) Effluent temperature of reactor 1, T5
(iii) Feed temperature to mixer, before mixing with reactor 1 effluent, T7
(iv) Output temperature of mixer/ Feed temperature of reactor 2, F9
(v) Effluent temperature of reactor 2, T12
(vi) Reactor 1 vessel temperature, Trx1
(vii) Reactor 2 vessel temperature, Trx2
(viii) Inlet cooling water temperature, Tcin
(ix) Reactor 1 outlet cooling water temperature, Tc1
(x) Reactor 2 outlet cooling water temperature, Tc2
(xi) Temperature of feed tank, Tft
(xii) Temperature of stream out of heat exchanger 3, T13a
41
The measured levels are:
(i) Reactor 1 level, Vrx1
(ii) Reactor 2 level, Vrx2
(iii) Feed tank level, Vft
The unmeasured flow rates are:
(i) Output flow of feedtank
(ii) Flow of cooling water in heat exchanger 3
(iii) Flow of heating steam into heat exchanger 2
(iv) Flow of condensate out of heat exchanger 2
(v) Flow of heating steam into heat exchanger 1
(vi) Flow of condensate out of heat exchanger 1
The unmeasured temperatures are:
(i) Output temperature of feedtank
(ii) Temperature of cooling water into heat exchanger 3
(iii) Temperature of cooling water out of heat exchanger 3
(iv) Temperature of material into heat exchanger 2
(v) Temperature of steam into heat exchanger 2
(vi) Temperature of condensate out of heat exchanger 2
(vii) Temperature of material into heat exchanger 1
(viii) Temperature of steam into heat exchanger 1
(ix) Temperature of condensate out of heat exchanger 1
42
The unmeasured level is:
(i) Mixer level
4.4. System Decomposition: Variable Classification
For reasons discussed in Chapter 2, it is beneficial to first simplify the equations to the
reduced form that contains only measured variables, before subjecting them as constraints
in the DRPE procedure. Next, it is important to identify non-redundant measurements
from the set of measurements included in the reduced equations. There could also be non-
redundant measurements not included in the reduced equations. The importance of
identifying non-redundant measurements is also discussed in Chapter 2. Finally, from the
set of unmeasured variables, the determinable variables can be distinguished from the
undeterminable ones. These steps are described in the following subsections. The final
classification is summarized in Table 4.1.
4.4.1. Derivation of Reduced Equations
The reduced set of equations in this case is straightforward to obtain by observations of
the complete set of steady-state equations, because all of the variables around most units
are measured, so that equations with measured variables and those with unmeasured ones
are clearly distinguished from each other. It is therefore not necessary to use formal
variable classification methods such as the ones mentioned in Chapter 2. The reduced
equations are as follows.
43
Reactor 1
Mass Balance: F2 − F5 = 0 (4.1)
U c1 Ac1
Energy Balance: F2T2 − F5T5 − (T − T ) = 0 (4.2)
ρC p rx1 c1
U c1 Ac1
Cooling Coil Energy Balance: Fc1 (Tc ,in − Tc1 ) + (T − T ) = 0 (4.3)
ρ c C pc rx1 c1
Mixer
Mass Balance: F5 + F7 − F9 = 0 (4.4)
Energy Balance: F5T5 + F7T7 − F9T9 = 0 (4.5)
Reactor 2
Mass Balance: F9 − F12 = 0 (4.6)
U c 2 Ac 2
Energy Balance: F9T9 − F12T12 − (T − T ) = 0 (4.7)
ρC p rx 2 c 2
U c 2 Ac 2
Cooling Coil Energy Balance: Fc 2 (Tc ,in − Tc 2 ) + (T − T ) = 0 (4.8)
ρ c C pc rx 2 c 2
F indicates flowrates and T indicates temperatures. The indices refer to the same entities
as listed previously in Section 4.2. Uc is the overall heat transfer coefficient of the
cooling coil, while Ac is the effective cooling area. In the present work, the model
parameters to be estimated are the product of the overall heat transfer (Uc) with the
effective cooling area (Ac) for both reactors, i.e. Uc1Ac1 and Uc2Ac2.
44
4.4.2. Classification of Measurements
Following the definition of redundant measurements, it can be deduced that redundant
measurements can only appear in the reduced set of equations. However, as there are
reduced equations with unknown model parameters to be estimated, some measurements
in the reduced equations may be non-redundant. The formal classification procedures can
of course be performed to determine these non-redundant measurements, but as the
number of variables is not too large, the classification can be done using symbolic
manipulation. This way, linearization can be avoided.
Firstly, all measured variables involved in reduced equations without unknown
parameters are redundant. This means that only the variables that appear only in
equations with unknown parameters need to be tested for redundancy. To test a variable
for its redundancy, it is assumed as unmeasured. If its value can be determined from other
measurements through symbolic manipulation of the other reduced equations, then it is
redundant, and vice versa. Another alternative is if the unknown parameters in the
reduced equations can be eliminated through algebraic manipulation of the equations that
contain them, then the measured variables that are also eliminated from the resulting set
of equations are non-redundant. For example, from the set of reduced equations above, it
is straightforward to see that if equations 4.2 and 4.3 are summed together, the term
containing the unknown parameter UcAc1 can be eliminated. However, the measured
variable Trx1 is also eliminated in the process, and it does not appear in the other
equations 4.1 and 4.4 to 4.8. Therefore, the measurement of Trx1 is non-redundant. The
45
same can be performed to equation 4.7 and 4.8, leading to the conclusion that Trx2 is
non-redundant. Since the remaining equations other than 4.2, 4.3, 4.7 and 4.8 do not
contain any unknown parameters, the measurements of all the other measured variables
in those equations are redundant.
Nevertheless, since the classification methods based on linearized and bilinear model are
more formal, these two methods are also performed in this thesis to classify the variables
in the reduced set of equations above. The results of the classification are then used as a
check for that obtained using symbolic manipulation. It is found that the results of
classification using symbolic manipulation, orthogonal transformation with linearized
model and orthogonal transformation with bilinear model agree with one another.
Finally, it is also straightforward to deduce that the measurements of variables that do not
appear in the reduced set of equations are non-redundant. As such, referring to the
complete set of steady-state equations, T13a is classified as non-redundant. The results of
the variable classification are summarized in Table 4.1.
46
Measured Variables Unmeasured Variables
Redundant Non-redundant Determinable Undeterminable
Flow rates F2, F5, F7, F9, None F13 F1, F6, Fc3
F12, Fc1, Fc2
Temperatures T2, T5, T7, T9, Trx1, Trx2, T13 T1, T16, T17,
T12, Tcin, Tc1, T13a T6, T14, T15,
Tc2 Tcin3, Tc3
Table 4.1. Result of Variable Classification for the General Purpose Chemical Plant
4.5. Data Generation
4.5.1. Simulation Data
Simulation data are generated by random number generators in Matlab, for several
different distributions, namely Normal, Laplacian, uniform and t distribution. The
different error distributions can be considered outliers if their magnitudes are large. From
optimization point of view, when an outlier is detected by the robust estimator, the
variable is then regarded as unmeasured. Consequently, depending on the structure of the
constraint equations, there are certain variables that when regarded as unmeasured,
become indeterminable. In fact, these variables are the non-redundant measurements.
This issue will cause difficulty in the optimization as there is no unique solution
available. These situations are avoided in the data generation in this case study.
47
4.5.2. Real Data
A data set was collected from a trial run of the pilot plant. The data collection setup
consists of OPCTM (Object Linking and Embedding for Process Control) server
connection to the DCS and a Microsoft® Excel interface connected to the OPCTM server.
A Matlab® interface is then developed to sample the data from Excel at fixed sampling
intervals. The joint data-reconciliation and parameter estimation is then performed on the
data set. As mentioned before, the plant configuration and location of measurements
during the trial run is the same as those of the simulation model used to generate the
simulation data. Therefore the reduced set of equations and the classification of variables
in Section 4.1 also apply to this data set.
4.6. Methods Compared
Three DRPE methods are compared, i.e. the weighted least squares (WLS), the
contaminated normal estimator and the GT distribution. The weighted least squares
(WLS) is chosen as a comparison because it is the best representation of non-robust
methods, one that is the most efficient if the conventional normal assumption is satisfied.
It will be shown that deviations from normal assumption result in the deterioration of the
performance of WLS. The contaminated normal estimator is chosen because of its
robustness, and also because it is always used with pre-assigned parameter values,
namely the amount and magnitude of the Normal contamination. In practice, the amount
and magnitude of contamination are unknown, so these assigned values are just
heuristics. However, it will be shown that by estimating the parameter values from the
48
data, i.e. by partially adaptive estimation, the efficiency of the contaminated Normal
estimator can be improved. Nevertheless, due to its pre-assigned distribution shape, the
profiles of data that this estimator can adapt to are rather limited. On the other hand, the
GT distribution can accommodate many distributions within its distribution family,
which covers many different distribution profiles, and therefore has more advantage in
the partially adaptive scheme.
The detailed configurations of the methods are as follows:
(i) Weighted Least Squares (WLS)
The objective function of the WLS is:
ρ = ( y − x)T Ψ −1 ( y − x)
where y is the measurement, x is the estimate and Ψ is the covariance matrix of the
measurement errors.
The covariance matrix is estimated from the data set using the following formula:
1 r
cov( yi , y j ) = ∑ ( yik − yi )( y jk − y j )
r − 1 k =1
(4.1)
where
49
1 r
yi = ∑ yik ,
r k =1
(4.2)
r is the number of sampling points in the data set and y i is the measurement of the
i -th variable.
(ii) Estimator based on the contaminated Normal distribution
The parameters p, b and σ of the contaminated Normal are estimated from
preliminary residuals. The preliminary residuals and the distribution parameters are
estimated in the same way as they are for the partially adaptive GT-based method,
i.e. as described in Chapter 3.
(iii) Partially adaptive GT-based method
The configuration is the same as set forth in Chapter 3, i.e. the GT-based estimator
with distribution parameters p, q and σ estimated from preliminary residuals.
4.7. Performance Measures
Two measures are used to quantify the efficiency of the DRPE methods, i.e. the mean
square error (MSE) and the percentage of model parameter estimation accuracy (%-
discrepancy). The MSE is calculated as:
50
)
1 K m ( xi, j − xi , j )
2
MSE = ∑∑ σ 2
mK j =1 i =1
(4.3)
i
where m is the number of measured variables and K is the number of data sets used for
)
the DRPE (m=17, K=20x100 in this study). x i , j and x i , j are the estimates of the reconciled
data and the actual value of the variable, respectively, while σ i is the standard deviation
of the Gaussian noise on sensor i.
The %-accuracy is calculated as:
| parameter estimate - actual value|

% discrepancy = 100% × (4.4)
actual value
More explanation on these measures will be given in Chapter 5, in relation to the cases
studied.
4.8. Conclusion
A general purpose chemical plant is used to generate data to investigate the performance
of the GT-based joint DRPE strategy. Based on the location of measurements, the
variables are classified as redundant/ non-redundant and determinable/ undeterminable
accordingly. Both real and simulation data are generated; for simulation data, different
error distributions with and without outliers are generated. Two other methods, i.e. the
weighted least squares and the contaminated Normal estimators are used for DRPE as
51
comparison. The weighted least squares method represents non-adaptive, non-robust
methods while the contaminated Normal method represents other alternatives for partially
adaptive methods. The results of comparison are presented in Chapter 5.
52
CHAPTER 5: RESULTS & DISCUSSION
5.1. Introduction
Several case studies are conducted based on the data generated from the general-purpose
plant described in Chapter 4. Firstly, data with various error distributions are input to the
joint DRPE strategy to study the efficiency due to its adaptiveness. Secondly, the data
with various error distributions are contaminated with outliers to investigate the
robustness of the DRPE strategies. With regard to the partially adaptive scheme, the
effects of data size and choice of preliminary estimators are then studied.
5.2. Partial Adaptiveness & Efficiency
To investigate the efficiency of the partially adaptive GT-based DRPE strategy, the
strategy is used to reconcile variables and estimate model parameters for data with
various error distributions. For each distribution, 10 sets of 100 data samples each are
generated.
The generated distributions are:
(i) Normal (Gaussian) distribution
(denote the standard deviation of this distribution as σ)
(ii) Laplace distribution,
a. With standard deviation σ
b. With standard deviation 3σ
c. With standard deviation 5σ
53
(iii) Uniform distribution
a. With range from -3σ to +3σ
b. With range from -5σ to +5σ
(iv) t distribution
a. With 1 degree of freedom (dof)
b. With 2 degrees of freedom
c. With 3 degrees of freedom
These are some of the more common distributions that might represent the empirical
character of the actual physical measurement processes. The Normal distribution is the
most commonly encountered; its density function has a bell shape and 99.73% of the
errors occur within -3σ to +3σ from the true value of the variable, where σ is the standard
deviation. The Laplace distribution is the double-sided exponential distribution; it has
thicker tails than Normal, which means that the possibility of large errors occurring is
higher than Normal. The t distribution also has thicker tails than Normal, especially for
small degrees of freedom. The uniform distribution is different from the other three
distributions above, as it does not assume that smaller errors are more likely to occur than
larger ones; instead, the magnitudes of errors are within a certain range with any value
within the range having the same likeliness to occur.
As measures of efficiency, the mean squared error (MSE) is used for the estimates of
variables and the percent of absolute discrepancy is used for the estimates of model
parameters. The mean squared error (MSE) is calculated as:
54
)
1 K m ( xi, j − xi , j )
2
MSE = ∑∑ σ 2
mK j =1 i =1
(5.1)
i
where m is the number of measured variables and K is the number of data samples used
)
for the DRPE (i.e. m=17 variables, and K = 20 sets x 100 samples). x i , j and x i , j are the
estimates of the reconciled data and the actual value of the variable, respectively, while
σ i is the standard deviation of the Normal (Gaussian) noise on sensor i. Meanwhile, the
accuracy is calculated as:
| parameter estimate - actual value|

% discrepancy = 100% × (5.2)
actual value
The MSE can be thought of as a kind of empirical measure of the variance of the
estimates; therefore it is suitable for measuring efficiency, as it can be recalled from
estimation theory that the estimation efficiency is usually defined in terms of variance
(e.g. the Cramer-Rao lower bound for asymptotic variance). The same value of “standard
deviation” σ i , that is, the standard deviation used to generate the Normal noise, is used in
calculating the MSE for all cases of noise distributions, instead of using the actual or
estimated standard deviation of every respective distribution. The purpose is to establish
the same standard to visualize and compare the performance of the method across
different noise distributions with different spreads.
Figure 5.1 illustrates the MSE of the resulting variable estimates. It is observed that the
weighted least squares (WLS) method performs efficiently for Normal error compared to
55
the other two methods. This is expected, as it is theoretically the most asymptotically
efficient estimator for Normal error. However, for other error distributions, the WLS is
becoming less and less efficient in comparison to the other two methods, and in
comparison to its performance for Normal error. It is seen that the MSE of WLS
estimates is highly sensitive to the spread of the error distributions. The most obvious
manifestation is in the case of t errors with one and two degree of freedoms, as these
distributions have considerably thicker tails than the others (from Table 5.1, the MSE of
the measurements for t error with one and two degrees of freedoms are the highest and far
higher than the other error distributions). For further illustration, consider the plot of the
percentage of relative MSE, i.e. the percentage of MSE of variable estimates with respect
to the MSE of measurements, in Figure 5.4. The relative MSE for WLS estimates are
observed to be fairly similar across the different error distributions, whereas the MSE of
measurements are actually vastly different as seen in Table 5.1. This means that the
efficiency of WLS estimates is proportional to the measurement error magnitudes,
regardless of the extent of the magnitude. The sensitivity of the WLS estimates to the
error magnitudes can be explained by recalling the influence function of WLS estimator
described in Chapter 2. The influence function has, indeed, a linear relationship with the
magnitude of the error.
56
MSE of Variable Estimates
0.35
0.3
0.25
0.2 WLS
Cont_Norm
0.15
GT
0.1
0.05
0
e1
e3
e5
t1
t2
t3
al
rm
rm
m
ac
ac
ac
or
fo
fo
pl
pl
pl
N
ni
ni
La
La
La
Figure 5.1. MSE Comparison of GT-based with Weighted Least Squares and
Contaminated Normal Estimators
Table 5.1. MSE of Measurements
Error Distribution MSE of Measurements
Normal, σ 0.9962
Laplace, σ 1.0145
Laplace, 3σ 8.0945
Laplace, 5σ 22.3552
Uniform, [-3σ,3σ] 2.7740
Uniform, [-5σ,5σ] 7.4459
t, dof = 1 1323.8090
t, dof = 2 41.4721
t, dof = 3 9.5355
57
%MSE of Estimates w.r.t. MSE of Measurements
0.01
0.009
0.008
0.007
WLS
0.006
Cont_Norm
0.005
GT
0.004
0.003
0.002
0.001
t1
t2
t3
e1
e3
e5
al
5
rm
rm
m
ac
ac
ac
or
fo
fo
pl
pl
pl
N
ni
ni
La
La
La
Figure 5.2. Percentage of Relative MSE
Turning the discussion to the contaminated normal estimates, the MSE results in Figure
5.1 show that this estimator performs more efficiently than the WLS in most cases,
particularly in the case of t errors. The percentage of relative MSE in Figure 5.4 also
shows that the larger the measurement errors, the smaller the relative MSE for this
estimator is. This reflects the descending trend of the influence function of this estimator
for large values of errors. An exception of the efficient performance is for the case of
uniform error, where the performance is just comparable to the WLS estimator. This can
be explained by looking at the behaviour of the influence function around zero, i.e. for
small residuals. Theoretically the uniformly distributed errors are concentrated with
uniform probability around the true values; therefore, residuals should also have uniform
influence regardless of their proximity to the true values, as long as they are within the
58
range. The contaminated normal estimator, however, behaves very similarly to the WLS
for small residuals, i.e. the smaller the residual, the smaller the influence is. The other
distributions (except the uniform distribution) have kurtosis that can be more or less fitted
to that of the Normal distribution by adjusting the spread of the Normal distribution,
which is why for the other distributions the partially adaptive contaminated Normal
estimator performs with considerable efficiency.
On the other hand, the partially adaptive GT-based estimator performs well for all cases
of error distribution; its efficiency is comparable to or better than that of the partially
adaptive contaminated estimator (Figures 5.1 and 5.2). This can be attributed to its
adaptability to many different error profiles, and also to its robustness against large
errors. The most evident example of how the adaptability results in more efficient
estimates is the MSE performance in case of the uniform distribution. As discussed
before, the Normal and contaminated Normal estimator are not able to follow closely the
kurtosis of the uniform distribution. This is not the case for the GT distribution as
uniform distribution is actually one of its limiting cases, i.e. with the distribution
parameter p and q very large (pÆ~, qÆ~).
Observation of the relative MSE in Figure 5.2 also reveals the robustness the GT-based
estimator, where it suppresses large errors and therefore is rather insensitive to the
magnitude of these errors.
59
It is also noticed that the generated error distributions are all the limiting cases in the GT
family tree in Figure 3.3. This confirms the assertion in Chapter 3 that although the
bounds placed on the GT parameters p and q excludes such limiting cases as the uniform
(pÆ~, qÆ~) and the Normal and Laplace distributions (qÆ~), the efficiency of the
estimates is not much compromised.
Figures 5.3 and 5.4 show the percentage of the parameter estimation errors for the overall
heat transfer coefficients of Reactor 1 and Reactor 2 cooling coils, respectively. As
expected, these results mirror the MSE performance in Figure 5.1. This is because the
variables are the source of information for the value of the parameter, and hence, the
accuracy of the variable estimates will be reflected in the accuracy of the parameter
estimates.
%Discrepancy of Param1 Estimates
1.4
1.2
1
WLS
0.8
Cont_Norm
0.6
GT
0.4
0.2
0
al
t1
t2
t3
e1
e3
e5
rm
rm
m
ac
ac
ac
or
fo
fo
pl
pl
pl
N
ni
ni
La
La
La
coefficient of Reactor 1 cooling coil
60
%Discrepancy for Param2 Estimates
2.5
2
1.5 WLS
Cont_Norm
1
GT
0.5
0
al
t1
t2
t3
e1
e3
e5
rm
rm
m
ac
ac
ac
or
fo
fo
pl
pl
pl
N
ni
ni
La
La
La
coefficient of Reactor 2 cooling coil
Figure 5.5 is presented to help visualize the preceding discussion on how the GT-based
estimator is flexible enough to adapt to the different error profiles, while the
contaminated Normal and weighted least squares (which assumes Normal error) may not
be as flexible. The gray vertical bars in the figure are the histogram of the measurement
errors, scaled so that its total area is one; therefore, it is a kind of relative frequency plot.
The distribution parameters of the GT (i.e. {p,q,σ}), contaminated Normal (i.e. {p,b,σ})
and the weighted least squares (i.e. σ) that are used to generate the plots in Figure 5.5 are
listed in Table 5.2. These parameters are the results of the estimation of the estimator
parameters in the partially adaptive scheme. Along with these parameters, the “true”
underlying GT parameters of the distributions from which the errors are generated are
also included as a comparison. These “true” parameter values are, of course, the ones
shown in the GT family tree in Figure 3.3 (Chapter 3) for each respective distribution.
61
1.4 0.5
GT GT
ContNorm 0.45 ContNorm
1.2
WLS WLS
0.4
Relative Frequency / PDF(u)
Relative Freq / PDF(u)

0.35
0.8 0.3
0.25
0.6
0.2
0.4 0.15
0.1
0.2
0.05
0 0
-1.5 -1 -0.5 0 0.5 1 1.5 2 -5 0 5
(a) Normal, σ = 0.5 (b) Laplace, std dev = 3σ = 1.5

0.35
GT 0.25 GT
ContNorm ContNorm
0.3
WLS WLS
0.2
0.25
0.2 0.15
0.15
0.1
0.1
0.05
0.05
0 0
-3 -2 -1 0 1 2 3 -20 -15 -10 -5 0 5 10 15 20
(c) Uniform, [-5σ, 5σ] = [-2.5, 2.5] (d) t, degree of freedom = 1
Figure 5.5. Adaptation to Data: Fitting the Relative Frequency of Residuals with GT,
Contaminated Normal, and Normal distributions
62
Table 5.2. Estimated parameters of the partially adaptive estimators used to generate
Figure 5.5
GT
Cont. Normal WLS
Estimated “True”
(a) Normal, p = 1.8983 p=2 p = 0.05 σ = 0.4977
σ = 0.5 q = 10.0478 qÆ~ b=5
σ = 0.6431 σ = 0.5x√2 = 0.7071 σ = 0.4577
(b) Laplace, p = 1.1866 p=1 p = 0.05 σ = 1.2819
std dev = q = 47.1412 qÆ~ b=5
3σ = 1.5 σ = 1.1367 (σ = 1.5/√2 = 1.0607) σ = 1.0929
(c) Uniform, p=5 pÆ~ p = 0.05 σ = 1.4891
[-5σ, 5σ] = q = 50 qÆ~ b=5
[-2.5, 2.5] σ = 2.4701 (σ = range / 2 = 2.5) σ = 1.466
(d) t, p = 1.9022 p=2 p = 0.3322 σ = 3.7444
dof = 1 q = 0.8315 q = 0.5 b = 5.3714
σ = 1.9043 σ = 1.1638
63
5.3. Effect of Outliers on Efficiency
To investigate the effect of outliers on the efficiency of the partial adaptive methods, 5%
of the data in Section 5.1 above are replaced with contaminations of randomly signed
errors of fixed sizes. The magnitude of this contamination is 10σ, where σ is the standard
deviation of the Normal error before being contaminated with outliers. The MSE of the
resulting measurements are shown in Table 5.3. Compared to Table 5.1, the MSE is now
generally larger than the MSE of the measurements without outliers.
Table 5.3. MSE of Measurements with Outliers

Error Distribution,
with outliers MSE of Measurements
(5% + 10σ)
Normal, σ 5.3616
Laplace, σ 5.3861
Uniform, [-3σ,3σ] 7.0368
Uniform, [-5σ,5σ] 11.4773
t, dof = 1 1980.0869
t, dof = 2 29.0566
t, dof = 3 14.4804
64
Figure 5.6 charts the MSE of the resulting variable estimates. Compared to Figure 5.1,
the performance of the weighted least squares here deteriorates considerably, even for the
Normal error. On the other hand, the contaminated normal and GT estimators are able to
maintain rather similar MSEs as those in Figure 5.1. This is because of both the
robustness and adaptability of both estimators. The adaptability of the estimators allows
for the change in data profiles to be accommodated by change of the estimator
parameters, while the robustness ensures that the adaptation to the data profile does not
compromise the estimation efficiency.
Figure 5.7 compares the performance for data with and without outliers, of the
contaminated Normal and the GT estimator. It is observed that the GT estimator is
generally less sensitive to outliers than is the contaminated Normal estimator. The
exception is for the errors with uniform distribution, as this distribution has originally
very thin tails compared to the others. When outliers are present in the data, the GT has to
adapt by assuming the PDF with thicker tails. This makes the GT estimator robust to the
outliers; however, the trade-off is the decrease of efficiency for the data within the
uncontaminated range. Nevertheless, the performance of the GT estimators in the
uniform error cases is still best compared to the other two estimators, which means that
the trade-offs in efficiency made by the other two estimators are greater than that of GT.
65
MSE for Various Error Distributions
0.3
0.25
0.2
WLS
MSE
0.15 Cont_Norm
0.1 GT
0.05
0
La e1
La e3
e5
t1
t2
t3
al
rm
rm
m
ac
ac
ac
or
fo
fo
pl
pl
pl
N
ni
ni
La
Figure 5.6. MSE of Variable Estimates for Data with Outliers
Cont_Norm GT
0.16 0.16
0.14 0.14
0.12 0.12
0.1 0.1 Outliers
0.08 0.08
0.06 0.06
No Outliers
0.04 0.04
0.02 0.02
0 0
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
Figure 5.7. Comparison of MSE with and without outliers for GT and Contaminated
Normal Estimators
66
5.4. Effect of Data Size on Efficiency
Since the efficiency of the partially adaptive estimators is achieved through its
adaptability to the data, it is natural to question the effect of the data size on the
estimation efficiency. Intuitively, the more data samples are used, the more accurate the
estimates will be. Particularly for partially adaptive estimators, where it is important to
follow the data profile closely, more data samples will mean that closer fit is possible,
and less data samples will lead to the question as to whether the partially adaptive
estimators are still viable compared to other non-adaptive estimators. To investigate this,
the joint DRPE is performed using three different data sizes, i.e. data sets with 20, 50 and
100 samples. The MSE results for the weighted least squares, contaminated Normal and
GT estimators are shown in Figure 5.8.
67
WLS Performance for Different Data Sizes
4.0000
3.5000
3.0000
2.5000 ns = 100
2.0000 ns = 50
1.5000
ns = 20
1.0000
0.5000
0.0000
t2
t3
al
5
e1
e3
e5
m
m
ac
ac
ac
r
or
fo
fo
pl
pl
pl
N
ni
ni
La
La
La
U
Cont. Normal Performance for Different Data Sizes
5.0000
4.0000
ns = 100
3.0000
ns = 50
2.0000
ns = 20
1.0000
0.0000 t2
t3
al
5
e1
e3
e5
m
m
ac
ac
ac
r
or
fo
fo
pl
pl
pl
N
ni
ni
La
La
La
GT Performance for Different Data Sizes
5.00
4.00
ns = 100
3.00
ns = 50
2.00
ns = 20
1.00
0.00
t2
t3
al
5
e1
e3
e5
m
m
ac
ac
ac
r
or
fo
fo
pl
pl
pl
N
ni
ni
La
La
La
Figure 5.8. MSE Results of WLS, Contaminated Normal and GT-based estimators for
Different Data Sizes
68
Although the scale of the MSE is not the same for each of the method, the same pattern is
observed when comparing the MSE across the different data sizes, that is, the relative
scale of the MSE from one data size to another looks similar for each of the three
methods. To show this, Figure 5.9 is plotted. Each vertical bar in the figure is the
percentage of the difference between MSE for the larger data size and MSE for smaller
data size, with respect to the MSE for the smaller data size; hence, it illustrates the
improvement in efficiency when data size is increased. It is seen that the improvement for
each error case is comparable among the three DRPE methods. This shows that although
when the data size is small, the efficiency of the partially adaptive estimators (in this
case, GT and the contaminated Normal) decreases, so does that of the non-partially
adaptive estimators (in this case, the weighted least squares). Therefore, it can be
concluded that the partially adaptive estimators, are in fact still quite as viable as other
estimators, even for relatively small data sizes (ns=20).
69
Improvement of Relative MSE from ns=50 to ns=100
100
80 WLS
60 Cont_Norm
40 GT
20
0
t2
t3
al
e1
e3
e5
5
rm
rm
m
ac
ac
ac
or
fo
fo
pl
pl
pl
N
ni
ni
La
La
La
U
Improvement of Relative MSE from ns=20 to ns=100
100
WLS
80
Cont_Norm
60
GT
40
20
0
t2
t3
al
e1
e3
e5
5
rm
rm
m
ac
ac
ac
or
fo
fo
pl
pl
pl
N
ni
ni
La
La
La
Figure 5.9. Improvement in MSE Efficiency when Data Size is increased
70
5.5. Estimation of Adaptive Estimator Parameters: Preliminary Estimators and
Iterations
A main feature of the partially adaptive estimation is the preliminary estimation to extract
information about the data from the data themselves. At the preliminary estimation stage
itself, however, the information about the data is yet to be obtained. The most suitable
estimator for the data is therefore not known at this stage. The best option at this point is
to first employ a pre-assigned estimator to produce the preliminary residuals. The
efficiency of these robust preliminary estimators may not be the best; however, by means
of iterative estimation of the adaptation parameters, the efficiency can be improved. This
preliminary estimation – iteration scheme is summarized in the flowchart in Figure 5.10.
71
Data Preliminary Estimation
Preliminary
Residuals
{u0}
Adaptation:
Estimation of estimator parameters
{ p , q , σ } = arg max log (f GT (u 0 ; p , q , σ ))

{ p , q ,σ }
Estimator
Parameters Residuals =
{p,q,ó} (Measurement Data) – (Reconciled Data)
GT-based Joint DRPE

{ params, vars } = arg max f ( u )
{ params, vars }
= arg min - log (f GT (u | p , q , σ ))

{ params, vars }
s.t. model equations
Reconciled Data,
Model Parameters
Maximum
iterations
reached?
RETURN
Reconciled Data,
Model Parameters
Figure 5.10. Iterative Joint DRPE with Preliminary Estimation
72
The joint DRPE using two different preliminary estimators: the median and the weighted
least squares estimator, is performed to study the role and effect of the preliminary
estimation. The results for a set of trial runs for each error distributions are shown in the
final MSE comparison in Figure 5.11, and the plot of MSE from one iteration to the
other, for one particular run for each error distribution, in Figure 5.12.
The results lead to the conclusion that the choice of the preliminary estimator is not really
important. From Figure 5.12, it is seen that regardless of the preliminary estimator, the
final MSE all converge to the same value. The question arises as to how the
unsatisfactory residuals produced by the weighted least squares, especially in case of
non-Normal and large errors, can be improved through iterations. To answer this
question, the evolution of the estimated GT parameters is investigated. If the residuals are
satisfactory, the estimated GT parameters will be representative of the underlying error
distribution. On the other hand, if the residuals are not satisfactory, it is likely that the
estimated GT parameters are far from the actual error distribution. This is confirmed by
observing the GT parameters estimated from the weighted least squares residuals in the
case of t noise, where these GT parameters are far from reflecting the true distribution.
However, it is also observed that many of the GT parameters take the values of their
respective bounds. Another run of joint DRPE with the weighted least squares as
preliminary estimator is then performed, but the GT parameter bounds for the maximum
likelihood optimization to estimate GT parameters are now relaxed. It is found that the
final MSE is now not as efficient as the previous run with stricter bounds; in fact, the
final estimates of variables are biased. It can therefore be concluded that the GT
73
parameter bounds play an important role in estimating the GT parameters. The bounds
should be set such that for all possible parameter settings, the resulting GT distribution is
sufficiently robust.
Finally, it is also noted that not many iterations are necessary before the MSE converges.
In this study the maximum iteration is set at ten, but in all runs, the MSE converges in
less than ten iterations. A drastic improvement usually occurs at the second or third
iteration, after which the MSE is effectively stable.
Final MSE Using Different Preliminary Estimators
0.12
0.1
0.08 GT
MSE
0.06 Median
WLS
0.04
0.02
0
e1
e3
e5
t1
t2
t3
al
rm
rm
m
ac
ac
ac
or
fo
fo
pl
pl
pl
N
ni
ni
La
La
La
Figure 5.11. Final MSE Comparison for GT-based DRPE method Using GT, Median and
WLS as preliminary estimators
74
Normal Error
GT Median WLS
0.012
0.01
0.008
MSE
0.006
0.004
0.002
0
0 1 2 3 4 5 6 7 8 9 10
Iteration #
Laplace Error
GT Median WLS
0.12
0.1
0.08
MSE
0.06
0.04
0.02
0
0 1 2 3 4 5 6 7 8 9 10
Iteration #
Uniform Error
GT Median WLS
0.3
0.25
0.2
MSE
0.15
0.1
0.05
0
0 1 2 3 4 5 6 7 8 9 10
Iteration #
t Error
GT Median WLS
1
MSE
0.5
0
0 1 2 3 4 5 6 7 8 9 10
Iteration #
Figure 5.12. MSE throughout iterations
75
5.6. Real Data Application
A data set is collected from a trial run of the general-purpose pilot plant. The three DRPE
methods – GT, contaminated Normal, weighted least squares – are then applied to
reconcile the data and estimate the overall heat transfer coefficients of reactors 1 and 2, as
is in the simulation case. The profiles of the data are shown in the (scaled) histograms in
Figure 5.13. The estimates of the variables and parameters are summarized in Table 5.4.
There is no basis to objectively assess the accuracy of these estimates because unlike in
the simulation case, the actual variable and parameter values are unknown. However,
looking at the plots of the GT, contaminated Normal, and Normal distributions that are
superimposed on the data histogram in Figure 5.13, the flexibility of the GT distribution
is apparently higher than that of the other two distributions, while the flexibility of the
contaminated Normal distribution is higher than that of the Normal distribution. This is
apparent from the fitting of the data by the three distributions, where GT seems to give
the closest fit and the Normal distribution the least close fit.
76
Feed flow to Reactor 1, F2 Effluent flow from Reactor 1, F5 Feed Flow 2, F7
GT
GT GT
CNorm
CNorm CNorm
Normal
Normal Normal
Data Data
Data
0.68 0.685 0.69 0.695 0.7 0.68 0.685 0.69 0.695 0.7 1.26 1.265 1.27 1.275 1.28 1.285
Output flow of Mixer, F9 Effluent Flow from Reactor 2, F12 Feed 2 Temperature, T7
GT
CNorm GT GT
Normal CNorm CNorm
Data Normal Normal
Data Data
1.958 1.96 1.962 1.964 1.966 1.968 33.2 33.3 33.4 33.5 33.6 33.7 36.85 36.9 36.95 37 37.05
Mixer Output Temperature, T9 Cooling water flow to Reactor 1, Fc1
Inlet cooling water tem perature, Tcin
GT GT
GT
CNorm CNorm
CNo rm
Norm al Norm al
No rm a l
Data Data Da ta
35.7 35.72 35.74 35.76 35.78 35.8 0.8 0.85 0.9 2 1 .9 2 1.9 5 22 22 .0 5 2 2 .1
Cooling water out temperature, Tcin1 Reactor 1 temperature, Trx1 Reactor 1 Feed Temperature, T2
GT GT GT
CNorm CNorm CNorm
Normal Normal Normal
Data Data Data
27.5 27.6 27.7 27.8 27. 33.2 33.4 33.6 33.8 40.2 40.3 40.4 40.5 40.6 40.7
Effluent flow from Reactor 2, F12 Reactor 2 Temperature, Trx2

Reactor 2 Effluent Temperature, T12
GT GT
GT
CNorm CNorm
CNorm
Norm al Norm al Normal
Data Data Data
1.95 1.955 1.96 1.965 1.97 33.3 33.32 33.34 33.36 33.38 33.4 33.2 33.3 33.4 33.5 33.6
Cooling Water 2 Temperature, Tc2 Cooling Water 2 Flow, Fc2
GT
GT
CNorm CNorm
Normal Normal
Data Data
31.3 31.32 31.34 31.36 31.38 31.4 0.47 0.475 0.48 0.485 0.49 0.49
Figure 5.13. Scaled Histogram of Data and Density Plots of GT, Contaminated Normal
and Normal Distributions
77
Table 5.4. Reconciled Data and Estimated Parameter Values Using Different DRPE
Methods
Contaminated Weighted
GT
Normal Least Squares
Variables
Feed flow to reactor 1, F2 0.6908 0.6906 0.6901
Effluent flow from reactor 1, F5 0.6908 0.6906 0.6901
Feed 2 flow, F7 1.2728 1.2731 1.2734
Mixer output flow, F9 1.9635 1.9637 1.9636
Reactor 1 effluent temperature, 33.5002 33.4714 33.4869
T5
Feed 2 temperature, T7 36.8941 36.9167 36.9281
Mixer output temperature, T9 35.7002 35.7050 35.7186
Reactor 1 cooling water flow, 0.8760 0.8776 0.8636
Fc1
Inlet cooling water temperature, 22.0000 22.0004 22.0050
Tcin
Reactor 1 outlet cooling water 27.5987 27.6095 27.6231
flow, Tc1
Reactor 1 temperature, Trx1 33.4914 33.4319 33.4320
Reactor 1 feed temperature T2 40.6000 40.5989 40.5173
Reactor 2 effluent flow, F12 1.9635 1.9637 1.9636
Reactor 2 effluent temperature, 33.3999 33.4014 33.4001
T12
Reactor 2 temperature, Trx2 33.4026 33.3781 33.3790
Reactor 2 outlet cooling water 31.3000 31.3001 31.3016
temperature, Tc2
Reactor 2 cooling water flow, 0.4857 0.4864 0.4897
Fc2
Parameters
Heat transfer coefficient for 0.8323 0.8454 0.8353
Reactor 1 cooling coil, UcAc1
Heat transfer coefficient for 2.1481 2.1770 2.1915
Reactor 2 cooling coil, UcAc2
78
5.7. Conclusion
The partially adaptive methods, i.e. the estimators based on the contaminated Normal and
GT distributions are more efficient than the non-partially adaptive method, i.e. the
weighted least squares estimator, for many error distributions other than the Normal
distribution. The GT-based method, however, performs better than the contaminated
Normal in almost all cases, as the GT is a more general distribution. The robustness of
both GT and contaminated Normal methods is also demonstrated, while the weighted
least squares method is conspicuous by its inability to handle outliers. It is also shown
that even for small data sizes, the partially adaptive GT-based method still consistently
outperforms the other two methods despite its decrease in efficiency. Finally, it is shown
that the robustness of the GT makes it possible to accommodate less than efficient and
less than robust preliminary estimators, through iterations of the GT-based estimation.
79
CHAPTER 6: CONCLUSION
6.1. Findings
Estimation criteria consist of robustness and efficiency. Parametric estimation where the
estimators are constructed based on a certain assumption regarding the error is
asymptotically efficient when the assumption regarding this error distribution is true, but
it is only under this limited condition that these estimators are efficient. By partially
adapting the parameters of these estimators to the data, however, the efficiency can be
improved. The Generalized T (GT) distribution is a good candidate for this partially
adaptive scheme because it can be made robust to outliers and it is very flexible to cover
a wide range of important distributions. The GT distribution is dictated by the distribution
parameter {p,q,σ}; by estimating these parameters from the data, partially adaptive GT-
based estimator can be constructed. The robustness is achieved by constraining the values
that can be taken by {p,q,σ} to ensure that the resulting GT has tails that are thick
enough.
Data is generated from the virtual version of a general-purpose chemical engineering
plant as a case study for the joint data reconciliation – parameter estimation (DRPE)
using the robust and partially adaptive GT-based estimator. The performance of the GT-
based estimator is the most efficient with respect to the other methods compared, i.e. the
weighted least squares and the contaminated Normal methods. This shows that the
adaptiveness to data is advantageous, and the GT distribution exploit this advantage very
80
well compared to the less flexible contaminated Normal estimator, as the GT is a more
general distribution.
The robustness of GT-based strategy is demonstrated through its performance when the
data is contaminated with outliers, where the GT-based estimator is able to suppress the
outliers while making only small trade-off in efficiency compared to that of the weighted
least squares and the contaminated Normal methods.
Constraining the GT parameters {p,q,σ} to be robust proves to be important and
beneficial. Firstly, although the constrained values exclude special cases such as the
Laplace, uniform, t and Normal distributions, the loss in efficiency is quite negligible as
it still performs best for error with these distributions. Secondly, because the constrained
{p,q,σ} result in robust GT distribution, any preliminary estimation method can be used
to the same final efficiency by iterating the joint DRPE with GT estimator, regardless of
the robustness and efficiency of these preliminary estimators. This is a very desirable
property because at the preliminary stage, virtually no information about the data is
known yet.
In conclusion, the robust and partially adaptive GT-based estimator is a viable choice in
performing data reconciliation and parameter estimation. It is also noted that although the
efficacy of the GT-based DRPE strategy is demonstrated for a chemical process case
study in this thesis, the strategy is essentially an estimator, which consequently has
generic use in the general estimation problems. Therefore, it is applicable not only when
81
physical model of the process or system is available, but also with other empirical and
approximation models. The usual precaution that the model inaccuracy is negligible
compared to the measurement inaccuracies holds, of course; but this also holds for other
estimators and is not a restriction specific to the GT-based estimator.
6.2. Future Works
In this work the GT-based DRPE strategy has been applied successfully to various data
profiles, both simulation and real data. These various applications have covered quite
general situations, which make the GT-based strategy a viable choice to deal with a wide
range of situations. However, recalling that the GT is a symmetric distribution, its
limitation may be that in the case when the error is actually not distributed symmetrically,
the adapted GT may not be efficient. It is possible to further generalize the GT
distribution to include non-symmetric distributions.
Another useful extension would be to explore the application of the GT-based strategy in
the dynamic operation of the system. This poses challenges in the efficient optimization
of the differential algebraic model, which is not encountered in the steady-state case.
82
REFERENCES
Albuquerque, J. S., Biegler, L.T. (1996). Data Reconciliation and Gross-Error Detection
for Dynamic Systems. AIChE J., Vol. 42, No. 10, pp. 2841-2856.
Arora, N., Biegler, L.T. (2001). Redescending Estimators for Data Reconciliation and
Parameter Estimation. Comp. Chem. Eng., Vol. 25, pp. 1585-1599.
Bickel, P.J. (1982). On adaptive estimation, Annals of Statistics 10, 647-671.
Butler, R.J., McDonald, J.B., Nelson, R.D., White, S.B. (1990). Robust and Partially
Adaptive Estimation of Regression Models. Rev. Econ. Stat., Vol 72, No. 2, pp.321-
327.
Crowe, C.M. (1989). Observability and redundancy of process data for steady state
reconciliation. Chemical Engineering Science. 44, 2909-2917.
Crowe, C.M. (1986). Reconciliation of process flow rates by matrix projection. Part II:
The non-linear case. AIChE.J. 32, 616-623.
Gill, P., Murray, W.A., Saunders, M.A., and Wright, M.H. (1986). User Guide for
NPSOL (Version 4.0); a FORTRAN program for Nonlinear Programming, Technical
Report SOL 86-2. Stanford University, Department of Operation Research, Stanford,
CA.
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A. (1986). Robust Statistics:
The Approach Based on Influence Function. Wiley, New York.
Huber, P.J. (1981). Robust Statistics. Wiley, New York.
Huber, P.J., (1964). Robust Estimation of a location parameter. Annals of Mathematical

Statistics, 64, 73-101.
Joris, P., and Kalitventzeff, B. (1987). Process measurements analysis and validation.
Proc. CEF ’87: Use Comput. Chem. Eng., Italy, pp.41-46.
Kim, I.W., Liebman, M. J., and Edgar, T.F. (1990). Robust error-in-variable estimation
using non-linear programming techniques. AIChE J., Vol 36, 985-993.
McDonald, J.B., Newey, W.K. (1988). Partially Adaptive Estimation of Regression

Models via the Generalized T Distribution. Econometric Theory, Vol. 4, pp.428-457.
Narasimhan , S.; Mah, R.S.H. Generalized Likelihood Ratio Method for Gross Errors
Identification. AIChE J. 1987, 33, 1514.
83
Potscher, B.M. and Prucha, I.R. (1986). A Class of Partially Adaptive One-Step M-
Estimators for the Non-linear Regression Model with Dependent Observations.
Journal of Econometrics, 32, 219-251.
Reilly, P.M., and Patino-Leal, H. (1981). A Bayesian study of the error-in-variable

models. Technometrics, Vol 23, 221-231.
Romagnoli, J. and Sanchez, M. (2000). Data Processing and Reconciliation for Chemical
Process Operations. Academic Press.
Rogmanoli, J., and Stephanopoulos, G. (1980). On the rectification of measurement

errors for complex chemical plants. Chemical Engineering Science, 35, 1067-1081.
Sanchez, M., Bandoni, A., and Romagnoli, J. (1992). PLADAT – A package for process
variable classification and plant data reconciliation. Computers and Chemical
Engineering, S16, 499-506.
Serth, R.; Heenan, W. Gross Error and Data Reconciliation in Steam Measuring Systems.
AIChE J. 1986. 32, 733.
Stein, C. (1956). Efficient nonparametric testing and estimation. In: Proceedings of the
third Berkeley symposium on mathematical statistics and probability, Vol. I
(University of California Press, Berkeley, CA).
Swartz, C.L.E. (1989). Data reconciliation for generalized flowsheet applications. 197th
National Meeting, American Chemical Society. Dallas, TX.
Tjoa, I., and Biegler, L. (1991). Simultaneous strategies for data reconciliation and gross
error detection of nonlinear systems. Computer and Chemical Engineering, 15, 679.
Valko, P., Vadja, S. (1987), An extended Marquardt-type Procedure for Fitting Error-in-
Variable Models. Comp. Chem. Eng., Vol. 11, pp. 37-43.
Wang, D., Romagnoli, J.A. (2003). A Framework for Robust Data Reconciliation Based
on a Generalized Objective Function. Ind.Eng.Chem.Res.,Vol.42, No.13, pp. 3075-
3084.
84
AUTHOR’S PUBLICATIONS
• Joe, Y.Y., Wang, D., Romagnoli, J. and Tay, A. (2004). Robust and Efficient
Joint Data Reconciliation and Parameter Estimation Using a Generalized
Objective Function. Proceedings of the 7th Dynamics and Control of Process
Systems (DYCOPS7). Boston, USA. Accepted.
• Joe, Y.Y., Wang, D., Tay, A., Ho, W.K., Ching, C.B., and Romagnoli, J. (2004).
A Robust Strategy for Joint Data Reconciliation and Parameter Estimation.
Proceedings of the 14th European Symposium on Computer-Aided Process
Engineering. Elsevier, Lisbon, Portugal.
• Joe, Y.Y., Xu, H., Dong, Z.Y., Ng, H.H., and Tay, A. (2004). Searching Oligo
Sets of Human Chromosome 12 using Evolutionary Strategies, International
Journal of Systems Sciences, Accepted.
• Joe, Y.Y., Xu, H., Dong, Z.Y., Ng, H.H., and Tay, A. (2003). Searching Oligo
Sets of Human Chromosome 12 using Evolutionary Strategies, Proceedings of
the 2003 Congress on Evolutionary Computation, IEEE Press, Canberra,
Australia.
85
APPENDIX A
Description of the Matlab/ Simulink Model of the General-Purpose Pilot Plant
GENERAL DESCRIPTION
The plant is a pilot-scale setting containing two CSTRs, a mixer, a feedtank and a number
of heat exchangers. Material feed from the feed tank is heated before being fed to the first
reactor and the mixer. The effluent from first reactor is then mixed with the material feed
in the mixer, and then fed to the second reactor. The effluent from the second reactor is,
in turn, fed back to the feed tank and the cycle continues.
RUNNING THE MODEL

The plant model is in twocstr3.mdl (excluding pumps).
UNITS
NOTE: Operating values (initial conditions, parameters) used in the current model right
now are still the ones taken from the references, not the values from the pilot plant.
1. REACTOR
Tc Cooling water
OUT
Fin, Tin, Cain
V, T, Ca
Ps
F, T, Ca
Product OUT
Reaction inside the reactor: A → B
1.1. Model I/O and Parameters
Matlab S-function: s_reactor1.m
86
Fin,Tin,Cain
F,T,Ca
Tcin
V
Fsin
Tc
Fcin
Ts
F
REACTOR1
1.1.1. Input:
Fin : volume flowrate of input stream (vol/time)
Tin : temperature of input stream
Cain : concentration of component A in the input stream (mole/vol)
Fcin : volume flowrate of incoming cooling stream
Tcin : temperature of incoming cooling stream
Fsin : volume flowrate of incoming heating stream
F : volume flowrate of output stream (as controlled by output flow controller)
1.1.2. Output:
F : volume flowrate of output stream
T : temperature of output stream = (assumed) temperature inside reactor
vessel
Ca : concentration of A in the output stream = (assumed) inside reactor vessel
V : volume of content of reaction vessel
Tc: Temperature of outgoing cooling stream
Ts: Temperature of steam jacket
Fs: Flowrate of input steam to steam jacket
1.1.3. Parameters
[Luyben, 1989]:
NOTE:
• The parameters are currently coded at the S-function; so to simulate two
reactors with different parameters, two of this S-function are needed, each
with different set of parameters.
Symbol Description Nominal Values Used
in Simulation
rho, ρ Mass density of stream 50 lbm/ft3
Cp, C p Specific heat of stream 0.75 Btu/lbm R
alpha, α Temperature constant for Arrhenius 7.08E10 /h =
expression 1.9667e+007 /s
E, E Activation energy for reaction 30,000 Btu/lb.mol
R, R Molar gas constant 1.99 Btu/lb.mol R
87
lambda, λ Reaction energy -30,000 Btu/lb.mol
Uc, U c heat transfer coefficient of reactor 150 btu/h ft2 R =
cooling coil 0.0417 btu/s ft2 R
Ac, Ac area of reactor cooling coil 250 ft2
rhoc, ρ c mass density of cooling stream 62.5 lbm/ft3
Cpc, C pc specific heat of cooling stream 1 Btu/lbm R
Vc, Vc volume of reactor cooling coil 3.85 ft3
hos, hos heat transfer coefficient of reactor 1000 btu/h ft2 R =
heating jacket 0.2778 btu/s ft2 R
Aos, Aos area of reactor heating jacket 56.5 ft2
Ms, M s molar mass of heating stream 18 lbm/lb.mol
Avp, Avp Constant for steam jacket pressure -8744.4 R
Bvp, Bvp Constant for steam jacket pressure 15.70
Hs_hc, (Vapor_enthalpy- 939 Btu/lbm
H s − hc condensation_enthalpy)
1.2. Model Equations

[Luyben, 1989]
1.2.1. Mass Balance
dV
= Fin − F
dt
1.2.2. Component Balance

d (VC a ) E
= Fin C ain − FC a − α exp(− )VC a
dt RT
Cb = ( ρ − M a C a ) / M b
1.2.3. Energy Balance

a. Reaction Vessel
dVT λα E
= FinTin − FT − exp(− )VC a − Qc − Q j
dt ρC p RT
Qc = U c Ac (T − Tc )
Q j = − hos Aos (Ts − T )
b. Cooling Coil
dTc Fc Qc
= (Tcin − Tc ) +
dt Vc ρ c C pcVc
Qc = U c Ac (T − Tc )
88
c. Steam Jacket
dρ Total Continuity
V j s = ws − wc
dt
MPj Perfect Gas Law
ρs =
RTs
Avp Simple vapor-pressure equation
Pj = exp( + Bvp )
Ts
ws = C vs X s Pjin − Pj
Q j = −hos Aos (Ts − T )
Qj Energy equation, ignoring internal energy change & assuming
wc = −
H s − hc steady-state, i.e. ws = wc
2. HEAT EXCHANGER
Matlab S-function: s_heatex_btu.m
In1 Out1
In2 Out2
HEAT_EX1
In1: [Fin1, Tin1, Cain1] T

Out1: [F1, T1, Ca1] T
Out2: [F2, T2, Ca2] T

2.1.1. Input:
Fin1/2: Volume flowrate of input stream 1/2
Tin1/2: Temperature of input stream 1/2
Cain1/2: Concentration of component A in input stream 1/2
2.1.2. Output:
F1/2: Volume flowrate of output stream 1/2
T1/2: Temperature of output stream 1/2
Ca1/2: Concentration of component A in output stream 1/2
2.1.3. Parameters:
[Luyben, 1989; Malleswararao, 1992; Holman, 1972]
in Simulation
89
ρ1 Mass density of stream 1 50 lbm/ft3
ρ2 Mass density of stream 2 50 lbm/ft3
V1 Volume of stream 1 in the heat 3.85 ft3
exchanger
V2 Volume of stream 2 in the heat 3.85 ft3
exchanger
Cp1 Specific heat of stream 1 0.75 btu/lbm R
Cp 2 Specific heat of stream 2 0.75 btu/lbm R
U Heat transfer coefficient 1000 btu/h R
A Area of heat exchange 20 ft2

2.2.1. Mass Balance
F1 = Fin1
F2 = Fin 2

C a1 = C ain1
C a 2 = C ain 2

[Holman, 1972; ASHRAE, 1993]
dT1 1
= [F ρ (T −T ) − Qt /C p1]
dt ρ1V1 1 1 in1 1
dT2
= 1 [ F ρ (T −T ) + Qt /C p2]
dt ρ2V2 2 2 in2 2
Qt =εQmax
where:
⎧ 1− exp(− N (1− C ))
⎪ ;C ≠1
ε = ⎪⎨1− C exp(− N (1− C ))
⎪ N ;C =1
⎪
⎩ N +1
N =UAh /Cmin
C = Cmin /Cmax
Cmin = min{F1ρ1Cp1,F2 ρ2Cp2}
Cmax = max{F1ρ1Cp1,F2 ρ2Cp2}
Qmax = Cmin (Th −Tc )
Th = max{Tin1,Tin2 }
Tc = min{Tin1,Tin2}
90
3. MIXER / FEED TANK
Matlab S-function: s_mixer_btu_cl.m
F1, T1, Ca1

F, T, Ca
F2, T2, Ca2
V
F
MIXER

3.1.1. Input:
Cain1/2: Concentration of component A in input stream ½
F : Volume flowrate of output stream (as controlled by output flow
controller)
3.1.2. Output
F: Volume flowrate of output stream
T: Temperature of output stream
Ca: Concentration of component A in output stream
V: Volume of tank
3.1.3. Parameter
None

3.2.1. Mass Balance
dV
= Fin1 + Fin 2 − F
dt

dVCa
= F1Ca1 + F2 Ca2 − FCa
dt
91
dVT = F T + F T − FT
dt 1 in1 2 in2
4. SPLITTER
Matlab S-function: not implemented in s-function
Out1
In
Out2
splitter 1
In1: [Fin, Tin, Cain, Cbin] T

Out1: [F1, T1, Ca1, Cb1] T
Out2: [F2, T2, Ca2, Cb2] T
Parameters: [eta] T

4.1.1. Input:
Fin: Volume flowrate of input stream
Tin: Temperature of input stream
Cain: Concentration of component A in input stream
Cbin: Concentration of component B in input stream
4.1.2. Output
F1/2: Volume flowrate of output stream 1/2
T1/2: Temperature of output stream 1/2
Ca1/2: Concentration of component A in output stream ½
4.1.3. Parameters
in Simulation
eta, η Split fraction ratio (F1/Fin) 0.4

4.2.1. Mass Balance
F1 = ηFin
F2 = Fin − F1

Ca1= Ca2 = Cain
92
T1 = T2 = Tin
5. MIXING JUNCTION
Matlab S-function: threeway.m
In1
Out
In2
junction

Out1: [F, T, Ca] T
5.1. Model I/O

5.1.1. Input:
Cain1/2: Concentration of component A in input stream 1/2
5.1.2. Output
F: Volume flowrate of output stream
T: Temperature of output stream
Ca: Concentration of component A in output stream

5.2.1. Mass Balance
F = Fin1 + Fin 2

F1Ca1 + F2 Ca2 − FCa = 0

F1Tin1 + F2Tin 2 − FT = 0
REFERENCES
ASHRAE (1993). Toolkit for HVAC System Energy Calculations, 1993.
De Nevers, Noel (1991). Fluid Mechanics for Chemical Engineers. McGraw-Hill.
Holman, J. P. (1972). Heat Transfer. McGraw-Hill.
93
Luyben, William, L (1989). Process modeling, simulation, and control for chemical
engineers. McGraw-Hill.
Malleswararao, Y. S. N, Chidambaram, M. (1992). Non-linear controllers for a heat-

exchanger. In J. Proc. Cont, Vol 2, No 1, p17-21.
94
APPENDIX B
Steady-State Equations of the General Purpose Pilot Plant
Reactor 1
Mass Balance: F2 − F5 = 0
U c1 Ac1
Energy Balance: F2T2 − F5T5 − (T − T ) = 0
ρC p rx1 c1
U A
Cooling Coil Energy Balance: Fc1 (Tc ,in − Tc1 ) + c1 c1 (Trx1 − Tc1 ) = 0
ρ c C pc
Mixer
Mass Balance: F5 + F7 − F9 = 0
Energy Balance: F5T5 + F7T7 − F9T9 = 0
Reactor 2
U c 2 Ac 2
Energy Balance: F9T9 − F12T12 − (T − T ) = 0
ρC p rx 2 c 2
U A
Cooling Coil Energy Balance: Fc 2 (Tc ,in − Tc 2 ) + c 2 c 2 (Trx 2 − Tc 2 ) = 0
ρ c C pc
Feed Tank
Energy Balance: F12T12 − F13T13 = 0
Heat Exchanger 2
Energy Balance:
F ρ (T14 − T7 ) − Qt / C p = 0
7
F ρ (T6 − T15 ) + Qt / C =0
6 hx phx
where Qt is a function of T7, T14, T6 and T15.
Heat Exchanger 1
Energy Balance:
F2 ρ (T16 − T2 ) − Qt / C p = 0
F1 ρ (T1 − T17 ) + Qt / C =0
hx phx
95
where Qt is a function of T2, T16, T1 and T17.
Heat Exchanger 3
Energy Balance:
U c 3 Ac 3
Fc 3 (Tc ,in 3 − Tc 3 ) + (T − T ) = 0
ρ c C pc 13a c 3
U c 3 Ac 3
F13 (T13 − T13 a ) − (T − T ) = 0
ρ c C pc 13a c 3
96

Thesis - Data Reconc and Params Estimation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Thesis - Data Reconc and Params Estimation

Uploaded by

Copyright:

Available Formats

FRAMEWORK FOR JOINT DATA RECONCILIATION AND

JOE YEN YEN

FOR THE DEGREE OF MASTER OF ENGINEERING

NATIONAL UNIVERSITY OF SINGAPORE

I would like to express my gratitude to my supervisors Dr Arthur Tay and A/Professor

and for having confidence in me in carrying out this research.

the Department of Chemical Engineering in University of Sydney. The guidance of

and for making me feel part of the group.

been a like-minded confidante from whom I obtain inspiration in carrying out my

Objective knowledge about a process is essential for process monitoring, optimization,

identification and other general management planning. Since measurement of process

demanding. An alternative is to strike a balance between the simplicity of the parametric

approach and the flexibility of the non-parametric approach, i.e. by adopting a

and t-distributions) that belong to the GT distribution family. To achieve estimation

efficiency of the estimator.

Table 5.1. MSE of Measurements..................................................................................... 57

environmental and safety regulations requires the performance of a process to be

continuously improved through process modifications (Romagnoli and Sanchez, 2000).

Distributed Control System (DCS) is capable of high-frequency sampling, resulting in

vast amount of data to be interpreted, be it for the purpose of process monitoring,

optimization or other general management planning. Since measurement data always

accurate information about the process.

equations. The conventional data reconciliation approach is the least squares

the Normal (Gaussian) distribution.

reconciliation) or the measurements (which must be done post-reconciliation). The

restriction in chemical processes where most relationships are nonlinear.

parametric or non-parametric approach. The parametric approach either represents the

largely deviating observations. The non-parametric group consists of estimators that do

through non-parametric density estimation instead. The resulting estimator will be

encountered in practice (Butler et al, 1990).

A strategy is proposed to improve the efficiency of the parametric estimation by allowing

reconciliation by Wang and Romagnoli (2003) in a comparison case study.

based partially adaptive estimators are comprehensively studied through various

As an application case study, the virtual version of a real lab-scale general-purpose

chemical plant is developed in Matlab/ Simulink. Besides simulation studies, steady-state

decomposition based on formal transformation methods and symbolic manipulation is

1.3. Thesis Organization

on to describe the proposed GT-based joint data reconciliation – parameter estimation

chapter, an introduction to data reconciliation is provided along with its important

aspects, including the incorporation of robustness and variable classification. Relevant to

improve estimation efficiency is presented. A section is also devoted to joint data

procedures to simultaneously obtain consistent variable and parameter estimates.

2.2. Data Reconciliation (DR)

occurrences can be attributed to incorrect calibration or malfunction of instruments,

process leaks, and other unnatural causes.

and indubitable process relationships that should be satisfied regardless of the

and Sanchez, 2000).

constrained optimization problem:

(reconciled variables) = arg min (optimality criteria)

(mass and energy balances; variable bounds)

squares method will be formulated in the following.

y, an (mx1) vector of measurements;

x, an (mx1) vector of corresponding true values of the variables with measurements y;

u, a (px1) vector of unmeasured variables;

and u, respectively, and xU and uU the upper bounds on x and u, respectively.

Three features are observed from the above formulation:

objective function of the data reconciliation optimization problem is in fact

f (ε ) = (2π det(Ψ )) −1 exp(−ε T Ψ −1ε ) , (2.2)

where ε = y − x ; the maximum of f (ε ) is obtained by minimizing ε T Ψ −1ε , i.e. the

weighted least square of adjustments. As will be discussed in Section 2.4 on

optimization can be solved analytically, i.e. closed-form solution can be obtained.

orthogonal factorization for bilinear systems (Crowe, 1986), successive linearization

(Swartz, 1989), and nonlinear numerical optimization methods such as sequential