You are on page 1of 6

Influence of Statistical Information Criteria to

One-Step-Ahead Prediction Error


Mohd Hezri Fazalul Rahiman, Mohd Nasir Taib and Yusof Md Salleh

Faculty of Electrical Engineering


Universiti Teknologi MARA
40450 Shah Alam, Malaysia.
Email: hezrif@ieee.org

Abstract - This paper investigates the influence of popular mathematical model that can represent a process
statistical information criteria to one-step-ahead dynamics.
prediction (1-SAP) error of an ARX black-box model. The Understanding of a chemical process behavior is
criteria investigated are the Akaike Information Criteria important especially when optimum condition of the
(AIC), Akaike Final Prediction Error (FPE) and
Rissanen’s Minimum Description Length (MDL). The
process is required. This understanding is often related
investigation will be based on Pseudo-Random Binary to the mathematical model that represents the process
Sequences (PRBS) data collected from an electrically dynamic itself. Until today, numerous techniques have
heated steam distillation essential oil extraction system. been devised to develop the mathematical model of any
The data is the steam temperature measured within the given processes. They are basically can be divided into
distillation column beneath the material bed. By using two main categories; the first principle model and the
MATLAB System Identification Toolbox, an ARX model empirical model. The main difference between these
will be estimated and validated. Prior to model validation, models is the approach of modeling.
all the information criteria will be examined and the
criteria that suggested the most flexible model shall be
selected for future works. The linear regression will be
In most cases, the processes are too complicated to be
minimized by using Levenberg-Marquardt algorithm. represented by mathematical model. A huge amount of
Evaluation of model performance will be based on both time is required to develop the model and this is not
graphical and statistical approaches such as R2, adjusted- desirable especially to the practitioners. Here where the
R2, residual distribution, mean and variance. The results empirical model comes in handy. The development of
have shown that the selected model based on MDL the model is mainly based on the data collected from the
criterion is more parsimonious and flexible as compared to process. Without knowing the underlying process, the
the others. model can be developed to a certain level of accuracy
depending on the requirement of the user. Thus, this
I. INTRODUCTION modeling approach is also known as the black-box
approach.
System identification is a specialized area under control
system engineering which focuses on developing the System identification is procedural and the proper steps
mathematical model of a system. The mathematical should be followed in order to obtain the satisfactory
model is capable of relating the system output for any model. Several established text references have
given input in such a way that it can even predict the proposed the steps which can be described by Fig. 1 [3-
future output of the system. The model is very 5].
important in control system engineering nowadays since
the recent advancements is on developing intelligent Fig. 1 shows the flow of system identification which
control system. This intelligent system is capable of consists of four main steps started with the experiment
controlling and optimizing the system which is very and followed by model structure selection, model
difficult to be controlled by any traditional control estimation and model validation. If the validation result
techniques especially the chemical processes e.g. is accepted, then the identification finished. Otherwise,
distillation process. the loop will be repeated.
It has been a trend in the modern process control In the second step which is model structure selection,
technique to integrate the process model with the there are two scopes to be focused, the type of the
control system in order to anticipate the process output model and the size of the model. In black-box model
without the process actually happened [1]. The merging family, there are several assumed model types which
of process model and the control system was realized by can be categorized into linear and nonlinear. The linear
the adaptive control system, which integrate the system structure is the most popular since it is simple and can
identification technique into the control algorithm [2]. be estimated quickly. The focus of this paper is on the
The system identification is a technique of developing a
linear Autoregressive with Exogenous Input (ARX) estimated model i.e. the graphical and the statistical
model type. approaches [4]. The graphical approach is the simplest
and usually need commonsense in interpreting the
goodness of the model. The statistical approaches are
necessary in order to select the satisfactory model and
produce results that are statistically acceptable.

The most common model validation technique is


examining the one-step-ahead-prediction (1-SAP). The
test result can be analyzed by both graphical and
statistical approaches. Almost all published works in
system identification analyzed both their model
validation result graphically and statistically as
suggested by [4].

From 1-SAP, the different between predicted data and


observed data is known as prediction error or residual.
The residual is useful especially to analyze the model
statistically. Among common statistical method used are
Fig. 1: The basic flow of system identification. coefficient of determination (R2) and adjusted-R2. The
R2 is the most common model fit measure used in
In system identification field, the selection of model system identification. Many published works evaluate
size can be referred to as selection of model order. It has their model by representing the R2 measure in
received a lot of attention from practitioners and percentage [8, 9]. It is also known as the model fit [10].
researchers including statisticians. Since the black-box In some cases where the comparison between two
modeling is similar to the technique of regressions in models of different degree of flexibility is made, the R2
statistics, huge amount of works in statistics related to measure is insufficient since the result of the more
the regressions are relevant to be reviewed by system flexible model will definitely be better. Therefore, the
identification community. Among popular techniques in adjusted-R2 was introduced to compensate the model
determining the model order are the Akaike Information complexity [11-13].
Criteria (AIC), Akaike Final Prediction Error (FPE),
Rissanen’s Minimum Description Length (MDL), II. ARX MODEL STRUCTURE AND PREDICTION ERROR
Bayesian Information Criterion (BIC) and Law of METHOD
Iterated Logarithms Criterion (LILC). All the previous
mentioned techniques are statistically proven and can The following figure describes the general model
assist the user to select the model order numerically. structure of a dynamic ARX system.
However, there are still a big number of published
works that used heuristic and trial-and-error approaches
in determining the model order [6].

Parameter estimation is a technique to estimate the


parameter vector, θ by solving the estimation problem
numerically. Basically, two approaches have been
outlined by [7] i.e. the correlation approach and the
prediction error approach. In this thesis, the estimation Fig. 2: General model structure of a dynamic ARX
of all models will be based on prediction error approach. system.
Prediction error method (PEM) is a popular approach in
model estimation. The objective of PEM is to minimize Its structure can be written as
the sum of squared prediction error (SSE) which is
performed in iterative manner. There are several B(q) 1
computing or minimizing algorithms that categorized y (t ) = q − nk u (t ) + e(t ) (1)
under iterative method. The most popular is the A(q) A(q )
Levenberg-Marquardt Algorithm (LMA). LMA have
been widely used in system identification since it and
provides robust and fast convergence [5].
y (t ) = − a1 y (t − 1) − a 2 y (t − 2) − ... − a na y (t − na )
+ b0 u (t − nk ) + b1u (t − 1 − nk ) + ... + bnb u (t − nb − nk ) + e(t )
(2)
Model validation is the final step in system
identification which will be the state where the decision
to accept the model or not will be made. In this stage, or in regression form,
two general approaches can be used to investigate the
y (t ) = ϕ T (t )θ + e(t ) (3) The estimated parameter θˆN is defined by the
minimization of (8),
θˆN = θˆN ( Z N ) = arg min V N (θ , Z N ) (9)
where
where this equation is known as prediction error method
ϕ (t ) = [ y(t − 1)... y (t − na), u (t − nk )...u (t − nb − nk )]T
(4) (PEM). The main objective of the PEM is to minimize a
θ = [−a1 ... − a na , b0 ...bnb ]T sum over some norm of the prediction errors [10].
There are numerous works in identification field that
The one-step-ahead-prediction (1-SAP) of the ARX implemented the PEM approach but in different cases.
model structure is given by These cases are obtained as special cases of (9)
depending on the choices of minimization method [9].
yˆ (t θ ) = q − nk B(q)u (t ) + [1 − A(q)] y (t )
(5) III. MODEL ORDER SELECTION
= ϕ T (t )θ
Model order selection is one of the procedures to be
Parameter estimation is a technique to estimate the determined before the parameter vector can be
parameter vector, θ by solving the estimation problem estimated. The model order can be either chosen
numerically. Basically, two approaches have been randomly by trial and error or can be chosen based on
outlined by [9] i.e. the correlation approach and the certain numerical approaches. Choosing by trial and
prediction error approach. The correlation approach is error is relatively easier as compared to the latter option
but the chosen model order might not be the optimal
realized by solving a certain function fN(θ,ZN)=0 using
least-squares method while the prediction error order and cannot be numerically justified. Trial and
approach is realized by minimizing a certain loss error approach has been demonstrated in [9].
function or criterion VN(θ,ZN) with respect to θ. In this
Model order selection based on numerical analysis is
paper, the estimation of all models will be based on
more convenient even it is slightly complicated in
prediction error approach.
theories. Instead of choosing the model order one-by-
one and evaluating the result, as in the earlier approach,
To explain the numerical approach, a set of collected
this task can be left to program to decide based on the
data, ZN is assumed and can be represented by
computational approach. There are several criterions
that can be relied on such as the Akaike Information
Z N = [ y(1), u (1), y(2), u (2),..., y( N ), u ( N )] (6) Criteria (AIC), Akaike Final Prediction Error (FPE) and
Rissanen’s Minimum Description Length (MDL).
where y and u are the system output and input
respectively. N is the number of data collected from the The loss function, VN, can be used as an indication on
system. This data is going to be used for estimating a how well the model fits the data. The model itself can
model, which is the best model among the candidates be represented by either simple model or complex
contained in the selected model structure that has the model. From implementation point of view, a simple
capability of predicting the observed data with model has the advantage of fast calculation but has less
minimum error. The difference between the generated flexibility in describing the system. On the other hand,
data with the observed data is known as prediction error complex model has more degree of freedom and can fit
or residual, ε(t,θ). A good model is defined as a model the system much better compared to the simple model
with good prediction capability, or model that produces but it is complicated and not popular among the control
small prediction error when applied to the observed data practitioners.
[9]. The prediction error is given by
From model performance point of view, we can say that
ε (t , θ ) = y (t ) − yˆ (t | θ ) (7) the more complicated the model the lower the loss
function can be. However, this concept does not means
that we have to represent a system with the most
where y(t) is the observed output, ŷ(t|θ) is the predicted
sophisticated equation. There is one term called overfit,
output. The measurement of the prediction error is often
which can cause the model deficient even the model has
represented by a function known as the loss function. Its
general representation can be written as the highest degree of freedom. This deficiency can be
noticed by the increment of the loss function if the
N
model order is further increased. The loss function can
∑ε
1 be contributed by the model bias and model variance [9].
V N (θ , Z N ) = 2
(t , θ ) (8)
2N t =1
Even though overfitting may reduce the model bias, but
it will significantly increases the model variance as well.
Therefore, the trade-off between model bias and model
where the factor of ½ on the RHS of (8) is used to
variance is always referred to the trade-off between
simplify differentiation when minimizing residual [12].
flexibility and parsimony principle. However, generally provides minimum loss function for respective criterion.
it is a good practice to be parsimonious in model Fig. 3 and Fig. 4 show the input-output signal and the
parameterization. structure selection based on NSSE, AIC, FPE and MDL.

Therefore, a penalty for model complexity was Measured output signal (°C)
95
introduced. A criterion for determination of model
structure and parameter values can be written as 90

W N0 (θ , M , Z N ) = V N (θ , Z N ) + U N ( M ) (10) 85

80

where VN(θ,ZN) is the prediction error criterion and


0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Time (sec.)

UN(M) is a function that measures the complexity of the PRBS input signal (On/Off)

1
model structure, which has been related to the 0.8
dimensionality of θ, 0.6
0.4

dim θ 0.2
U N (M ) = (11) 0
N 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Time (sec.)

To explain the AIC, FPE and MDL, let the loss function, Fig. 3: The measured output signal and the PRBS input
VN defined as the normalized sum of squared error or signal.
NSSE. The NSSE can be computed from (8) and
-11
denoted by NSSE
-11.2 AIC
N FPE

∑ε
-11.4
1
V NSSE (θ , Z N ) = (t , θ )
2 MDL
(12)
2N t =1
-11.6
loss function (log)

-11.8

-12
Based on (12), the AIC, FPE and MDL are obtained as
follows: -12.2

-12.4

⎛ d ⎞
V AIC = ⎜1 + 2 ⎟V N (θ , Z N ) (13)
-12.6

⎝ N ⎠ -12.8

⎛ d ⎞
V MDL = ⎜1 + log( N ) ⎟V N (θ , Z N )
-13
(14) 0 5 10 15 20 25 30 35 40
⎝ N⎠ no. of parameters

⎛ 1 + Nd ⎞ Fig. 4: Structure selection for ARX model based on


V FPE = ⎜ ⎟V (θ , Z N ) (15) NSSE, AIC, FPE and MDL.
⎜1− d ⎟ N
⎝ N ⎠
Fig. 4 shown the loss functions plotted against the
where d is the number of estimated parameters. number of parameters. The number of parameters are
Analyzing all above four criterions given by (12) to (15), the sum of na and nb e.g. 5 parameters for na = 2 and
the first criterion is not affected by the model nb = 3 or any other combinations that produce total sum
complexity i.e. the value of d does not influent the 5. Within the same number of parameter, it might has
criterion. In the next three criterions, the criterion will several combinations of na and nb, thus only the
be affected by the model complexity. Since (12) is not minimum among them will be selected and displayed.
penalizing the model complexity, it will only be used From the figure, it can be observed that the NSSE loss
for benchmarking purposes. Therefore, the comparison function decreases as the number of parameters
between AIC, FPE and MDL criterions and their increased towards the maximum number of parameters.
selected model orders can be deduced. Conversely, the MDL loss function decreases to a
certain number of parameters and beyond that it starts to
IV. RESULT AND DISCUSSION increased. It is mainly due to the penalization behavior
of the criterion towards the model complexity.
Based on input-output data, the model order of the Meanwhile, the AIC and FPE criterion did not penalize
structure will be defined based on four popular the complexity as well as MDL. In fact, they show
criterions i.e. NSSE, AIC, FPE and MDL. Then, each almost similar pattern to the NSSE, which did not
defined model orders will be used to estimate the penalize the complexity at all. Both criterions show
system and denoted as ARX-NSSE, ARX-AIC, ARX- identical result. Searching the minimum along each
FPE and ARX-MDL respectively. The selection of criterion curves, the model orders obtained are as
model order is basically based on which model order tabulated in Table 1.
figure shows the histogram of the residual from ARX-
Referring to Table 1, NSSE criterion selected the largest MDL model validation.
number of parameters and MDL the smallest. Since the
data distribution. mean=0.000654 var=0.000646
AIC and FPE criterions selected the same number of
parameters and model orders, only AIC criterion will be 600

discussed and it should reflects the FPE criterion as well.


400
Henceforth, the ARX structure based on above
criterions will be evaluated and denoted as ARX-NSSE, 200

ARX-AIC, ARX-FPE and ARX-MDL respectively.


0
-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2

Table 1: Selected model orders Fig. 6: Residual histogram of ARX-MDL model


Criterions Model orders No. of parameters
na, nb, nk Referring to Fig. 6, the residual is normally distributed
NSSE 20, 20, 1 40 and center-tendency. The residual mean and variance
AIC 19, 19, 1 38 are 6.5415e-004 and 6.4552e-004 respectively. The
FPE 19, 19, 1 38 other models are also produced the similar residual
MDL 6, 14, 3 20 distribution, which are normally distributed and center-
tendency. The following table summarizes the mean and
The estimated models will be validated based on both variance for all residuals.
visual inspection and statistical evaluation. The
following figure shows the 1-SAP result of ARX-MDL Table 3: Mean and variance of ARX models for
model. validation data (all values are divided by 1e-004)
Models Mean Variance
measured, 1-SAP & residual. %R2=98.9875 %R2a=98.9798 ARX-NSSE 4.9932 6.2944
100
ARX-AIC 8.4102 6.3068
meas. (°C)

90
ARX-MDL 6.5415 6.4552
80 Legend: Shaded value indicates the highest result (ARX-NSSE is not
0 500 1000 1500
time (samples)
2000 2500
included)
100
1-SAP (°C)

Referring to Table 3, ARX-MDL produced the residual


90
with lowest mean but ARX-AIC with lowest variance.
80
0 500 1000 1500 2000 2500 Comparatively, the differences between ARX-NSSE,
time (samples) ARX-AIC and ARX-MDL are not significant.
0.2
resid.

0 V. CONCLUSION
-0.2
0 500 1000 1500 2000 2500
time (samples) In this paper, the influence of statistical information
criterion to the 1-SAP has been investigated. Four
Fig. 5: Measured, 1-SAP and residual of ARX-MDL criterions i.e. NSSE, AIC, FPE and MDL are tested.
model However, the NSSE is a non-complexity-penalizing
criterion but evaluated for benchmarking purposes. The
Fig. 5 shows the measured output, ARX-MDL model’s NSSE has selected model with 40 parameters, which is
1-SAP and their residual. From residual plot, 1-SAP is significantly high as compared to 20 parameters
in good agreement with the measured output. R2 test and selected by MDL. The AIC and FPE are also selected
adjusted-R2 test show 98.9875% and 98.9798% high model order, which is very close to NSSE. Based
respectively. The overall results for all models are on 1-SAP results on steam distillation data, MDL
tabulated in the following table. selected model (ARX-MDL) is parsimonious and
comparable to the other selected models, ARX-NSSE,
Table 2: The overall 1-SAP result of ARX models for ARX-AIC and ARX-FPE. Although statistical results
validation data such as R2, adjusted-R2 and residual variance did not
Models % R2 % adj-R2 shown MDL as the best criterion but it is comparable
ARX-NSSE 99.0003 98.9845 and even produced the lowest residual mean.
ARX-AIC 98.9990 98.9840 Furthermore, from graphical point of view, 1-SAP of
ARX-MDL 98.9875 98.9798 the ARX-MDL is in good agreement with the measured
Legend: Shaded value indicates the highest result (ARX-NSSE is not signal and no outliers detected from the residual plot.
included) Therefore it can be concluded that the MDL criterion
has selected a sufficient model order and the other
Referring to Table 2, ARX-AIC produced the highest R2 criterions, which selected higher orders will not
and adjusted-R2 results, which is close to the non- improve the 1-SAP to any significant results.
penalized selected model, ARX-NSSE. The following
ACKNOWLEDGMENT [7] L. Ljung, System Identification: Theory for the
User. New Jersey: Prentice Hall, 1987.
This work was conducted on the data gathered at the [8] L. Ljung, "Estimation focus in system
Faculty of Electrical Engineering, UiTM Shah Alam identification: prefiltering, noise models, and
with the support of JPbSM UiTM and IRDC UiTM. The prediction," in Proc. IEEE Conference on
authors would like to thank all staffs involved. Decision and Control, Pheonix, Arizona, 1999,
pp. 2810-2815.
REFERENCES [9] G. Etcheverry, W. Suleiman and A. Monin,
"Quadratic System Identification By
[1] D. E. Seborg, T. F. Edgar and D. A. Hereditary Approach," in Proc. 2006 IEEE
Mellichamp, Process Dynamics and Control, International Conference on Acoustics, Speech
2nd ed. New Jersey: John Wiley & Sons, 2004. and Signal Processing (ICASSP), 2006, pp. III-
[2] K. J. Astrom and B. Wittenmark, Adaptive 129 - III-132.
Control, 2nd ed. Reading, Massachusetts: [10] L. Ljung, System Identification Toolbox User's
Addison-Wesley Publishing Company, Inc., Guide, 7th ed. Natick, USA: The MathWorks,
1995. Inc., 2007.
[3] L. Ljung, System Identification: Theory for the [11] D. C. Montgomery, G. C. Runger and N. F.
User, 2nd ed. New Jersey: Prentice Hall PTR, Hubele, Engineering Statistics, 3rd ed: John
1999. Wiley & Sons, Inc., 2004.
[4] T. Soderstrom and P. Stoica, System [12] D. C. Montgomery and G. C. Runger, Applied
identification. Hertfordshire: Prentice Hall Statistics and Probability for Engineers, 3rd ed.
International (UK) Ltd., 1989. New York: John Wiley & Sons, Inc., 2003.
[5] M. Norgaard, O. Ravn, N. K. Poulsen and L. K. [13] P. I. Good, Introduction to statistics through
Hansen, Neural Networks for modelling and resampling methods and R/S-Plus. New Jersey:
control of dynamic systems. London: Springer- John Wiley & Sons, Inc., 2005.
Verlag London Ltd, 2000.
[6] J. Eborn, "Modelling and Simulation of
Thermal Power Plants," Lund, Sweden: Lund
Institute of Technology, 1998.

You might also like