DML 2

University of Warwick institutional repository: http://go.warwick.ac.uk/wrap A Thesis Submitted for the Degree of PhD at the University of Warwick http://go.warwick.ac.
uk/wrap/3955 This thesis is made available online and is protected by original copyright. Please scroll down to view the document itself. Please refer to the repository record for this item for information to help you to cite it. Our policy information is available from the repository home page.
DISCOUNT BAYESIAN
AND FORECASTING
MODELS
JAMAL RASUL MOHAMMAD
AMEEN
Statistics, t5epartmenF of Warwick, University of 7AL CV4 Coventry .,.,

; ..,.--''
DISCOUNT
BAYESIAN AND
MODELS
FORECASTING
JAMAL
R. M. AMEEN
PH.
D.
DEPARTMENT OF STATISTICS
UNIVERSITYOF WARWICK MAY 1984
TABLE OF CONTENTS
page
1- CHAPTER 1.1 Status 1.2 Outline of the Thesis ONE : INTRODUCTION
t
3
2- CHAPTER TWO : DISCOUNT WEIGHTED ESTIMATION 2.1 Introduction

2.2 Exponential Weighted Regression
5
2.2.1 The model
2.2.2 EWR and time series 2.2.3 Some comments on EWR 2.3 The simultaneous adaptive forecasting 2.4 Discount weighted estimation 2.4.1 The model 2.4.2 DWE for time series 2.5 Applications 2.5.1 A simple linear growth model 2.5.2 A practical example: The U.S. Air Passenger data set 2.6 Summary
7 7 8 10 10 11 13 13 15 19
3- CHAPTER THREE : -YNAMIC LINEAR MODELS 3.1 Introduction 3.2 The DLM's
3.3 Relation between DLM's and DWE's 3.4 Some limitations and drawbacks
20 20 22 23 26
3.5 Summary
4- CHAPTER FOUR : NORMAL DISCOUNT BAYESIAN'MODELS
4.1 Introduction
27
4.2 Normal Weighted Bayesian Models 4.3 Normal Discount Bayesian Models
28 31
4.3.1 The model

4.3.2 Forecasting and updating with NDBM's 4.3.3 Coherency
4.3.4 Sequential analysis of designed experiments 4.4 Other important special cases
31
33 33
35 36
4.4.1 The modified NDBM
36
4.4.2 Extended NDBLI's

4.5 Summary 5- CHAPTER FIVE : ON-LINE VARIANCE
5.1 Introduction 5.2 The Bayesian approach
38
40 LEARNING
41 42
5.3 Non Bayesian methods :A short review

5.4 The power law 5.5 Summary
43
44 47
6- CHAPTER SIX: LIMITING RESULTS

6.1 Introduction 6.2 Similar models and reparameterisation 6.3 A common canonical representation 6.4 A general limiting theorem 6.5 Relations with ARIMA models 6.6 Summary 7- CHAPTER SEVEN : MULTIPROCESS MODELS WITH CUSUMS 7.1 Introduction 7.2 Historical background and developments 64 66 48 48 53 60 62 63
7.3 The backward CUSUM
68
7.4 Normal weighted Bayesian multiprocess models 7.5 Multiprocess models with CUSUM's 7.6 Summary
70 73 76
8- CHAPTER EIGHT : APPLICATIONS

8.1 Introduction
8.2 A simulated series of the data
77
79 79 80
8.2.1 Simulation 8.2.2 Intervention
8.2.3 Multiprcess models - The artificial data

8.3 The prescription series
83
87
8.3.1 The data

8.3.2 NDB- multiprocess 8.3.3 The CUSUM models : Known observation : Known observation variance variance
87
87 90
multiprocessor
8.4 The road death series 8.4.1 The data

8.4.2 The NDB multiprocess model with CUSUM's
91 91
91
8.5 Summary
95
9- CHAPTER NINE : DISCUSSIONAND FURTHER RESEARCH 10-APPENDIX

11-REFERENCES
100
102
ACKNOWLEDGMENTS
I would like to acknowledge my great indebtedness and convey an expression of

P. J. Harrison for his Professor to thanks assistance, guidance, many throughout the preparation of this work. Thanks and encouragement
also to the members of staff and my of Warwick for many valuable
fellow students at the Department discussions and the Computer
Statistics, of
University
Unit for their helpful assistance and facilities.
Finally
I would like to thank the University of Sulaimaniyah ( Salahuddin at
Research-Iraq for financial High Education Scientific Ministry the the of and present) and
support.
To those
I love so much
I owe so much
SUMMARY
This thesis is concerned with Bayesian forecasting concept of multiple parsimony. discounting In addition, is introduced
and sequential estimation. to achieve parametric
The and
in order
conceptual Dynamic drawbacks
this overcomes many of the drawbacks of the Normal which uses a system variance matrix. to the scale of independent (NDBM) is introduced These A
Linear Model (DLM) specification involve ambiguity Discount
and invariance
variables.
class of Normal difficulties.
Bayesian Models learning
to overcome
these
Facilities
for parameter
and multiprocess
modelling
are provided.
Unlike the DLM's, Normal of DLM's Weighted
many limiting
results are easily obtained for NDBMM's. A general class is introduced. This includes the class of NWBM's and for A
Bayesian Models (NWBM)
Other important case. special a as These are particularly according
subclasses of Extended and Modified useful in modelling discontinuities
introduced. also are systems which
operates
to the principle are given.
of Management
by Exception.
illustrative of number
applications
CHAPTER
ONE
INTRODUCTION
1.1. STATUS
The study scientists Indeed,
:
of processes that are subject to sequential developments, has occupied
for a long time and is currently in the majority often
one of the most active topics in statistics. arrives sequentially its plausible pattern according to and hidden more reliable and control
of real life problems, information desired detect it is to and to facilitate control,
some index, characteristics estimates engineering In the past,
time,
in order
reduce noise and obtain quality control
future and
predictions.
The areas of economics, See Whittle
are full of such examples.
(1969), Astrom (1970), Young(1974).
( ) Bayesian have been used to analyse time series non procedures passive procedure seem to be through categories. model construction. Models
processes. The most popular
into different broadly two be classified can Social models provide structures
one of these is called Social Models. behaves. Social or
which govern the way the environment
The be Scientific this of members class. other class may are called political organisations Models, These aim to build structures which fit specific environmental characteristics as important An closely as possible. subclass which is the concern of this thesis concerns
' The is build that to chance. aim of elements models and measure contain environments 'a deeper in to their adequacy understanding of the causal mechanism obtain order Scientific Models is Statistical Models This the of subclass called environment. governing Throughout its tools. the thesis, models are principle and as statistics with mathematics
Statistical Models be specified to otherwise. unless meant

In the classical sense, a time series is a sequential series of observations on a Wold(1954), time. phenomenon which evolves with suggested that a time series process can be decomposed into deterministic components like trend and seasonality with -a
-2random component among the common criterion Kendall, caused by measurement short term forecasting functions errors. Before the appearance of computers, the so called Moving Average
procedures, through
was used to fit polynomial, Stuart and Ord (1983). of computers,
least squares.
This is reviewed in, references. With the, during the late and
See also Anderson (1977) for further
development
the most widely Weighted Moving into
used models in forecasting Averages (EWMA) the ICI and forecasting Brown's
50's were the Exponential
and Holts growth method, Exponential in Chapter
seasonal, model -which ; later embodied in the computer
developed package -
DOUBTS, Weighted
of MULDO
Regression,, (EWR), stimulated
Brown(1963).
These models are reviewed
2 since they
much of the research described in this thesis. well known and widely used class of models is the Autoregressive models of Box and Jenkins (1970). Integrated
Another Moving
Average (ARIMA)
'Given 'a series of observations {y} and uncorrelated random residuals {a, }, having a. fixed "distribution, usually assme`dNormal, with` zero mean and a constant variance, an
ARIMA(p, d,q) is defined in the 'notation of Box and Jenkins by:

(i +d +( 1B+... PB)(1-B)dyt=(1+01B+... +9,8')e
where B is the backward shift operator, B y1 = y&_1,and 44D
of ;p, ; A1.......
q and
d are constants whose values belong to a known domain, and are to be estimated from the available data ( parameters in a non Bayesian sense).
Despite the existence of a vast amount of literature, number of unknown natural descriptive constants, meanings.. that are often difficult Further, these models depend on a large since they do not have the recommended Moreover, mean the and. '
to interpret using
for, estimation
square error criterion, resulting make
a considerable They, of
amount
is data required. of past
models are not robust. in the form
demand stationarity subjective information
or derived stationarity difficult. For
intervention,
example, -
discontinuities
the estimates. all can ruin and sudden changes
-3State Space representations , (1963) have gained considerable and the works of Kalman grounds -regarding
and Bucy (1961) and Kalman performance and reduced Bayesian
fast
computer
storage
problems.
However,
a natural about
recipe would
have a fully quantities
representation through
in which the uncertainty distributions. provide
all the unknown
is expressed
probability
The Bayesian Dynamic This
Linear Models of Harrison and is reviewed in Chapter 3 and a
Stevens (1971,1976) number of limitations In Bayesian
such a foundation.
and drawbacks
are pointed out. series process is defined to be a parameterised joint probability distribution for all t. Initially,
terminology,
a time
process I Y, 10,1 possessing a complete there is available. prior information the parameter are represented
( that is incorporated
in the process analysed ) about
definition This 0,. vector
is adopted from Chapter 3 and onwards. Vectors while capital bold phase letters are used for
by bold phase small letters
matrices except the random vector Y,. 1.2. OUTLINE OF THE THESIS : Weighted Regression (EWR)
In Chapter 2 the one discount factor Exponential
Simultaneous (1967) Adaptive Forecasting Harrison Brown(1963) the and of are method of reviewed with some critical comments. discount principle technique. The EWR method is then exploited using the
to introduce the general Discount Weighted Estimation ( DWE ) allows different discount factors to be
This includes the EWR method,
different for introducing to and provides model components a preparatory ground assigned
the discount principle into Bayesian Modelling. The method is then applied to the U. S.
Air
Passengers series and the results are compared with

Integrated of Harrison Moving Average ( ARIMA
those of DOUBTS
The Dynamic
and
Autoregressive Models (DLM's) initial
) models.
Linear
Stevens(1976) and
are reviewed in Chapter 3.
Given some forecast
prior assumptions,
for each DWE model,
there is a DLM with an identical
function. Some limitations and drawbacks of the DLM's are also pointed out.
-4-
'4 into Bayesian introduced is in Chapter The discount -, through principle modelling
Normal ;Weighted Bayesian Models ( NWBM's ). This includes the class of DLM's as a
case.,., Other. important, special Bayesian Models - (NDBM's), introduced. and parsimonious subclasses like Normal Modified NDBM's and Extended NDBM's Discount are also
The possibility of including time series models with correlated observations,,

on the coherency of these models and their A short outline of the existing on-line relation estimation with the of the for
and some brief comments previous models are given. variance
observation
is given. in Chapter
3 and practical
procedures
are introduced
variances that have some known pattern or move slowly with time.
r, ",.
Chapter
6, is devoted to ; reparameterisations
of, any NDBM,, transformations for calculating
and limiting
canonical adaptive
results.
Given the
A
eigenstructure
to similar
forms are available.
direct procedure is provided
the limiting
factors with no reference state distribution is
to the state precision or covariance often quickly reached. This
matrices.
In practice, results
a limiting useful
limiting such makes
and saves unnecessary
computations such as the adaptive vector.
Given the adaptive factors, limiting
state
for independent Results and matrices be of each other. calculated variance can precision ''given. NWBM's Limiting predictors are compared formulations somecannical are also of ARIMAmodels. with those
In Chapter sketched.
-'This leads to generalisations of some previous results.

Management of by Exception is its in forecasting use and models had largely replaced for errors However, models
7, the principle
In' Bayesian
forecasting,
the use of multiprocess
the backward
Cumulative
Sum (CUSUM)
of the one step ahead forecasting specific targets.
detecting changes CUSUM's
departures and
level from the process of
are reintroduced
to forecasting principle. Models
systems and operate with These provide with CUSUM's. .
multiprocess
discount the based on which are models called Multiprocess characteristics NDB
both economical A number
and efficient
of applications
having different
are considered in Chapter 8.
Chapter in is 9. Attention discussion Finally a general given directions for future to in possible research. progress, and work
is also paid to further
-5-
CHAPTER
TWO
DISCOUNT
WEIGHTED
ESTIMATION
2.1.
INTRODUCTION:
Operational simplicity and parsimony are among the desirable properties in model
constructions, (1984). model (EWR), errors.
The word - parsimony
' is used here in the sense of Roberts and Harrison of unknown constants involved in the
The order of parsimony construction. minimising Brown
is the number
(1963) developed
the Exponentially
Weighted
Regression
the ` discounted
' sum of squares of the one step ahead forecasting factor, it has parsimony of order 1. It methods, that
As the method depends on one discount
in be the comming sections, evident will the information with
as is the case in many forecasting
content of past observations
about the future state of the process decays factors. The discount concept is a
its age and this is accomplished
using discount
key issue of the thesis and will be exploited in this and the later chapters.
In this chapter, Exponential Weighted Regression is reviewed in Section 2.2 with the The DOUBTS is introduced method is reviewed in 2.3 and in 2.4., matrices Ameen and Harrison and provides simple
being on time series construction. emphasis the Discount (1983 Weighted Estimation method
a). This
generalisation formulas.
EWR of
uses discount
recurrence updating is constructed
In Section 2.5 a simple linear growth seasonal DWE model application is given using the U. S.Air passenger data series. and Box-Jenkins.
and a practical
The results are compared with those of DOUBTS
2.2. EXPONENTIAL
2.2.1. The Model
WEIGHTED
REGRESSION
One general locally linear representation of a time series process at any time, t future is Ytt, outcomes : with t
-6-
Yc+k
-/O+kO$,
k +t+k
Ec+k
V1 -[O,
(2.1)
where the components
[f )Jj+k independent (2),.,., (n (1), / / /L+k are of, = variables or known
functions of time 0'i k =A(1),A(2),... A(n))i, k are unknown with the subscript t, k indicating , that their, estimates are based on the data available up to and including is a random error term ( ee+k variance V ). Usually,, .,, O,, and V are called the parameters of the model and in a Bayesian sense, k However, (yi, f in EWR models, these are assumed 1'. (0 ,V] time t, and Ei+k
is short hand for the mean of E$+k being 0 and the
they have associated prior distributions. as constants for the past data De ={(ye, f),... O k =0 is estimated &, '
Given a discount factor 0< 1)}.
by m, as the value of 0 that minimises the discounted sum of squares: (2.2)
Se
'(ye-; -/e-i0) i. o, equating the result to 0, and
Differentiating
(2.2)'with
respect to Oat 0=m,,
(2.3) . -0
Now, define
q c-1
(2.4)
(2.5) i, o
'Assuming that Q, -1 is the generalised inverse of q, it can be seen from (2.3), , that
This
'and following (2.5), (2.4) the gives , with
relationship
on r.,
where as = Q1-1/'&
is by, forecast k The given point ahead steps mj_1" yt -fg and e=
-7-
I+k mt.
2.2.2. EWR, and Time Series

In time series processes, the form of the forecast to a reasonable degree of approximation. General function can often be specified up predictors can be
polynomial
constructed
through specifications of the design vector,
A simple and efficient way, where Cis a non singular matrix
(1963), by Brown into define fg, je Ck, as presented k= with dimension n. Therefore, f=j using the notations
and the criterion
of Section 2.2.1, with
being independent of time the alternative , Qe=IBC. -iQt-1C-t_
forms of (2.4) and (2.5) are : f If (2.6)
and
Le=C-'kt-,
+f 'yt
(2.7)
The current estimate of 0 at time t, is then given by
where
and a, =Q, - j'
2.2.3.
Some Comments
on EWR
In order to get some insight into the terms and equations obtained in Sections 2.2.1 and 2.2.2, consider the minimisation by maximising (2.2) of again. Note that the same estimates of 0 Given that E,+, is a Normal in (2.4)
can be obtained
L=exp{-S=/(2V)}
for 0. function Information
random variable ,L
can be called the ` likelihood to the so called Fisher's
' of 0 at time t.
(2.6) is proportional and second derivative Bayesian sense.
( matrix about ni minus the ' matrix (2.6), in a Qj is and
of L with The /'/,
respect to 0 at 0=rag) constant content is from
' precision the or , V. Moreover, in
proportionality the information
decomposed into PG'`'Qi_1G-i,
the most recent observation
the information
contributed
from the past data,
discounted by P. This,
8. together with the convergenceof (2.2), restricts the values of to the range, 0<0<1. Thus the role of the ` discount factor ', describes the rate at which the information about Moreover, distinct X2,... X1, X. time. given as eigenvalues changes with model parameters of'C the convergence of ( 2.6 )'requires that 0< (/X;'I < 1. This can be seen on
(2.6) as rewriting
Q, R`
Combining
Q0C-e
i0
R,C, -: J "fC
the restrictions on ,
IX112,1x212.... IxI2}. have 0<<min{1, we
The
from follows the convergence of Q;. the vector adaptive st convergence of 2.3. THE SLMULTANEOUS ADAPTIVE FORECASTING: can be decomposed into
variation.
Consider a time series process that

components of trend, seasonality,
three different
the seasonal
and random
Suppose that
component changes relatively very slowly, `so that the greater percentage of the predictive ( data in the is trend to analysed at variation and random changes variation attributable the end of this chapter, is of this type). E%VRassumes that the loss of information with for both the trend the and seasonal components, whereas we rate same age, occurs at know that the information on the seasonal component is more durable, and hence, more appropriately
appropriate
discounted using a much higher discount factor
than that
is which
for the trend component.
This led Harrison (1967) to propose an alternative linear growth and seasonal method
EWR to which considered a simple multiplicative procedure 2. model of parsimony, DOUBTS That work led to the development Forecasting,
forecasting the of
Simultaneous or,
Adaptive
C. I. I. basis is the the short of which (1965). Scott and of DOUBTS Whittle , with (1965) some
term forecasting, computer examined comments. .: the method.
MULDO. package following
Harrison
The
is a short
review
The k-steps predictor
FF(k) is Ft(k)=(mc+kbe)Sj(k)
9.
where
mm , -t+6i_1--(1-031 6=bi-i-(1-431)2e s)e
e =y1-Ft-itl)
factor. discount is trend the i31 and The seasonal component for k periods ahead is given by ,
n
Se(k)=17
=t
{a$(t)cos(H,
zk)-b.
(t)sin(Hlzk)},
where there are n significant the range 1 to T/2,
harmonics,
with H, taking
the appropriate
integer values in 27rk T
(t), and a,
and 6, (t) are the harmonic coefficients at time t. xk=
length T is the of the seasonal period. where Given

a&= a1, b1b,. I&
c= diag{c1, cz,.,.,. c* }
Ck
cos(zk)
si(zk)
-ain(zk)
cos(zk)
Then
at=iia1-1+aet
(Yt/mt) e'e"
is Brown's adaptive constant vector, where a
whose elements are functions of the
be details More found in Harrison (1965). factor Scott discount (3z. can on and a seasonal Although it is not intended here to proceed with the generalisation of this method, by the end of this chapter it will be evident that higher degree and parsimonious
to polynomials with more economical but still efficient seasonal components can be
accommodated. However,
like its predecessors, the method is limited and suffers from It is purely a point estimator. Unlike Holts
both theoretical and logical justifications. seasonal forecasting method, equations while
components . Other means of constructing used through stochastic adaptive
the seasonal effects are included in the trend updating contribution in is removed updating mi the seasonal
the trend
vectors for sequential estimation Gelb (1974).
purposes are that are
approximation, in any statistical irrespective
These provide estimates
not necessarily optimal convergence (1982). properties
sense. They possess some desirable, uncertainties.
well defined
of the parameter
See also Maybeck
2.4. ` DISCOUNT
In this section,
WEIGHTED
ESTIMATION
the idea of ` discounting
', as discussed in Section 2.3, is generalised
to ` multiple discounting '.
This provides a new class of models called Discount Weighted
Estimation (DWE), using different discount factors for different model components.
2.4.1. The Model

Let a time series process be represented by (2.1), Eg+k (k>0) be independent of
be by D, time t. Oi, data Dt_1}, f1), D4 estimated the at =0 given mg and ={(yi, k , ,
DEFINITION
A DWE model is given by
E(YI+k IDe, Ij+k]-le+x i,
where : ='ne-i Aae -I qt . + aeee (2.8)
(2.9) t
11 -
et'Yi-finit-i
(2.10)
+I 'tft
Qg =BIQt-lB1,
(2.11)
B=diag{R1, 2,... {3}
; 0<13, <1
9-
i=1,2,... n-
(2.12)
The EWR model is retained when B ={31 where I is an identity Notice that inversion only Qi-' and not Q, needs to be calculated Q,-' has been around to obtain
matrix mi.
of order it. the
Although
method
for obtaining Lindley
for a long time
Henderson et
later al(1959) and appreciated
and Smith(1972),
even now it does not seem to be generally recursion which avoids
by practitioners
even in E%VR case. A more attractive
inversions matrix
and their associated problems is to replace (2.9) - (2.11), by :-'

1=(I-a, j,, )Rt 2.13)
R, =B
"Qe-t-iB-"
(2.14)
at =Re!
'e(1+IeR,
f'e)-1
(2.15)
It can be seen from (2.11), that any initial value for Q. and hence =o be will , ,
dominated ( a around after small number an , the dimension of 0) of iterations. In cases
ignorance initial the of ( say usually are ,
default settings Qo-1=a1, and no is a large number, 105 where a , adequate for operation. However in most cases, there is at least a
vague impression that f
of the size of the elements of 0 which will give a better value of no so From 2.2.3 know that we , Qo= VCo-1, where, Co represents
is close to yl. no matrix
Fisher's Information upper limit for V,
about 0 at time t=0.
Then
1 Q0 can be set by assuming an marginal value c; for 0; and
the variance of c; choosing a liberal c2,.,.,. cJ/V These ideas are illustrated .
setting Q-1=dia9{cl,
by an example in 2.5.
2.4.2. DWE for Time Series The principle of superposition states that any linear combination of linear models is a linear model. Model builders often use this in reverse, decomposing a linear model and
12.
extending
the principle
to statistical
models using the fact that a normal
random vector The major point linearly to Ck diag
may be decomposed into a set of components of normal random vectors. is that obtain the component a complete k>0, models can be built Hence in practice, separately
and then combined
model. often
given a time series for which j, _k =f r components and written C=
for all t,
C is decomposed into
(C1, C2,.,.,. C, }, where C is non singular. assuming that the n, square matrix n; n.
The case of special interest
by be covered will factor and
C; has a single associated discount
DEFINITION The method of DWE, for time series, is given by the forecast function A; =fC Mg
E[Ye+itIDI]=fI+knag
by is calculated recursively where Mt

MtCn g-t+atet
dL=4e
1j=Rej(/Ril,
+l)-i
_ (l - s j )Re
B-ti
2.16
_B-yCQ`-1-'C,
<B=dia9{, 0<; } 1,,... ,!, h, , where and order n,"
<1,
and Ii is an identity matrix of
THEOREM
2.1. r , are non
For the DWE method defined above, if X0, X;,2,...X; *i=1,2,... Q0, bounded then' for limQ, G; all and =Q exists :._ zero eigenvalues of e-=
-13-
IX, 1ki 12} 0<3 <min{1, 1121I\,. 2I2,...
PROOF
Using (2.16), we have,
Qt=B Is , -` Qv1p " ,r
t- t t2ttt2k2 Gl-
QOG
k=0
, -k
kk2
since B' and C-I
commute.
The convergence of (2.17), gives that
)21 O <1 ;i=i, 2,.... ; 1=1,2 <IR;/(A;,; n, ,...

The result is obtained by combining this with the conditions 0<; < 1.
Some modellers have suggested to move beyond these assumptions in order to

increase models adaptivity, Muth (1981). Clearly such models produce highly unreliable
A introducing forecasts. temporal adaptivity proper way of un sounded and statistically is dealt with later through discounting the prior information. Under the above assumptions, the recursive formulas converges considerably fast to benefits, be later, form. Apart from limiting these provide as will seen computational a limiting justifications of many commonly used forecasting structures in literature. in spirit, uncovers models. 2.5. APPLICATIONS:
linear
It also
the partial success achieved by some classical models like the ARLMA
2.5.1.
A simple
growth
seasonal
model for normal random variables that suppose a ,
Using the principle
of superposition
time series model can be constructed using a linear combination of a linear growth,
-14-
seasonal and random components. The linear growth model may be described by the pair 10 1 ]} . This
is evident since I1Ci =i1, k]. Then if m9 and b&are the present estimates of the level and growth rate, the forecast function of this component is j1 GkMt = m, - kb,, which is the familiar Holt-Winters linear growth, function.
Any o additive seasonal pattern T ._. S(1) .... ,. S(T), for which n is the integer part of
T+ 1) /2
and such that

j=1
S(j)=0
be expressed in terms of harmonic functions as can .
S(k)=(a,
co.-(kwi)-b;,
-in(kwi))
where w =2ir/ T.
Equation '(2.18) scan be represented equivalently as

, ... _, S(k)=[i k a; cosies sinlw `-siniw Cosiw 6.
An alternative seasonal component model which gives an identical performance to that previously discussedis
I2 [/2,1'f2,2'
C2 = diag {C2 11 C2 2,.,.,.
2, n]
02.0
/2
of this
k=0J
harmonic
and C2 k=
representation
[CO3
sin (kw) occurs
,; siwhere cos (kw) when there
k-
1,2,... n. The practical an economical . For example,
benefit seasonal Box and
exists
representation
in terms of a limited
number
of harmonics
Jenkins (1970)examined the mean monthly temperature for Central England in 1964 and demonstrated that over 96 % of the variation can be described by a single first harmonic the rest of the variation being well described as random . In this case the seasonal pattern is captured by two unknowns rather than eleven as in the full monthly seasonal
-15description. In applying DWE it is generally advisable to associate a discount factor t with the linear growth
description.
component but
have a higher discount factor, z for the seasonal

is more stable than the
This is due to the fact that often the seasonal pattern
trend.
The full linear growth seasonal model is then
{f 1"f 2i; diag(C, C2i; diag{(31I2, (32I2n}}
where Ik 2.5.2.
k=2,2n,
is the identity matrix of dimension k. Example: The U. S.Air Passenger Data Series
data from 1951 to 1960 is The series is a
A practical
For comparison analyzed.
with other methods the ten years monthly
The data is given in Brown(1963)
and Box and Jenkins (1970).
favourite with analysts since it has strong trend and seasonal components with very little
randomness . However, it is not a strong test of a forecasting method. Harrison (1965)
EWR by Brown the that method proposed cannot achieve a Mean Absolute showed Deviation (MAD) of less than 3% since it insists upon using a single inadequate discount factor in a case in which the trend and seasonal components require significantly different discount factors. He stated that if, on this data, a method cannot achieve a MAD of less than 3%, then that method can be regarded as suspect. Harrison analyzed the data using the DOUBTS method described in 2.3. In this section the DWE model { J, C, B } is applied to the logarithms of the data
using: j=(1,0,1,0,.,., 1,01, C=diag{CI, CZ,.,.,. C5}
(1 11 I-jin(kw) toa(kw) f or the trend and Ck C1 __ l0 1= 1J Oh harmonic the description
sin(kw)l for k= 1,2,.,... 5 as representing J eos(kw) w=7r/6 . The point forecasts
of the seasonal pattern with of the DWE results.
where obtained as the exponential
-16
"A ( for discount factors l the trend relating to C1 ) and p2 for pair of was used with
the seasonal block. Initially the' prior specifications was

.100 001 . .1,0 0000.021
(, no9QO1=(
0}
which corresponds to a specification
of no seasonal pattern
!. a level lying within
a 95
interval [ 80 ; 280 1 and a monthly growth of between 4c and W 'c per month.
Hence
this is a very weak prior although it does not assume complete ignorance. Fig. 1 presents
the one-step-ahead point predictions For comparison, with the observations. errors over the last six years were
the one step ahead forecast
MAD DWE 2.3% achieved. performance of obtained and a 'Another Writing book data is in Box Jenkins. known the the given of and of analysis well of the t' observation and aj as the corresponding one step
logarithm the as z
ahead errors, their predictions
difference the equation: are obtained using Zt_13+at-13"0 -i-'ac-iz+Awae-ia
=e-ie-1+ze-12
This is 61. 4 is 9=. 41 also of and method =. minimised when where the mean square error parametric following 2 the and parsimony table indicates the comparability of the
(0.84 DWE discount DOUBTS that the pair with same of and performance with that of 0.93) as described in Harrison (1965) and with the discount pair (0.76 , 0.91) which MAD z the the errors. of reduces
-17-
The Mean Absolute
Forecast
Errors
For The Year 1955-1960
DOUBTS
Year 1955 1956 1957 1958 1959
1960
DWE
(. 84,. 93) 7.7 5.4 5.5 14.7 11.5
11.5
DWE
(. 76,. 91) 7.0 5.4 5.6 13.7 9.8
12.1
B&J
(.84,.93) 9.4 4.3 5.5 15.2 11.8

11.0
8.0 4.5 6.1 14.0 8.7

14.2
OZEAN
Clearly,
9.4
9.4
difference
8.9
9.3
in this example,
no significant
is observed between the above
results.
However,
DWE has the properties of being more general, parsimonious,
intervention can easily be accommodated in the phase of sudden changes and these depend
on a small number discount assessed easily of factors. The following table illustrates
MAD. discount in for different terms of of pairs selection models sensitivity
P2 P,
0.6 0.7 0.8 0.9 1.0
0.8
9.71 9.45 10.41 16.56 32.58
0.9
9.39 9.08 9.12 11.91 22.59
1.0
15.36 15.3 14.15 13.77 16.44
Passengers
Iv W
(*1000)
to 00 00 014
OOOO OOOO O
o..
co
3. -
01 0
rf
co r
o'
0 co
N O
"19
2.6. SUMMARY:
The methods of EWR and DOUBTS are reviewed and some general comments ,
drawbacks introduced introducing and limitations are pointed extension out. The estimation procedure of DWE is for
as a fruitful
of EWR
to provide
and prepare solid grounds
the discount concept into the Bayesian Forecasting.
The method is applied to when compared with
the U. S.air passenger data set and the results previous existing ones.
are encouraging
-20-
CHAPTER
THREE
DYNAMIC
LINEAR
MODELS
3.1. INTRODUCTION
One applications, intelligence of the main
:
contributions to Bayesian Statistics, both in theory and
is Bayesian forecasting. with the information
This provides a natural by the data. The method
way of combining
experts and
provided
The DLM's
of Harrison
Stevens (1971,1976) for many well
provide such means. classical
also gives limiting Brown(1963),
justifications Holt(1957),
known
forecasting
procedures,
Winters(1960), Priestley engineering
and Box and Jenkins(1970). In particular,
State space representations amount of literature
can be found in is available on has
(1980).
an extensive Filter,
applications
of the Kalman
Kalman(1963).
Bayesian Statistics
introduced facilities the the and phenomenon of on-line of random understanding widened variance learning, intervention of stationarity. has relaxed the assumption and modelling multiprocess ,
The method provides forecast distributions rather than point estimates.
In this chapter, DLM's are reviewed in 3.2 and relationships with DWE methods are
discussed in 3.3. for The DLM recursive formulas in the parameter estimation are attractive problem.
the ease of elaboration
and reduce considerably
the computer
storage
However,
the method is not free from drawbacks. This,
The specification
of the system matrix is discussed in
W has caused problems in practice. 3.4. A short summary
together with other difficulties,
in is 3.5 the the given chapter of contents of
3.2. THE DLM's
The class of DLM's as defined by Harrison and Stevens (1976) constitute quadruples {F, C, V, W}1 with proper dimensionality. A particular parametrised process { YY!0, } can
be modelled using this class of models if the following linear relations hold:
21 -
i)
YY=PtCt-"j
ii)
Oi=CeOe-i+'re
we..'Ni0+We1
(3.2)
The first of these equations is called the observation observable vector structure Yt to an unobservable state parameter distributed
equation vector
which combines the error The
and an additive
Normally be is to assumed which
with mean 0 and variance V. with time.
describes the evolution second equation, stated, the random vectors
of the state vector
Unless otherwise with known
v, and w, are assumed
to be uncorrelated
WW Vi respectively. and variance matrices Given an initial it follows that (YY IDe-t) -N[1e; ft 1 (Oe(De)` N[me+Ce 1 (3.3) prior (03IDo)-N'mo; C0;, using Bayes theorem with Di={1&, De_1;,
(3.4)
where:
li=PjGjm<_1 ; Yj=PjRjp, 4+VJ (3.5)
wi ==&-i+A&ej
(3.6)
CC =(I-AeFe)R1 R =C CC_,C't+W Al=R5F'5 Tt-1= C5F'I V, -1
(3.7)
(3.8)
(3.9)
cc=It -
(3.10)
When {P, C, V, W} are all known and are not dependent on time DLM is then the , DLM called a constant
-22-
3.3. RELATION The updating
BETWEEN
DLM's
AND DWE's
: suggests a connection between In order to establish this
(3.5)-(3.10), recurrence relations
estimation using DWE and the estimation the ` we first following give relationship,
DEFINITION For any DWE {j, C. B}, definite, .a -, corresponding -e Qt-i-1 with initial DLM
using DLM's.
setting (mo; Qo), where Q0 is nonnegative is given by (f , G, V, W}i where
!V, =(gl Qti-Lfl In case of. Q, _i
C't) V0 is nonnegative definite and H =B, -'Ci. Q, -, -t represents a generalised inverse of Q_i.
being singular.,
THEOREM
3.1.
the corresponding , Further, DLM produce a forecast function (Oi ID1)--N[m,; C,; where mi is the
For the DWE {f, C, B}i identical to that obtained
by DWE. -1VV.
DWE estimate and Cj=Q,
PROOF: From the initial settings, the theorem is true for t=1. Using induction, suppose
true for {t minas 1 }. From the DLM results, we have:
'
4'
Bi = Ci Qi-i-1 G 'g Vi + WV=8, Qj_1-1H' V,
and since
CC-1=Rg-t+ f'tfo Ve_1=Lff lQe-ige 1+1 'ifjJ/VV
Q1/V .
we have -'
-l
Ce = Q,
Ve
-23
)
E; OD=G m ei=m _1+a ai=Cif'e/Vt=Qi-lj'i, the DWE estimate, since
the DWE s,
iii)
The forecast function is

k
E; YY+kIDe]-fe+.,
H CI.
i=1
$m=
wm for the DIVE.
COROLLARY
3.1.1. the DAVE if gives a forecast function identical to that of
For t>0.
the DLM {f , C,, V,O}.

PROOF Obvious since from the definition WW =0 for all t.
is unusual in its dependence upon C, the In DLM terms the above setting for WW _1, The D, that the 0, the given concept observers concerning observer of uncertainty _1. depends his information is development the future the process upon current of also view of Souza(1978) and in Smith (1979). Forecasting, Entropy in adopted
3.4. SOME LIMITATIONS
AND DRAWBACKS
Time series processesare often best described reasonably using parametric statistical intervention be in In this can various stages of performed case, efficient model models. framework, Bayesian Within the the analysis. DLM's are often used for this purpose.
However, the latter require experience in the representation of innovations using Normal distributions. probability The specification of the associated system variance matrices has
Practical because V the arise problems of non uniqueness obstacle. of major a proved lack familiarity because the W of and and with such matrices causes application
difficulties and lead practitioners to other methods. Even experienced people find that
-24
have little feel for
they
natural
quantitative
the elements
of these matrices.
Their
ambiguity
arise since there exists an uncountable
number of time shift reparametrisations
which have identical forecast distributions.
For example:
The constant Normal DLM

Yt=et+ve
of =, \9t-t
+ W9
v` with Wt
--N'O; U=
V-as as
aS W-a(1-X
Zj )S
can
be
represented
as
Yt -x Ye-1= vj -, v, _14-wi. loss of generality,
This is a stationary series,
process provided that III I <1, equivalently as
and without
for an infinite
it can be written
x Y, =vt+- "Z x` we_i

i=0
so that
Var(Y,
z
)=V+W/(1-X2)
and
>. "v
;t... t
Cov(YY, Yt+k)-kk W/(1-X2).
Provided
that
is
covariance
matrix,
the
joint
distribution
of
Y, does not depend on a. i. e; for infinitely many values of the variances Y,, YI+l, Y: +2,.,., +k of the u's and'w's, the same forecast distribution .
V, W and C using sample autocovariances, see
is obtained This generalises easily to .
higher dimensional DLM's

Attempts
are made to-estimate the ambiguity
Lee(1980), " however, constraints
in these systems is always evident variance is also not invariant
unless further to the scale on Ameen The
'added !. The system error are variables
independent the which and 'Harrison
are measured. this
To overcome these difficulties, matrix by a discount matrix.
(1983 b) have replaced
system
is the 'understand this to concern Of is both to elaborate and simple and procedure easy
-25the coming chapters.
-26-
3.5. SUMMARY
The class of DLM's is represented in 3.2 and its relation with DWE estimates is DLM having DWE is there the same It in 3.3 that a exists model, given a shown given .
forecast function. Limitations discussed in 3.4. DLM drawbacks s are of and
CHAPTER
FOUR
NORMAL
DISCOUNT
BAYESIAN
MODELS
4.1. INTRODUCTION:
Two desirable properties conceptual estimation. estimation parsimony. Chapter of applied mathematical models are ease of application and
Hence the attraction 2 was concerned with
of discount factors in methods of sequential the method of DWE which generalises the by Brown(1963). In is
Exponentially of method
Weighted
Regression promoted
the simplest situation
a single discount factor
3 describes the rate at which information
lost with time so that if the current with
information
is now worth M units then its worth , However if a system has numerous particular components may be
is is k M I3 to units. ahead steps a period respect then the discount factor
characteristics
associated with
different values. The DWE method provides a means of doing this but it take to required is strictly a point estimation method.
The Bayesian approach to statistics is a logical and profound method. In forecasting

it provides information consequently). to provide as probability distributions ( support (likelihood) functions follow
These are essential to decision makers. Forecasting methods founded
The major objective of this work is upon the discount concept. This
Bayesian
ICI forecasting in been the has applied concept Harrison(1965) forecasting method and Harrison Scott(1965), and
package MULDO
as described in 2.3 , Hierarchical is a particular and other
ICI Multivariate the and The former The
package, does
Harrison, Leonard not easily
Cazard(1977). and Whittle(1965). Dynamic
which
generalise,
latter
applications have limiting Harrison
have been based upon Constant forecast functions Harrison equivalent
Linear Models ( DLM's
). which and
to those derived using EWR,
Godolphin
(1975),
Akram(1983). and variance
The use of such models has involved matrix W which has elements that are
practitioners
specifying
a system
-28proportional
discounting. ( NDBM's) introduced components. time
to functions of a discount factor.

This chapter
They are thus indirect applications of

Discount Bayesian Models
is concerned with a class of Normal the system variance matrix possibly different discount
which eliminates which associates
W. Instead a discount matrix is, factors with different model _1 at-
Such a discount precision
factor converts the component's Pi =43P, _i for time t.
posterior
precision P,
t-1 to a prior
The term
precision
is used in its
Bayesian sense but may also be thought The use of -the discount variance W, ' since ambiguity matrix
of as a Fisherian measure of information. overcomes the major the discount matrix disadvantages is invariant of the system ( up to a linear
is removed.
transformation')
to the scale on which both the independent
and dependent variables are and ease in
the methods 'and are easily applied. measured ,: of operation -it' is 'anticipated dynamic behaviour, performed regression, forecasting quality control that
Because of conceptual simplicity will
the NDBM1 approach
find many applications
time series analysis , the detection of changes in process , modelling where the observations are
and in general statistical
sequentially
or c rdered according
to some index.
In this chapter Normal Weighted Bayesian Models (NWBM) are introduced in 4.2 Particular is DLM's is their emphasis given to a subclass pointed out. relations with' and is discussed NDBM's'and mdels'called the the retaining of model coherency possibility of in 4.3. ' Other practically important like Modified NDBM's and the subclasses of models
in These discussed in NDBM's 4.4. the the Extended extend capability of models the are behaviour in in dealing the and process with cases pof sudden changes correlated
NDBM's in finally 4.5 Some the are given and a short on comments observations. in is 4.6. chapter the given summary of
4.2. NORMAL
WEIGHTED
BAYESIAN
MODELS
:,
Consider a parameterised
is I0, is 0, }, YY (Y, and vector an observable where process
the
unobservable
vector
of
state
parameters
containing
certain
process state
interest. of characteristics
For example each component of Y, may represent the sales of
-29-
a product
at time t
with
the corresponding
component
of 04 representing
its level of
demand at that time. distributions. operationally made for 0,. probability distribution
Each of the YY and 0, are random, vectors and so, have probability the discount principle discussed in Chapter no distributional by introducing for 2 provides assumption an initial updating an is
Although
simple and efficient This drawback for
method
of estimation, simply using
can be overcome Yo and Oo and
joint the
distribution
Bayes theorem
of the parameters as new data arrives.
This requires a model which describes
the way that the parameters evolve with transition DLL[. In order to introduce the discount stage. The relevant
time and the amount of precision lost at each are stated in (3.1) and (3.2) for the
model assumptions
principle
into the Bayesian approach,
the class of
Normal Weighted Bayesian Models ( NWBMi ) is defined
DEFINITION For a parameterised process { Y1!04} ,t>0, where the observation probability distribution is a NWBM is given by a quadruple
(Y Io1) ril - ,viFtog;

distribution the parameter posterior at time t-1 and given (ae-1IDe-i)
the prior parameter distribution
(Ot IDe-i) -
(4.1)
N[m
C1_1] 9 _1;
(4.2)
at time t, is
N[Gg mt-t; Rt) ;R =H CC-, H't (4.3)
Note that, R. is a variance matrix provided that c,

definition. by matrices
_,
is. C,
and V are variance _1
THEOREM
4.1. the one step ahead forecasting distribution and the updating
For any NWBM,
-30distributions by: are given parameter

(Y IDo-1) (4.4)
(01 IDe) where D, G,, D,
N; me+L'ei
(4.5)
,,
_1}
and
IiPgGirat_i
Y=FjRjP'tVt
(4.6)
'
mt=Cimi_t-Ater Ct=(I-AeFt)Re
, ec-7e-re
(4.7)
A. =RtF'tke-1=CeF'tvt-'. .
(4.8)
PROOF
The proof is standered in normal Bayesian theory. obtained from the identity f(Ye1Oe)f(OeIDe-t)-I(YjJoe-jt(ocI functions the where variables. Similarly, f (. ) 's are density functions terms However, the results can be
Yt, De-j)
of the appropriate random
by rearranging
the quadratic
)+(0j-Cjmt-t)'Rt-1(09-C&=$-t)
as ( , e_1e)'ie-1(Yc-1e)+(0, where ,j, NWBM's for W1 which = kip -m1), C`-i(0e-w1)
defined in Cs the theorem. and are as m, linear and non-linear are normal. distribution models If
form an extensive class of models containing 'the prior and posterior
distributions definite, to provide
H1C1_1H', -C1C, _1G'j -
is nonnegative
the conditional
(01 [ 0j_1) distributions. hand,
N[C101_1; W] may be introduced Thus, under this condition
forecast lead time coherent On the other normal DLM
NWBM , any it is
is a normal DLM. that any
setting
i, C'& W1)"C1 H1 = (G C1_1 +
evident
3Z
these different model components. Before introducing practically more efficient and parsimonious NDBM settings, it is interesting to point out some relations with other well known models. THEOREM 4.2.
{F. C. V, B},. with non singular Ci for all t, and initial setting
Given a NDBM
(m0; C0),
If Bt =l,
function NDBM1 forecast V, VV the and = setting (rno; Q0= Co-' V) is identical to that
is identical
to that of of
EV4 R {P,, Ci, j with initial iiThe NDBM initial iiiforecast
function
of DWE {F, C. 8;,
with
(i) in with setting as the joint posterior
V, = V. parameter distributions settings. are identical to those of
If B4 =1,
initial the DLM {F, C, V, O},, same with the
PROOF
The proof is by induction. From the assumptions in all the cases, since , Now, assuming that
is for true t=0. the theorem G, taken, Fi are and common
it is for that time t, true is t-1 time true show we the theorem at , iGiven from the NDBM results, for time t, we have
M, =GOwt_ltaitt ,
Ce-i_8e-i+F,
eV
spe
Re 1-RCe lCe iCe 1
This gives
Ce 1V=PG'e 1qt-iCc i+F', F, =Q, for EWR.
Hence,
at=Ci 1F'jV=QtF't
as for the EWR.
k.
32'-
these different model components. Before introducing practically more efficient and parsimonious NDBM settings, it is
interesting to point out some relations with other well known models. THEOREM
Given
( 0;
4.2.
a NDBM {F. C, V, B},, with non singular CC for all t. and initial setting
CO),
i-
If Bt=I,
NDBM function forecast is identical to that of of Vi V, the and =
EWR {F, C,R} with initial setting (mo; Q0=C0-i V)

ii, The NDBM initial forecast function is identical to that of DWE {F, C. B, with
setting as in (i) with the joint posterior
Vi = V. parameter distributions settings. are identical to those of
If Bt, =I,
the DLL1 {F, G, V, O},, with the same initial
PROOF The proof is by induction. From the assumptions since in all the cases, , Now, assuming that
is for true C taken, the theorem t=0. FF are and commo,
the theorem is true at time t-1 , we show that it is true for time t, iGiven from the NDBM results, for time t, we have
1Rt=is1A_1<<C
Ct-1=R6_1+A
1Fg gV
Ri
1=G'1
1Ce 1Ge
This gives
c lV=pGe 1Qe_1c 1+P'iF=Qi for EWR.
Hence,
1F't V= Q4F'j C, at = a+ for the EWR.
-33iiiii`=B''C'c-`C The proof is similar to part one, with Rj The two models are identical, since, for the tCg `B NDBM with B =!,
Bt=CA_iC't=CeCt_tC'i-0
for the DLM1 with
WW=O.
Furthermore it can be seen from the Theorem in Section 3.3, that the limiting
forecast distribution DLM of a constant NDBM {F. C, V, B}. is identical to that of a constant definite. However, as
{F, C, V, WI with
W=(HCH'-CCC')V
being nonnegative
with the DWE specifications is given emphasis with
for the time series models defined in Section 2.3.2, particular structured NDBL1's for which I. C, =diag{C1, C,,... C, }i is the identity matrix of
to canonically
C, of full rank n, . Bt =diag{I311t. {3I_,... I3, I, } where 0<0; and n, ts:1
dimension
4.3.2.
Forecasting
Updating and
parameter
NDBM's with
distribution at time t-1 as in (4.2), the one step at time t,
Given the posterior distribution forecast ahead
and the updating with R, =Bl
posterior
parameter
distributions
(4.5), (4.4) by and are given
" C, Cj_1 C'1B'-".
The k-steps ahead forecast function FF(k), k>0,
is given by
Hc4.1 1M
Ft(k)=E{Yt+kIDei-Pe+k
4.3.3.
Coherency
For the NWBM
joint forecast distributions be derived NDBM's coherent may and
DLM using a corresponding

Yt{k='$+AOt+k+lot+k+
et+k-, V'o; t-k
at +k -GC+k0ttk -
-1
+wt,
k+
t, k
1,0; --N[O+Wt, k' .'
Defining
,
Re, i-g&+iCcH'+1
-34-
Wt k ={w; 1} and Rt, k
is derived from the recursive relationships WW, k * wo =(1-APj)

Rt. -At H,
(4.9)
k+l=Ht+k,.
l(I
kFt+k)Rt.
t+k+1
At,
pit+ki =Rt. k k
8t_k
Vt-k - FC+k Rt, kF't_k)-1

=B -Ct_k
For univariate nt does of W&., t
series,
Ve-k
Fe-k Re
kFg.
is a scalar quantity.
Hence, the calculation
require matrix inversions.
THEOREM
4.3.
Given no missing observations, the NDBM {F, C, V, B} is coherent. PROOF

Let (0, ID, ) Define
$=CC C'+Wi+,=B-"CC C'a'-''

Q= CRC'+ WW+2=B-4CRC'B'-" I'=FRF'+Y Z=FRF'+Y.
is on the basis that is missing. In the
,
.2
Note that the calculation above relations,
of W,
jr,,,
for convenience. some subscripts are removed
Using the formal DLM relations,

ye+x y6+l ( at oe+l 09+2 IDS) " - N( FG 'm FGmt m, Gm,
2 G
Z FGRF' Y
FG2C, FGC& Ct
FGR FR GC1 R
FQ FRG' G 2C6 GR
Q
-35It can be seen that the posterior distribution Cmt+RF'1' But (oo+2I, Now Q-GRF'Y be incoherent. inequality will De) e+1, V C_1; Q-GRF'Y-PRC" showing that by the discount since IDe+1) (O, is of +i C +i=R-RF'C'-1FR!
1(fe+i-FGne);
'FRG'*B-'CC, _1C'B'-' However, not occur for the
principle Ct k= Ci_,
would 1, this the
6Vt,, 's defined
(4.9),
and the above procedure
be can extended
to establish
NDBM for DLMt Y, the the of and equivalence _k. The above theorem ensures the testability the DLM. However, given a starting distributions initial N VBMI's the of on the same lines with with F, C, V and B, particular
prior
[mo; Co; together
successive predictive NWBM's.
can be used to generate data sets following
The noted difference is that the DLM uses the set of equations (3.1) and (3.2) go is known while, the NWBM starts B. with [ma; Co, and the transition
assuming that uncertainty
is acknowledged
by the discount matrix
4.3.4.
Sequential
analysis of designed experiments

are often performed
:
and are subject to slow
Statistical movements variation. well
experiments
sequentially
as well as sharp changes, In such cases, static
perhaps due to some uncontrollable ) models are hardly justified for a sequential by Harrison.
sources of and may analysis of the
( non sequential DLM's
lead to false conclusions. characteristic
have been adopted of Nylon Polymer
quality problems
in the production
However,
already discussed regarding the W covariance these problems. For example, a
matrix
often arise. The NDBM's randomised sequential
overcome experiment
22 completely
by be can represented
Y; =A1. +Q,, èrj1 1i e i, j =1,2
.,
block 0,, the Qjjj the represents and represents effect where collective & YE9, time t. at =0 any stage with ji ii
treatments
effect
-36Now, contrast in order
to partition
the variation by partitioning
among the treatments, O,,, to 9s 1,93
an orthogonal
need to. be constructed
A.,, where Al. and j e 1 and 2 respectively
represents the block effect , AZ i and A,,, is the interaction ieffect.
and A3.& as the effects of treatment Usually this is performed as follows: treatments and their interaction
the effect in presence of both 'treatments 'and 2+ 1
sum of the
random error.
"C =8t. t -A2t -eat -e4I -Elt
ii-
main effect of treatment without treatment 1+
1x
sum of the terms with treatment
I- sum of the terms
random error.
Y121 -A16 +029 -031 -010+620
iii-
similarly,
for the main effect of treatment 2, we have

Y2l& -Q1 -029 +039 -41 +3j
The orthogonality condition suggests that

Y22L= +E4e +040 -029 -03t 1*
--'In
information the collecting above
with
Y'I =[Y11, YI2, Y21, Y22J,, an appropriate
NDBM may be {F, I, V, B} where

1111
F1 .1
11 -1 11 -1 1 -1
(4.10)
C is taken as the identity matrix to indicate a steady parameter evolution with time. 4.4. OTHER IMPORTANT SPECIAL CASES:
4.4.1.
The Modified
NDBM and changes in different characteristics of processes
In modelling discontinuities
it is intervention often advisable to operate a system that using or multiprocess models
-37-
protects arising
the information from
on unchanged that
components
against
unwanted This
interaction
effects and the of the
those components
have been disrupted.
is possible
discontinuities occurrence of
in the data need not require complete respecification such as Jaynes (1983).
by been has statisticians suggested often parameters as
DEFINITION Let (Oi_1lDi_1)--N[me_1; Ct_li ,G= diag { C1, C2,.,.,. C, }B= diag
{B1,BZ,.,.,. B, } {Rj j}t ,
be {C,., }, let C_1 be Ri the structures of partitioned and and of , _1 , for i, j = 1,2...... r. A modified NDBM is a NDB.M {F. C. Y, B, such that
(4.11
R; j=C, C;,j C'j,

The occurrence of sudden structural
for i* j
(4.12)
changes in the state of sequential
statistical
it be In is time processes, series may possible to classify the types of processes common. level, Such in into be and/or growth seasonal components. changes changes can change the increasing to by only corresponding components so that, uncertainty modelled components uncertainty is not effected. In DLM's, other
this is performed by increasing
for blocks. For NDBM's, the the the only vector, relevant error w,, state uncertainty of future uncertainty is controlled by the discount matrix. It can be observed from the
definition of NDBM, that the future uncertainty introduced to a particular block will be transmitted to other blocks through their correlation between them. The modified NDBM
is introduced to prevent this. intervention relevant Moreover, a major disturbance factor N, on a particular block may
be signaled with
using a discount
where N is chosen to age the This can be performed linear transformations
history effect of past even within blocks.
to that component this
by N periods. under
Although,
loses invariance
temporarily,
but enables to introduce
uncertainty
into a desired component of that block.
These ideasare usedin the examplesthat are given in Chapter 8 and Migon and Harrison
(1983) have applied modified NDBM's in their models.
-38The above definition That is, can be modified to include more general transition form of G as {G;, } and that of GC, _1G'=E C matrices. as {Eji}
given the partitioned
(4.11) and (4.12) are replaced by R,; =B 'Eis B, -' r and
R, -12
fori'
j.
4.4.2. Extended NDBM's

In, many applications an NDBM provides an adequate model but other applications This is particularly the rase hen C is singular and
may require a more general NWBM.
when high frequencies and some type of stochastic The extended NDBM is defined by the quintuple N[^-,; Ct-tJ , this defines
transfer responses are to be modelled
{F, C, Y, B, W}, where given (0c-, IDe-i) ,
(Y 1Oc)-N(FFOt; VV)
(OeIDc-i)-N[Gi
], R me-i;
where
- C Ri =Bt Often with in regression and design of- experiments a constant variance
It %
the variables are fairly of some ,
stable
be it advisable not may and
to subject their precision to an
exponential independent either static (4.10),
decay.
This may be the case with the example in 4.3.4, if the block effects are so that ) (A1, IDt_, )-Ne5, 95,, is unknown where u2 i; g With the design matrix and
or exchangeable
or subject to a very slow movement.
P defined by
this may be modelled using an extended NDBM with
F1=(FOJ
C=
0 (0,0,0,1)
0I,
4
1=b and W={W; 1} with W1.1=cr2its only non zero element.
-39-
Similarly in modelling correlated observation errors such as those generated by a (1-e1B)(1-e2B)St process vi = stationary second order autoregressive
1bt {(1,0.01; 0 10 b2 Fi+2
type models of ,
00C be preferred. may estimated
I;
0;
diag{{1,1. B}, W}
The only non zero element of W is 'W'22=V
and this can be easily models. For a particular errors see
(1971) for Zellner See line autoregressive also on .
EWR Generalised EWR to which considers stationary extension of Harrison and Akram (1983).
observation
-40-
4.5.. SUMMARY:
The'discount concept is introduced models. into Bayesian modelling and forecasting via a
NWBM general class of NDBM's attention , Modified
Some special important and the Extended Neat updating
and parsimonious are introduced. for the location
subclasses of Particular vector and
NDBM's
NDBM's formulas
is given to the NDBM's.
derived together matrices are scale _.
with their forecast distributions.
41 -
CHAPTER
FIVE
ON-LINE
VARIANCE
LEARNING
5.1. INTRODUCTION:
One of representation that the consequences of the conceptual differences between the Bayesian is
Markovian time as a series of a allows for a genuine
process and its non Bayesian formulation structure for the variance
the former
dynamic
V, of the of
observation Kalman
known be is to that assumed constant a often error v, and ARLMA techniques. of it VV is important is crucial models. for
in the formulations
Filtering
On-line Bayesian likelihoods intuitive
estimation but
a successful practical modelling
application governs
of the
forecasting
in multiprocess
since it
different the of
Experience has shown that
practitioners
have little
feeling for the size of this variance. Y.
It is often confused with the one step ahead
forecast variance
However for single constant NDBM cases, in the period of stability, w
Y' ( ) ii Theorem 6.4. is discount )lim (; is ; the V=11 /X; part of where the relationship ,
i=1
e-Z
factor associated with the iih parameter 0, with associated eigenvalue k,. If required the be V derive to may acknowledged and used marginal extra uncertainty associated with forecast distributions Bayesian manner. A number of approaches based on the idea of De Groot(1970) have been adopted for
estimating univariate the observation variance V. Smith (1977) has discussed the problem for
This is easily done since (0, .
V) is jointly estimated in a neat
steady state models.
The case of heavy tailed error distributions introduced
is given by
West (1982). The Bayesian Harrison (1983 c). Other
procedure non
in 5.2 can also be seen in Ameen and introduced by Harrison and
Bayesian
elaborations
Stevens(1975),
Harrison
and Pearce(1972) and Cantarelis
and Johnston(1983)
are briefly
-42-
reviewed and a generalisation of the latter is given in 5.3. A new procedure called the power law is given in 5.4 , Ameen and Harrison (1983 b). 5.2. THE BAYESIAN APPROACH:
It is assumed that the variance VV= Ibn -1 where d is unknown.

The observation distribution is
The posterior state distributions are

(5.2)
ID, -, with this Gamma pdf having a kernel
ie
)- r(
-t
rte
-t
(5.3)
22
exp{((*le_1/2)-1)log4b,
-i-(a,
-i/2)ee-t}
(5.4)
Defining the prior pdf's as

(0e (D eieee _ I )-N(Crs -i 1) i ehe _g (5.5
(,0t Da-ifo)"r where Ii represents and the information
(ae-i)2; ile-i)2) . in the posterior to prior
(5.6) transition
required
R, =Ht C_ H', posterior
are feasible functions respectively.
( such that (5.6) is well defined ) of the
parameters
The functions and X can play an important
role in both theory and applications.
A special choice is introduced later in Section 5.4. Other forms are dealt with in Ameen (1983 b). These are specific functions either defined through posterior entropies or like advertising awareness. However, it follows
accommodate some external information
from (5.1)-(5.6) and using Bayes theorem, that the recurrence relationships for m, C, i, k, (4.6)-(4.8) and are exactly as with the setting V=1 , and
-43t
(YeDe-tlie)"Yfet
41
(Ot ID,
-i,
bt )`
-1, 'wt+Ct6t -' .
ID,)-r(a,; 2; (4, '2) -n,

where distribution and ac =(ac_1)+cc' Yc ec As usual cc = i -c. The joint
( Yc+1,6&+1ID1 ) is readily case
obtained
and ( Yc_LIDc ) is derived by integrating
In the univariate 1 out .
Ye+i- ye+i
(1T-
to
t
the student t -distribution This
with T16degrees of freedom . elegant and is properly Bayesian. It is not easy to structure of V,
method is operationally
retain the elegancy when generalising is unknown
to many cases where the correlation Consequently practitioners
or where V1 is not a constant.
may prefer the
robust variance estimation
method discussed in 5.4
SHORT REVIEW: METHODS BAYESIAN NON 5.3. -A

In addition to the method mentioned in 5.2 ,a number of approaches Harrison have been
for estimating adopted fitting six point curve point by
the observation
variance
V.
and Pearce (1972) used a
data point. around each that of
Denoting the value of the curve at that VV x yt 20, by they chose the and
yt and assuming likelihood
J Vi N(0; where y1- yi P. Another method
maximum
estimate
proposed
Harrison
Stevens(1975)
assumes that
PSQ % L,
where L1,SS are the level and seasonality constant with C( say ) is
known Q the P, while constants proportionality are components and obtained from the median of a pre specified N ordered constants
corresponding
probabilities theoretically
data line information. the using on updated which are do not generalise easily. and profound Another
These methods are not method and
on-line estimation
based on the limiting
steady the state DLM's is suggested by Cantarelis properties of
Johnston (1983). This is described as follows
-44-
1-1 tt
) Ve-t;
1-a e
where a= lima, , a, being the adaptive coefficient.

A direct generalisation
" x r. e =t ,
of this method can be given as

t-1 1Z Ve_i (1-Je"e)ee
or more generally
1'
5.4.
THE POWER
LAW:
efficient and robust procedure can be described using the
A more general but simple, relationship Vj=(I-PFAj)Yj
For, a univariate
time series,
2 define d =(1- f1a, )e, . In parallel with the Bayesian
approach described in Section 5.2, the estimate of VV may be given by vt =Xe/tie
where Xj =Xt-i+de (5.7)
lit -*1e-1+1 Initially (Xo, -vla) may be chosen such that
(5.8)
Vo=X0/110 is a point estimate for Vo and rho is In
the accuracy expressed in terms of degrees of freedom or of equivalent observations. , the analysis of 5.2 it is seen that Vi IDi (4 J. =1/E Hence, if required,
forecasts can be
produced as in 5.2 using a student t-distribution may be wise to protect distributions, one simple O'Hagan effective
with -9, degrees of freedom. In practice it Outlierdisturbances. and major prone Using mixture d, distributions. However
the estimate from outliers (1979) can be introduced method
practical
is to define [4,6).
(1- f, ajmin(e,
KY } where in ,
general the constant
K belongs to the interval
In those cases in which it is
-45
suspected that
VV varies slowly over time a discount factor may be introduced by
(5.8) by (5.7) and replacing

Xi=13X_1+d2 9, and
'I1 -13l1e_1`1
This procedure is easily applied and experience with both pure time series and regression type models is encouraging. 0.95 <<1. choose then it is recommended dimension empirical of the state However, because of the skew distribution of di it is wise to
Further that
if the initial learning
prior of the parameter vector 0 is vague commences at time n+1 where n is the . with positive observations. an
variance .
vector
2b
In stock
control
, law V variance =ay,
with b=0.75
is often used. Stevens(1974).
An estimate
;, of a is then derived as
a, =ZZ/fig, where
Zt=Ze-t+desfYet .s
, It-07le-i+l V, V Future estimates of are; =aI{E(Yj+, +,, k ID, ))1'6
A more general procedure for accommodating stochastic scale parameters is as follows. See Ameen ( 1983 c). Let Oj be a scale parameter with posterior probability density function (pdf) at time t-1 given by (5.3) and prior pdf for time t, be given by (5.6). Moreover, let s& , me-1 -1) ID, (0t-, and f (0
I0, (YY ) be the f k the random variables with pdf's of modes and ,/
De-1) respectively. e
Define the link between YY, 0, and c, as follows

iYeOe) x 4tf, 1-6 (Ytlxt)f (Yilot) (5.9)
f (Qek1 De-t) ,
x 4i fl-m (ht IDi-i)/(oe
1Dt-i)
(5.10)
46 Combining ( 5.6) with' ( 5.9 (jrc, for is (0c, Dc_1) pdf Oc

1i )2 -ib (n
( ), 5.10 the approximate kernal of the posterior and
e
n)2-. b 4
_)2
1-b
! (itIse)/(AcDc-i)]
(1(re1Oc)/(oejDe-i)
. -.
). 2
'i/(fczc)/(keI
Di-t)I/(meI IDt_, )
De)] '/
1-. b
'(mcIDc)f
(01IDe)
Ain
),z
exp{-[ .
/(fase)/(k
(a 1)t2ln .. /( rac ID) t
1.6t/2}/
I-, b'(rnc
ti jDc) , Dc )/ IO,
where Mc is the posterior pdf
I Dt. Oc mode of
In comparison
with the approximate
posterior
s_
/(Oc', e`Dc)..
_'d ''
, 1' 2
'
r'}'
(OIDe)/
'(m
IDe)
we have
Ize)f (De-i)/I ID&)} (ye (ke (m& a -W(ae-t)+21n{I The formulas (5.9) and (5.10) are exact for normal random vectors and are in contrast with those of (5.1) and (5.5). However, the formulation above goes well beyond the exponential family of distributions and has the key for introducing a constructive dynamic evolution of location and scale parameters in generalised dynamic models.
., e. _ar
cr
-47-
5.5. SUMMARY:
A proper Bayesian on line estimation described in 5.2 .A procedure for the observation variance is
Bayesian techniques is given in 5.3. the non existing short review of
The power law is described in 5.4 Outlines for a general model , for which stochastic . is be given. accommodated, can scale parameters
; i'i ".
xv
-48
CHAPTER
SIX
LIMITING
6.1. INTRODUCTION:
There variance has been a continued interest
RESULTS
in deriving
limiting
values for the parameter DLM1's but
CC and the adaptive in solving Riccati
vector
a, associated with has restricted
observable constant progress.
the difficulty NDBM's the
equations
However for constant
{j, C, V, B} these values can be obtained directly. constant DL11's {j, C, V, W} NDBM's which have
Hence the results also apply to limiting, forecast distributions
set of
equivalent convergence parsimony,
to those of constant
These results are relevant . simplicity
to practice since and parametric DLM's which
is often fast and, in order to achieve conceptual previous efforts have been devoted to determining
constant
have limiting
forecast functions equivalent to those obtained by the application of EWR.
Harrison and Akram (1983) and Roberts and Harrison (1984). Similar models and the method of transforming from one similar model to another
are defined. Limiting for the state covariance matrix results CC and the adaptive vector. matrix C' and
for first models similar a, are stated then for general constant NDBM's.
to a model with a diagonal transition The limiting relationship
between the observations This leads to the
and the one step ahead prediction establishment of a relationship
errors is obtained
for NDBM's.
between the ARIMA
models and the constant NDBM's.
8.2. SIMILAR
MODELS
AND REPARAMETRISATION
objectives of theoretical
:
is to obtain unified
One of the desirable results that may
developments
be used in different representations,
fields of applications.
By looking
at the most In
meaningful NDBM's
economical
this eases the understanding
of practitioners.
this leads to canonical representations
of categorised models. their similarity
The properties of with the canonical within the class of
other more complicated models. Harrison
be studied through can models
discussed (1983) have Akram reparametrisations and
-49
DLM's.
DEFINITION
F FC FC2 A constant NWBM {F, C. V. H} is called observable if is of full rank.
FC"The observability
parameter
condition for NWBM's
is to ensure the estimability
of state
from finite time a number of observations. stage vectors at any
DEFINITION Two NWBM's M; ={F;, C;, V, H, } i=1,2

non singular transformation L such that
are said to be similar if there exists a
{F1L-1, LC1L-1, V, LII1L-1}={F3,
C2, V, H2}
The importance of finding similar models arise in practice since, real life problems
are rather complicated benefits, in this their `primary' physically form statistical meaningful formulation. relationships Apart among from the
computational primary
provides
variables
and those of the canonical
like growth,
level and seasonality
components.
THEOREM
6.1.
L= Ts' T
If Mi and M2 are two observable similar NWBM's then 1 where

Tj
PROOF
Since Ml and M2 are similar, it follows that F2=F1L and C2k=LCI 1, L-
-50k= 1,2,.,.,. {n minas 1}. This gives

F2 F2 C2 F1 F1 at L-1
F`^n-I
FiG'
i. e:
T2= TIC I
From observability
it follows that
T, and TZ are invertible L=T2-'T1
This gives .
The above result introduces the first functions eigenvalues specifically
similar
reparametrisations. #=L9
That
is if 0 is the state vector for the model M2. As forecast the
model then the reparametrisation are characterised
produces of P and
by the specifications
G and in particular canonical
of C plays an important useful in demonstrating
role in that
specification,
forms are
these ideas.
THEOREM
6.2.
3'/X1 <1; i=1,2 If the n. ....
be the eigenvalues of G, and 0< Let X1, X. X21 ... is NDBM {/, V, G, p1} observable, then constant lim{C, R, i', &}, ={C, R, Y, s} tx being C, R non singular. and uniquely exists, with
PROOF
From the NDBM results,
-V
Q, =C, -1Q, _1C-1+/,!
, tip
51
i-1
-e C, QoC-e QQ -' -o
Hence, using the assumptions 0< 3'/X <
pi C. -: f. JC-;
1, IimQ, c-= exist.
(s. 1)
=Q
To show that Q is positive definite, converges to zero, and

YY
consider (6.1) as i--x, the first term
Q= 8-0
" G. -' j, jG-i G/,.
-r3$C. 8-0
- u T. T., G-n
where T=j',
f',..,.. "-
From observability, To show that Q
T=G-(n-t)! is unique,
G '"-1f
RC"-Z /""""R "-1/'1 that there exists
is non singular. 8 such that
assume
s=C, -Esc-1+I, /.
Therefore,
s-Q=,
Successive applications have Z-Q Since, a, = R, j' unique. Ya-=C, Moreover, =O. Q5=C1-1V, V-', the limiting
-1(s-Q)c-,
C'-k(s-Q)C-k, and as k-x,
(6.2)
we
(6.2) gives f-Q=k of
R5=-'CCtC', forms for CC, R5, Yt : and a,
and
all exist and
Q _C, -iQC-i+
f, f =C-i f, y-i
,R
=R-1000'
(6.3)
Y= fRJ'+V
, s=Rf'Y
'=Cj'V-1
(6.4)
THEOREM
Let,
6.3.
n, =n, i=1 C=diag{C1, C2,... C, } and B=diag{(31I1, p2I2,... 43,1, } with
0<i3; <min{1, IX;, z lJ;, z 1;, Is}where 11 , Z1 ,... ,
are the eigenvalues of G;
dimension with n;.
-52-
If the constant NDBM {f, G, V, B} is observable, then lim{C, R, Y, a}t ={C, R, Y, a}

Moreover, C and R are non singular.
uniquely exists.
PROOF
The proof is similar to that of Theorem (6.2), knowing that the observability
of the model gives the observability
in each model component block.
In order to have some ensight into the sensitivity of these models, consider a DLMI
Yt = At + v, . Vt ^- NO; V}
N_0; W]
A =Ai_1 T wi
Given the prior (60IDa) -- N[mo; C0], it can be seen that the posterior state variance Cg and the -adaptive and C =A V= ((W2+4 WV)".- W)/2. Now, both A,, converge to C and A respectively coefficient
discount NDBM takethe the same prior settings and with consider a
factor as 1-((1+4 V/ W)`*- 1) W/(2 V). This guaranties that the NDBM and the DLM both have the same limiting variance C, _1 distribution. However, given any common posterior
DLM for is (W) A, the the adaptive coefficient at time t-1,

Aj(W)=1/(1+V1(CC_1+W))
is NDBM form the under while the alternative

Ae(R)=1/(1- V/CC_1)
NDBM-to the faster DLM the for than if t, CC the Therefore, >C converges all faster limit NDBM the for to t, then if Cj the limit. ` -However, <C converges all than the DLM. This generalises to higher -dimensions.
53 -
8.3. A COMMON
CANONICAL
REPRESENTATION
One of the most common and yet simple canonical forms for observable models have distinct C which with system matrices C=dia9{X1, X2,.,.,.X. } and I=11.1,1,..... following theorem holds eigenvalues , Kl, Xz,.,.,.k. that is
11. For such an observable NDBM the
THEOREM 6.4.
Let if , C, V, B} be a constant NDBM. where with 0<; /= < 'I'l I],
C=diag{X1, X2,... X. } and B=diag{I31, (3z......i3j i=1,2.... Xi u, =pi', , n all distinct. Then
" i) Jim&, =a=jal,
&"'o
1-u,
ui
a2,... a"j',
with
a, =(1-u.
)ll
f;
u2 ,
1-uI
ii)
limYt=Y=V/fl
u?
" limC1=C={cij},
&'s
1-uku, f
Uk Ui fi
"
1-u1uj
Up, uj
c; j=V(1-uiuj)fl
kj
iv)
lim WW =W= B-GCC'
B'-`'-
CCG' _ {w;1},
ell u; u1
PROOF
From Theorem 6.2 or 6.3, lim{C, R, Y, a}j={C,
: -z
R, Y, a} all exist, unique,
Moreover, 0, R non singular. are and
}=C=(I-af)R {c;;
Q_B''C'-'QC'-'B"+! '! , Q=C-'V={vr; }
(6.5)
(6.6)
-54-
a=a(n)=Q-i , where a'(n)=(al(n), From (6.6),

1 1- u' u Now, multiplying (6.7) by Q. gives
fI
(6.7)
az(n),... a. (n)J.
(6.8)
Qa(n)=I For n=l, (6.8) and (6.9) gives al(t)= For n2t2, i=1,2.... n ;' from (6.9), we have
(6.9
tO) t6.
i=1,2,.,.,.
(6.11)
Therefor,
w-1 awn)(i' 9. ai (n)y4..
substituting
for a. (n) in the firs n-1 equations of (6.11),

*-1 Qww 4ij ' TII 9, w
ah(n)=1 j-1 9ww'4w
This gives
'Si 1-s-t u
yjjaj(n)=1 1-u1 u.
(6.12)
Since a (n) is unique and (6.11) is true for all n,

1-u1u
ah(n)=-aj(n-1); uI 1 =1,2, ",.,. a-1
1- us
-55-
(i). (6.10) This proves together with ,
ii)
From (6.4) V=Y-JRJ'
=(1-js)Y
(1'=t
ajY
nn
but from ( i) it follows that
V a, = I- [-[ u?
iii)
From (6.5) we have esj =(e; j /uj u, )-a; a Y =ujuja1e1Y/(1-ujuj)
(ii) (i) from follows and the result iv) Easily derived from the definition of W
COROLLARY
6.4.1.
for all k then (i) reduces to the EWR result of Dobbie (1963).
If k =0
The theorem is of practical interest mainly for periodic models with distinct For inside the lying unit circle. a real observation series a or on complex eigenvalues limiting be the C adopted and corresponding values would similar model with real
derived. are easily For example, if G=,, cosw -sinw sines cosw an alternative NDBM can be
G= with considered
`V 0e
0i
A more general procedure for finding the adaptive coefficientscan be deduced

following the using
:.
THEOREM 8.5.
-56-
Let if A= CF' Y-'.
G, Y, H} be a constant , Then the , two
NWBM
with limC, = C, e-= (I -AF
non singular, )H and H-'
and have
transformations
identical characteristic If H=l,
polynomials. and C-`, and (I-AF)C and PC-' have
then (3-(I-AF)C polynomials.
identical characteristic
PROOF:
Since limC1= 0 is non singular. from the N VBM properties, we have
C=([-AP)R
R=UCH'
This gives
CH'-10-I= (I-AF)H (6.13)
The result follows from (6.13). The above results can be used to calculate the limiting adaptive coefficients for its NDBM if state covariance matrix converges to a non singular any observable limit. In particular, for {f, G, V, 3I} NDBM's such that 0<
NDBM's
2 /x; <1,
where
in following C the the as of are eigenvalues in used practice
that are commonly
apart from the one given in Theorem 6.4.
COROLLARY
0.5.1.
has Jordan form and 0<< J(X) where, a X2.
If /=[1,0,0,... 0] and G=J(X) Then

AG Xa, +a1+i= s where p= A s'=[al,
(n)p
i=1,2,... n ,
a2,... a,,; and a. +1=0.
-57-
PROOF
It is easily seen that the above NDBM is observable. Since 0<0<
(6.2), Theorem using lim{C, a}, ={C, a} both uniquely exist. Moreover
X2,
C is non
singular.
From Theorem (6.5) det(G H=-lG for all r.
-i1)det((I-aj)G-il)
This gives
tx-r1 -r2 1000 a1..
det{
-r. 000a
}=(/a-=
where, a=x-z
and r; =Xa; +a; +1, a. +1=0, i=1,2,... n. Hence,

fite i-0 (P-Q)w
(6.14)
The result follows from the comparison of the coefficients of each power of a in this equation. COROLLARY 6.5.2.
matrix with entries 1 and 0<p
IF f =[1,0,0,.,. 0] and a is an upper triangular
<1,
then
i=1,2,.,.,. i-: n.
ai=
PROOF Following the steps as in the proof of Corollary 6.5.1, the alternative form of (6.14) in the variable x is
58.
(1-z)*
-a1(1-z)*-'+a3S(1-z)-2t...
+(-1)"a.
z*-1"((3-z)'
(6.15)
Writting
this in powers of (1 -x)
and collecting the terms in the coefficient of
(1-z)', from both sides of (6.15), gives

itk
k k-0
-1
)actk=
()l-l)
_n
i=1,2....
a.
The values of ai can be found successively from the above equation.
COROLLARY
In Corollary
6.5.3.
6.5.2, if j is replaced by /=1.1.1......
=1. ......
1. then
ai_ i
F'
PROOF
Similar calculations In a. )+ (-1)kzk(1-z)w-kQk-(-z)e give the alternative form of (6.15) as
The result follows from comparison with the terms of

(13-zY= 3(1-z)-z(1-)l*-
-lk k,
k-0
ik
COROLLARY
6.5.4.
010, Jwith 0<R<1, then a=1+13' and a; =0
If I= [1,0,0......0] C= , ; otherwise. PROOF
Similar calculations shows that the alternative form of (6.15) is

z"+a2Z"-1+... +a,, z+a, -1-zA+0 0
The result follows.
59
Now,
( distinct X. that X1, X2,.,.,. not necessary given NDBM, the restrictions Denoting
) are the eigenvalues of C
for a constant proper
on B ensures the existence of limC, =C as a e-s the eigenvalues of CR-`C or equivalently
covariance
matrix.
(I -a f) C by p, ;i=1,2,.,.,.
n we have the following
.' THEOREM
6.6.
any constant observable NWBM {f . C. V, H . for which limCC =C e-=
Given
is positive definite,
nn
B)ye-
B)e,. =0
backward B is the where

error.
shift operator
and ei is the one step ahead forecast
PROOF.
Since limCC =0 is positive definite, lim{R, a}1={R, a} exists and R is non
singular.
Let p,, i=1,2,.. n be the eigenvalues of I=CR-1C=(I-af )C.
(4.1) A direct application of the Bayes theorem in updating (4.3) using , with univariate observations, gives as t ---,
wt= Gw-1 aee (6.16)
Cm1-11. V 1j'y,
or
(6.17)
mi =xmt-1-+'sye
From (6.16) and the identity
et,+1-
yt+t
-f
Cm%
we have
ee+i-ye+i-fC(I-BC)-ise,
Hence,
-60-
(1+BIC(I-BC)-1a)et+i-yeti The same identity with (6.17) gives, e +jye+j-/C(I-B%)i . yt or
(6.18)
ee+i=(1-BfG(I-B%)-la)Y41
n
(6.19)
Nte
that,
det(I-BC)
and
det(I-B%)
are
fl(1-7,
B)
and
fJ(1-p;
B)
respectively. (6.18) and (6.19) can be rewritten as
PI(B)ee+t-
n fl (1-XiB)yi+t : -1
(6.20)
(1-P. B)re+t-P2(B)yj+1 : -i
(6.21)
Where P1(B) andP3(B)
are polynomials of degree n in B.
From ( 6.20) and (6.21),

n
H(1-X1B)
II (1-P1B); --Pi(B)P2(B)
The result follows using the factorisation This result result obtained
theorem. (1976) and hence the same through special DLM The
includes the EWR result of McKenzie by Godolphin and Harrison
(1975)
formulations. Normality using
These are obtained assumption
for scalar discount matrices.
That is B=I.
can be relaxed since (2.16) and (2.17) can also be obtained unbiased models. linear estimation. Hence the results may be
minimum
variance
Normal beyond the extended
8.4. A GENERAL
Any constant observable NWBM
LIMITING
constant M={f,
THEOREM :
NWBM {/, G, V, B} , is similar to the canonical
G, V, B}
where
writting
B=diag{(31, 2,.,.,. %} ,
-61-
j=
0) and H= [h,, J such that
h. =-h,.:. and 0<,
1=X1/(R1)"=u1
if
i=1,2,...
n.
h,
j=0
otherwise,
', ;
<
j =! a,, with
a,, =ui
for jzi
and
a;j =0 otherwise.
It follows from Theorem
6.3,
that
limC= =C=
Q-i V exists where the precision
Liaponov be to the give rearranged recursion can
equation
H'Q_QH-1+g,
I'I
of Q=(9,, } :.
This allows an easy sequential term by term evaluation

2 Ut 4t1= 2i 412= 9t1
ui -1)
(uiu2-1)
q1,: qt; =ulu;
-i
_1 (u; ul-1)
for
i>2
and
qi-1, ui k 5i, k
Uk
where
Sr, k `qjj
k=9.,
k'
and
i-i
It follows that
i)
C=Q-'V
AA
ii)
s=Q-lj'
a1=1-
flu; : -i
and a. =(-1),
+i(1-al)fl(uu, ;. 1
-I)
Va
Y=
(1-41)
fl =V u?
i
-I
-62-
iv)
W=HCH'-CCC'
B.S. RELATIONS
WITH ARIMA MODELS :
Let Y, be a random time series generated according to an ARLMA model

nn
B) Y,
1< I <1 0 <Ix; <Ip; where -1 ,0 ,i=1.2

E and (12 a, a( .k=0 for all k>0.
is E Eat2 0. n and that at such = a, = ......

The appropriate error e', Box-Jenkins (1970) predictor
replaces a, by the one step ahead prediction
and it is well known that
lime', =a,. Applying the appropriate Dynamic Model ; j. C. v..: to the realised series
(1-piB)et}=O
j-X i=1 i-l
Hence limle, -eI c-z
=0
and with probability
one , the limiting
Box-Jenkins forecast
function is equivalent to that of the Dynamic Model. For an unbalanced ARLMA process
pv fl(1-A1B)yt i-i fl (1-P: B)ae = i-t
Let n=
{p, max
q}.
Then given any e>0 if n= ,
(or p
n=q
) by taking p-q of
(or the p i'8
q-p of the X; 'a) approximately
I Ie, is lim to close <E zero, assumed. - e', es
Thus, in the sense of limiting modelled by constant NWBM's.
forecast functions,
all ARIMA
processes can be
In fact if the limiting
posterior state variance is
taken as the original prior variance then, the forecast functions can be identical to that of ARIMA models all the way through the sequential analysis. However, as
informations NWBM in a sensible way. This the parameter stated earlier, provides simplifies explaining and controlling the process and models behaviour.
63 -
6.6. SUMMARY
This chapter
:
is concerned with the derivation matrices, some well of some interesting limiting
results regarding the posterior parsimonious representations. transfer
covariance using a simple
the adaptive coefficients and the known and simple within canonical similar the The
functions
In particular
transformation
procedure the forecasting
in 6.2. discussed is models adaptive link with 6.5. vector, precision
Limiting
results regarding matrices
variance.
and covariance
are given in 6.3 and 6.4.
Box-Jenkins
ARI. %1A models in terms of forecast functions
is discussed in
rr
CHAPTER
SEVEN
MULTIPROCESS
MODELS
WITH CUSUMS
7.1.
INTRODUCTION
:
data sets are based on the assumptions collected and well behaved. properties that the input in practice, Often the
Many: analyses of statistical data is free from exceptions, it is hard to believe that data contains missing
properly
However,
all these smoothness outliers that
can be guaranteed.
values,
and sudden structural
changes in the process
behaviour. procedures pointed
It is then believed causes model
the occurrence of any of these events in sequential and damages the available prior information as In This various occur
breakdown
out by Jaynes(1983). the principle of
These events call for model revision and amendments. ' Management producing by Exception routine ' is widely applied. by
forecasting, constitutes
mathematical
methods
forecasts
required
decision makers.
These forecasts
are acted upon unless exceptional
circumstances
due either to the anticipation information, (see Harrison
of a major
change arising from the use of reliable market (1967)) Harrison and or, to the occurrence of
and Scott(1965)
some unforeseen change in the pattern and consequently a model breakdown.
of demand which causes unusual forecasting errors A flowchart of the principle is given in Fig. 7.2.
A Management
by Exception
Forecasting
System (Fig. 2)
Regular Data
Routine Mathematical Forecasting Method

Interventioniby Exception Error Control scheme ( e.g.
Market Information
'*vAARKETDEPARTMENT: ', information to provide forecasting system. routine Vet forecasts and issue USER, DEPARTMENTS: e.g. Stock Control, Production planning and purchasing systems. Market planning, budgeting and control
Exception Signals
Forecasts
,.
-66In this chapter, efficient statistical models are introduced to deal automatically with exceptions. Ameen and Harrison (1983 c). Section 2 reviews the historical background and
j$;;;
developments. The backward Cumulative Sum (CUSUM) Statistic is reviewed in Section

3. The Multiprocess model approach of Harrison and Stevens is reviewed in the light of
discounting in Section 4. In Section 5, the ideas from the backward CUSUM and the
multiprocess combined to models of Harrison provide both Stevens together and and with the Modified NDB. %I's are models called
economical
efficient
multiprocess
Multiprocess Models with CUSUM1's. These eliminate many unnecessary computations

involved in the existing multiprocess unchanged structurally models and protect prior information on components
in other components, when changes occur
Ameen(1983 a).
7.2. HISTORICAL
Woodward
BACKGROUND
AND DEVELOPMENTS
:
tests to detect
and Goldsmith
(1964) have employed Backward
CUSUM
in demand. unanticipated_ changes Ewan and Kemp(1960) controlling
The procedure is given by Page (1954), Barnard(1959), Harrison and Davies(1964) used CUSUM's for
and Ewan(1963).
routine forecasts of product
demand and provided simple recursion formulas to -
details in Section More 3. These data on the reviewed are storage problems. reduce CUSUM statistic can be found in Van Dobben De Bruyn (1968) and Bissell(1969). (For Wald(1947)). tests see general sequential Previously having detected a change,, ad hoc intervention procedures were applied. The first routine computer forecasting systems for stock control and production planning, linear ( Holts ) Moving Averages Weighted EWMA Exponential growth and employed limiting forecasting All the methods used model, with or without seasonal components. The behaved data. long history occurrence of well predictors which assume a reasonably that, means change a major of in some respects, the current data does not reflect a well
behaved process and that there is greater uncertainty than usual about the future. Hence the next data points will be very informative in removing much of this increased
the by limiting be be than they allocated would given more weight uncertainty and should
I
-67-
predictor.
For example, consider the use of EW.MA with forecast function

Ft(k)=mi where ,
and
e, =y, -'n,
-t
a, =0.2
This may be written as

mt=0.8mt_t-0.2yt
for demand is the period t. observed where y, Suppose that in the limiting that mi = 100 with department case, the variance variance V(c1) = Var ( f1IDt_1 )= of a CUSUM 125 and signal.
an associated
of 20. As a result by stating
Marketing
intervene to may wish
that their best estimate of the
level is not now 100 but 150 and that their variance associated with this estimate is not 25 but 300. In the past there was no formal based upon the assumptions way of dealing with this. derived of stationarity Classical time
series methods Typically
are inappropriate.
introduce done to was what was 0.2. One procedure put
a change in the adaptive coefficient a which
here is originally
r* 9-i/10 a'+'0.2
if i=1,2,.. 6 if i>6
This approach is not very satisfactory nor does it generalise well in dealing with other kinds of change. The DLM's of Harrison and Stevens(1971,1976) and the NWBMI's introduced in Chapter 4 provide a formal way of combining subjective judgements and data. In the DLM is the adopted example above
Y9=99+v9 ; vi--N[0; 100]
0t 0j_l+w$
w, -N(we+W
-681,, A, represents the underlying market limit (Ai ID, )-N(m,; 20j. The limiting level at time t, W' =5 recurrence relationship IID, and usually w, = 0. In the is mj =m, _t+0.2e, and the
limiting provides
forecast distribution one step ahead the limiting point forecasts.
is (YY;
)-N(mi;
125j. hence the EWMA market forecast the
In the
example 280i. (YY,, Now
(B, ID, )-N(100; 201, the the one step ahead
information distribution
is communicated is not
as we+l-N50;
(Yt_, ID, )-N"100; 1251 but
400!. 1IDj)--N(150;
Immediately .
recursive equation departs from its limit future interventions limiting
and becomes mt+1=m, -50-0.75et., coefficient
Provided to its with
do not occur the adaptive Note that
a, +; returns fairly quickly
value of 0.2.
the same results can be achieved using an NDBM time its value is reduced to ii
discount factor =0.8 where at the intervention adjusting the state prior mean from 100 to 150. related works are those of Kalman
=0.066 with
Other
(1963),
Smith(1979)
and Harrison
and
Akram (1983).
Bayesian forecasting provides a means of dealing with specified types of major deals forecasting forms These that the change modelled with of. are so system changes.,,
them in a prescribed way. the initial implementation of the resulting multiprocess models
is described in Harrison and Stevens (1971,1975,1976). In addition they involve to the limitations drawbacks and
These are reviewed in Section 4. in Chapter 3,
of single DLM's mentioned and
unnecessary computations. Kidney transplants.
Smith
West (1983) have applied the models to steady
these state
models
to monitoring
Restricting
Normal to these non are generalised processes,
models by Souza(1981). Limited
is success
(1983) Smith in Gathercole the by reducing computation and achieved redundant Makov(1983). models according to some pre the specified rules.
by removing efforts attempt see
For another
In general
practice
development
and existence of these
methods
replaced the control chart techniques.
7.3. THE BACKWARD
CUSUM
:
6
-69-
Control
departures
charts provide
simple
and effective tools for detecting

valuable
changes and
control. In fact Further
from specific target
values and are particularly in detecting
in quality
Page(1954) used Cumulative
Sum charts
changes in process level. of these. changes.
they can be used for detecting developments
the amount
and direction
Length Run Average increase to the this topic the sensitivity use and of on Woodward and Goldsmith(1964) and Ewan value, the
Barnard(1959), in found be the signals can of and Kemp(1960). CUSUM statistic Given
T the and as a target observed value as process yt where e, =y, - T. Then choosing
S, is defined for each time t as Ytvisual , S, is Normal inspection with zero mean.
Since (e IDi)-N[O. constants Lo and a,
two positive
can be carried out with
the graph of (t, S, ) and
from hole V-shaped the and placed a piece of cardboard on cut out graph with the a using vertex point. V-mask the of pointed horizontally with a distance (Lo/a)+1 from the leading No change is and Davies The target
The edges of the V-mask
being apart with angle 241 where tan4, =a. , curve remains inside the V-mask. for monitoring forecasts of-product Harrison demand.
CUSUM long the as signaled as (1964 ) developed the method
is the one step ahead point value forecast errors. ahead economical algorithm Define
forecasts so that the e, series is that of the one step simple and
In order to reduce computer storage, a conventionally was employed.
ae+1-min{Lo, at}+a-ce+i/Ye+,
",
and
(7.1)
min(L
}+ate
kt+l 4
Yt+1 forecast A is is if the ahead variance. step change signaled one and only if where
Initially 0. 4g} < min{a,, lines. guide used as THEOREM 7.1. do=dO=Lo. In choosing Lo and a, the following facts may be
Given that the V-mask has not signaled a change at time t, for time t+l,
-70-
i)
x
11.
A change will be signaled, if

(ec+i/k4+t I>Lo+a
(7.3)
.3
11)
Change will not be signaled if ,

'<a
(7.4)
PROOF i) From (7.1) ,

ac+i=min{La+a-e, l/Yt+1'; a -a-ee+i' Ye-i }
Hence " given'(7.3), ai-
< 0.
ii)
^F
Substituting (7.4) in each of (7.1) and (7.2), it follows that a, >0 +l >0. ,
;
and d, +i
4i
7.4. ; NORMAL
WEIGHTED
BAYESIAN as
MULTIPROCESS proposed by
MODELS Harrison and
: Stevens
DLM The ` -'- multiprocess ,' models
(1971,1975,1976) are reviewed in the light of the NWBM's introduced in Chapter 3.

The s; et'' {(MI', tl),:,... (MIN), r(N))} for i=1,2,. is a multiprocess C(`), VIÌ, II'1 NWBM with N model
com Sonents such that: p
'.. N " A`)={F(`),
represents a NWBM
where
N
P('); LOis the posterior probability of model i at time t, the model transition probability vector such that
P, ')=1
i, ,
l'1=(aitl,.... 4j
is
'Tkjl'1=PAMIMI'il
j,
that is the
time k the t-1 that t time that at operational model given operates model at probability in 's known 's M') Initially the that the M{'l. practice are assume p(') and was , although let At ') be N t-1 time 's there v posterior the on-line. conditional are estimated , distributions forecasts are : (YIM'
t-1, ' M'
(0,
_)1, Dr_1)--N[mt_1('); Ct_1(')] . The N2 conditional one step ahead _1IM,
D_1)-N[j,
(ii)
71 -
where
`(ii)_F(i)C(i)wat_1(`) and it, ('i)_F(i)RI(i)F, (il, y (i) , with
(i) R(1i) = 8t Ci-(I) ig9
li) ,
The one step ahead forecast distribution

Y
is expressed as a mixture
.V
)
of N2 Tormals
Ye De-t)'_''
PC
-i
(i) "V f. (ij); f, W) 7*
Also. t,
N posterior given
models at time t-1 , ,V2 prior models are produced for time the .V2 posterior models
for which given the data at time t and using Bayes theorem,
for time t are : (i)+Mi-i+)+De)"N(rsli)C("")1 IM, (Ot ; id =1,2,... N
where
el
Ae(+)=81
(ii)_
(li)d-t )
('1)
11-1,
likelihood the : and given
(')1, D, ) L(MM')IM,
yg(a)I I
))-ie(1)} exp {-'fice(): e(!
the associated N2 posterior probabilities Pe() xD
are :
I)n1i)Pt-1
In practice, in order to keep the computations manageable the same collapsing , Stevens Harrison is defined by used to complete the cycle. calculating and as procedure
as
-72N
Ptti) i-i
N
p9(+i)
Cti)_
N P+i)iCà)+(me")__ . =t
As in partitioned
Harrison
and
Stevens(1976) in which ' ,,. equal to
models,
the
NWB
multiprocess
models are
into Class I models, but
there is no transition zero,
between models, that is in which such
r` has all - elements transitions
Class II models, and The former tests.
exists and models operate
interactively.
class is used on-line for Class II models are used models. As explained
model discrimination, for modelling in Chapter multiprocess applications. as outliers cases slope Brown(1983) generally
model estimation
and hypothesis
some prescribed
types of disturbances DLM's
and alternative
4, all the normal normal DLM
can be formulated can be
as NWBM's. as
In this sense, the NWBM such
applications
counted
multiprocess
The former
has worked
well in analysing
processes with disturbances components. However,
in level and seasonality and sharp changes changes have not been modelled
in many
as successfully This is largely variation
as would
be desired.
also commented
on this problem.
because such changes are
small compared
to both the random identified (1964)
level. in to process and changes Smith difficulty and in
Hence slope changes are sometimes Cook(1980), distinguishing Harrison and Davies
as a series of level changes. also commented analysis. on the
level and slope changes in CUSUM
A further
criticism
of these
These involve is that they problems are multiprocess models unnecessary computations. Class by introducing of a new class of multiprocess models using a combination overcome I and Class II models with the Backward CUSUM statistic as a control device for shifting model operations-from Class I to Class II models. Class I is retained when one of the
probability limit.
11 threshold Class some prespecified attains members of
-73-
7.5. MULTIPROCESS
MODELS WITH
CUSUMS :
there is a preferred model tif(1) called the .
The analogy with quality in an expected way. control is
In most multiprocess Class II applications,

` mother that M' ' model by Gathercole and Smith
(1983).
describes the data as long as it is behaving
The other In
i>_2 M'l; model some generally significant models , particular outliers or mavericks and significant
type of departure
from the norm.
changes in the trend are often modelled.
In the new approach the mother model M(l) is represented by a NWBM {j, C, V, H}i
This model forecasts which produces are used unless a departure from normal is
CUSUM by the scheme which operates signaled Then, starting with the latest observation which
on the one- step-ahead forecast errors. helped to trigger in a multiprocess the signal, the other with a
models are applied. high probability posterior MP)
All the models then operate to the ' mother'
Class II way,
of transition Pt'
model ( NO
), until such a time that When this happens,
the
probability
of model MD) exceeds a given value.
model is
begins to operate alone and the CUSUM although
scheme is reset.
When one model
operating,
based forecasts upon model AP), are all phase.
the competing
models are
being prepared in readiness for the multiprocess
and
in C3} C=diag{C1, C1 represents a trend component For example which consider , d Cz a seasonal component. Let model major changes in trend, .1M3 model and model M4 model major changes in seasonality. vectors respectively. For M 'l
outliers,
Let O1 and Oz, be the trend and seasonal parameter
i=1,2,3,4, let the posterior state distributions at time t, be given as

W oz ee ms(s) CW C. s(: W ) 0M
The priors for time t+1 are then formed from the posteriors as follows: r02 lG1MIW8 ('))--N[ (D eMe
e+i
1(x) 83 (x)
+ (0) U) () G zs R. R2 wa
-74-
where $k'=
I,. " " -, Cl, Ck(1) C'k/P( k1)
k =1,2 ;and
2 if k=1 l=4 1 if k=2
and i=2 and i=4
otherwise
R3$)
= C1 C'311)C'2"(01
02
0<01
)<011)
=0i3)i4)
<1
<z <<4', -R -Rz <1

The general . characterising mother principle is to derive the marginal prior for the parameter block
({)
(2)-
(3)
the change,
from the posterior
but to take all other information on other components Otherwise
from the
model., This aims at keeping the information
as stable as
possible in order. to, prepare good estimates structure estimates between the model components
for the changes. might produce
the covariance in the model
violent
fluctuations
of the presumed stable
components.
In addition so that )P .1
a set of preparatory
probabilities
is calculated using Bayes theorem, p' x L(ygIAf
When the CUSUM
signals
a change all these preparatory phase. However,
values are used as starting other than In
values for the NWB multiprocess that marginal characterising order to exercise control
generally all information
the change continues to be taken from the mother model. over the response of models to exceptional for alternative events,
a guard
procedure on the observation by choosing models-Ai')
variance
models is used. This is performed for the and 0, V(3)
V'3) and (') ; i>1 i> 1, are equal.
that so , P& example,
the one-step-ahead forecast variances with prespecified i1) , ll , The outlier (2)
Le: given B(i),
01), R(1) and 12) can be calculated. (2) and '. This gives j(3)= Yi2)
variance
i= Yip)j&1 be v3 that then can set so
-75'). Y4')= ]
Since defining r, =f
(
, C1 C;;,
(4) ) -1
(') can be chosen so that
In particular,
&_1C',
I ';,
(1) (4)
i=1,2 and r12= f1 CI C1,,, _1C'2J'2,

) -'s r12+ I)_7 -9'(i (1) P2 (1) - )12-'-(P2 Z1)
we have
1r2+ )
r2+2(1
This is a second degree equation and can be solved to ((3(4))-'.
This operates when
during If first the single model phase, the CUSUM required signals a change. the Chapter described in 5. be V methods using either on-line estimated may variance During the multiprocess phase the forecast distribution
the mixture function of Normals.
is often multimodal being

Normal loss
Point forecasts are then derived using a conjugate Further
introduced
by Lindley (1976).
discussion of the use of such loss functions
Smith, Harrison in found be can
Harrison Smith (1980). Zeeman(1981) and and and
-767.8. SUMMARY :
The principle forecasting the of of Management systems by Exception in Section is discussed and a historical 2. The backward multiprocess Finally CUSUM background statistic is in
is given
NWB Section Section in in 3 4, the and reviewed the light Harrison of and Stevens multiprocess
models are introduced
models.
all the above ideas are
Section in 5 to give multiprocess combined some detail for a linear growth
models with CUSUMIS and this is explained in
seasonal model.
t,
,.:
", z
-77-
CHAPTER
EIGHT
APPLICATIONS
8.1. INTRODUCTION
This situations. since any chapter
:
of the developed theory using the principle can be decomposed vectors. with This in a variety of
is devoted to applications
The NDBM's multivariate
{f, G, VV, BJ are constructed Normal random vector Norniai G,
of superposition into a linear
combination
of component
multivariate block
random
suggests that model
C =diag{C1, C2,.,.,. C, } where- the component. proper Accordingly, /=(jl,
is associated
a meaningful
I2,.,.,. f, 1 and B=diag{131I1, 212...... , I, } each with G; =X1 's being distinct X, the , eigenvalues is, fora are pair
dimensionality. of C.
This includes the case where
eigenvalues concerned, of conjugate G_r
But for real observation
processes,
where complex
it is usual to consider conjugate
pairs in the same block. of multiplicity one,
That
(Xeiv eigenvalues complex 'ke-")
the adopted
form is
1, sines cosw This could represent a damped sine wave of period 21 and would L sines cosw W have a single associated discount factor 00<0The discount criterion. factors used
typically throughout for further
are not chosen according to any optimisation Trigg here. research Leach(1967) and
There might be room to redefine Brown's
have attempted
discount
factor as specific functions of sign and absolute one step ahead error forecasts. with 0<P<1,
This gives
For an EWR {F=1, G=1,0},

adaptive coefficient is 1-. a,
it is easily seen that in the limit,

s-i m1=mo+with s-o (1_)k
the
as
the
weight
corresponding
to a data
point
that
is k periods
old. N-1.
Comparing
the average
N in data the period moving an age of
average,
(1/N) 2 (N-1-i)=(N-1)/2, . -0
to the
(1-)Ti3', age average +-0
based on the above EWR
model,
Montgomery
and Johnson
-78(1976) have obtained the relationship =(N1) /(N+1). Using this relation,
Agnew (1982) suggested that 0.33 <_13 0.78 Clearly, such low values of give highly .
adaptive unsuitable encouraging models with large lead-. time forecast variances , which would be totally more where data
for such purposes as stock control suggestion is that of Harrison That is
and production
planning.
A rather 3N-1 34V-1
(1983) Johnston and given by =
N represents age to half effect. point to halve in value. Apart from
,N
is the time for the weight of a particular
This leads to higher li values. periods. the discount factors here are chosen more
the discontinuity
close to I such that
the more stable the component
the closer its discount factor is to 1.
Experience shows model robustness against this choice. In modelling protect model discontinuity periods, the Modified from NDBM 's are used in order to
components
information
unwanted
interactions
and the guard
is described in 7.5 the employed. variance observation, procedure on
For a straightforward
application of a single NDBM to a data series which exhibits US data the air passengers see set which is analysed in
no major changes and outliers,
Chapter 2. Other selected series considered here , are : i) A simulated seasonal series with trend, level, seasonal changes , outliers and in is This the to the these performance examine phase of missing observations. discontinuities and major changes knowing the true underlying model. The
data is analysed using both intervention and multiprocess NDBM 's.

ii) In order to test the performance of the multiprocess multiprocessor, data set concerning a medical NDBM 's vis the CUSUM charges is chosen.
prescription
'The data was previously
Harrison-Stevens using analysed
multiprocess models.
iii)
For a typical data set with an unknown and variable observation variance, the Road Death Series is chosen and the CUSUM multiprocessor is applied with an the variance. observation of estimation on-line
-79-
All the data sets are provided in the appendix.
8.2. SIMULATED
SERIES
In order to examine model performance in phase of major changes and impulses with
a minimum risk of misspecifications, artificial data is generated. The data is analysed
Intervention. both an automatic method and using using For an automatic way of dealing with these changes, a multiprocess model is used.
Automation from in analysing point statistical of view. data sets may not be a desirable property However. multiprocess to aim for
a Bayesian
models have a wide range of
applications
in areas other than prediction and classification Data the of
of future outcomes and are especially valuable
in the detection
in different types changes process components. of of
8.2.1.
Simulation
The artificial
data is simulated by the superposition of three component series. =rwl, w3Jt and
These are an independent random noise v,, a linear trend component r'1
by harmonic '2,9=(W31w419 The 12. represented a single of period component cyclic 0 a , is carried out using the model : simulation Yt=f O'+v'
0= [ cOS Cz= cwith ,
where
C=diap{C1,
C21
C1= I
1,,
v, -N[0; 400], w', =[w'l, i, w'2 0'a=[75,9,85,11.
]-N[O; diag{13.175,0.017,1.14,1.14}) and
Accordingly,
a series of 120 monthly observations is generated and the following
major impulses and events are imposed:
i)
200 is subtracted from the intermediate simulate an outlier;
observation at t=32,
in order to
-80ii) immediately after t=36

to give a jump iii) following t=60, the linear growth is reversed in sign from roughly 8 units per
deseasonalised ' ' level is reduced by 270 the process ,
period to -8 , giving a slope change ;
iv)
following t=85
linear growth is again reversed in sign and simultaneously the ,

is increased by 50 and 115 are eliminated ; to give a period of three missing
the seasonal amplitude v) data points observations. 113,114
8.2.2.1 INTERVENTION:
Intervention involves changing a routine or existing probability model often by
introducing through
subjective information. functions
In classical time series, interventions are specified' In Bayesian Dynamic Models, distributions which not only
transfer
( Box and Tiao (1975)). transfer probability
intervention
is achieved through
introduce an expected effect but also introduce an extra uncertainty associated with the change. The object of structuring a model, is to enable changes to be made to particularmodel components in such a way that leaves other components largely unaffected In the following example, a useful way in which additional uncertainty can be specified through the discount factors is illustrated. role of the state random noise w,. V B, } is applied with f For the data simulated in 8.2.1, the NDBM {J, C, VV, ,G and from Apart intervention s, 3, i34}9. times B=diag{i31, defined at of there and as
values 1=s=0.9 and 3=i=0.95 for òptimal' No are used. attempts discount factors.
As explained earlier, the discount factors replace the
the
to improve model performance are rounded
are made by looking figures thought
The values chosen,
to be appropriate
bearing in mind that it is usually preferable to err on , factors, Harrison (1967). Initially , the same When the
the : side, of ' underestimating starting major
discount
but for go with a vague covariance are adopted values it is that (v) (i) to to assumed occur about are changes
matrix
2000 I.
the type of forthcoming
81
is no available information but known there is that event
on the size or even the sign of the updating procedure

information from
known is is it Since, that y32 an outlier, the coming change.

treats it as a missing observation. misspecifications t=37 is signaled This is to protect
model components
that the outlier observation
may provide.
Foreknowledge growth
of the jump at 61 by
by (1, Z)=(0.940,1) and for the trend
change at t=
distributions instance In the updating (l31, 2)=(1,0.98) . each ,

modified NDBM from since, the in practice effect of , it is desirable to protect of
are obtained using a

information changes on model in other in
components components. 4.4.1.
imprecise
descriptions
sharp
At these intervention
times a Modified
N'DB. NI is applied at t=86
as described
The simultaneous
sudden change in trend and seasonality . The three missing gY observations of 0, for t= 113
is signaled by with simply b
ao so Baa = diag{1,0.9 ,0.9 I} taking the posterior
are dealt
distributions .
114, and 115 as the prior parameter , and the corresponding one step Since Mean
distribution
(Oi ID112)
Fig-3 shows the observations
ahead expectations in the routine Absolute
in order to demonstrate V=
the power of the intervention major disturbances
method.
data generation ( MAD)
400, without
the limiting
Deviation Y=
forecast be the errors would step ahead one about 18.7 = of performance in terms of the
0.8 1's, where MAD
400. / ( i2) ( Theorem 6.4). The overall the outlier
is found to be 20.76 after omitting and ellb'
jump the e32,
e37 and the three
missing errors e113e11, below. given
The performance
in terms of the MAD for each year is
YEAR
rs
1 39.5
2 20.3
3 20.5
4 20.3
5 15.0
6 17.2
7 17.4
8 20.1
9 21.5
10 15.8
MAD
observations 00 00 a 0 0
It $ 3 -
I I 0 :
(D
T -+ 1. ,W
1
(!)
U7 (n
4*
tD
Z 1rn
o o
CD (D 1 4) < a !V .. Ò rt 0 CO
a T
13
x >
1 -0 1IIrn 17
v)
et -o.
W
I :z 1-a IM
C
O3<_.
' Iz 0% 1-4 et
I-.
Irn
o o"- 1 3 lo
-;
Iz
1-4
'
In
I l- I
Ir
10
O
CD
Ln Irn
,, >.
-83Multiprocess Models - The Artificial Data :
8.2.3.
For an automatic way of dealing with the major changes in the series , it is assumed
that the series is monthly Given with an additive linear growth and one harmonic seasonal could be
component.
this structure, outlier,
the possible changes in the series which seasonal changes and/or However, changes, combinations
considered are trend change, possibilities
of them ( 23
) 'alongside of the mother model. model the combined
successive operations of the main if any. Clearly, the computer
changes may reasonably storage and running
time increases exponentially
with the number of models considered. models should be considered. results. an NDB multiprocess model is
This suggests that a fewer number of alternative For a rough comparison constructed factor with intervention
with four models comprising. factor
A basic or mother
model with trend discount variance V= 400,
j31=0.9 and seasonal discount
1 2=0.95 and observation model, a trend
all' as specified factors ' :,; change (31=0.02 discount
in the basic intervention and I3 2=021 an outlier factor and the variance
change model with
discount
and a seasonal change model. for the outlier model
The seasonal from the
are found
observation probabilities
variance controlling
rule for the alternative
models defined in 7.5. Transition trend, outlier and seasonal
from models at time t to each of the mother,
0.8,0.095,0.1 are taken t+1 as time change models at Given the same initial NDB multiprocess f_ using 'case, intervention in the as settings
and 0.005 respectively. case, the data is then analysed controller. In the latter of 0.8 for
models without
and with the CUSUM
0.5 threshold 2.0 with probability taken respectively a and L,, and a are as back to single model operation. intervention the unlike model, multiprocess
switching
As is to be expected,
models need more
adjustments. model proper make and to changes recognise time 3,4,6 for the years errors observed
The high forecasting
and 8, may partly be due to that and partly due to
Level in the and growth models. changes the alternative are combined of selection the This growth together. is change seasonal and are while, modelled to model, change trend
84 -
keep the process of alternative models performance the CUSUM . without
model selection
vague and simple.
A summary
of the
a CUSUM is presented The overall
in Fig. 4 and Fig. 5 presents that with after removing the errors
statistic.
MAD's
e32, e37, e38, e39, eiis+elil
and ells were found to be 25.03 and 22.74 respectively.
The
performance in terms of MAD for each year in both cases is given in the following table for comparison.
Multiprocess
Models
Simulated -
Data
( with
and
without
CUSUMS)
YEAR
1a1i689
10
without
CUSUM's 32.8 21.3 34.7 40 4 15.6 22.5 16.5 263 209 19.3
with CUSUM's 26.9 17.9 23.9 29.1 13.9 28.0 14.9 32.9 22.4 17.5
It is easily observed from the above table,
that
the multiprocess model with
CUSUM's is to be preferred in both terms of performance and computer storage and it is interesting Moreover, to note that, time. running the order of preference among the available to be used for the
information is in the amount of accordance with models, model construction information . The intervention
model would be the best choice when all the
about the structural
The known. their times are occurrence changes and
CUSUM's be the best a preferred models with when multiprocess choice would second ,
known. is model Finally, if no information about a preferred model and the types of the CUSUM
occurrence is known, times their change and statistic be the candidate. would
the multiprocess
model without S
observations
N pp Opp O ON
I-
.., .
r
1.1
39
1-s
0) cl
N A ft
3 _
C w
ID
CD 3
IC, I' 1r
Cl) -
to
1 1i
fk'*2 F"` i
, , ..-k _: S ..
c
W 0%
m
D
G1 T m G) a -0
o C"
N (D '7 <
-4
_v 1m 1O 1 f'7 rr
o--, .. { s k
Bt :{ a'.. 't
. tCo
r'
1 Cn
1(n
> rn
0 > I
1 fD
rn IC! f
C)
a
f) CA 13 C r
>
ti N
o 7
1-a rn
CD r
(! 1
Irn Im
11-
'0 0.
rn (J1
Co
N O
.11
of
-" o
n.i
0 o
observations
N
O C) O
a 0 0
$1 1I 37 N N cD
1f7 -+
O
(D f!) r cD "D
C) 1'
i11
l11
.'rcN 0 `;
d c: z
-X)
m )Cl
C
M1
1 C1) 113 1 r_r-
W
CDN
.<
c'mI (D N
C)
ID a.
1>"
1 P*1 CZ) 1 t! )
CD,.
cc
a a1
co
a m C)
(A
Irn I3 1I-.
Irn ()
"
ON
0 7
-1 0 7
Ic ($)
IC 13
11 In
r+ N "' O co co 17 r O 7
10
01.
CO
N O
-878.3. THE PRESCRIPTION SERIES
8.3.1.
The Data :
medical data set giving the number of prescriptions according for five years
This is a monthly starting from March
1966.
the figures taken are normalised
to the number of the analysis of
effective working Harrison
days in the month.
This is to compare the result with used multiprocess models.
and Stevens who previously
The data is strongly
for which a constant observation seasonal It is observed that increased
variance is reasonably assumed. charges in June in December 1968 caused a major an outlier.
prescription epidemic
influenza level in the an and change However,
1970 'caused'
for the purpose of demonstrating Consequently other
multiprocess
modelling,
it is assumed that but, are dealt with using
these events are not known. automatically Modified initial together with
they are not anticipated changes. The data
unobserved
is analysed
NDB multiprocess
models without
and with the CUSUM statistic
using the same
information prior
given as is t (OOIDO)=( s ID0)-N(
22 0; 000
10 00 00.25
0 251,1
signifying a weak prior with no growth and no seasonal pattern.
In both cases only two
types of major disturbances are considered, namely , sharp trend changes and outliers. 8.3.2. NDBM Multiprocess Models - Known Observation Variance :
Here the routine model has a linear growth and full seasonal components with discount factors l = 0.95 and z = 0.975. The discount factors for the corresponding trend change model are (1,(3z)=(0.02,0.975 constant throughout so that.
()}=0.05 I"' f lr2=P{, tte+t(2 ")}=0.025 ,
) and the model transition probabilities are
7T3Oft+1(3)lAfj
; ta123
-88where the A`1 's are: observation Mother Outlier and Trend The
change models respectively. is as given in 7.4. ( i= 1,2,3
variance is estimated
as 0.36.
the model operation
Given the posterior the probability
model information
at time t-1 as {(m,; C; ); P, },
_i
that at time t-1 model i operated prior parametric
and that at time t model j operates is is (0! M1,MM,Dt; NDBM rules. the outlier R; jJ
'Ir, P; and the corresponding where the R; J 's are calculated
distribution
according
to the Modified and jumps,
In order to control
the response of the models to the outliers that Yz= Y3 . Point predictions
variance is chosen so Normal Loss function
are obtained
using the conjugate
introduced
by Lindley
(1976) and plotted
along with
the observations
in Fig. 6 with the
percentage one step ahead error forecasts. The first 12 observations seasonal pattern month 24 recognition. the model has recognised a minor
are used for trend and unobserved shift at
the increase of prescription .
charges at month 30 has caused a negative error between outlier and sharp as an in
and is followed -15% of trend changes is resolved. outlier.
by an error of -5% as the uncertainty
the influenza epidemic at month 48 is properly in terms of MAD, for the last four years,
identified
The performance
is tabulated
Section 8.3.3.
This is for a direct comparison
with the results obtained using multiprocess
CUSUM's. models with
one point
i NON OO
step
error
of No. of
NW Op O
prediction
prescriptions
(000
s)
I
001 Oc0 1 < G1 7 m Co "t 1D 0) (D
I-1
1C) 10> 11 1-. 4 1= 1 rn
o 7T N
N r N r
1 1-0
Irn 1 U)
o.
o
u
a -1
"+
**
lo 1z 1cr
o_ I6-" Irn o1 Cl)

w a
w O1 3 O 7 IC Ir I-C Ii7
ID In
In, IU1 C11
Co
ON 0
*`
13 IC ID In, Ir it Imo-. Iz c-I
-908.3.3. The CUSUM Multiprocessor Known Observation Variance: the
The same model specifications described in 8.3.2 are resumed here with
Backward CUSUM statistic with initial until values La=2.0 and a=0.3. the CUSUM interactively monitor Predictions
are based at which of 0.98
Mother the on
model performance
signals a change,
time all the three models start operating
until a threshold probability
is regained for the Mother model. during the Mother model performance other models run
in parallel as described in 7.5 as preparatory arrangements for comming changes. The
model performance CUSUNI monitors.
is summarised
in Fig. 7 together
with the upper and lower Backward and that
It can be seen that all the changes are properly identified is slightly better than that
the, performance tr't nearly by 2/3. without

e t' ... 1 . 'ri
of 8.3.2 while the process time is reduced with that of the multiprocess models
In order to compare the performance statistic,
the CUSUM
the MAD for the last four years is tabulated
below for the
two models.
.. 1 .
YEAR
CUSUM's without
1
0.63
2
0.52
3
1.79
4
0.47
.;
},.
with CUSUM's
0.57
0.46
0.7
0.34
a" "i5
No.
ONNW O O Cl
of
prescr
pt
Ul
ons (000 s)
O
I ti i J / \
rc o L -o -0 n)
CD CD
nn CC u) (n CC
33 1Q N A
i ** *
*
*i
001. o- 3 Cl) mCD (n < f mI -n n JT mm 0) aiCL m C)
-11 I c')
1ti i= i rn 1-o 1rn itn IC) im 1D 1
-4 i, 1o
it
-o o
W -D
rn icn
iz i itn irr I-
0 7 N
I co CL 0 *
-t-
4a,
it
1
im IO
-92-
8.4. ROAD
DEATH
SERIES
8.4.1.
The Data :
representing quarterly road deaths in U. K. for the present
This is a series of 38 observations
years 1960-1969. - It can be seen from Fig. 8 that the main observed discontinuities
in the series are the outlying observation in the first quarter of 1963 due to a cold icy
winter 4; preventing traffic using many roads and the trend ,a high variation change in 1967 due to the of the observation error
introduction variable
of breathalyser.
Generally
can be observed.
this suggests that
an on-line
estimation
of the observation
variance is more appropriate relation
than a fixed and global estimate.
Using longer data sets, a is evident. The death
between the number of road deaths and industrial
activity
rate rises during periods.
falls during boom is traffic the slump roads on and a period when more in a model and would lead to more reliable purposes and no attempt is
This effect could be accommodated However,
predictions.
the analysis here is for demonstration productions.
industrial deaths to and relate road made
8.4.2.
the NDB Multiprocess
Models with CUSUM
's :
In this analysis, Modified NDB multiprocess models are used with CUSUM's. The
alternative models assumed are: factors trend change, outlier 0.9. and seasonal changes. The main
discount the sets model
(31=0.8 and z = and reflect
These figures are lower than the data. The trend change
in the precision used ones discount factors
example
the quarterly
((3l, 2) = are
(0.84,p2). As before, variance
the discount
factors for the seasonal
the observation and model change
for the outlier
found using the model are
The defined in 7.5. variance variance for observation the models alternative control rule be it to is proportional assuming on-line the estimated model main of i. deaths, e. number of
V, =aE{Yt IDt_1}.
to the expected
10. X0 initial in 5.4 defined law and = values the with is power using estimated on-line a
-93-
,rjo =
20. The model transition
probabilities
are 0.7899,0.1,0.11
and 0.0001 with the
CUSUM The initial in 8.2.3. Lo 2.0 parameters where as order = and a=0.5 same the threshold return probability is 0.8. For this model:
j ='. 1.0.1.0.1' and C =diag{C1. C: }
010 and Cz= -1 00
and
Where
cl=
1I0 1,
00 A weak prior distribution was yet initially as

JJO
-1
( 2 IDO) a
N:
0 90 -50 30
; diag{1000,100,100013}1
Finally,
is the in Fig. 8 showing that the performance and presented results of a summary
dealt The CUSUM with successfully. the are the summary changes on prescribed all direction different the model operation and of periods of changes at the statistic shows breakdown points. model
deaths cu
C)
per
quarter
O. m
C)
41
11 ij
I= 1: 10 I fC
I 1
I "0 *
iF r *
a-
I I
:0
"TI 1
1. 07
1b
(
")
CD
N
CC)
_ CD
C'f
m CD
C7 y 3 N SI * *
01 cD eS b
0 N O CD d C)
1> O
ID I >1= I CA I X) 1 R1 1 C! )
%y % N . '
3 I0
Uf
D`
`\
0'
N O
eft
N r
N r
0
*
N co
N OJ
N /
W % I Jr
Uli N
95 -
8.5.
SUMMARY
:
of applications of the Normal Bayesian Models based on are
In this chapter a number the principle of parsimony
through
discounting
and Management an artificially
by Exception
presented. intervention
The first application
is made by examining
generated data using In 8.2.3 the same In Section
discontinuity. the points of at
This is presented in 8.2.2. models with and without prescription
data is analysed using both multiprocess
C'CS AI's.
3, the models are applied to a real data concerning observation quarterly -appropriate variance is assumed.
charges where a constant variance. in U. K..
For an on-line estimation
of the observation of road deaths
data is chosen in 4 concerning -Section
the number
figures are presented in each case to summarise the model performance.
96 -
CHAPTER
NINE
DISCUSSION
LL _ "": ..+ . mau ;i
AND FURTHER
RESEARCH
A.
The Bayesian framework statistical, supervision techniques. and interaction
in statistics, and
is most promising predicting
and logical among existing outcomes. it requires
In, modelling
future
of the modellers to accommodate.
on-line,
any environmental
or external effects that are not, anticipated. A The pioneering' with work 'oP Harrison and Stevensi 1976) sequentially has provided applied
statisticians static
a base for analysing predictor models.
series arriving The DLM's Smith
with time,
away from of successful
and limiting
have seen a number
applications
by Harrison
Stevens(1975), and
(1983) and Smith and West(1983).
However, as pointed out in Chapter 3 of this study,
both the observation and state and not parsimonious in the
invariant not scale are ambiguous covariance matrices , Harrison Roberts and sense of (1984).
These problems have caused practitioners
diverted in difficulties the them to the use of other problem and estimation considerable less constructive models. The principle aim of this study is to replace the state error variance by a small
number within invariant discount of discount the Bayesian factors. This gives models which enjoy the principle Discount factors of parsimony they are
framework.
can more easily be set,
linear transformations under principle are generally
of scale, not ambiguous, and robust.
and models based on the is also
parsimonious
The discount principle
aimed at demonstrating applications, After providing
and publicising the study
the potential
of Bayesian modelling certain
in practical
in particular,
of processes with within
types of discontinuity.
the basic principle its efficiency
of discounting through
the classical point estimation with DOUBTS and ARLMA are pointed replacing the
framework
and testing
comparison
models applications, out. The discount
the drawbacks principle is then
and limitations
of the Bayesian DLM's NWBM's
carried out to construct
-97DLM's. This class of models has many interesting
subclasses and many existing
and well forecast are more
known classical models are retained in the sense that they have the same limiting functions extensive. as special constant NDBM's. However the Bayesian facilities
Two methods
are given for on-line for processes with
estimation
of the observation variations
variance.
This is essential especially
high stochastic
and those that
exhibit sudden changes and outliers. The controlling
In these cases multiprocess ,
models are advisable. governs
for the observation rule ratios.
variances is advisable since the variance
the model likelihood in the efficiency,
The use of CUSULI statistic storage and running
provides an overall improvement The NDBMM's allow a
computer
time problems.
simple and easy way of communication Almost all types of major disturbances
and intervention that
in phases of major disturbances. in time series processes are series of that principle kind are
are common
in the artificially present
data set. generated Chatfield see
Even less disturbed (1978). The discount
by statisticians, often avoided intervention different with
has simplified All types
components,
as the example in 8.2.2 demonstrates.
detected successfully. of change are information multiprocess multiprocess disturbances on major models. Clearly,
However, is often
in the analysis of real data sets advance in which case it is useful to adopt the resulting analysis from the
missing,
be is to expected, as
is less from Fig. 4 intervention in those than the successful models shown no information on the disturbances is fed
models. Apart from the missing observations, into the multiprocess models with models. More efficient
results are obtained
using the multiprocess by assuming the to
CUSUM's
little a where
more information
is provided
existence of a particular
model representing
the process. The models are also applied
demonstrate These data to sets. are used real discontinuity types of certain observation noise is present. occurring
the efficiency of the models in dealing with
when the data is fairly stable and also when high The
This causes a delay in the task of recognising changes. dealt with are promising. statistic In particular,
from all applications results for, called are models
when multiprocess
the CUSUM
is recommended for efficiency and economy influences and misspecifications when major
Modified NDBM's the and
protect component
-98disturbances are present. More applications can be found in Ameen and Harrison (1983 a b, c). In all cases the underlying model parameters have been given physical meanings are provided
It can
and simple transformations

practical' applications
to transfer information
be argued that the
from or to other
amount of further of effort
of interest.
developments
exploitation and
in these models is proportional models.
to the amount
spent in developing
the existing and less profound
The following
lists a number
of suggestions for further research: iThe models deal with Normality Filters processes defined only on the entire real line with the
assumptions so that successive estimates are obtained using Kalman However, in many real life problems , processes
recurrence relations.
are well defined on bounded sensibly, in which case,
sample spaces and do not cover the real line estimates outside their
these models may provide
feasible region. exploitation.
These points
seems to be the most promising
and demands
Smith (1979) and Souza and Harrison (1979) have extended the DLM's to include non Normal Steady State models. These ideas are combined with discount principle, models. iiAmeen (1983 b), the
to provide Generalised Bayesian Entropy .. `
The forecast functions are specified using the design and transition matrices. It is important to develop methods that provide more automation in model identification and a proper Bayesian on-line parameter learning procedure will
improve the performance. Some considerable success has been achieved by
Migon and Harrison (1983) considering non linearity and non Normality of the
processes. iii-The discount choice of factors is left to the modellers and work needs to be The generalised EWR and
done in developing methods for on-line estimation.
have in 6 ARIMA limiting restricted parameters as chapter the models obtained
-99Godolphin by out and Stone transition (1980) for the DLM's matrices. forecast Also, with in which lower explode they
pointed
suggest the use of singular factors, providing the uncertainty
discount rapidly
of lead time
distributions
less reliable long term predictions.
iv-
Generalising the models to include more correlation structures will provide a

wider range of applications.
v-
The
limiting
results
obtained
are
mostly
based
on
specified
canonical
representations and more general results are possible. viIn a general context. interest more applications when the of the theory in different process is subject to fields of
are needed especially as almost always
a dynamic
development
is the case. The NDBM's
replace the popular in the analysis.
classical regression models and provide
an overall improvement
Some applications on this topic are given by Harrison and Johnston(1983).
APPENDIX
.
U.S.AIR PASSENGERSDATA
YEAR JAN FEB MAR APR MAY
JUN JUL
951 145 150 178 163 172

178' 199
952 171 '180 193 181 183

218 230
953 196 196 236 235 229

'243 264
954 204 188 235 2.7 234

264 302
955 242 233 267 269 270

315 364
956 284 277 317 313 318

374 413
957 315 301 356 348 355

422 465
958 340 318 362 348 363

435 491
959 360 342 406 396 420

472 548
9 417 391 419 461 472

535 622
AUG SEP OCT

NOV DEC
199 184 162

146 '166
242 209 191

172 194'
272 237 211

180 201
293 259 229

203 ; 229
347 312 274

2.37 278
405 355 306

271 306
467 404 347

305 336
505 404 359

310 337
559 463 407

362 405
606 508 461

390 432
SIMULATED
DATA
JAN FEB
189 108
270 261
392 318
221 179
318 317
411 404
273 267
233 192
377 340
449 444
MAR
APR MAY JUN
93
77 42 52
192
201 166 150
335
283 276 253
185
114 122 80
269
234 198 185
347
267 238 216
202
176 146 84
163
106 59 69
246
202 185 175
377
338
JUL
AUG SEP OCT NOV DEC
67
75 155 236 320 270
193
244 255 300 343 382
282
86 387 388 501 482
143
148 205 292 307 331
239
237 314 343 409 408
187
193 242 269 272 306
108
132 157 187 206 207
108
130 201 268 319 375
222
254 338 402 478 467 373 433 483 567 617
101 -
PRESCRIPTION DATA
YEAR
JAN FEB MAR

APR
1966
23.1
21.4
1967
23.9
1968
25.9
1969
23.1
-- 24.3 _.
22.3 23.6
22.3
1970
23.3 23.3
21.8
24.4 25.2
23.6
22.2 23.8
22.4
MAY JUN JUL AUG SEP

OCT
21.1 20.8 19.8 18.8 20.2

21.9
22.7 22.4 20.8 19.6 -1.4

22.7
23.5 20.5 19.0 18.1 19.9

21.3
21.3 21.3 19.8 18.7 20.8

21.5
22.6 21.7 20.5 19.4 21.4

22.3
NOV DEC
. , ,
22.8 23.1
23.8 26.6
21.7 23.4
21.0 28.6
22.4 23.7
ROAD DEATH DATA
960
1 2 3 4 486 514 614 710 1 1
961---- 962
516 546 587 653 501 499 587 650
963
400 547 619 742
964
570 582 664 790
965
592 648 660 751
966
578 604 658 822 610 542 659 629
96L L
518 499 603 650 518 541
102 -
REFFERENCES
[1]
AGNEW,
R. A. (1982). Econometric
forecasting
via discounted
least squares. Naval
Research Logistics Quart., [2] AMEEN.
Vol. 29, No. '3,291-302. to discussion of the paper by E. T. Jaynes. Valencia, Spain, Sept. 1983. models and forecasting.
J. R. M4. (1983 a). Contribution
Second International [3] AMEEN. Warwick (! AMIEEN.
Meeting on Bayesian Statistics,
J. R. M. (1983 b). Generalised Res. Rep. 37. J. R. M. (1983 c). Contribution
Bayesian entropy
to discussion of the paper by M. West.
Second International Meeting on Bayesian Statistics, Valencia, Spain, Sept. 1983.

[5] AMEEN, J. R. M. and HARRISON, (to appear). P. J. (1983 a). Discount weighted estimation. J.
Forecasting. of
[6]
ARMEEN, J. R. " M. and HARRISON,
P. J. (1983 b). Normal discount Bayesian
International Meeting Bayesian Statistics, for Invited the on second paper models. Valencia, Spain, Sept. 1983. (7j AMEEN, J. R. M. and HARRISON, P. J. (1983 c). Discount Bayesian multiprocess modelling with CUSUM's. Proceedings of International Time Series Conference,
Nottingham, (O. D. Anderson ed.), 1983. North Holland. (8J ANDERSON, 0. D. (1977). A commentary on "A Series ". Time survey of
International Statist. Rev., 45,273-297.

[9] ASTROM, K. J. (1970). Introduction to Stochastic Control Theory. Academic Press,
Inc., New York.

[101 BARNARD, G. A. (1959). Control charts and stochastic processes. J. R. Statist.
Soc. B, 21 239-270. , [11] BISSELL, A. F. (1969). Cusum techniques for quality control ( with discussion) . d.
103-
R. Statist. Soc. C, 18,1-30. [121 BOX, G. E. P. and JENKINS, G. M. (1970). Time Series Analysis, Forecasting and Control: San Fransisco, Holden Day.
[13J BOX, G. E. P. and TIAO, G. C. (1975). Intervention analysis with applications to
Statist. J. Amer. economic and environmental problems.
Ass.. 70.70-79.
[141 BROWN, R. G. (1963). Smoothing, Forecasting and Control. San Fransisco: Holden
Day. [151 BROWN. R. C. (1983). The balance of effort in forecasting. J. of Forecasting. Vol. 1,
No. 1.49-53. [161 CANTARELIS, N. and JOHNSTON, F. R. (1983). On-Line variance estimation for
the steady state Bayesian forecasting model. J. Time Series An., Vol. 3, No. 4.225-
234. [171 CHATFIELD, C. (1978). The Holt-Winters forecasting procedure. App. Statist., 27,
No. 3,264- 279. [18] De GROOT, M. H. (1970). Optimal Statistical Decisions. New York, Mc Craw-Hill. [19] DOBBIE, J. M. (1963). Forecasting predict trends by exponential smoothing. Opns. Res. 11,908-918. [20] EWAN, W. D. (1963). When and how to use Cusum charts. Technometrics, 5,1-22. [211 EWAN, W. D. and KEMP, K. W. (1960). Sampling inspection of continuous
between successive results. Biometrika, 47,363processes with no autocorrelation 371. [221 GATHERCOLE, R. B. and SMITH, J. Q. (1983). A dynamic forecasting model for a
time series. Proceedings of International Time Series
general class of discontinuous
Conference, Nottingham, ( O. D. Anderson, ed.). North Holland. [23] GEBEL, A. (ed.). (1974). Applied Optimal Estimation. MIT Press, Cambridge. [241 GODOLPHIN, E. J. and HARRISON, P. J. (1975). Equivalent theorems for
104-
Soc. Statist. B, J. R. 37,205-215. polynomial projecting predictors.

[25] GODOLPHIN, polynomial E. J. and STONE, J. M. (1980). On the structural filter. representation for
projecting
Kalman based the on models
J. R. Statist.
Soc. B, 42,
35-46. [261 HARRISON, P. J. (1965). Short-term sales forecasting. J. R. Statist. Soc. C. (Appl. Statist.
[27]
15,102-139.
P.. 1. (1967). Exponential smoothing and short-term sales forerasting.
HaRRISON,
Man. Sci., 13,821-842.

;281 HARRISON. P. J. and AF RA\[. dynamic M. (1983). Generalised exponentially eighted
regression and parsimonious
linear models. International Holland.
Conference held at
Valencia, ( O. D. Anderson ed. ), North
[29] HARRISON, P. J. and DAVIES, O. L. (1964). The use of Cumulative sum ( Cusum, ) techniques for the control of routine forecasts of produce demand. Oper. Res.(J. O. R. S. A. ) 12,325-333. [30] HARRISON, P. J. and JOHNSTON, F. R. (1983). A regression method with non , ( Submitted ). J. O. Rep. R. Warwick Res. 35. to of stationary parameters. ]31] HARRISON, P. J., LEONARD, multivariate T. and GAZARD, T. N. (1977). An application of Res. and
hierarchical forecasting. Paper to R. Statist. Soc. Ind. Appl.
Section Conference, Manchester. Also Warwick Res. Rep. 15.
[32] HARRISON, P. J. and PEARCE,S. F. (1972). The use of trend curves as an aid to
Manage., 2 149-170. Ind. Mark. forecasting. market ,
[331 HARRISON, P. J. and SCOTT, F. A. (1965). A development system for use in short
term sales forecasting
investigations.
Paper to Ann. Conf. O. R. Soc. and Special O.
R. Soc. Meeting. Also Warwick Res. Rep. No. 26.

[341 HARRISON, P. J. and SMITH, J. Q. (1980). Discontinuity, on Bayesian Statistics, decision and conflict. Valencia, Spain ( ed.
Proc. of the first International Birnardo et al. ) May 1979.
Meeting
ion "
[35] HARRISON, P. J. and STEVENS, C. F. (1971). A Bayesian approach to short-term forecasting. Oper. Res. Quart., 22,341-362.
[36] HARRISON, P. J. and STEVENS, C. F. (1975). Bayesian forecasting in action :

Case studies. Warwick Res. Rep. 14.
[37] HARRISON. P. J. and STEVENS, C. F. (1976). Bayesian forecasting ( with
discussion ). J. R. Statist. Soc. B. 38.205-247.
[38] HENDERSON, C. R.. KEMIPTHORNE,

(1959). The estimation 15,192- 218. of environmental
0., SEARL, S. R. and KROSIGK, C. M.

and genetic trends subject to culling.
Biometrika,
[39]HOLT,
C. C. (1957). Forecasting seasonals and trends by exponentially weighted

Res. Memo. No. 32. (NONR 760(01)).
Inst. Technol. Carnegie moving averages.
[40] JAYNES, E. T. (1983). Highly informative
priors: the effect of multiplicity
of
inference. Invited paper for the Second International Meeting on Bayesian Statistics, Valencia, Spain, Sept. 1983. [41] KALMAN, R. E. (1963). New methods in Wiener filtering theory. In Proceedings of
Application Random Function Engineering Theory Symposium first of the on and ) F. KOZIN. New York : Wiely. BOGDANOFF ( L. J. Probability. and eds.
[42] KALMAN, R. C. and BUCY, R. S. (1961). New results in linear filtering theory. J.
Basic Eng., 83,95-108.
[431 KENDALL, M. STUART, A. and ORD, J. K. (1983). The Advanced Theory of ,

Statistics. Vol. 3,4-th ed., Charles Griffin & Company Limited. [44] LEE, T. T. (1980). A direct approach to identify the noise covariances of the Kalman filtering. IEEE. Trans. Automatic Control, Vol. 1, AC-25,841-842. [451 LINDLEY, D. V. (1976). A class of utility functions. Ann. Statist.. 4,1.10.
[46] LINDLEY, D. V. and SMITH, A. F. M. (1972). Bayesestimatesfor linear models.J.

R. Statist. Soc. B, 34 1-41. ,
tos -
[47] MAKOV;
UDI E.- (1983). 'Approximate
Bayesian procedures for dynamic linear
models in the presenceof jumps. The Statistician 32,207-213. [48] : MAYBECK, P. S. (1982). Stochastic Models. Estimation and Control. Vol. 2,
Academic Press, New York.

[49J Mc KENZIE. 24,131-140. [50[ MIGON, ., forecasting H. 'S. and HARRISON, to television P. J. (1983). An application Contributed of non-linear Bayesian E. An analysis of general exponential -(1976). smoothing. Oper. Res.
advertising.
paper to the Second International
Meeting on Bayesian Statistics,
Valencia, Spain. Sept. 1983.
[51] MONTGOMERY,
Analysis, [521 MUTH,
D. C. and JOHNSON, L. A. (1976). Forecasting and Time Series

New York. with polynomials by a generalised recursive
McGrow-Hill,
(1981). J. E. "Forecasting `
Canada. Quebec City, Forecasting. Symp. First Inter. Paper to on updating methd. [53] O'HAGAN, A. (1979). On outlier rejection phenomenon in Bayes inference. J. R.
Statist. Soc. B, 41,358-367. [54]' PAGE, E. S. (1954). Continuous inspection schemes. Biometrika, 41 , 100-115. 155 PRIESTLEY, M. B. (1980). State-dependent models: Ageneral approach to non-
linear time series analysis. Time Series Analysis, 1 ,47-71.
(561 ROBERTS, S. A. and HARRISON, P. J. (1984). Parsimonious modelling and

Res., Oper. J. 16,365-377. Eur. time forecasting of seasonal series.
[571 SMITH, A. F. M. and COOK, D. G. (1980). Straight lines with a change point :A Statist. 29,180-189. data Applied transplant Bayesian analysis of some renal .
An (1983). Monitoring M. WEST, transplants: M. F. renal (581 SMITH, A. and

application [59]SMITH; <': of the multiprocess Kalman filter. Biometrics, Statistics 39 No. 4 867-878. , , Relating to Discontinuous
J. Q. (1977). Problems
in Bayesian
Phenomenon, Catastrophe
Theory and Forecasting.
Ph. D. Thesis, Univ. of Warwick.
. F_", >>
107 ;,
[601 SMITH, J. Q. (1979). A generalisation of Bayesian steady forecasting model. J. R. Statist. soc. B, 41 378-387. ,
[611 SMITH, J. Q. (1983). Forecasting 32,109-115. accident claims for an assurance company. The
Statistician,
[62] SMITH, J. Q. HARRISON, P. J. and ZEEMAN, E. C. (1981). The analysis of some ,

discontinuous [63] decision processes. Eur. J. Oper. Res. 7,30-43. Approach to Forecasting. Ph. D. Thesis,
SOUZA, R. C. (1978). A Bayesian Entropy Univ. of Warwick.
[6.1] SOUZA. R. C. (1981). A Bayesian entropy
approach
to forecasting Holland ,
: The multistate
( Houston Tex. ) North Series Analysis Time In model.
535-542.
[651 SOUZA, R. C. and HARRISON, P. J. (1979). Steady state system forecasting :A Bayesian entropy approach. Warwick Res. Rep. 33. [66] STEVENS, C. F. (1974). On the variability Res. Quart., 25,411-420. of demand for families of items. Oper.
[671 TRIGG, D. W. and LEACH, G. A. (1967). Exponential smoothing with adaptive

Quart., Res. Oper. response rate. 18,53-64.
[68] VAN DOBBEN DE BRUYN, C. S. (1968). Cumulative

Practice. London : Griffin.
Sum Tests: Theory and
[69] WALD, A. (1947).SequentialAnalysis. John Wiley & Sons. New York.

[70] WEST, M. (1982). Aspects of recursive Bayesian Estimation. Ph. D. Thesis, Univ. of Nottingham. [71] WHITTLE, P. (1965). Recursive relations for predictors of non-stationary processes. J. R. Statist. Soc. B, 27,523-532. [72) WHITTLE, P. (1969). A view of stochastic control theory. J. R. Statist. 320-334.
[731 WINTERS, P. R. (1960). Forecasting sales by exponentially weighted moving
Soc. A, 132,
108'
Sci., Man. 6,324. averages., [74] WOODWARD, R. H. and GOLDSMITH, P. L. (1964). Cumulative Sum Techniques.
ICI Monograph No. 3, Oliver & Boyd. Edinburgh.

[75] WOLD, Wiksell, [76] YOUNG, H. (1954). A Study Stockholm in the Analysis 1938). approaches to time series analysis. But. Inst. of Stationary Time Series. Almqrist &
( first edition
P. C. (1971).
Recursive 10.209-224.
Applications. and -Maths. [77] ZELLNER,
A. (1971). An Introduction
to Bayesian Inference in Economics. Wiley.
f, y

DML 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DML 2

Uploaded by

Copyright:

Available Formats

University of Warwick institutional repository: http://go.warwick.ac.uk/wrap A Thesis Submitted for the Degree of PhD at the University of Warwick http://go.warwick.ac.

JAMAL RASUL MOHAMMAD

Statistics, t5epartmenF of Warwick, University of 7AL CV4 Coventry .,.,

UNIVERSITYOF WARWICK MAY 1984

2- CHAPTER TWO : DISCOUNT WEIGHTED ESTIMATION 2.1 Introduction

2.2.1 The model

4- CHAPTER FOUR : NORMAL DISCOUNT BAYESIAN'MODELS

4.3.1 The model

4.4.1 The modified NDBM

4.4.2 Extended NDBLI's

5.3 Non Bayesian methods :A short review

6- CHAPTER SIX: LIMITING RESULTS

7.3 The backward CUSUM

8- CHAPTER EIGHT : APPLICATIONS

8.2.1 Simulation 8.2.2 Intervention

8.2.3 Multiprcess models - The artificial data

8.3.1 The data

8.4 The road death series 8.4.1 The data

9- CHAPTER NINE : DISCUSSIONAND FURTHER RESEARCH 10-APPENDIX

I would like to acknowledge my great indebtedness and convey an expression of

also to the members of staff and my of Warwick for many valuable

fellow students at the Department discussions and the Computer

Unit for their helpful assistance and facilities.

I would like to thank the University of Sulaimaniyah ( Salahuddin at

and sequential estimation. to achieve parametric

conceptual Dynamic drawbacks

Linear Model (DLM) specification involve ambiguity Discount

class of Normal difficulties.

Bayesian Models learning

Unlike the DLM's, Normal of DLM's Weighted

Bayesian Models (NWBM)

Other important case. special a as These are particularly according

subclasses of Extended and Modified useful in modelling discontinuities

introduced. also are systems which

to the principle are given.

for a long time and is currently in the majority often

of real life problems, information desired detect it is to and to facilitate control,

some index, characteristics estimates engineering In the past,

reduce noise and obtain quality control

The areas of economics, See Whittle

are full of such examples.

(1969), Astrom (1970), Young(1974).

processes. The most popular

one of these is called Social Models. behaves. Social or

which govern the way the environment

Statistical Models be specified to otherwise. unless meant

was used to fit polynomial, Stuart and Ord (1983). of computers,

See also Anderson (1977) for further

the most widely Weighted Moving into

50's were the Exponential

and Holts growth method, Exponential in Chapter

seasonal, model -which ; later embodied in the computer

Regression,, (EWR), stimulated

These models are reviewed

ARIMA(p, d,q) is defined in the 'notation of Box and Jenkins by:

where B is the backward shift operator, B y1 = y&_1,and 44D

square error criterion, resulting make

is data required. of past

models are not robust. in the form

demand stationarity subjective information

or derived stationarity difficult. For

the estimates. all can ruin and sudden changes