You are on page 1of 12

Bayesian Analysis of Autoregressive Time


Series with Change Points
Maria Maddalena Barbieri
Università di Roma “La Sapienza”

Caterina Conigliani
Università Roma Tre

Abstract

The paper deals with the identification of a stationary autoregressive model for a
time series and the contemporary detection of a change in its mean. We adopt the
Bayesian approch with weak prior information about the parameters of the models
under comparison and an exact form of the likelihood function. When necessary, we
resort to fractional Bayes factor to choose between models, and to importance sampling
to solve computational issues.

1 Introduction
The class of autoregressive models is a rather general set of models largely used to represent
stationary time series. However, time series often present change points in their dynamic
structure, which may have a serious impact on the analysis and lead to misleading conclu-
sions. A change point, which is generally the effect of an external event on the phenomenon
of interest, may be represented by a change in the structure of the model or simply by a
change of the value of some of the parameters.

Address for correspondence: Maria Maddalena Barbieri, Dipartimento di Studi Geoeconomici, Statistici,
Storici per l’Analisi Regionale, Università di Roma “La Sapienza”, via del Castro Laurenziano, 9 - 00161
Roma, e-mail: marilena@pow2.sta.uniroma1.it
Research supported by CNR and MURST (Italy)

1
Various procedures have been proposed in the literature to detect different kinds of change
points in a time series. See, for example, Tsay (1988), Maddala and Kim (1996), Atkinson,
Koopman and Shephard (1997), Le, Martin and Raftery (1996) and the references therein.
In particular, the detection of a change in the expected value of a process, whose behaviour
before and after the change follows a stationary autoregressive moving average model, is
considered in Tsay (1988) and Chen and Tiao (1990). To find the time where a possible
change has been occurred, they propose to calculate the likelihood ratio test statistics λt
over the observation interval and then to compare their maximum, λmax , with a specified
critical value C. The suggested value of the constant C is based on simulation results which
provide empirical percentiles of the maximum of λt . The time point of the change is the one
corresponding to λmax . A Bayesian approach to the problem has been adopted by Albert and
Chib (1993) and McCulloch and Tsay (1993, 1994). They specify proper prior distributions
for the parameters of the models and use Markov chain Monte Carlo methods to carry
out the computations. In all those papers the orders of the autoregressive moving average
polynomials are consider as fixed. The identification step is performed on the original series,
potentially affected by a change in the mean, whose presence may result in a misidentification
of the model.
The problem considered in this paper is the identification of an autoregressive model and
the contemporary detection of a possible level shift, i.e. a change in the value of the expected
value. We take into account the stationarity constraints on the autoregressive parameters,
and use an exact form of the likelihood function. Moreover, we adopt the Bayesian approch
and assume that the prior information about the parameters of the various models under
comparison is weak.
Calculation of a suitable Bayes factor is required for Bayesian model comparison. How-
ever it is well known that difficulties arise when using the Bayes factor if the prior infor-
mation is vague, and that several alternative solutions have been introduced to address this
problem (Kass and Raftery, 1995). Here, when necessary, we adopt the fractional Bayes
factor approach (O’Hagan, 1995), already considered in Barbieri and Battaglia (1995) for
the detection of outliers in time series. Notice that the structure of the models that we are
considering does not allow to derive analytically the integrals required to evaluate Bayes
factors and fractional Bayes factors; however, numerical appoximations can be obtained by
applying importance sampling.
The procedure proposed may be used also when the prior distributions of the parame-
ters are proper; the comparison between the models may be performed using the standard
definition of Bayes factor. Moreover, the methodology may be easily generalised to other

2
frameworks: for instance to detect changes of the variance, of the autoregressive parameters,
of a deterministic trend, or the emergence of an unit root in the autoregressive polynomial.
Note that the problem of detecting a change point in autoregressive time series under the
assumption of vague prior information for the parameters within each model has been already
considered in Booth and Smith(1982). Their solution uses the device introduced in Spiegel-
halter and Smith (1982), based on the idea of an imaginary training sample. They solve the
computational problems in finding the marginal densities of the data under each model by
relating the time series model to a regression problem and using suitable approximations.
The paper is organised as follows. In Section 2 we introduce the models and the nota-
tion, while the Bayes factor and the fractional Bayes factor are introduced in Section 3. A
procedure for the identification of an autoregressive model for a time series and the detection
of a level shift is formalized in Section 4, where the computational issues associated with it
are also discussed. An example with real data is presented in Section 5.

2 Models and notation


Suppose we observe the time series {y1 , . . . , yn }. As a possible generating stochastic process
{Yt ; t ∈ Z}, we consider the autoregressive model of order p (AR(p))

φ(B)(Yt − µ) = ǫt (1)

where IE(Yt ) = µ, φ(B) = 1 − φ1 B − · · · − φp B p , B is the usual backshift operator such that


B k Yt = Yt−k and {ǫt } is a Gaussian white noise process with variance σ 2 .
As an alternative, consider the model with a change in the mean at time m:



 φ(B)(Yt − µ1 ) = ǫt t = p + 1, . . . , m

φ(B)(Yt − µ1 δt−m − µ2 (1 − δt−m )) = ǫt t = m + 1, . . . , m + p (2)


φ(B)(Yt − µ2 ) = ǫt t = m + p + 1, . . . , n

where µ1 = IE(Yt ) when t ≤ m, µ2 = IE(Yt ) when t > m, δt = 1 if t ≤ 0 and δt = 0 otherwise.


Let M0,p be the autoregressive model (1) and Mm,p be model (2). We assume that in both
cases the roots of the autoregressive polynomial are outside the unit circle, i.e. the parameter
vector φ(p) = (φ1 , ..., φp ) lies in the stationarity region Φp = {φ(p) : φ(z) = 0 implies | z |>
1}.
The exact likelihood function for the observed time series, under M0,p , can be written as
(Box, Jenkins and Reinsel, 1994; p.296-297):
  
p X p n
 1 X 
p(y|M0,p , µ, σ 2 , φ(p) ) = (2πσ 2 )−n/2 |Vp |1/2 exp − 2  a2t 
X
vts zt zs + (3)
 2σ 
t=1 s=1 t=p+1

3
Pp
where y = (y1 , . . . , yn ), zt = (yt − µ) (t = 1, . . . , n), at = zt − i=1 φi zt−i (t = p + 1, . . . , n),
2
and σ Vp−1 is the variance and covariance matrix of p consecutive variables of the process
{Yt }. The elements vij of Vp are independent from σ 2 , and for 1 ≤ i ≤ j ≤ p they are given
by (Galbraith and Galbraith, 1974):
i−1
X p+i−j
X
vij = φh φh+j−i − φh φh+j−i; (4)
h=0 h=p+1−j

the other elements of the matrix may be determined noting that Vp is symmetric with respect
to both its principal diagonals, that is: vji = vij = vp+1−i,p+1−j = vp+1−j,p+1−i.
Consider now model Mm,p with a level shift at time m (m > p); the exact likelihood
function for {yt } is given by:
  
p Xp n
 1 X 
p(y|Mm,p , µ1 , µ2 , σ 2 , φ(p) ) = (2πσ 2 )−n/2 |Vp |1/2 exp − 2  d2t  (5)
X
vts wt ws +
2σ t=1 s=1 t=p+1

where wt = (yt − µ1 ) for t = p + 1, . . . , m, wt = [yt − µ1 δt−m − µ2 (1 − δt−m )] for t =


Pp
m + 1, . . . , m + p, wt = (yt − µ2 ) for t = m + p + 1, . . . , n and dt = wt − i=1 φi wt−i ,

(t = p + 1, . . . , n).
Without loss in generality, we assume that the parameters of each model are a priori
independent, and we adopt the usual non informative prior distribution for µ, µ1 , µ2 and σ 2 :

π(µ) ∝ 1 (6)

π (µ1 ) ∝ 1
π (µ2 ) ∝ 1 (7)
1
π(σ 2 ) ∝ 2 . (8)
σ
We also assume that the prior distribution for the autoregressive parameters is uniform on
the stationarity region Φp , that is
1
π(φ(p) ) = IΦ (φ(p) ), (9)
volume (Φp ) p

where IΦp (·) denotes the indicator function of region Φp .

3 Bayes factors and default Bayes factors


The problem of identifying the autoregressive model which may have generated the series
{y1 , . . . , yn }, and of detecting a possible level shift, can now be seen as the one of choosing
a model in the class M = {Mk,p , p ∈ P, k ∈ {0} ∪ N }, where P denotes the set of possible
autoregressive orders and N the set of time points when a change in the mean may occur.

4
The choice of the model that better fit the data is a crucial problem in the various
approches to statistical inference; in the Bayesian setting the key tool for model comparison
is the Bayes factor (Kass and Raftery, 1995).
Suppose we consider two models in M; to simplify the notation, we shall indicate them
with Mi and Mj . Let θ s ∈ Θs , πs (θ s ) and p(y|Ms , θ s ) be the parameter vector, its prior
distribution and the density of the data y under model Ms , respectively (s = i, j). The
Bayes factor Bij for comparing Mi with Mj is defined by
R
p (y |Mi ) p (y|Mi , θ i ) πi (θ i )dθi
Bij = = R Θi , (10)
p (y |Mj ) Θj p (y|Mj , θ j ) πj (θ j )dθ j

where p (y |Ms ) is the marginal density of y under Ms .


The Bayes factor is usually very sensitive to the choice of the prior distribution on the
parameters. In particular, the usual definition (10) cannot be applied in general when con-
ventional non-informative prior distributions are adopted, since they are typically improper.
Several alternative definitions of Bayes factor have been introduced to overcome this
problem; among these, the intrinsic Bayes factor (Berger and Pericchi, 1996) and the frac-
tional Bayes factor (O’Hagan, 1995) perform well in a number of situations. In both methods
the idea is to use part of the observations, a training sample, to obtain a proper posterior
distribution for the parameters, and the remainder of the data to compute a Bayes factor;
this is usually referred to as a partial Bayes factor. Berger and Pericchi recommend the
use of proper minimal training samples, and by considering suitable averages of the partial
Bayes factors over the set of all these training samples, proposed different versions of the
intrinsic Bayes factor (Berger and Pericchi, 1996; 1997; 1998).
A different solution, that does not require the evaluation of all the partial Bayes fac-
tors corresponding to the possible training samples was suggested by O’Hagan (1995), who
introduced the fractional Bayes factor

p (y |Mi ) /pb (y |Mi )


Bijb = , (11)
p (y |Mj ) /pb (y |Mj )

where pb (y |Ms ) (s = i, j) is given by


Z
pb (y |Ms ) = [p (y |Ms , θ s )]b π (θ s ) dθ s .
Θs

A possible choice of b, the fraction used for training, is the dimension of the minimal training
sample over n, the number of osservations; here, following this guidance, we get b = 3/n.
Other suggestions, not explicitely considered in this paper, are given in O’Hagan (1995;
1997) and in Conigliani and O’Hagan (2000).

5
One thing is worth noting about the comparison among all different models in M. Even
if in general when we adopt improper prior distributions the Bayes factor (10) is not defined,
it can actually be adopted for the problem of choosing between two models Mm1 ,p and Mm2 ,p
with the same autoregressive order p and with a level shift in different times m1 and m2 ;
this is due to the fact that such models are non-nested, they have the same number of
parameters, and all the parameters have the same meaning. In all other cases, which involve
the comparison of nested models with improper priors on the parameters not in common,
(10) cannot be applied and we will resort to the fractional Bayes factor approach. We will
not consider the intrinsic Bayes factor here, since in this setting it is highly non-robust; this
is due to the fact that there are only two proper minimal training samples, namely the two
subsets of three successive observations containing the possible change point.
In the next section we will discuss the problems associated with the evaluation of (10)
and (11). We will consider in details the computation of pb (y |Mk,p ); the expression for
p (y |Mk,p ) can be derived simply by letting b = 1.

4 A procedure to compare models


The set M of possible models that may have generated the series of interest is often rather
large. In fact, even if in real problems the autoregressive order is generally small, the set N
of the possible level shifts is usually large, since its dimension is related to the lenght of the
series. In order to minimize the number of comparisons, we suggest the following procedure,
which is also motivated by the fact that some comparisons may be done with the usual Bayes
factor (10), while for others we need to use the fractional Bayes factor (11).
First fix a value for the autoregressive order p, and assume that a level shift is present
at time mp unknown. We can then estimate mp by maximizing with respect to m (m =
p + 1, . . . , n) the following expression
 −1
n
X p(y|Mk,p)  p(y|Mm,p )
P ∗ (Mm,p |y) =  = Pn ,
k=p+1 p(y|Mm,p ) k=p+1 p(y|Mk,p )

that represents the posterior probability that the autoregressive model of order p has a level
shift in m, given that there is a level shift in the series, and that each time has the same prior
probability of being the one corresponding to the change point. Note that this corresponds
to choosing the best model on the basis of the usual Bayes factors (10), since

P ∗(Mi,p |y)
Bij = .
P ∗ (Mj,p |y)

6
After determining the set Np = {mp , p ∈ P}, we need to compare the models in the
class {Mk,p , p ∈ P, k ∈ {0} ∪ Np }; as we explained in the previous section, this can be done
by computing the fractional Bayes factor. More specifically, we suggest beginning with the
comparison, for every p, between the model without level change, M0,p , and the model with
a level change in mp , Mmp ,p ; by doing so we find the preferred model for each possible
autoregressive order p. The final choice is between all these models when p varies in P.

4.1 Expressions for marginal distributions


First consider the computation of pb (y |M0,p ). Using (3), (6), (8) and (9), when p = 0 we
have !
b bn − 1
p (y |M0,0 ) = Γ (nbbn )−1/2 (πR0,0 )−(bn−1)/2
2
where n n
1X
(yt − ȳ)2
X
R0,0 = and ȳ = yt ;
t=1 n t=1
if p > 0, we obtain
!
bn − 1 π −(bn−1)/2 b−bn/2
Z
b
|Vp |b/2 τ −1/2 R0,p dφ(p)
−(bn−1)/2
p (y |M0,p ) = Γ
2 volume (Φp ) Φp

where !2
p X
X p p
X
τ= vts + (n − p) 1 − φi ,
t=1 s=1 i=1
p X
p n
" p #2
X X X
R0,p = vts (yt − µ̃0,p ) (ys − µ̃0,p ) + (yt − µ̃0,p ) − φh (yt−h − µ̃0,p )
t=1 s=1 t=p+1 h=1

and  
Xp X
p p
X
! n
X p
X
µ̃0,p = τ −1  vts yt + 1− φi (yt − φi yt−i ) .
t=1 s=1 i=1 t=p+1 i=1

Consider now the evaluation of pb (y |Mm,p ); using (5), (7), (8) and (9), when p = 0 we
have !
b bn − 2
p (y |Mm,0 ) = Γ [m(n − m)bbn ]−1/2 (πRm,0 )−(bn−2)/2
2
with m  n
2  2
X (1) X (2)
Rm,0 = yt − µ̃m,0 + yt − µ̃m,0 ,
t=1 t=m+1
m n
(1) 1 X (2) 1 X
µ̃m,0 = yt and µ̃m,0 = yt .
m t=1 n − m t=m+1
When p > 0, integrating with respect to µ1 , µ2 and σ 2 leads to
!
bn − 2 π −(bn−2)/2 b−nb/2 Z
b
p (y |Mm,p ) = Γ |H|−1 |Vp |b/2 Rm,p
−(bn−2)/2
dφ(p) ,
2 volume (Φp ) Φp

7
where H is the 2 × 2 symmetric matrix with elements:
!2  2
p X
X p p
X p
X Xp
h11 = vts + (m − p) 1 − φi +  φj  ,
t=1 s=1 i=1 i=1 j=i

p X
X p i−1
X
h12 = φj φl ,
i=1 j=i l=0
!2  2
p
X p
X i−1
X
h22 = (n − m − p) 1 − φi +  φj  ,
i=1 i=1 j=0

and
p X
p   
vts yt − µ̃(1) ys − µ̃(1)
X
Rm,p = m,p m,p
t=1 s=1
m
" p #2
   
µ̃(1) µ̃(1)
X X
+ yt − m,p − φh yt−h − m,p
t=p+1 h=1
m+p p i 2
( )
  h
µ̃(2) µ̃(1) µ̃(2)
X X
+ yt − m,p − φh yt−h − m,p δt−m − m,p (1 − δt−m )
t=m+1 h=1
n p  2
" #
  
µ̃(2) µ̃(2)
X X
+ yt − m,p − φh yt−h − m,p ,
t=m+p+1 h=1

with  
µ̃(1)
m,p
  = H −1ξ
µ̃(2)
m,p

and ξ is the vector with elements:


p
X p
X p
X
! m
X p
X p
X
ξ1 = yt vts + 1 − φi et − em+i φj
t=1 s=1 i=1 t=p+1 i=1 j=i

p
X
! n
X p
X i−1
X
ξ2 = 1 − φi et − em+i φj .
i=1 t=m+p+1 i=1 j=0

4.2 Computational aspects


As we have seen in the previous section, when p > 0 it is not possible to solve analytically
the required integrals with respect to φ(p) . In order to evaluate pb (y|M0,p ) and pb (y|Mm,p )
we then need to obtain approximations of these integrals by means of some kind of numerical
method, for instance importance sampling.
Application of importance sampling in this setting requires generating a sufficient large
number N of random vectors from a suitable importance function on the set Φp . It is
well known, however, that while Φ1 is the interval (-1,1) and Φ2 is a triangle, when p > 2

8
the stationarity region Φp is much more complicated, and although its volume is known
(Piccolo, 1982), it is not possible to simulate φ(p) on it directly. The problem may be solved
reparametrising the autoregressive model in terms of partial autocorrelations (Barndorff-
Nielsen and Shou, 1973).
Formally, let {ρ1 , ..., ρp } be the partial autocorrelation function of the process. Consider
the quantities ζ k = (ζ1,k , ..., ζk,k ) for k = 1, ..., p, such that:

ζ1,1 = ρ1
ζi,k = ζi,k−1 − ρk ζk−i,k−1 i = 1, ..., k − 1
ζk,k = ρk k = 2, ..., p.

It follows that φ(p) = ζ p . Using such reparametrisation, the stationarity conditions on the
autoregressive parameters turn into the constraints |ρk | < 1, k = 1, 2, ..., p, while the uniform
distribution of φ(p) on Φp induces on each ρk a Beta ([0.5 (k + 1)] , [0.5k] + 1) distribution,
and the ρk are independent (Jones, 1987). In the notations above, [a] is the integer part of
a.

5 Example
As an application of the procedure described in previous section, consider the series of the
annual volume of Nile river at Assuan for the years 1871 to 1970 (Cobb, 1978), shown in
Figure 1.
1400
1200
1000
800
600

0 20 40 60 80 100

Figure 1: Series of the annual volume of Nile River at Assuan for the years 1871 to 1970
(unit of measure: cubic meters × 10e+8).

When applying our method, we find that for p = 0, the only value of P ∗ (Mm,0 |y) signif-
ically greater than zero is P ∗ (M28,0 |y), that is equal to 0.764. In the same way, when we fix

9
p = 1, the only values larger than 0.2 are P ∗ (M26,1 |y) = 0.218 and P ∗ (M28,1 |y) = 0.428.
b b
Now letting b = 3/n, with n = 100 here, we obtain BM 0,0 ,M28,0
= 0, BM 0,1 ,M28,1
= 0.00045
b
and BM 28,1 ,M28,0
= 0.02035, so that the preferred model for this series is an AR(0) with a
change in the mean at time t = 28, corresponding to year 1989. The same conclusion was
obtained by other authors; see, for instance Freeman (1986), Phillips and Smith (1996), Pole,
West and Harrison (1994).
The integrals required to evaluate the various Bayes factors when p > 0 were computed
applying importance sampling, with N = 5000 and with normal importance functions with
mean equal to the Durbin’s estimate of the partial autocorrelation of suitable lag.

References

Albert, J.H. and Chib, S. (1993) “Bayes inference via Gibbs sampling of autoregressive time
series subject to Markov mean and variance shift”, J. Business and Econ. Statist., 11,
1-15.

Atkinson, A.C., Koopman, S.J. and Shephard, N. (1997) “Detecting shocks: outliers and
breaks in time series”, J. Econometrics, 80, 387-422.

Barbieri, M.M. and Battaglia, F. (1995) “Outlier detection in time series: a decision theoretic
approach”, Quaderno del Dip. di Statistica, Probabilità and Stat. Appl., Università
”La Sapienza”, Roma, n.19.

Barndorff-Nielsen, O. and Schou, G. (1973) “On the parametrization of autoregressive models


by partial autocorrelation”, J. Multivar. Anal., 3, 408-419.

Berger, J.O. and Pericchi, L.R. (1996) “The intrinsic Bayes factor for model selection and
prediction”, J. Am. Statist. Assoc. 91, 109-122.

Berger, J.O. and Pericchi, L.R. (1997) “On criticisms and comparisons of default Bayes
factors for model selection and hypothesis testing”, in Proceedings of the Workshop on
Model Selection, Ed. W. Racugno, pp. 1-50, Bologna: Pitagora.

Berger, J.O. and Pericchi, L.R. (1998) “Accurate and stable Bayesian model selection: the
median intrinsic Bayes factor”, Sankhyā B, 60, 1-18.

Booth, N.B. and Smith, A.F.M. (1982) “A Bayesian approach to retrospective identification
of change-points”, J. Econometrics, 19, 7-22.

Box, G.E.P. Jenkins, G.M. and Reinsel, G.C. (1994) Time Series Analysis: Forecasting and
Control, 3rd ed., Holden Day, San Francisco.

10
Chen, C. and Tiao, G.C. (1990) “Random level-shift time series models, ARIMA approxi-
mations, and level-shift detection”, J. Bus. Econ. Statist., 8, 83-97.
Cobb, G.W. (1978) “The problem of the Nile: conditional solution to a change point prob-
lem”, Biometrika, 65, 243-251.
Conigliani, C. and O’Hagan, A. (2000) “Sensitivity of the fractional Bayes factor to prior
distributions”, Canadian Journal of Statistics, 28 (to appear).
Freeman, J.M. (1986) “An unknown change point and goodness of fit”, Statistician, 35,
335-344.
Galbraith, R.F. and Galbraith, J.I. (1974) “On the inverses of some patterned matrices
arising in the theory of stationary time series”, J. Appl. Prob., 11, 63-71.
Jones, M.C. (1987) “Randomly choosing parameters from the stationarity and invertibility
region of autoregressive-moving average models”, Appl. Statist., 36, 134-138.
Kass, R. and Raftery, A. (1995) “Bayes Factors”, J. Am. Statist. Assoc., 90, 773-795.
Le, N.D., Martin, R.D. and Raftery, A.E. (1996) “Modeling flat stretches, burnts, and
outliers in time series using mixture transition distribution models”, J. Amer. Statist.
Assoc., 91, 1504-1515.
Maddala, G.S. and Kim, I.-M. (1996) “Structural change and unit roots”, J. Statist. Plann.
and Inf., 49, 73-103.
McCulloch, R.E. and Tsay, R.S. (1994) “Bayesian inference and prediction for mean and
variance shifts in autoregressive time series”, J. Amer. Statist. Assoc., 88, 968-978.
McCulloch, R.E. and Tsay, R.S. (1994) “Statistical analysis of economic time series via
Markov switching models”, J. Time Ser. Anal., 15, 523-539.
O’Hagan, A. (1995) “Fractional Bayes factors for model comparison”, J. Roy. Statist. Soc.,
B 57, 99-138.
O’Hagan, A. (1997) “Properties of intrinsic and fractional Bayes factors”, Test, 6, 101-118.
Phillips, D.B. and Smith, A.F.M. (1996) “Bayesian model comparison via jump diffusions”,
in Markov Chain Monte Carlo in Practice, eds. W.R.Gilks, S. Richardson and D.J.
Spiegelhalter, Chapman and Hall, London, p. 215-239.
Piccolo, D. (1982) “The size of the stationarity and invertibility region of an autoregressive-
moving average process”, J. Time Ser. Anal., 3, 245-247.
Pole, A., West, M. and Harrison, J. (1994) Applied Bayesian Forecasting and Time Series
Analysis, Chapman and Hall, London.

11
Spiegelhalter, D.J. and Smith, A.F.M. (1982) “Bayes factors for linear and log-linear models
with vague prior information”, J. Roy. Statist. Soc., B 44, 377-387.

Tsay, R.S. (1988) “Outliers, level shift, and variance changes in time series”, J. Forecasting,
7, 1-20.

12

You might also like