You are on page 1of 36

Intervention analysis for low count time series

with applications in public health

D. Moriña1 , J. M. Leya-Moral2 and M. Feijoo-Cid2

1 BarcelonaGraduate School of Mathematics (BGSMath), Departament de


Matemàtiques, Universitat Autònoma de Barcelona (UAB)
2 Departament d’Infermeria, Universitat Autònoma de Barcelona (UAB)

Institut Català d’Oncologia


Gener 2019
Outline

1 Introduction

2 Model definition

3 Goodness of fit

4 Example: Simulation study

5 Example: Venereal lymphogranuloma and massive events in


Barcelona

6 Example: HPV vaccination and genital warts in Catalonia

7 Conclusions
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Introduction

 It is usual in many contexts such as public health or sociology


to design and conduct an intervention to change some phe-
nomenon behaviour
 When dealing with continuous valued time series or series with
large counts, intervention analysis may be used with this pur-
pose
 When the time point where the potential change occurs is un-
known, some authors have proposed the change-point analysis
 However, for count data there is not a clear analogous method-
ology, although there are recent efforts on extending the change-
point techniques to INAR models

3 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Introduction

 Most of these models are focused on real-time monitoring for


structural changes in the series while we are focused on tran-
sient or definitive changes in the new observed cases after an
intervention through a retrospective analysis
 INAR(k) models are defined by Xt = p1 ◦ Xt−1 + . . . + pk ◦
Xt−k + Wt ,
where p1 , . . . , pk are fixed parameters, 0 < p1 , . . . , pk < 1 and Wt
is assumed to follow a Poisson distribution with a fixed mean λ. In
addition, Xt−1 and Wt are assumed to be independent at any time
t.

4 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Introduction

The ◦ operator, called binomial thinning, is defined as


Xt−j
X
pj ◦ Xt−j = Yi ,
i=1

where Yi are independent identically distributed Bernoulli random


variables with probability of success equal to pj . That means that
pj ◦ xt−j is binomially distributed with number of sucessess equal
to xt−j provided that Xt−j = xt−j

5 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Model definition

For each intervention we are interested in, we define a dummy


variable It , which takes the value 1 if t is within an intervention
period or 0 otherwise. The proposed model is a variation of the
INAR(k) model

Xt = p1 ◦ Xt−1 + . . . + pk ◦ Xt−k + Wt (λ0 ),

where λ0 = λ + σ · It
Impact of the intervention
The impact of the intervention is estimated by σ

6 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Model definition

For each intervention we are interested in, we define a dummy


variable It , which takes the value 1 if t is within an intervention
period or 0 otherwise. The proposed model is a variation of the
INAR(k) model

Xt = p1 ◦ Xt−1 + . . . + pk ◦ Xt−k + Wt (λ0 ),

where λ0 = λ + σ · It
Impact of the intervention
The impact of the intervention is estimated by σ

6 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Parameter estimation

 The parameters of the model, θ = (λ, σ, p1 , . . . , pk ) can be


estimated by using the method of conditioned ML
 The likelihood function conditioned to the first k values
x1 , . . . , xk can be written as
L(X ; θ) ∼ P (xk+1 | x1 , . . . , xk ; θ) · · · P (xn | xn−1 , . . . , xn−k ; θ)

 The probability distribution corresponds to the sum of k


independent binomial variables with parameters
(xi−1 , p1 ), . . . , (xi−k , pk ) plus an independent Poisson variable
with mean λ0

7 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Parameter estimation

 This probability function can be written as


min(xi−1 ,xi ) !
xi−1
· p1j1 · (1 − p1 )xi−1 −j1 ·
X
P(xi | xi−1 , . . . , xi−k ) =
j1 =0
j1
min(xi−2 ,xi −j1 ) !
xi−2
· p2j2 · (1 − p2 )xi−2 −j2 · · ·
X
·
j2 =0
j2
min(xi−k ,xi −jk−1 ) !
xi−k
· pkjk · (1 − pk )xi−k −jk ·
X
·
jk =0
jk
e −(λ+σ·Ii ) · (λ + σ · Ii )xi −(j1 +...+jk )
·
(xi − (j1 + . . . + jk ))!

8 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Parameter estimation

 The main interest is to test the hypothesis


H0 : σ = 0
H1 : σ 6= 0,

 The null hypothesis can be tested using the standard error


corresponding to σ̂, obtained from the inverse of the Hessian
matrix

9 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Model properties

 The number of cases at time t can be estimated by the


expectation conditioned to the last known information, that
is, E (Xt |Xt−1 = xt−1 , Xt−2 = xt−2 , . . . , Xt−k = xt−k ) =
0
p1 xt−1 + p2 xt−2 + . . . + pk xt−k + λt
 It leads to the following estimator or predicted value of Xt :
+ λˆ
0
X̂ = pˆ x
t + pˆ x
1 t−1 + . . . + pˆ x
2 t−2 k t−k t

10 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Model properties

 This can be used to validate how the model fits the real data,
calculating the variance of X̂t
+ V (λˆt )+
2 2 2 0
V (X̂t ) = V (pˆ1 )xt−1 + V (pˆ2 )xt−2 + . . . + V (pˆk )xt−k
2 · x x COV (pˆ , p̂ ) + 2x COV (pˆ , λˆ )+
Pk−1 Pk 0
j=1 i=2 t−j t−i j i t−1 1 t
+2xt−2 COV (pˆ2 , λˆt ) + . . . + 2xt−k COV (pˆk , λˆt )
0 0

 In order to estimate V (X̂t ), the variances and covariances


needed can be replaced by their corresponding estimates
 From here, 100(1 − α)% approximate confidence intervals can
be constructed q
in a standard way by means of
Xt = X̂t ± zα/2 V̂ (X̂t )

11 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Goodness of fit

 The goodness of fit of the selected model can be assessed


through a discretised version of the Cox-Snell residuals
 These residuals can be computed from the estimated
conditional distribution
P̂(Yn = yn | (Y1 , . . . , Yn−1 , Yn+1 , . . . , YT )) =
P̂(Yn = yn , Y1 , . . . , Yn−1 , Yn+1 , . . . , YT )
,
P̂(Y1 , . . . , Yn−1 , Yn+1 , . . . , YT )
where T is the length of the series

12 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Goodness of fit

 The normal pseudo-residual segment [zn− , zn+ ] can be obtained


as
zn− = Φ−1 (P̂(Yn < yn | (Y1 , . . . , Yn−1 , Yn+1 , . . . , YT ))) = Φ−1 (un− )
zn+ = Φ−1 (P̂(Yn ≤ yn | (Y1 , . . . , Yn−1 , Yn+1 , . . . , YT ))) = Φ−1 (un+ ),
where Φ is the standard normal distribution function
 When yn = 0 the probability un − is 0 and zn− is not defined
 A possible solution is to take a standard normal quantile
corresponding to a probability close to 0 as zn− = −4

13 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Goodness of fit

 The mid-pseudo-residuals, defined by

un− + un+
znm = Φ−1 ( )
2
are commonly used in practice to check the validity of a fitted
model, as a white noise behavior is expected
 Other approaches might be used for assessing the goodness of
fit in this context, as the adapted version of the Probability
Integral Transform (PIT)

14 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Goodness of fit

15 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Simulation study

 Two INAR(1) processes were simulated, with parameters


p1 = 0.7 and λ = 5 for t = 1, . . . , 400 and p1 = 0.7 and
λ = 1 for t = 401, . . . , 500
 Similarly, an INAR(1) process consisting of 500 observations
was simulated with parameters p1 = 0.7 and λ = 5

16 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Simulation study

 A change is expected to be detected for t > 400 in the first


series, and the fitted model is
0
Xt = p1 ◦ Xt−1 + Wt (λ ),
0 0
where λ = λ for t ≤ 400 and λ = λ + σ for t > 400
 The estimate for σ is -5.89 with 95% confidence interval
(-6.82, -4.96), meaning that the intervention at time t = 400
had a significant impact on the series
 In the second case, as expected, an estimate of
σ̂ = 0.39 (−0.25, 1.03) is obtained, which can be interpreted
as no effect of the hypothetical intervention at time t = 400

17 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Simulation study

 A change is expected to be detected for t > 400 in the first


series, and the fitted model is
0
Xt = p1 ◦ Xt−1 + Wt (λ ),
0 0
where λ = λ for t ≤ 400 and λ = λ + σ for t > 400
 The estimate for σ is -5.89 with 95% confidence interval
(-6.82, -4.96), meaning that the intervention at time t = 400
had a significant impact on the series
 In the second case, as expected, an estimate of
σ̂ = 0.39 (−0.25, 1.03) is obtained, which can be interpreted
as no effect of the hypothetical intervention at time t = 400

17 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Simulation study

 Another approach would be to test for structural changes in


the series, following for example the methodology described in
Hudecová et al (2015). Detection of changes in INAR models.
Stochastic models, statistics and their applications, 122,
11–18
 In this case this methodology is able to detect a change at
t = 396
 No change is observed for the second series, as expected
 Our approach is retrospective and the potential intervention
time is known, while the real-time monitoring alternatives are
able to estimate when the change occurred

18 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Venereal lymphogranuloma and massive events in


Barcelona

 Venereal lymphogranuloma (LGV) is a STI caused by the


bacteria chlamydia trachomatis
 Due to the popularity that the so called circuit parties have
reached recently, especially among gay and bisexual men, the
impact of these massive events over the number of cases of
this and other STI is a public health concern
 The analysed data correspond to the number of LGV cases
registered in the Barcelona area from January 2007 to
December 2014
 No trend or seasonal behavior is observed

19 / 34
20 / 34
Number of LGV cases

0.0
2.5
5.0
7.5
Introduction

20
07−
01−
01

20
07−
04−
01
Barcelona
20
07−
08−
01

20
07−
11−
01

20
08−
03−
01

20
08−
07−
Model definition

01

20
08−
10−
01

20
09−
02−
01

20
09−
05−
01
GoF

20
09−
09−
01

20
10−
01−
01

20
10−
04−
01

20
10−
08−
01

20
10−
Example 1

11−
01

20
11−
03−
01

20
11−
07−
01

20
11−
10−
01

20
12−
02−
Example 2

01

20
12−
05−
01

20
12−
09−
01

20
13−
01−
01

20
13−
04−
01
Venereal lymphogranuloma and massive events in
Example 3

20
13−
08−
01

20
13−
11−
01
Conclusions
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Venereal lymphogranuloma and massive events in


Barcelona
 According to the ACF and PACF of the process a model of
order 1 seems to be appropriate, so the model
Xt = p1 ◦ Xt−1 + Wt (λ0 ),
where λ0 = λ + σ · I(t) is proposed
 In this case, the indicator variable takes the value 1 for all
periods of time after the celebration of any circuit festival in
Barcelona within the LGV incubation time (between 3 and 30
days)
 For instance, the first festival took place in early July 2008, so
the indicator takes the value 1 in August 2008. In 2009 there
were two circuit parties in Barcelona, the first in early June
and the second again in July, so the indicator takes the value
21 / 34
1 in July and August 2009
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Venereal lymphogranuloma and massive events in


Barcelona

 The estimates of the parameters are

Table: Maximum likelihood estimates

Parameter λ p1 σ
Estimate 1.21 0.50 1.51
St. Error 0.22 0.07 0.49

Impact of circuit parties


In particular, σ̂ = 1.51 (0.55; 2.47) so a significant effect of the
celebration of circuit parties over the number of new LGV cases is
detected
22 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Venereal lymphogranuloma and massive events in


Barcelona
 The AIC of this model is 706.0132, while the AIC
corresponding to the standard INAR(1) model is 716.7405, so
the proposed model is preferred
 To check the goodness of fit of the model the mid
pseudo-residuals approach can be used, and the results
support its suitability
ACF PACF
0.2 0.2

0.1 0.1

0.0 0.0

−0.1 −0.1

−0.2 −0.2

5 10 15 20 5 10 15 20
Lag Lag
23 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Venereal lymphogranuloma and massive events in


Barcelona

 Following the approach described in Hudecová et al (2015).


Detection of changes in INAR models. Stochastic models,
statistics and their applications, 122, 11–18 no significant
change in the parameters can be detected
 This fact highlights the difference between the two
approaches, as the focus here is in detecting changes in the
innovations due to an intervention or several interventions but
then possibly returning to the original pre-intervention stage
 Thus, the proposed methodology is able to detect
non-structural changes

24 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Venereal lymphogranuloma and massive events in


Barcelona

25 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

HPV vaccination impact on genital warts cases

 Vaccination against HPV was included in the systematic vacci-


nation calendar in Catalonia in 2008
 This study included all cases of genital warts (GW) diagnosed
between January 2009 and December 2016 within the epidemi-
ological surveillance system of the public health system in Cat-
alonia
 The potential impact of the HPV vaccination programme on
the reduction of GW incidence on women aged 15 to 19 was
studied considering the pre-vaccination period from 2009 to
2012 and post-vaccination period from 2013 to 2016
 The robustness of the model was tested in women over 60 years
old as a sensitivity analysis, in whom HPV vaccination could be
expected to have no effect
26 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

HPV vaccination impact on genital warts cases

27 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

HPV vaccination impact on genital warts cases

 In the pre-vaccination period (2009-2012), an average of 13.67


(SD=4.01) was observed, for an average of 10.81 (SD=4.81)
per month in the post-vaccination period (2013-2016)
 In this case, the intervention effect estimate is σ̂ = −2.34,
with a 95% confidence interval of (−3.77, −0.91), identifying
a significant reduction on the number of GW cases in women
attributable the introduction of the vaccine
 The model shows no significant decrease in the number of GW
cases in women over 60 (σ̂ = 0.53 with a confidence interval
of (−0.21, 1.27))
 The performance of the model was compared to that of a stan-
dard INAR model by means of the Akaike Information Criterion
(AIC), also in benefit of the model accounting for the effect of
HPV vaccination (4659.29 vs 4667.97)
28 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

HPV vaccination impact on genital warts cases

 This model is also capable to providing forecasts


 For instance, the methodology can be used to estimate the year
of massive control of the disease
 It can be seen that taking into account the effect of HPV vac-
cination, GW will be massively controlled in the female popu-
lation by 2041 (2036 - 2045).

29 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

HPV vaccination impact on genital warts cases

30 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

HPV vaccination impact on genital warts cases

 Under a non-vaccination scenario, the model shows the massive


control of the disease does not occur

31 / 34
Introduction Model definition GoF Example 1 Example 2 Example 3 Conclusions

Conclusions

 Measuring the impact of planned actions or unexpected events


over a time series is an issue that often arises in many fields
 When dealing with continuous time series, intervention
analysis has been widely used in the literature
 We propose a flexible model based on INAR discrete time
series models, which is able to suit a wide range of situations
focused on the evaluation of the impact of a planned
intervention, an unexpected event or the combination of one
or more actions or circumstances
 The proposed methodology is capable of providing forecasts,
allowing for mid term estimation of the intervention impact on
the counts, improving the performance of the usual INAR
models
32 / 34
With support from MDM-2014-0445 and Fundación Santander
Universidades.

You might also like