Improving Evidence for Policy Making in Infectious Disease
Control: A Role for Big Data?
Jacco Wallinga'?& Hans Bogaards'
* Centre for Infectious Disease Control
National Institute for Public Heath and the Environment (RIVM)
&
? Leiden University Medical Center
The Netherlands
http: //wew.rivm.nl/en/Topics/M/Modelling_infectious_diseases
Leiden University
C. Medical CenterWhat is big data about?
» not about the data
> not about big
» about making data actionable
» novel algorithms that scale well
» huge computational power
» large data sets available
Lazer et al. SciencLaws of data analysis still apply
Y
Y
’
’
data cleaning takes most of the
time
there is always a model lurking in
the background
association does not imply
causality, unless you randomized
treatment
validity of proxy measures should
be testedGoogle Flu Trends
Data available as of February 2006,
af
Data avaiable as of 3 March 2008
= ee
Data avaiable as of 31 March 2008
» uses online use of search terms Pet,
LU porcentage
as proxy for influenza-like illness (Data available as of 12 May 2006
5
» provides real time estimates eal —_—
» predictions are highly accurate at % a a 8 9 7 1 1
Week
time of publication
Figure 3| ILI percentages estimated by our model (black) and provided by
‘the CDC (red) in the mid-Atlantic region, showing data available at four
tage inthe mid-Atlantic region: similarly, on 3
March our model indicated thatthe peak ILI percentage had been reached
‘during week 8, with sharp declines in weeks 9 and 10. Both results were later
‘confirmed by CDC IL data.
Ginsberg et al. Nature, 2009,Google Flu Trends
» search terms are volatile proxy
measures for influenza-like illness
> realtime estimates are
outperformed by extrapolated
CDC surveillance estimates
> predictions have been off since
publication
Lazer etal. Science, 2015.
a
i" Sse «(SRR
ie
ie
Gr retina. i vsti te prec ne 212-205 pa a ote
‘eas 212 2012 yar 0 om 222 1 Spr 200 Trees
Ft te poe 0 al 208m Dl sao ls Lgl CD a ea
‘Pech etabsae Ca “Gog Cc oe Cat,
tages enor GF tina ede ck erally var abs Std Eup han CX
‘mabe rms) earn) seh are mae ne mame overs Sore
‘anatase AE eit oa ee OBO G GeO a 28
‘stn an Dt ltteer ay seta 83 eSPrediction of health care demand due to infectious diseases
’
’
prediction is possible
> see CDC influenze prediction
contest
prediction is very difficult
» see Ebola epidemic predictions
prediction requires different kinds
of data
the quality of prediction is
determined by the quality of these
data
» not by the amount
ee
HEEERELEGGLALADSEEEEE EET P EaInterpreting surveillance data
» potential benefits from ‘big data’
approaches to small high-quality
data from routine infectious
disease surveillance
» these data are under-utilized
» interactions between different
diseases
» effects of exogeneous drivers
» providing real-time estimates of
incidence (‘nowcasting’)
> even the most recent, incomplete
data can be used in real time
» ‘nowcasting’ is technically
possible
» but hardly usedvaluable high-quality data are lost
due to the neglect of paper
archives and change of software
programmes
historical data on case reports
historical data on interventions
historical data on vaccination
coverages
digitisation of existing archived
paper records would allow for
safe storage
and make valuable high-quality
data actionable with the ‘big data’
approach