You are on page 1of 8
Improving Evidence for Policy Making in Infectious Disease Control: A Role for Big Data? Jacco Wallinga'?& Hans Bogaards' * Centre for Infectious Disease Control National Institute for Public Heath and the Environment (RIVM) & ? Leiden University Medical Center The Netherlands http: //wew.rivm.nl/en/Topics/M/Modelling_infectious_diseases Leiden University C. Medical Center What is big data about? » not about the data > not about big » about making data actionable » novel algorithms that scale well » huge computational power » large data sets available Lazer et al. Scienc Laws of data analysis still apply Y Y ’ ’ data cleaning takes most of the time there is always a model lurking in the background association does not imply causality, unless you randomized treatment validity of proxy measures should be tested Google Flu Trends Data available as of February 2006, af Data avaiable as of 3 March 2008 = ee Data avaiable as of 31 March 2008 » uses online use of search terms Pet, LU porcentage as proxy for influenza-like illness (Data available as of 12 May 2006 5 » provides real time estimates eal —_— » predictions are highly accurate at % a a 8 9 7 1 1 Week time of publication Figure 3| ILI percentages estimated by our model (black) and provided by ‘the CDC (red) in the mid-Atlantic region, showing data available at four tage inthe mid-Atlantic region: similarly, on 3 March our model indicated thatthe peak ILI percentage had been reached ‘during week 8, with sharp declines in weeks 9 and 10. Both results were later ‘confirmed by CDC IL data. Ginsberg et al. Nature, 2009, Google Flu Trends » search terms are volatile proxy measures for influenza-like illness > realtime estimates are outperformed by extrapolated CDC surveillance estimates > predictions have been off since publication Lazer etal. Science, 2015. a i" Sse «(SRR ie ie Gr retina. i vsti te prec ne 212-205 pa a ote ‘eas 212 2012 yar 0 om 222 1 Spr 200 Trees Ft te poe 0 al 208m Dl sao ls Lgl CD a ea ‘Pech etabsae Ca “Gog Cc oe Cat, tages enor GF tina ede ck erally var abs Std Eup han CX ‘mabe rms) earn) seh are mae ne mame overs Sore ‘anatase AE eit oa ee OBO G GeO a 28 ‘stn an Dt ltteer ay seta 83 eS Prediction of health care demand due to infectious diseases ’ ’ prediction is possible > see CDC influenze prediction contest prediction is very difficult » see Ebola epidemic predictions prediction requires different kinds of data the quality of prediction is determined by the quality of these data » not by the amount ee HEEERELEGGLALADSEEEEE EET P Ea Interpreting surveillance data » potential benefits from ‘big data’ approaches to small high-quality data from routine infectious disease surveillance » these data are under-utilized » interactions between different diseases » effects of exogeneous drivers » providing real-time estimates of incidence (‘nowcasting’) > even the most recent, incomplete data can be used in real time » ‘nowcasting’ is technically possible » but hardly used valuable high-quality data are lost due to the neglect of paper archives and change of software programmes historical data on case reports historical data on interventions historical data on vaccination coverages digitisation of existing archived paper records would allow for safe storage and make valuable high-quality data actionable with the ‘big data’ approach

You might also like