Professional Documents
Culture Documents
COVIDcast An Ecosystem For COVID-19 Tracking and Forecasting
COVIDcast An Ecosystem For COVID-19 Tracking and Forecasting
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Access the slides from this presentation
cmu-delphi.github.io/covidcast/talks/ml-
Interactive summit/talk.html
HTML version
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Delphi: Then and now
• Delphi formed in 2012 to “develop the theory and practice of epidemic forecasting”
• Participated in annual CDC flu forecasting challenges since 2013, earned top place in several
• Awarded CDC’s National Center of Excellence for flu forecasting in 2019
• March 2020: pivoted to focus on supporting the US COVID-19 response, launched COVIDcast
• We’ve been working on the “full pipeline” but are focused now more than ever on data
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
COVIDcast indicators
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
COVIDcast ecosystem
Google QUIDEL
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Outline for this talk
I can’t cover everything! I will focus on our API and give some basic data demos
(demos are reproducible, with all code included) and then reflect on a few
lessons learned
Outline
• Part 1: API and data demos
• Part 2: Lessons learned
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Part 1: API and data demos
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
COVIDcast API
The COVIDcast API is based on HTTP GET queries and returns data in
JSON or CSV format
API: cmu-delphi.github.io/delphi-epidata/api/covidcast.html
Dashboard: https://delphi.cmu.edu/covidcast/
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
R and Python packages
We also provide R and Python packages for API access
Highlights
• Easy API querying
Same specification structure (many default parameters)
Full support for data revisions (as of, issues, lag)
• Basic signal processing
Correlations sliced by location or by time
Data wrangling: preparing signals for analysis
• Plotting functionality R: cmu-delphi.github.io/covidcast/covidcastR/
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
List of indicators
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: Deaths
How many COVID-19 deaths have been reported per day, in my state, since March 1?
deaths = covidcast_signal(
data_source = "usa-facts",
signal = "deaths_7dav_incidence_num",
start_day = "2020-03-01",
end_day = "2021-04-28",
geo_type = "state",
geo_values = "pa")
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: Hospitalizations
What percentage of daily hospital admissions are due to COVID-19 in PA, NY, TX?
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: Total cases
What does the current COVID-19 cumulative case rate look like, nationwide?
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: Doctor visits
How do some cities compare in terms of doctor’s visits due to COVID-like illness?
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: Symptoms
How do my county and my friend’s county compare in terms of COVID symptoms?
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: Mask use
How do some states compare in terms of self-reported mask usage?
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: Vaccines
How about vaccine uptake (self-reported), and willingness to take vaccine (if not yet vaccinated)?
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Optional parameters
By default the API returns the most recent data for each time_value
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data revisions
Why would we need this? Because data sources are subject to revisions
• Case and death counts frequently corrected/adjusted by authorities
• Medical claims data can take weeks to be submitted and/or processed
• Testing/lab data can be backlogged for a variety of reasons
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: Backfill in doctor visits
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: Backfill in doctor visits
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: Backfill in doctor visits
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ongoing pandemic survey
Through a recruitment partnership with This is the largest non-census research survey
Facebook, we survey about 50,000 people ever conducted
daily (and over 20 million since the
pandemic began in April) in the US Raw survey response data is available to
researchers under a data use agreement:
Topics include https://dataforgood.fb.com/docs/covid-19-
symptom-survey-request-for-data-access/
• COVID symptoms
• COVID testing A parallel international effort by the
University of Maryland reaches 100+ countries
• Mental health in 55 languages
• Social contacts and behavior
• Demographics
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Part 2: Lessons learned
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Lessons and reflections
Lessons learned from the past year related to statistical modeling and machine
learning, broken down by three areas
A. Forecasting
B. Nowcasting
C. Risk-taking
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Forecasting in a pandemic is hard
The COVID-19 Forecast Hub collects short-term forecasts of incident COVID-19 cases,
hospitalizations, and deaths; these are made by 50+ groups of “citizen scientists” and power the
CDC’s official communications on COVID-19 forecasting
This is not an easy problem
• Signal-to-noise is generally quite low (all we have is a small set of correlated examples)
• Nonstationarity is generally quite pronounced (the disease and the way we’re measuring it)
• Data problems are a constant struggle (delays, revisions, anomalies, changes in definitions)
All of this – plus an additional model-level nonstationarity – carries over to the task of building an
ensemble model!
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Forecasting in a
pandemic is hard
Only a small handful of models
consistently outperform the baseline
(essentially flat-line forecaster)
For example, from Cramer et al. (2021)
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Forecasting in a pandemic is hard
LESSONS/REFLECTIONS
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Nowcasting is an emerging candidate for MVP
Nowcasting: estimating the value of a signal that will only be fully observed at a
later date; current data is partial/noisy but progressively improves as time passes
Example: suppose we want to use medical insurance claims to estimate how many
people have some disease on some day (in some location)
• Claims are routinely submitted and processed late
• Holidays can be particularly troublesome
• Finalized values will only be available months later
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Nowcasting is an emerging candidate for MVP
Meanwhile, for COVID-19 it’s even more complicated
• We’d like to know how many infected individuals are currently present in
a population
• We’d settle for how many symptomatic infections are currently present
• We’d even settle for counting current symptomatic infections that eventually
get tested
• Instead, we get how many confirmed symptomatic infections happened to be
reported that day
Even when settling for the third case, we would be nowcasting a latent variable
(never observed)
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Nowcasting is an emerging candidate for MVP
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Nowcasting is an emerging candidate for MVP
LESSONS/REFLECTIONS
While time scales may change, nowcasting is not going away as a central problem
in public health
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Taking risks (when you can afford to do so)
• The beginning of the pandemic created a clear pull for computational scientists:
fetch case and death data from JHU CSSE’s GitHub, learn about SIR modeling,
inject stochasticity, and start making forecasts
• We decided early on to swim against the stream; it’s not that this work wasn’t
important, but rather, we felt we could create greater value by working on the
data itself (to hopefully benefit many others)
• We wouldn’t/couldn’t have taken the risk if there weren’t so many strong
computational scientists who jumped in to work on forecasting
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Taking risks (when you can afford to do so)
It can be hard to quantify the value of good data – we will be trying to do this for years to come
(not just us or our data; this is an important undertaking for the whole scientific community)
That said, we are starting to see (in retrospect) some encouraging results in problems where you
can quantify value, like forecasting and nowcasting
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.