COVIDcast An Ecosystem For COVID-19 Tracking and Forecasting

SML406
COVIDcast: An ecosystem for

COVID-19 tracking and forecasting
Ryan Tibshirani
Statistics & Machine Learning, Carnegie Mellon University
Amazon Scholar, Amazon Web Services
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Access the slides from this presentation
cmu-delphi.github.io/covidcast/talks/ml-
Interactive summit/talk.html
HTML version
Download from the Resources tab for this

session in the event platform
PDF
Delphi: Then and now
• Delphi formed in 2012 to “develop the theory and practice of epidemic forecasting”
• Participated in annual CDC flu forecasting challenges since 2013, earned top place in several
• Awarded CDC’s National Center of Excellence for flu forecasting in 2019
• March 2020: pivoted to focus on supporting the US COVID-19 response, launched COVIDcast
• We’ve been working on the “full pipeline” but are focused now more than ever on data
COVIDcast indicators
COVIDcast ecosystem
Google QUIDEL
Outline for this talk
I can’t cover everything! I will focus on our API and give some basic data demos
(demos are reproducible, with all code included) and then reflect on a few
lessons learned
Outline
• Part 1: API and data demos
• Part 2: Lessons learned
Part 1: API and data demos
COVIDcast API
The COVIDcast API is based on HTTP GET queries and returns data in
JSON or CSV format
API: cmu-delphi.github.io/delphi-epidata/api/covidcast.html
Dashboard: https://delphi.cmu.edu/covidcast/
R and Python packages
We also provide R and Python packages for API access
Highlights
• Easy API querying
 Same specification structure (many default parameters)
 Full support for data revisions (as of, issues, lag)
• Basic signal processing
 Correlations sliced by location or by time
 Data wrangling: preparing signals for analysis
• Plotting functionality R: cmu-delphi.github.io/covidcast/covidcastR/
 Choropleth and bubble maps, time series plots Python: cmu-delphi.github.io/covidcast/covidcast-py/html

Have an idea? File an issue or contribute:
https://github.com/cmu-delphi/covidcast/
List of indicators
Example: Deaths
How many COVID-19 deaths have been reported per day, in my state, since March 1?
deaths = covidcast_signal(
data_source = "usa-facts",
signal = "deaths_7dav_incidence_num",
start_day = "2020-03-01",
end_day = "2021-04-28",
geo_type = "state",
geo_values = "pa")
plot(deaths, plot_type = "line",

title = "New COVID-19 deaths in PA
(7-day average)")
Example: Hospitalizations
What percentage of daily hospital admissions are due to COVID-19 in PA, NY, TX?
Example: Total cases
What does the current COVID-19 cumulative case rate look like, nationwide?
Example: Doctor visits
How do some cities compare in terms of doctor’s visits due to COVID-like illness?
Example: Symptoms
How do my county and my friend’s county compare in terms of COVID symptoms?
Example: Mask use
How do some states compare in terms of self-reported mask usage?
Example: Vaccines
How about vaccine uptake (self-reported), and willingness to take vaccine (if not yet vaccinated)?
Optional parameters
By default the API returns the most recent data for each time_value
We also provide access to all previous versions of the data using

optional parameters
Data revisions
Why would we need this? Because data sources are subject to revisions
• Case and death counts frequently corrected/adjusted by authorities
• Medical claims data can take weeks to be submitted and/or processed
• Testing/lab data can be backlogged for a variety of reasons
This presents a challenge to modelers: e.g., we have to learn how to forecast

on the data we have at the time, not updates that would arrive later; to
accommodate, we log revisions even when the original data source does not!
Example: Backfill in doctor visits
Ongoing pandemic survey
Through a recruitment partnership with This is the largest non-census research survey
Facebook, we survey about 50,000 people ever conducted
daily (and over 20 million since the
pandemic began in April) in the US Raw survey response data is available to
researchers under a data use agreement:
Topics include https://dataforgood.fb.com/docs/covid-19-
symptom-survey-request-for-data-access/
• COVID symptoms
• COVID testing A parallel international effort by the
University of Maryland reaches 100+ countries
• Mental health in 55 languages
• Social contacts and behavior
• Demographics
Survey dashboard: https://delphi.cmu.edu/covidcast/survey-results/
Part 2: Lessons learned
Lessons and reflections
Lessons learned from the past year related to statistical modeling and machine
learning, broken down by three areas
A. Forecasting
B. Nowcasting
C. Risk-taking
Forecasting in a pandemic is hard
The COVID-19 Forecast Hub collects short-term forecasts of incident COVID-19 cases,
hospitalizations, and deaths; these are made by 50+ groups of “citizen scientists” and power the
CDC’s official communications on COVID-19 forecasting
This is not an easy problem
• Signal-to-noise is generally quite low (all we have is a small set of correlated examples)
• Nonstationarity is generally quite pronounced (the disease and the way we’re measuring it)
• Data problems are a constant struggle (delays, revisions, anomalies, changes in definitions)
All of this – plus an additional model-level nonstationarity – carries over to the task of building an
ensemble model!
Forecasting in a
pandemic is hard
Only a small handful of models
consistently outperform the baseline
(essentially flat-line forecaster)
For example, from Cramer et al. (2021)
Forecasting in a pandemic is hard
LESSONS/REFLECTIONS
• Simple, robust models (mechanistic or statistical) tend to perform the best

• We (as a community) missed every surge, didn’t clearly pass the “eyeball test”
• Continual, direct engagement with the end users of forecasts is critical
Nowcasting is an emerging candidate for MVP
Nowcasting: estimating the value of a signal that will only be fully observed at a
later date; current data is partial/noisy but progressively improves as time passes
Example: suppose we want to use medical insurance claims to estimate how many
people have some disease on some day (in some location)
• Claims are routinely submitted and processed late
• Holidays can be particularly troublesome
• Finalized values will only be available months later
Meanwhile, for COVID-19 it’s even more complicated
• We’d like to know how many infected individuals are currently present in
a population
• We’d settle for how many symptomatic infections are currently present
• We’d even settle for counting current symptomatic infections that eventually
get tested
• Instead, we get how many confirmed symptomatic infections happened to be
reported that day
Even when settling for the third case, we would be nowcasting a latent variable
(never observed)
LESSONS/REFLECTIONS
• We need more people to work on nowcasting; it is extremely relevant

• Compared to forecasting, it is still technically quite underdeveloped
• Good answers could redefine ground truth (i.e., forecasting targets)
While time scales may change, nowcasting is not going away as a central problem
in public health
Taking risks (when you can afford to do so)
• The beginning of the pandemic created a clear pull for computational scientists:
fetch case and death data from JHU CSSE’s GitHub, learn about SIR modeling,
inject stochasticity, and start making forecasts
• We decided early on to swim against the stream; it’s not that this work wasn’t
important, but rather, we felt we could create greater value by working on the
data itself (to hopefully benefit many others)
• We wouldn’t/couldn’t have taken the risk if there weren’t so many strong
computational scientists who jumped in to work on forecasting
Taking risks (when you can afford to do so)
It can be hard to quantify the value of good data – we will be trying to do this for years to come
(not just us or our data; this is an important undertaking for the whole scientific community)
That said, we are starting to see (in retrospect) some encouraging results in problems where you
can quantify value, like forecasting and nowcasting
Thank you!

COVIDcast An Ecosystem For COVID-19 Tracking and Forecasting

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

COVIDcast An Ecosystem For COVID-19 Tracking and Forecasting

Uploaded by

Copyright:

Available Formats

SML406

COVIDcast: An ecosystem for

Download from the Resources tab for this

 Choropleth and bubble maps, time series plots Python: cmu-delphi.github.io/covidcast/covidcast-py/html

plot(deaths, plot_type = "line",

We also provide access to all previous versions of the data using

This presents a challenge to modelers: e.g., we have to learn how to forecast

Survey dashboard: https://delphi.cmu.edu/covidcast/survey-results/

• Simple, robust models (mechanistic or statistical) tend to perform the best

• We need more people to work on nowcasting; it is extremely relevant

You might also like