You are on page 1of 35

SML406

COVIDcast: An ecosystem for


COVID-19 tracking and forecasting
Ryan Tibshirani
Statistics & Machine Learning, Carnegie Mellon University
Amazon Scholar, Amazon Web Services

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Access the slides from this presentation

cmu-delphi.github.io/covidcast/talks/ml-
Interactive summit/talk.html
HTML version

Download from the Resources tab for this


session in the event platform
PDF

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Delphi: Then and now
• Delphi formed in 2012 to “develop the theory and practice of epidemic forecasting”
• Participated in annual CDC flu forecasting challenges since 2013, earned top place in several
• Awarded CDC’s National Center of Excellence for flu forecasting in 2019
• March 2020: pivoted to focus on supporting the US COVID-19 response, launched COVIDcast
• We’ve been working on the “full pipeline” but are focused now more than ever on data

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
COVIDcast indicators

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
COVIDcast ecosystem

Google QUIDEL

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Outline for this talk
I can’t cover everything! I will focus on our API and give some basic data demos
(demos are reproducible, with all code included) and then reflect on a few
lessons learned

Outline
• Part 1: API and data demos
• Part 2: Lessons learned

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Part 1: API and data demos

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
COVIDcast API

The COVIDcast API is based on HTTP GET queries and returns data in
JSON or CSV format

API: cmu-delphi.github.io/delphi-epidata/api/covidcast.html
Dashboard: https://delphi.cmu.edu/covidcast/

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
R and Python packages
We also provide R and Python packages for API access

Highlights
• Easy API querying
 Same specification structure (many default parameters)
 Full support for data revisions (as of, issues, lag)
• Basic signal processing
 Correlations sliced by location or by time
 Data wrangling: preparing signals for analysis
• Plotting functionality R: cmu-delphi.github.io/covidcast/covidcastR/

 Choropleth and bubble maps, time series plots Python: cmu-delphi.github.io/covidcast/covidcast-py/html


Have an idea? File an issue or contribute:
https://github.com/cmu-delphi/covidcast/

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
List of indicators

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: Deaths
How many COVID-19 deaths have been reported per day, in my state, since March 1?

deaths = covidcast_signal(
data_source = "usa-facts",
signal = "deaths_7dav_incidence_num",
start_day = "2020-03-01",
end_day = "2021-04-28",
geo_type = "state",
geo_values = "pa")

plot(deaths, plot_type = "line",


title = "New COVID-19 deaths in PA
(7-day average)")

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: Hospitalizations
What percentage of daily hospital admissions are due to COVID-19 in PA, NY, TX?

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: Total cases
What does the current COVID-19 cumulative case rate look like, nationwide?

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: Doctor visits
How do some cities compare in terms of doctor’s visits due to COVID-like illness?

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: Symptoms
How do my county and my friend’s county compare in terms of COVID symptoms?

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: Mask use
How do some states compare in terms of self-reported mask usage?

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: Vaccines
How about vaccine uptake (self-reported), and willingness to take vaccine (if not yet vaccinated)?

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Optional parameters
By default the API returns the most recent data for each time_value

We also provide access to all previous versions of the data using


optional parameters

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data revisions
Why would we need this? Because data sources are subject to revisions
• Case and death counts frequently corrected/adjusted by authorities
• Medical claims data can take weeks to be submitted and/or processed
• Testing/lab data can be backlogged for a variety of reasons

This presents a challenge to modelers: e.g., we have to learn how to forecast


on the data we have at the time, not updates that would arrive later; to
accommodate, we log revisions even when the original data source does not!

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: Backfill in doctor visits

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: Backfill in doctor visits

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: Backfill in doctor visits

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ongoing pandemic survey
Through a recruitment partnership with This is the largest non-census research survey
Facebook, we survey about 50,000 people ever conducted
daily (and over 20 million since the
pandemic began in April) in the US Raw survey response data is available to
researchers under a data use agreement:
Topics include https://dataforgood.fb.com/docs/covid-19-
symptom-survey-request-for-data-access/
• COVID symptoms
• COVID testing A parallel international effort by the
University of Maryland reaches 100+ countries
• Mental health in 55 languages
• Social contacts and behavior
• Demographics

Survey dashboard: https://delphi.cmu.edu/covidcast/survey-results/

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Part 2: Lessons learned

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Lessons and reflections
Lessons learned from the past year related to statistical modeling and machine
learning, broken down by three areas
A. Forecasting
B. Nowcasting
C. Risk-taking

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Forecasting in a pandemic is hard
The COVID-19 Forecast Hub collects short-term forecasts of incident COVID-19 cases,
hospitalizations, and deaths; these are made by 50+ groups of “citizen scientists” and power the
CDC’s official communications on COVID-19 forecasting
This is not an easy problem
• Signal-to-noise is generally quite low (all we have is a small set of correlated examples)
• Nonstationarity is generally quite pronounced (the disease and the way we’re measuring it)
• Data problems are a constant struggle (delays, revisions, anomalies, changes in definitions)
All of this – plus an additional model-level nonstationarity – carries over to the task of building an
ensemble model!

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Forecasting in a
pandemic is hard
Only a small handful of models
consistently outperform the baseline
(essentially flat-line forecaster)
For example, from Cramer et al. (2021)

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Forecasting in a pandemic is hard
LESSONS/REFLECTIONS

• Simple, robust models (mechanistic or statistical) tend to perform the best


• We (as a community) missed every surge, didn’t clearly pass the “eyeball test”
• Continual, direct engagement with the end users of forecasts is critical

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Nowcasting is an emerging candidate for MVP
Nowcasting: estimating the value of a signal that will only be fully observed at a
later date; current data is partial/noisy but progressively improves as time passes
Example: suppose we want to use medical insurance claims to estimate how many
people have some disease on some day (in some location)
• Claims are routinely submitted and processed late
• Holidays can be particularly troublesome
• Finalized values will only be available months later

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Nowcasting is an emerging candidate for MVP
Meanwhile, for COVID-19 it’s even more complicated
• We’d like to know how many infected individuals are currently present in
a population
• We’d settle for how many symptomatic infections are currently present
• We’d even settle for counting current symptomatic infections that eventually
get tested
• Instead, we get how many confirmed symptomatic infections happened to be
reported that day
Even when settling for the third case, we would be nowcasting a latent variable
(never observed)

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Nowcasting is an emerging candidate for MVP

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Nowcasting is an emerging candidate for MVP
LESSONS/REFLECTIONS

• We need more people to work on nowcasting; it is extremely relevant


• Compared to forecasting, it is still technically quite underdeveloped
• Good answers could redefine ground truth (i.e., forecasting targets)

While time scales may change, nowcasting is not going away as a central problem
in public health

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Taking risks (when you can afford to do so)
• The beginning of the pandemic created a clear pull for computational scientists:
fetch case and death data from JHU CSSE’s GitHub, learn about SIR modeling,
inject stochasticity, and start making forecasts
• We decided early on to swim against the stream; it’s not that this work wasn’t
important, but rather, we felt we could create greater value by working on the
data itself (to hopefully benefit many others)
• We wouldn’t/couldn’t have taken the risk if there weren’t so many strong
computational scientists who jumped in to work on forecasting

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Taking risks (when you can afford to do so)
It can be hard to quantify the value of good data – we will be trying to do this for years to come
(not just us or our data; this is an important undertaking for the whole scientific community)
That said, we are starting to see (in retrospect) some encouraging results in problems where you
can quantify value, like forecasting and nowcasting

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!

© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.

You might also like