You are on page 1of 20

RESEARCH

BMJ: first published as 10.1136/bmj.m3731 on 20 October 2020. Downloaded from http://www.bmj.com/ on 18 November 2020 at UCL Library Services. Protected by copyright.
Living risk prediction algorithm (QCOVID) for risk of hospital
admission and mortality from coronavirus 19 in adults: national
derivation and validation cohort study
Ash K Clift,1 Carol A C Coupland,2 Ruth H Keogh,3 Karla Diaz-Ordaz,3 Elizabeth Williamson,3
Ewen M Harrison,4 Andrew Hayward,5 Harry Hemingway,6 Peter Horby,7 Nisha Mehta,8
Jonathan Benger,9 Kamlesh Khunti,10 David Spiegelhalter,11 Aziz Sheikh,4 Jonathan Valabhji,12
Ronan A Lyons,13 John Robson,14 Malcolm G Semple,15 Frank Kee,16 Peter Johnson,12
Susan Jebb,1 Tony Williams,17 Julia Hippisley-Cox1

For numbered affiliations see Abstract Results


end of the article. Objective 4384 deaths from covid-19 occurred in the derivation
Correspondence to: To derive and validate a risk prediction algorithm to cohort during follow-up and 1722 in the first
J Hippisley-Cox
julia.hippisley-cox@phc.ox.ac.uk estimate hospital admission and mortality outcomes validation cohort period and 621 in the second
(or @juliahcox on Twitter: from coronavirus disease 2019 (covid-19) in adults. validation cohort period. The final risk algorithms
ORCID 0000-0002-2479-7283) included age, ethnicity, deprivation, body mass index,
Design
Additional material is published and a range of comorbidities. The algorithm had good
online only. To view please visit Population based cohort study.
the journal online.
calibration in the first validation cohort. For deaths
Setting and participants
Cite this as: BMJ 2020;371:m3731
from covid-19 in men, it explained 73.1% (95%
QResearch database, comprising 1205 general
http://dx.doi.org/10.1136/bmj.m3731
confidence interval 71.9% to 74.3%) of the variation
practices in England with linkage to covid-19 test
in time to death (R2); the D statistic was 3.37 (95%
Accepted: 23 September 2020 results, Hospital Episode Statistics, and death registry
confidence interval 3.27 to 3.47), and Harrell’s C was
data. 6.08 million adults aged 19-100 years were
0.928 (0.919 to 0.938). Similar results were obtained
included in the derivation dataset and 2.17 million
for women, for both outcomes, and in both time
in the validation dataset. The derivation and first
periods. In the top 5% of patients with the highest
validation cohort period was 24 January 2020 to 30
predicted risks of death, the sensitivity for identifying
April 2020. The second temporal validation cohort
deaths within 97 days was 75.7%. People in the top
covered the period 1 May 2020 to 30 June 2020.
20% of predicted risk of death accounted for 94% of
Main outcome measures all deaths from covid-19.
The primary outcome was time to death from covid-19,
Conclusion
defined as death due to confirmed or suspected
The QCOVID population based risk algorithm
covid-19 as per the death certification or death
performed well, showing very high levels of
occurring in a person with confirmed severe acute
discrimination for deaths and hospital admissions
respiratory syndrome coronavirus 2 (SARS-CoV-2)
due to covid-19. The absolute risks presented,
infection in the period 24 January to 30 April 2020. The
however, will change over time in line with the
secondary outcome was time to hospital admission
prevailing SARS-C0V-2 infection rate and the extent
with confirmed SARS-CoV-2 infection. Models were
of social distancing measures in place, so they
fitted in the derivation cohort to derive risk equations
should be interpreted with caution. The model can be
using a range of predictor variables. Performance,
recalibrated for different time periods, however, and
including measures of discrimination and calibration,
has the potential to be dynamically updated as the
was evaluated in each validation time period.
pandemic evolves.

What is already known on this topic Introduction


The first cases of severe acute respiratory syndrome
Public policy measures and clinical risk assessment relevant to covid-19 can be
coronavirus 2 (SARS-CoV-2) infection were reported in
aided by rigorously developed and validated risk prediction models the UK on 24 January 2020, with the first death from
Published risk prediction models for covid-19 are subject to a high risk of bias coronavirus disease 2019 (covid-19) on 28 February
with optimistic reported performance, raising concern that these models may be 2020. As of 18 August 2020, more than 41 000 deaths
unreliable when applied in practice from covid-19 had occurred in the UK and more than
What this study adds 773 000 deaths globally.1 In the initial absence of any
vaccination or prophylactic or curative treatments, the
Novel clinical risk prediction models (QCOVID) have been developed and
UK government implemented social distancing and
evaluated to identify risks of short term severe outcomes due to covid-19 shielding measures to suppress the rate of infection
The risk models have excellent discrimination and are well calibrated; they will and protect vulnerable people, thereby trying to
be regularly updated as the absolute risks change over time minimise the risk of serious adverse outcomes.2 3
QCOVID has the potential to support public health policy by enabling shared Emerging evidence throughout the course of the
decision making between clinicians and patients, targeted recruitment for pandemic, initially from case series and then from
clinical trials, and prioritisation for vaccination cohorts of patients with confirmed SARS-CoV-2

the bmj | BMJ 2020;371:m3731 | doi: 10.1136/bmj.m3731 1


RESEARCH

BMJ: first published as 10.1136/bmj.m3731 on 20 October 2020. Downloaded from http://www.bmj.com/ on 18 November 2020 at UCL Library Services. Protected by copyright.
infection, has shown associations of age, sex, certain for the development of risk prediction algorithms
comorbidities, ethnicity, and obesity with adverse across the National Health Service (NHS) and for
covid-19 outcomes such as hospital admission or epidemiological research. By April 2020, 1205
death.4-11 The knowledge base regarding risk factors practices in England were contributing to QResearch,
for severe covid-19 is growing. As many countries are covering a population of 10.5 million patients. The
cautiously attempting to ease “lockdown” measures or database is linked at individual patient level, using
reintroduce measures if rates are rising, an opportunity a project specific pseudonymised NHS number, to
exists to develop more nuanced guidance based on hospital admissions data (including intensive care unit
predictive algorithms to inform risk management data), positive results from covid-19 real time reverse
decisions.12 Better knowledge of individuals’ risks transcriptase polymerase chain reaction tests held by
could also help to guide decisions on mitigating Public Health England, cancer registrations (including
occupational exposure and in targeting of vaccines to detailed radiotherapy and systemic chemotherapy
those most at risk. Although some prediction models records), the national covid-19 shielded patient list in
have been developed, a recent systematic review found England, and mortality records held by NHS Digital.
that they all have a high risk of bias and that their We identified a cohort of people aged 19-100 years
reported performance is optimistic.13 registered with participating general practices in
The use of primary care datasets with linkage to England on 24 January 2020. We excluded patients
registries such as death records, hospital admissions (approximately 0.1%) who did not have a valid NHS
data, and covid-19 testing results represents a novel number. Patients entered the cohort on 24 January
approach to clinical risk prediction modelling for 2020 (date of first confirmed case of covid-19 in the
covid-19. It provides accurately coded, individual level UK) and were followed up until they had the outcome
data for very large numbers of people representative of of interest or the end of the first study period (30 April
the national population. This approach draws on the 2020), which was the date up to which linked data
rich phenotyping of individuals with demographic, were available at the time of the derivation of the
medical, and pharmacological predictors to allow model, or the second time period (1 May 2020 until 30
robust statistical modelling and evaluation. Such June 2020) for the temporal cohort validation.
linked datasets have an established track record for the
development and evaluation of established clinical risk Outcomes
models, including those for cardiovascular disease, The primary outcome was time to death from covid-19
diabetes, and mortality.14-16 We aimed to develop (either in hospital or outside hospital), defined as
and validate population based prediction models confirmed or suspected death from covid-19 as per the
to estimate the risks of becoming infected with and death certification or death occurring in an individual
subsequently dying from covid-19 and of becoming with confirmed SARS-CoV-2 infection at any time in
infected and subsequently admitted to hospital with the period 24 January to 30 April 2020. The secondary
covid-19. The model we have developed is designed outcome was time to hospital admission with covid-19,
to be applied across the adult population so that it defined as an ICD-10 (International Classification of
can be used to enable risk stratification for public Diseases, 10th revision) code for either confirmed
health purposes in the event of a “second wave” of or suspected covid-19 or new hospital admission
the pandemic, to support shared management of risk associated with a confirmed SARS-CoV-2 infection in
and occupational exposure, and in early targeting of the study period.
vaccines to people most at risk. An ongoing companion
study will externally validate the models, using Predictor variables
datasets across all four nations of the UK, and will be We selected candidate predictor variables on the basis
reported separately. of the presence of existing clinical vulnerability group
criteria (table 1), associations with outcomes in other
Methods respiratory diseases, or hypothesised to be linked to
This study was commissioned by the Chief Medical adverse outcomes on clinical/biological plausibility
Officer for England on behalf of the UK Government, and likely to be available for implementation. They are
who asked the New and Emerging Respiratory Virus summarised in box 1 and supplementary box A. We
Threats Advisory Group (NERVTAG) to establish defined variables according to information recorded
whether a clinical risk prediction model for covid-19 using Read Codes in general practices’ electronic
could be developed in line with the emerging evidence. health records at the start of the study period. The
The protocol has been published.17 The study was exception to this was information on chemotherapy,
conducted in adherence with TRIPOD18 and RECORD19 radiotherapy, and transplants, which was based on
guidelines and with input from our patient advisory linked hospital records.
group.
QCOVID model development
Study design and data sources We randomly allocated 75% of practices to the
We did a cohort study of primary care patients using derivation dataset, which we used to develop the
the QResearch database (version 44). QResearch was models. We evaluated the models’ performance in
established in 2002 and has been extensively used the remaining 25% of practices (the validation set).

2 doi: 10.1136/bmj.m3731 | BMJ 2020;371:m3731 | the bmj


RESEARCH

BMJ: first published as 10.1136/bmj.m3731 on 20 October 2020. Downloaded from http://www.bmj.com/ on 18 November 2020 at UCL Library Services. Protected by copyright.
Table 1 | Original population level risk stratification method as exercised in UK*
Clinical risk level Advice Criteria Identification and inclusion
Clinically extremely Shielding (stay at home and High risk conditions as established by clinical Method 1: core group of patients identified by NHS
vulnerable (high risk) stringently avoid any expert group decisions based on available Digital and contacted centrally by NHS England
personal/face-to-face contact) evidence at time. Dynamic group of Method 2: additional patients in particular medical
approximately 2.2 million people in England sub-specialties not identifiable centrally
Method 3: additional patients identified by secondary
care specialists using clinical judgment
Method 4: additional patients identified in
primary care using clinical judgment
Clinically vulnerable Follow stringent social distancing Vulnerable group of approximately 16 million NA
(medium risk) measures people in England, based on eligibility for
influenza vaccination due to age ≥70, pregnancy,
or comorbidity
Remainder of population Follow mandatory social distancing NA NA
(low risk) measures and “lockdown” measures,
but no specific clinical advice
NA=not applicable.
*Shielding and stringent social distancing are both interventions designed to reduce risk of exposure to SARS-CoV-2, but classification of risk relates to risk of complicated or fatal disease if
infected and not risk of becoming infected.

All models were fitted separately in men and women. between body mass index and ethnicity and interactions
The outcomes of interest are subject to competing between predictor variables and age, focusing on
risks. For the primary outcome of death from covid-19, predictor variables that apply across the age range
the competing risk is death due to other causes. (asthma, epilepsy, diabetes, severe mental illness).
For the secondary outcome of hospital admission, We explored the use of penalised models (LASSO) to
the competing risk is death from any cause before screen variables for inclusion, but this retained all
admission. We fitted a sub-distribution hazard (Fine the predictor variables and most interaction terms.17
and Gray21) model for each outcome to account for In line with the protocol, we subsequently removed a
competing risks. Individuals who did not have the small number of variables with low numbers of events
outcome of interest were censored at the study end and adjusted (sub-distribution) hazard ratios close
date, including those who had a competing event. to 1 (as these will have minimal effect on predicted
For all predictor variables, we used the most recently risks) or with uncertain clinical credibility, defined
available value at the entry date (24 January 2020). We as counterintuitive results in light of the emerging
used second degree fractional polynomials to model literature. Lastly, we combined regression coefficients
non-linear relations for continuous variables (age, body from the final models with estimates of the baseline
mass index, and Townsend material deprivation score, cumulative incidence function evaluated at 97 days to
an area level score based on postcode20). Initially, we derive risk equations for each outcome. We used all the
fitted a complete case analysis by using a model within available data in the database.
the derivation data to derive the fractional polynomial
terms. For indicators of comorbidities and medication Model evaluation
use, we assumed the absence of recorded information We did all model evaluation using the validation
to mean absence of the factor in question. Data data with two separate periods of follow-up. The
were missing in four variables: ethnicity, Townsend first validation study period was the same as for the
score, body mass index, and smoking status. We derivation cohort: 24 January to 30 April 2020. The
used multiple imputation with chained equations second temporal validation covered the subsequent
under the missing at random assumption to replace period of 1 May 2020 to 30 June 2020. This was carried
missing values for these variables. For computational out with the same validation cohort except for exclusion
efficiency, we used a combined imputation model for of patients who died during 24 January to 30 April
both outcomes. The imputation model was fitted in the 2020. In the validation cohort, we fitted an imputation
derivation data and included predictor variables, the model to replace missing values for ethnicity, body
Nelson-Aalen estimators of the baseline cumulative mass index, Townsend score, and smoking status. This
sub-distribution hazard, and the outcome indicators excluded the outcome indicators and Nelson-Aalen
(death from covid-19 and hospital admission with terms, as the aim was to use covariate data to obtain a
covid-19). We carried out five imputations. Each prediction as if the outcome had not been observed to
analysis model was fitted in each of the five imputed reflect intended use.
datasets. We used Rubin’s rules to combine the model We applied the final risk equations developed
parameter estimates and the baseline cumulative from the derivation dataset to men and women in
incidence estimates across the imputed datasets. the validation dataset and evaluated R2 values, Brier
We initially sought to fit models using all predictor scores, and measures of discrimination and calibration
variables. Owing to sparse cells, some conditions were for the two time periods.22-24 R2 values refer to the
combined if clinically similar in nature (such as rare proportion of variation in survival time explained by
neurological disorders). We examined interactions the model. Brier scores measure predictive accuracy,

the bmj | BMJ 2020;371:m3731 | doi: 10.1136/bmj.m3731 3


RESEARCH

BMJ: first published as 10.1136/bmj.m3731 on 20 October 2020. Downloaded from http://www.bmj.com/ on 18 November 2020 at UCL Library Services. Protected by copyright.
Box 1: Candidate predictor variables examined during model development*
Demographic
• Age in years (continuous)
• Townsend deprivation score (continuous)—This is an area level continuous score based on the patient’s postcode.20 Originally developed by
Townsend,20 it includes unemployment (as a percentage of those aged ≥16 who are economically active), non-car ownership (as a percentage
of all households), non-home ownership (as a percentage of all households), and household overcrowding. These variables are measured for a
given area of approximately 120 households, via the 2011 census, and combined to give a Townsend score for that area. A greater Townsend score
implies a greater level of deprivation
• Ethnicity in nine categories (White, Indian, Pakistani, Bangladeshi, Other Asian, Caribbean, Black African, Chinese, other ethnic group)
• Domicile in three categories: homeless, care home residence (nursing or residential), other
Lifestyle
• Smoking status in five categories (non-smoker, ex-smoker, 1-10 per day, 11-19 per day, ≥20 per day)
• Body mass index in kg/m2 (continuous)
• Crack cocaine and injected drug use
Conditions on current shielding patient list
• Solid organ transplant recipient on long term immune suppression treatment
• Cancers:
○○Active chemotherapy
○○Radical radiotherapy for lung cancer
○○Blood/bone marrow cancer at any treatment stage
○○Immunotherapy or continuing antibody treatment
○○Targeted cancer treatments that affect immune system (PARP inhibitor or PKI)
○○Bone marrow or stem cell transplant in previous 6 months or remain on immunosuppression
• Immunosuppression sufficiently increasing infection risk
• Severe respiratory disease:
○○Severe asthma (≥3 prescribed courses of steroids in preceding 12 months)
○○Severe COPD (≥3 prescribed courses of steroids in preceding 12 months)
○○Cystic fibrosis, interstitial lung disease, sarcoidosis, non-cystic fibrosis bronchiectasis, or pulmonary hypertension
• Rare diseases or inborn errors of metabolism:
○○Severe combined immunodeficiency
○○Homozygous sickle cell disease
• Pregnant with significant heart disease:
○○Acquired or congenital
Conditions moderately associated with increased risk of complications as per current NHS guidance
• Chronic, non-severe respiratory disease:
○○Asthma
○○COPD (emphysema and chronic bronchitis)
○○Extrinsic allergic alveolitis
• Chronic kidney disease (CKD):
○○CKD stage 3 or 4
○○End stage renal failure requiring dialysis
○○End stage renal failure ever undergoing a transplant
• Cardiac disease:
○○Congestive cardiac failure
○○Valvular heart disease
• Chronic liver disease:
○○Chronic infective hepatitis (hepatitis B or C)
○○Alcohol related liver disease
○○Primary biliary cirrhosis
○○Primary sclerosing cholangitis
○○Haemochromatosis
• Chronic neurological conditions:
○○Epilepsy
○○Parkinson’s disease
○○Motor neurone disease
○○Cerebral palsy
○○Dementia (Alzheimer’s, vascular, frontotemporal)
○○Down’s syndrome
• Diabetes mellitus:
○○Type 1
○○Type 2

4 doi: 10.1136/bmj.m3731 | BMJ 2020;371:m3731 | the bmj


RESEARCH

BMJ: first published as 10.1136/bmj.m3731 on 20 October 2020. Downloaded from http://www.bmj.com/ on 18 November 2020 at UCL Library Services. Protected by copyright.
Box 1: Continued

• Conditions or treatments that predispose to infection (eg, steroid treatment):


○○Rheumatoid arthritis
○○Systemic lupus erythematosus
○○Ankylosing spondylitis or other inflammatory arthropathy (eg, psoriatic arthritis)
○○Connective tissue disease (eg, Ehlers-Danlos syndrome, scleroderma, Sjögren’s syndrome)
○○Polymyositis or dermatomyositis
○○Vasculitis (eg, giant cell arteritis, polyarteritis nodosa, Behçet’s syndrome)
Other medical conditions that investigators hypothesised to confer elevated risk
• Cardiovascular disease:
○○Atrial fibrillation
○○Cardiovascular events (myocardial infarction, stroke, angina, transient ischaemic attack)
○○Peripheral vascular disease
○○Treated hypertension
• Hyperthyroidism
• Chronic pancreatitis
• Cirrhosis (if not above; eg, non-alcoholic fatty liver disease)
• Malabsorption:
○○Coeliac disease
○○Steatorrhoea
○○Blind loop syndrome
• Peptic ulcer (gastric or duodenal)
• Learning disability
• Osteoporosis
• Fragility fracture (hip, spine, shoulder, or wrist fracture)
• Severe mental illness:
○○Bipolar affective disorder
○○Psychosis
○○Schizophrenia or schizoaffective disorder
○○Severe depression
• HIV infection
• Hyposplenism
• Sickle cell disease
• Sphingolipidoses (eg, Tay-Sachs disease)
• History of venous thromboembolism
• Tuberculosis
Concurrent medications
• Drugs affecting the immune response, including systemic chemotherapy based on hospital data
• Drugs affecting the immune system prescribed in primary care (focus on BNF chapter 8.2)
• Long acting β agonists
• Long acting muscarinic antagonists
• Inhaled corticosteroids
COPD=chronic obstructive pulmonary disease; PARP=poly ADP ribose polymerase; PKI=protein kinase A inhibitor.
*Based on data recorded in general practice record linked to hospital information on chemotherapy, radiotherapy, and transplants

where lower values indicate better accuracy.25 D incidences. Additionally, we did a recalibration for
statistics (a discrimination measure that quantifies the the mortality outcome, using the method proposed by
separation in survival between patients with different Booth et al by updating the baseline survivor function
levels of predicted risks) and Harrell’s C statistics based on the temporal validation cohort with the
(a discrimination metric that quantifies the extent prognostic index as an offset term.27 We also applied
to which people with higher risk scores have earlier the algorithms to the validation cohort for the first
events) were evaluated at 97 days (the maximum follow- time period to define the centile thresholds based
up period available at the time of the derivation of the on absolute risk. We also defined centiles of relative
model) and 60 days for the second temporal validation, risk (defined as the ratio of the individual’s predicted
with corresponding 95% confidence intervals.26 absolute risk to the predicted absolute risk for a person
We assessed model calibration by comparing mean of the same age and sex with a white ethnicity, body
predicted risks with observed risks by twentieths mass index of 25, and mean deprivation score with no
of predicted risk for each of the validation cohorts. other risk factors).
Observed risks were derived in each of the 20 groups We calculated the performance metrics in the whole
by using non-parametric estimates of the cumulative validation cohort and in the following pre-specified

the bmj | BMJ 2020;371:m3731 | doi: 10.1136/bmj.m3731 5


RESEARCH

BMJ: first published as 10.1136/bmj.m3731 on 20 October 2020. Downloaded from http://www.bmj.com/ on 18 November 2020 at UCL Library Services. Protected by copyright.
subgroups: within age groups (19-39, 40-49, 50-59, were cardiovascular conditions (atrial fibrillation, heart
60-69, 70-79, ≥80 years), within nine ethnic groups, failure, stroke, peripheral vascular disease, coronary
and within each of the 10 English regions (where heart disease, congenital heart disease), diabetes (type
numbers allowed). In response to open peer review of 1 and type 2 and interaction terms for type 2 diabetes
the published protocol,17 we evaluated performance by with age), respiratory conditions (asthma, rare
calculating Harrell’s C statistics in individual general respiratory conditions (cystic fibrosis, bronchiectasis,
practices and combining the results using a random or alveolitis), chronic obstructive pulmonary
effects meta-analysis.28 disease, pulmonary hypertension or pulmonary
fibrosis), cancer (blood cancer, chemotherapy, lung
Patient and public involvement or oral cancer, marrow transplant, radiotherapy),
Patients were involved in setting the research question neurological conditions (cerebral palsy, Parkinson’s
and in developing plans for design and implementation disease, rare neurological conditions (motor neurone
of the study. Patients were asked to aid in interpreting disease, multiple sclerosis, myasthenia, Huntington’s
and disseminating the results. chorea), epilepsy, dementia, learning disability, severe
mental illness), other conditions (liver cirrhosis,
Results osteoporotic fracture, rheumatoid arthritis or systemic
Overall study population lupus erythematosus, sickle cell disease, venous
Overall, 1205 practices in England met our inclusion thromboembolism, solid organ transplant, renal
criteria. Of these, 910 practices were randomly assigned failure (CKD3, CKD4, CKD5, with or without dialysis
to the derivation dataset and 295 to the validation or transplant)), and medications (≥4 prescriptions
cohort. The practices had 8 256 158 registered patients from general practitioner in previous six months for
aged 19-100 years on 24 January 2020. We included oral steroids, long acting β agonists or leukotrienes,
6 083 102 of these in the derivation cohort, and the immunosuppressants).
validation dataset comprised 2 173 056 people. Figure 1 and figure 2 show the adjusted hazard
ratios in the final models for covid-19 related death in
Baseline characteristics the derivation cohort in women and men. Figure 3 and
Table 2 shows the baseline characteristics of patients figure 4 show the adjusted hazard ratios for the final
in the derivation cohort. Of these patients, 3 035 409 models for covid-19 related hospital admission in the
(49.9%) were men and 990 799 (16.3%) were of black, derivation cohort.
Asian, or other minority ethnic (BAME) background. Supplementary figures A and B show graphs of the
In the derivation cohort, 10 776 (0.18%) patients adjusted hazard ratios for body mass index, age, and
had a covid-19 related hospital admission and 4384 the interaction between age and type 2 diabetes for
(0.07%) had a covid-19 related death during the 97 deaths and hospital admissions due to covid-19 (which
days’ follow-up, of which 4265 (97.3%) were recorded showed higher risks associated with younger ages).
on the death certificate and 119 (2.71%) were based Supplementary figures C and D show fully adjusted
only on a positive test (and of these <15 were based hazard ratios for variables for the full model, including
on a test more than 28 days before death). Admissions variables that were not retained in the final model (for
and deaths due to covid-19 occurred across all regions, example, adjusted hazard ratios close to one or those
with the greatest numbers in London, which accounted which lacked clinical credibility). Other variables with
for 3799 (35.3%) of admissions and 1287 (29.4%) too few events for inclusion were HIV, sphingolipidoises,
of deaths. Of those who died, 2517 (57.4%) were short bowel syndrome, polymyositis, dermatomyositis,
male, 732 (16.7%) were BAME, 3616 (82.5%) were Ehlers-Danlos syndrome, biliary cirrhosis, hepatitis
aged 70 and over, 1417 (32.3%) had type 2 diabetes, B and C, haemochromatosis, non-alcoholic fatty liver
1311 (29.9%) had dementia, and 1033 (23.6%) were disease, chronic pancreatitis, drug misuse, asplenia,
identified as living in a care home. cholangitis, scleroderma, Sjogren’s syndrome, and
The characteristics of the validation cohort were pregnancy. Supplementary figures E and F show fully
similar to those of the derivation cohort, as shown in adjusted hazard ratios for a combined outcome of
supplementary tables A and B. In the first validation either covid-19 related death or hospital admission.
period (24 January to 30 April 2020), 1722 deaths and This gave very similar absolute risks to the hospital
3703 hospital admissions due to covid-19 occurred. In admission outcome.
the second validation period (1 May to 30 June 2020),
621 deaths and 1002 admissions due to covid-19 Model evaluation
occurred. Discrimination
Table 3 shows the performance of the risk equations
Predictor variables in the validation cohort for women and men over 97
The variables included in the final models were days for the main study period and for the temporal
fractional polynomial terms for age and body mass validation cohort evaluated from 1 May 2020 to 30
index, Townsend score (linear), ethnic group, domicile June 2020. Overall, the values for the R2, D, and C
(residential care, homeless, neither), and a range of statistics were similar in women and men. Values for
conditions and treatments as shown in figure 1, figure 2, the mortality outcome tended to be higher than those
figure 3, and figure 4. These conditions and treatments for the hospital admission outcome. For example,

6 doi: 10.1136/bmj.m3731 | BMJ 2020;371:m3731 | the bmj


RESEARCH

BMJ: first published as 10.1136/bmj.m3731 on 20 October 2020. Downloaded from http://www.bmj.com/ on 18 November 2020 at UCL Library Services. Protected by copyright.
in the first validation period, the equation explained Supplementary figures G and H show two clinical
74% of the variation in time to death from covid-19 examples from the web calculator (https://qcovid.
in women; the D statistic was 3.46, and Harrell’s org/BMJ/), showing the absolute and relative risk
C statistic was 0.933. The corresponding values in of catching and dying from covid-19 and the risk of
men were 73.1%, 3.37, and 0.928. The results for hospital admission due to covid-19. It also shows a
the second validation period were similar except for ranking of mortality risk based on centiles across the
covid-19 related admissions in women, for which the validation cohort. Supplementary figure G shows a
explained variation and discrimination were lower 55 year old black African man with type 2 diabetes,
than for the first period (explained variation 45.4%, D a body mass index of 27.7, and no other risk factors.
statistic 1.87, and Harrell’s C statistic 0.776). His absolute risk of catching and dying from covid-19
Supplementary tables C-F show the corresponding over the 90 day period is 0.1095% (or 1 in 913). His
results by region, age band, and fifth of deprivation relative risk compared with a white man aged 55 years
and within each ethnic group in men and women in and a body mass index of 25 is 10.84. The graph
both validation periods. Performance was generally shows that he is in the top 10% of the population at
similar to the overall results except for age, for which highest risk. Supplementary figure H shows a 30 year
the values were lower within individual age bands. old white woman with Down’s syndrome with a body
Figure 5 shows funnel plots of Harrell’s C statistic for mass index of 40. Her absolute risk of catching and
each general practice in the validation cohort versus dying from covid-19 is 0.024% (or 1 in 4184). Her
the number of deaths in each practice in men and relative risk compared with a white woman aged 30
women in the first validation period. The summary years with a body mass index of 25 and no other risk
(average) C statistic for women was 0.916 (95% factors is 59.75, and the rank is 75. Her absolute risk
confidence interval 0.908 to 0.924) from a random of admission to hospital with covid-19 over 90 days is
effects meta-analysis. The corresponding summary C 1 in 272.
statistic for men was 0.919 (0.912 to 0.926).
Discussion
Calibration We have developed and evaluated a novel clinical
Figure 6 (top two rows) shows the calibration plots risk prediction model (QCOVID) to estimate risks of
for the covid-19 related hospital admission equation hospital admission and mortality due to covid-19.
and for the covid-19 related death equation in the We have used national linked datasets from general
first validation period. These show that both sets of practice and national SARS-CoV-2 testing, death
equations were well calibrated in the first time period registry, and hospital episode data for a sample
except for a small degree of under-prediction in the of more than 8 million adults representative of
highest risk category for mortality. In the second the population of England. The risk models have
validation period, calibration was good for the hospital excellent discrimination (Harrell’s C statistics >0.9
admission outcome but not for the mortality outcome for the primary outcome). Although the calibration
at the high levels of risk (fig 6, third and fourth rows). for the hospital admission outcome was good in both
The calibration was improved with recalibration, with time periods, some under-prediction existed for the
observed risks more closely matching the predicted mortality outcome in the second validation cohort,
risks (fig 6, bottom row). which improved after recalibration. The recalibration
method could be used to transport the risk models to
Risk stratification other settings or time periods with different absolute
Table 4 shows the sensitivity values for the mortality risks of covid-19. QCOVID represents a new approach
equation over 97 days evaluated at different thresholds for risk stratification in the population. It could also
based on the centiles of the predicted absolute risk be deployed in several health and care applications,
in the validation cohort. For example, it shows that either during the current phase of the pandemic or in
75.73% of deaths occurred in people in the top 5% subsequent “waves” of infection (with recalibration
for predicted absolute risk of death from covid-19 as needed). These could include supporting targeted
(predicted absolute risks above 0.237%). People in the recruitment for clinical trials, prioritisation for
top 20% for predicted absolute risk of death accounted vaccination, and discussions between patients and
for 94% of deaths, and the top 25% accounted for clinicians on workplace or health risk mitigation—for
95.99% of deaths. Supplementary table G shows a example, through weight reduction as obesity may
similar table based on centiles of relative risk. This be an important modifiable risk factor for serious
example shows that 50.93% of deaths occurred in complications of covid-19 if a causal association is
people in the top 5% for predicted relative risk of established.10 Although QCOVID has been specifically
covid-19 related death (predicted relative risk above designed to inform UK health policy and interventions
5.3). People in the top 20% for predicted relative risk to manage covid-19 related risks, it also has
of death accounted for 77.53% of deaths, and the top international potential, subject to local validation. One
25% accounted for 81.59% of deaths. As an example, of the variables in our model (the Townsend measure
table 5 shows characteristics of the 5% of patients of deprivation) may need to be replaced with locally
with the highest predicted absolute risk of death for all available equivalent measures, or some recalibration
individuals aged 19-100 years. may be needed. Previous risk prediction models based

the bmj | BMJ 2020;371:m3731 | doi: 10.1136/bmj.m3731 7


RESEARCH

BMJ: first published as 10.1136/bmj.m3731 on 20 October 2020. Downloaded from http://www.bmj.com/ on 18 November 2020 at UCL Library Services. Protected by copyright.
Table 2 | Demographic and medical characteristics of derivation cohort and cohort members with outcomes. Values are numbers (percentages) unless
stated otherwise
Derivation cohort—total Derivation cohort—covid-19 Derivation cohort—covid-19
Characteristic (n=6 083 102) deaths (n=4384) admission (n=10 776)
Male sex 3 035 409 (49.90) 2517 (57.41) 5962 (55.33)
Mean (SD) age, years 48.21 (18.57) 80.27 (12.10) 69.63 (17.91)
Age band:
  19-29 years 1 139 120 (18.73) 12 (0.27) 282 (2.62)
  30-39 years 1 190 905 (19.58) 22 (0.50) 523 (4.85)
  40-49 years 1 021 643 (16.79) 51 (1.16) 828 (7.68)
  50-59 years 1 013 599 (16.66) 223 (5.09) 1371 (12.72)
  60-69 years 757 483 (12.45) 460 (10.49) 1677 (15.56)
  70-79 years 586 164 (9.64) 892 (20.35) 2135 (19.81)
  80-89 years 298 093 (4.90) 1722 (39.28) 2722 (25.26)
  ≥90 years 76 095 (1.25) 1002 (22.86) 1238 (11.49)
Geographical region:
  East Midlands 164 502 (2.70) 52 (1.19) 131 (1.22)
  East of England 186 673 (3.07) 136 (3.10) 317 (2.94)
 London 1 576 616 (25.92) 1287 (29.36) 3799 (35.25)
  North East 163 388 (2.69) 87 (1.98) 243 (2.26)
  North West 1 076 590 (17.70) 942 (21.49) 2055 (19.07)
  South Central 824 558 (13.55) 563 (12.84) 1293 (12.00)
  South East 685 960 (11.28) 462 (10.54) 960 (8.91)
  South West 581 014 (9.55) 198 (4.52) 501 (4.65)
  West Midlands 605 752 (9.96) 533 (12.16) 1197 (11.11)
  Yorkshire and Humber 218 049 (3.58) 124 (2.83) 280 (2.60)
Ethnicity:
 White 3 924 110 (64.51) 2947 (67.22) 6790 (63.01)
 Indian 175 909 (2.89) 131 (2.99) 423 (3.93)
 Pakistani 114 727 (1.89) 69 (1.57) 248 (2.30)
 Bangladeshi 87 491 (1.44) 69 (1.57) 173 (1.61)
  Other Asian 110 579 (1.82) 57 (1.30) 248 (2.30)
 Caribbean 69 166 (1.14) 152 (3.47) 392 (3.64)
  Black African 150 022 (2.47) 122 (2.78) 456 (4.23)
 Chinese 58 511 (0.96) 18 (0.41) 45 (0.42)
  Other ethnic group 224 394 (3.69) 114 (2.60) 436 (4.05)
  Not recorded 1 168 193 (19.20) 705 (16.08) 1565 (14.52)
Townsend deprivation fifth:
  1 (most affluent) 1 238 575 (20.36) 840 (19.16) 1799 (16.69)
 2 1 222 681 (20.10) 746 (17.02) 1886 (17.50)
 3 1 187 082 (19.51) 934 (21.30) 2114 (19.62)
 4 1 176 829 (19.35) 951 (21.69) 2338 (21.70)
  5 (most deprived) 1 23 1431 (20.24) 905 (20.64) 2612 (24.24)
  Not recorded 26 504 (0.44) * 27 (0.25)
Accommodation:
  Neither homeless nor care home resident 6 036 288 (99.23) 3345 (76.30) 9895 (91.82)
  Care home or nursing home resident 35 813 (0.59) 1033 (23.56) 854 (7.93)
 Homeless 11 001 (0.18) * 27 (0.25)
Body mass index:
 <18.5 161 579 (2.66) 203 (4.63) 260 (2.41)
 18.5-24.99 2 033 809 (33.43) 1345 (30.68) 2708 (25.13)
 25-29.99 1 723 494 (28.33) 1291 (29.45) 3406 (31.61)
 30-34.99 800 857 (13.17) 738 (16.83) 2126 (19.73)
 ≥35 453 323 (7.45) 460 (10.49) 1549 (14.37)
  Not recorded 910 040 (14.96) 347 (7.92) 727 (6.75)
Smoking status:
 Non-smoker 3 482 456 (57.25) 2312 (52.74) 6073 (56.36)
 Ex-smoker 1 291 953 (21.24) 1735 (39.58) 3716 (34.48)
  Light smoker 803 783 (13.21) 199 (4.54) 668 (6.20)
  Moderate smoker 153 680 (2.53) 32 (0.73) 97 (0.90)
  Heavy smoker 70 215 (1.15) 18 (0.41) 62 (0.58)
  Not recorded 281 015 (4.62) 88 (2.01) 160 (1.48)
Chronic kidney disease (CKD):
  No CKD 5 843 919 (96.07) 2928 (66.79) 8156 (75.69)
 CKD3 2 14193 (3.52) 1190 (27.14) 2010 (18.65)
 CKD4 12 654 (0.21) 141 (3.22) 252 (2.34)
  CKD5 only 7286 (0.12) 96 (2.19) 239 (2.22)
  CKD5 with dialysis 1676 (0.03) 14 (0.32) 46 (0.43)
  CKD5 with transplant 3374 (0.06) 15 (0.34) 73 (0.68)

8 doi: 10.1136/bmj.m3731 | BMJ 2020;371:m3731 | the bmj


RESEARCH

BMJ: first published as 10.1136/bmj.m3731 on 20 October 2020. Downloaded from http://www.bmj.com/ on 18 November 2020 at UCL Library Services. Protected by copyright.
Table 2 | Continued
Derivation cohort—total Derivation cohort—covid-19 Derivation cohort—covid-19
Characteristic (n=6 083 102) deaths (n=4384) admission (n=10 776)
Learning disability:
  No learning disability 5 972 982 (98.19) 4110 (93.75) 10251 (95.13)
  Learning disability 107107 (1.76) 255 (5.82) 498 (4.62)
  Down’s syndrome 3013 (0.05) 19 (0.43) 27 (0.25)
Chemotherapy:
  No chemotherapy in previous 12 months 6 059 236 (99.61) 4267 (97.33) 10482 (97.27)
  Chemotherapy group A 9307 (0.15) 33 (0.75) 71 (0.66)
  Chemotherapy group B 13 600 (0.22) 75 (1.71) 200 (1.86)
  Chemotherapy group C 959 (0.02) * 23 (0.21)
Cancer and immunosuppression:
  Blood cancer 28 089 (0.46) 114 (2.60) 238 (2.21)
  Bone marrow or stem cell transplant in previous 6 months 194 (0.00) * *
  Respiratory cancer 12 792 (0.21) 61 (1.39) 130 (1.21)
  Radiotherapy in previous 6 months 12 129 (0.20) 56 (1.28) 125 (1.16)
  Solid organ transplant 3209 (0.05) 10 (0.23) 33 (0.31)
  GP prescribed immunosuppressant medication 7990 (0.13) 19 (0.43) 53 (0.49)
  Prescribed leukotriene or LABA 13 0895 (2.15) 399 (9.10) 874 (8.11)
  Prescribed regular prednisolone 32 929 (0.54) 176 (4.01) 388 (3.60)
  Sickle cell disease 2125 (0.03) * 28 (0.26)
Other comorbidities:
  Type 1 diabetes 28 587 (0.47) 36 (0.82) 136 (1.26)
  Type 2 diabetes 394 562 (6.49) 1417 (32.32) 3017 (28.00)
  Chronic obstructive pulmonary disease 142 107 (2.34) 580 (13.23) 1155 (10.72)
 Asthma 825 422 (13.57) 584 (13.32) 1745 (16.19)
  Rare pulmonary diseases 33 433 (0.55) 96 (2.19) 240 (2.23)
  Pulmonary hypertension or pulmonary fibrosis 4940 (0.08) 40 (0.91) 83 (0.77)
  Coronary heart disease 215 069 (3.54) 1038 (23.68) 1779 (16.51)
 Stroke 129 699 (2.13) 809 (18.45) 1339 (12.43)
  Atrial fibrillation 147 528 (2.43) 832 (18.98) 1461 (13.56)
  Congestive cardiac failure 70 970 (1.17) 575 (13.12) 1005 (9.33)
  Venous thromboembolism 105 136 (1.73) 381 (8.69) 753 (6.99)
  Peripheral vascular disease 44 476 (0.73) 289 (6.59) 467 (4.33)
  Congenital heart disease 31 576 (0.52) 48 (1.09) 100 (0.93)
 Dementia 58 873 (0.97) 1311 (29.90) 1235 (11.46)
  Parkinson’s disease 15 315 (0.25) 137 (3.13) 218 (2.02)
 Epilepsy 80 064 (1.32) 159 (3.63) 348 (3.23)
  Rare neurological conditions 18 603 (0.31) 42 (0.96) 120 (1.11)
  Cerebral palsy 6481 (0.11) * 27 (0.25)
  Severe mental illness 672 494 (11.06) 745 (16.99) 1841 (17.08)
  Osteoporotic fracture 238 276 (3.92) 675 (15.40) 1154 (10.71)
  Rheumatoid arthritis or SLE 60 847 (1.00) 127 (2.90) 309 (2.87)
  Cirrhosis of liver 11 865 (0.20) 37 (0.84) 106 (0.98)
GP=general practitioner; LABA=long acting β agonist; SLE=systemic lupus erythematosus.
*Value suppressed owing to small numbers (<15).

on QResearch have been validated internationally and death following a positive test, or clinical decision tools
found to have good performance outside of the UK.29 30 that integrate biochemical and imaging parameters to
aid diagnostis.13 However, most such studies are at
Comparison with other studies high risk of bias, as they have been developed in highly
Although similarities exist between our study and the selected cohorts, have limited transparency, are likely
recently reported analysis of risk factors from another to have optimistic reported performance, or did not
English general practice database using a different use covid-19 specific data.13 This study represents a
clinical computer system, our project had a different substantial improvement on previously developed risk
aim—namely, to develop and evaluate a risk prediction algorithms in terms of the size and representativeness
model. We used a more comprehensive outcome of the study population, the richness of data linkages
(including deaths in patients with positive tests for enabling accurate ascertainment of cases (including
SARS-CoV-2), a much wider range of predictors, and both in-hospital and out of hospital deaths) across the
a more granular assessment of ethnicity and body health network, and the breadth of candidate predictor
mass index. Our C statistic for mortality (>0.92) is variables considered. Importantly, it analyses risks
substantially higher than the previous study’s reported at the population level, rather than risks in people
value of 0.77.31 Other prediction models have been with confirmed or suspected infection, and may have
reported, although these focus on other outcomes of relevance for shielding or other policies that seek to
covid-19, including risk of admission to intensive care or mitigate risk of viral exposure.

the bmj | BMJ 2020;371:m3731 | doi: 10.1136/bmj.m3731 9


RESEARCH

BMJ: first published as 10.1136/bmj.m3731 on 20 October 2020. Downloaded from http://www.bmj.com/ on 18 November 2020 at UCL Library Services. Protected by copyright.
Adjusted hazard Adjusted hazard
ratio (95% CI) ratio (95% CI)

Townsend material deprivation score (5 unit increase) 1.48 (1.37 to 1.61)

White 1.00 (1.00 to 1.00)


Indian 1.89 (1.43 to 2.51)
Pakistani 1.40 (0.91 to 2.14)
Bangladeshi 1.41 (0.88 to 2.26)
Other Asian 1.19 (0.72 to 1.97)
Caribbean 1.68 (1.29 to 2.20)
Black African 1.98 (1.39 to 2.83)
Chinese 1.21 (0.51 to 2.90)
Other ethnic group 1.73 (1.28 to 2.35)

Not in care home or homeless 1.00 (1.00 to 1.00)


Lives in residential or nursing home 3.61 (3.18 to 4.10)
Homeless according to GP records 1.48 (0.21 to 10.52)

No learning disability 1.00 (1.00 to 1.00)


Learning disability apart from Down's syndrome 1.36 (1.11 to 1.65)
Down's syndrome 32.55 (18.13 to 58.42)

No kidney failure 1.00 (1.00 to 1.00)


Chronic kidney disease stage 3 1.30 (1.17 to 1.45)
Chronic kidney disease stage 4 1.37 (1.05 to 1.80)
Chronic kidney disease stage 5 3.00 (2.19 to 4.12)
Chronic kidney disease stage 5 with dialysis 2.68 (0.86 to 8.36)
Chronic kidney disease statge 5 with transplant 7.84 (3.38 to 18.17)

Not on chemotherapy in past 12 months 1.00 (1.00 to 1.00)


Chemotherapy grade A 2.30 (1.35 to 3.94)
Chemotherapy grade B 3.52 (2.29 to 5.42)
Chemotherapy grade C 17.31 (6.52 to 45.98)

Blood cancer 1.50 (1.06 to 2.12)


Bone marrow or stem cell transplant in past 6 months 2.78 (0.22 to 34.55)
Respiratory tract cancer 1.70 (1.16 to 2.49)
Radiotherapy in past 6 months 2.11 (1.30 to 3.41)
Solid organ transplant (excluding kidney and bone marrow) 1.46 (0.36 to 5.92)
Immunosuppressant medication from GP 4+ scripts in past 6 months 1.09 (0.56 to 2.10)
Leukotriene or long acting β agonist 4+ scripts in past 6 months 1.23 (0.78 to 1.94)
Oral steroids 4+ scripts in past 6 months 1.83 (1.52 to 2.19)
Sickle cell disease or severe immunodeficiency 5.94 (1.89 to 18.67)

No diabetes 1.00 (1.00 to 1.00)


Type 1 diabetes 4.02 (2.07 to 7.82)
Type 2 diabetes 6.29 (4.08 to 9.70)

Chronic obstructive pulmonary disease 1.50 (1.29 to 1.74)


Asthma 0.84 (0.73 to 0.97)
Rare lung conditions (bronchiectasis, cystic fibrosis, or alveolitis) 0.85 (0.60 to 1.19)
Pulmonary hypertension or pulmonary fibrosis 1.55 (1.00 to 2.40)

Coronary heart disease 1.24 (1.10 to 1.40)


Stroke 1.34 (1.19 to 1.51)
Atrial fibrillation 1.18 (1.04 to 1.34)
Congestive cardiac failure 1.37 (1.18 to 1.60)
Thrombo-embolism 1.18 (1.01 to 1.38)
Peripheral vascular disease 1.42 (1.15 to 1.76)
Congenital heart disease 1.23 (0.75 to 2.03)

Dementia 2.91 (2.58 to 3.28)


Parkinson's disease 1.13 (0.79 to 1.62)
Epilepsy 1.58 (1.23 to 2.03)
Motor neurone disease, multiple sclerosis, myaesthenia gravis, or Huntington's 2.75 (1.83 to 4.12)
Cerebral palsy 3.45 (1.10 to 10.78)

Severe mental illness 1.29 (1.15 to 1.45)


Osteoporotic fracture (hip, spine, wrist, humerus) 1.12 (1.00 to 1.26)
Rheumatoid arthritis or SLE 1.32 (1.06 to 1.65)
Cirrhosis of liver 1.85 (1.15 to 2.99)

0.015 1 64

Fig 1 | Adjusted hazard ratio (95% CI) of death from covid-19 in women in derivation cohort, adjusted for variables shown, deprivation, and
fractional polynomial terms for body mass index (BMI) and age. Model includes fractional polynomial terms for age (3 3) and BMI (0.5 0.5 ln(bmi))
and interaction terms between age terms and type 2 diabetes. Hazard ratio for type 2 diabetes reported at mean age. GP=general practitioner;
SLE=systemic lupus erythematosus. (QResearch database version 44; study period 24 January 2020 to 30 April 2020)

10 doi: 10.1136/bmj.m3731 | BMJ 2020;371:m3731 | the bmj


RESEARCH

BMJ: first published as 10.1136/bmj.m3731 on 20 October 2020. Downloaded from http://www.bmj.com/ on 18 November 2020 at UCL Library Services. Protected by copyright.
Adjusted hazard Adjusted hazard
ratio (95% CI) ratio (95% CI)

Townsend material deprivation score (5 unit increase) 1.50 (1.40 to 1.61)

White 1.00 (1.00 to 1.00)


Indian 1.59 (1.25 to 2.01)
Pakistani 1.84 (1.39 to 2.44)
Bangladeshi 2.27 (1.65 to 3.12)
Other Asian 2.02 (1.49 to 2.74)
Caribbean 2.06 (1.65 to 2.57)
Black African 3.03 (2.42 to 3.80)
Chinese 2.47 (1.49 to 4.09)
Other ethnic group 2.04 (1.60 to 2.58)

Not in care home or homeless 1.00 (1.00 to 1.00)


Lives in residential or nursing home 4.28 (3.80 to 4.83)
Homeless according to GP records 1.56 (0.65 to 3.76)

No learning disability 1.00 (1.00 to 1.00)


Learning disability apart from Down's syndrome 1.36 (1.14 to 1.60)
Down's syndrome 9.80 (4.62 to 20.78)

No kidney failure 1.00 (1.00 to 1.00)


Chronic kidney disease stage 3 1.18 (1.06 to 1.30)
Chronic kidney disease stage 4 1.83 (1.46 to 2.29)
Chronic kidney disease stage 5 2.40 (1.83 to 3.15)
Chronic kidney disease stage 5 with dialysis 3.67 (2.02 to 6.66)
Chronic kidney disease statge 5 with transplant 3.20 (1.62 to 6.33)

Not on chemotherapy in past 12 months 1.00 (1.00 to 1.00)


Chemotherapy grade A 1.74 (1.10 to 2.75)
Chemotherapy grade B 3.50 (2.54 to 4.82)
Chemotherapy grade C 3.37 (1.17 to 9.64)

Blood cancer 1.29 (0.97 to 1.71)


Bone marrow or stem cell transplant in last 6 months 6.10 (1.11 to 33.54)
Respiratory tract cancer 1.27 (0.89 to 1.81)
Radiotherapy in past 6 months 2.09 (1.48 to 2.96)
Solid organ transplant (excluding kidney and bone marrow) 1.72 (0.71 to 4.21)
Immunosuppressant medication from GP 4+ scripts in past 6 months 1.58 (0.95 to 2.62)
Leukotriene or long acting β agonist 4+ scripts in past 6 months 1.04 (0.64 to 1.70)
Oral steroids 4+ scripts in past 6 months 1.44 (1.19 to 1.73)
Sickle cell disease or severe immunodeficiency 4.41 (1.41 to 13.81)

No diabetes 1.00 (1.00 to 1.00)


Type 1 diabetes 5.84 (3.97 to 8.60)
Type 2 diabetes 4.74 (3.34 to 6.71)

Chronic obstructive pulmonary disease 1.25 (1.11 to 1.42)


Asthma 1.03 (0.91 to 1.17)
Rare lung conditions (bronchiectasis, cystic fibrosis, or alveolitis) 1.20 (0.93 to 1.56)
Pulmonary hypertension or pulmonary fibrosis 1.47 (0.93 to 2.32)

Coronary heart disease 1.13 (1.02 to 1.24)


Stroke 1.24 (1.11 to 1.38)
Atrial fibrillation 1.11 (1.00 to 1.24)
Congestive cardiac failure 1.40 (1.24 to 1.59)
Thrombo-embolism 1.36 (1.18 to 1.57)
Peripheral vascular disease 1.38 (1.19 to 1.61)
Congenital heart disease 1.03 (0.72 to 1.47)

Dementia 3.14 (2.81 to 3.50)


Parkinson's disease 1.93 (1.59 to 2.35)
Epilepsy 1.60 (1.30 to 1.97)
Motor neurone disease, multiple sclerosis, myaesthenia gravis, or Huntington's 1.99 (1.24 to 3.18)
Cerebral palsy 2.77 (1.23 to 6.23)

Severe mental illness 1.26 (1.13 to 1.42)


Osteoporotic fracture (hip, spine, wrist, humerus) 1.41 (1.24 to 1.61)
Rheumatoid arthritis or SLE 1.02 (0.75 to 1.38)
Cirrhosis of liver 1.29 (0.83 to 2.02)

0.031 1 32

Fig 2 | Adjusted hazard ratio (95% CI) of death from covid-19 in men in derivation cohort, adjusted for variables shown, deprivation, and fractional
polynomial terms for body mass index (BMI) and age. Model includes fractional polynomial terms for age (1 3) and BMI (−0.5 −0.5 ln(age)) and
interaction terms between age terms and type 2 diabetes. Hazard ratio for type 2 diabetes reported at mean age. GP=general practitioner;
SLE=systemic lupus erythematosus. (QResearch database version 44; study period 24 January 2020 to 30 April 2020)

the bmj | BMJ 2020;371:m3731 | doi: 10.1136/bmj.m3731 11


RESEARCH

BMJ: first published as 10.1136/bmj.m3731 on 20 October 2020. Downloaded from http://www.bmj.com/ on 18 November 2020 at UCL Library Services. Protected by copyright.
Adjusted hazard Adjusted hazard
ratio (95% CI) ratio (95% CI)

Townsend material deprivation score (5 unit increase) 1.52 (1.45 to 1.60)

White 1.00 (1.00 to 1.00)


Indian 1.89 (1.60 to 2.24)
Pakistani 1.52 (1.21 to 1.89)
Bangladeshi 1.41 (1.11 to 1.79)
Other Asian 2.14 (1.74 to 2.64)
Caribbean 2.01 (1.71 to 2.35)
Black African 2.30 (1.97 to 2.68)
Chinese 1.15 (0.71 to 1.85)
Other ethnic group 1.90 (1.64 to 2.21)

Not in care home or homeless 1.00 (1.00 to 1.00)


Lives in residential or nursing home 1.84 (1.64 to 2.07)
Homeless according to GP records 1.23 (0.55 to 2.74)

No learning disability 1.00 (1.00 to 1.00)


Learning disability apart from Down's syndrome 1.53 (1.34 to 1.76)
Down's syndrome 8.84 (5.37 to 14.55)

No kidney failure 1.00 (1.00 to 1.00)


Chronic kidney disease stage 3 1.35 (1.25 to 1.46)
Chronic kidney disease stage 4 1.79 (1.48 to 2.17)
Chronic kidney disease stage 5 4.17 (3.39 to 5.12)
Chronic kidney disease stage 5 with dialysis 3.72 (2.06 to 6.75)
Chronic kidney disease statge 5 with transplant 5.54 (3.55 to 8.67)

Not on chemotherapy in last 12 months 1.00 (1.00 to 1.00)


Chemotherapy grade A 2.11 (1.48 to 3.01)
Chemotherapy grade B 4.19 (3.28 to 5.37)
Chemotherapy grade C 15.53 (8.36 to 28.85)

Blood cancer 1.40 (1.10 to 1.78)


Bone marrow or stem cell transplant in past 6 months 1.21 (0.24 to 5.97)
Respiratory tract cancer 1.65 (1.25 to 2.17)
Radiotherapy in past 6 months 1.47 (1.07 to 2.04)
Solid organ transplant (excluding kidney and bone marrow) 1.57 (0.80 to 3.05)
Immunosuppressant medication from GP 4+ scripts in past 6 months 1.32 (0.94 to 1.84)
Leukotriene or long acting β agonist 4+ scripts in past 6 months 1.31 (1.04 to 1.64)
Oral steroids 4+ scripts in past 6 months 1.92 (1.71 to 2.17)
Sickle cell disease or severe immunodeficiency 6.68 (4.06 to 10.97)

No diabetes 1.00 (1.00 to 1.00)


Type 1 diabetes 4.03 (3.12 to 5.22)
Type 2 diabetes 2.64 (2.27 to 3.07)

Chronic obstructive pulmonary disease 1.34 (1.21 to 1.49)


Asthma 1.12 (1.04 to 1.21)
Rare lung conditions (bronchiectasis, cystic fibrosis, or alveolitis) 1.28 (1.06 to 1.55)
Pulmonary hypertension or pulmonary fibrosis 1.60 (1.19 to 2.14)

Coronary heart disease 1.11 (1.02 to 1.22)


Stroke 1.39 (1.27 to 1.53)
Atrial fibrillation 1.34 (1.22 to 1.47)
Congestive cardiac failure 1.38 (1.23 to 1.55)
Thrombo-embolism 1.34 (1.21 to 1.50)
Peripheral vascular disease 1.21 (1.03 to 1.44)
Congenital heart disease 1.20 (0.88 to 1.65)

Dementia 1.73 (1.56 to 1.92)


Parkinson's disease 1.70 (1.32 to 2.18)
Epilepsy 1.57 (1.33 to 1.86)
Motor neurone disease, multiple sclerosis, myaesthenia gravis or Huntington's 2.47 (1.90 to 3.22)
Cerebral palsy 2.66 (1.42 to 4.98)

Severe mental illness 1.37 (1.28 to 1.47)


Osteoporotic fracture (hip, spine, wrist, humerus) 1.35 (1.24 to 1.47)
Rheumatoid arthritis or SLE 1.35 (1.17 to 1.56)
Cirrhosis of the liver 1.83 (1.35 to 2.49)

0.031 1 32

Fig 3 | Adjusted hazard ratio (95%CI) of hospital admission for covid-19 in women in derivation cohort, adjusted for variables shown, deprivation,
fractional polynomial terms for body mass index (BMI) and age. Model includes fractional polynomial terms for age (0.5, 2) and BMI (−2 0) and
interaction terms between age terms and type 2 diabetes. Hazard ratio for type 2 diabetes reported at mean age. GP=general practitioner;
SLE=systemic lupus erythematosus. (QResearch database version 44; study period 24 January 2020 to 30 April 2020)

12 doi: 10.1136/bmj.m3731 | BMJ 2020;371:m3731 | the bmj


RESEARCH

BMJ: first published as 10.1136/bmj.m3731 on 20 October 2020. Downloaded from http://www.bmj.com/ on 18 November 2020 at UCL Library Services. Protected by copyright.
Adjusted hazard Adjusted hazard
ratio (95% CI) ratio (95% CI)

Townsend material deprivation score (5 unit increase) 1.46 (1.40 to 1.53)

White 1.00 (1.00 to 1.00)


Indian 2.15 (1.89 to 2.44)
Pakistani 2.01 (1.72 to 2.36)
Bangladeshi 1.71 (1.41 to 2.08)
Other Asian 2.29 (1.91 to 2.74)
Caribbean 2.29 (1.99 to 2.63)
Black African 2.59 (2.27 to 2.97)
Chinese 1.51 (1.03 to 2.20)
Other ethnic group 2.12 (1.83 to 2.46)

Not in carehome or homeless 1.00 (1.00 to 1.00)


Lives in residential or nursing home 2.52 (2.25 to 2.82)
Homeless according to GP records 1.50 (0.97 to 2.30)

No learning disability 1.00 (1.00 to 1.00)


Learning disability apart from Down's Syndrome 1.38 (1.22 to 1.56)
Down's syndrome 4.36 (2.39 to 7.94)

No kidney failure 1.00 (1.00 to 1.00)


Chronic kidney disease stage 3 1.28 (1.19 to 1.38)
Chronic kidney disease stage 4 2.00 (1.67 to 2.39)
Chronic kidney disease stage 5 3.86 (3.25 to 4.58)
Chronic kidney disease stage 5 with dialysis 5.90 (4.22 to 8.25)
Chronic kidney disease statge 5 with transplant 7.09 (5.30 to 9.47)

Not on chemotherapy in past 12 months 1.00 (1.00 to 1.00)


Chemotherapy grade A 1.72 (1.24 to 2.37)
Chemotherapy grade B 3.64 (2.95 to 4.49)
Chemotherapy grade C 4.11 (2.20 to 7.68)

Blood cancer 1.29 (1.05 to 1.57)


Bone marrow or stem cell transplant in past 6 months 1.70 (0.49 to 5.94)
Respiratory tract cancer 1.44 (1.14 to 1.83)
Radiotherapy in past 6 months 2.02 (1.59 to 2.55)
Solid organ transplant (excluding kidney and bone marrow) 2.02 (1.27 to 3.21)
Immunosuppressant medication from GP 4+ scripts in past 6 months 1.12 (0.81 to 1.54)
Leukotriene or long acting β agonist 4+ scripts in past 6 months 1.18 (0.89 to 1.58)
Oral steroids 4+ scripts in past 6 months 1.42 (1.25 to 1.62)
Sickle cell disease or severe immunodeficiency 4.87 (2.67 to 8.87)

No diabetes 1.00 (1.00 to 1.00)


Type 1 diabetes 3.66 (2.90 to 4.62)
Type 2 diabetes (see note) 2.57 (2.27 to 2.91)

Chronic obstructive pulmonary disease 1.36 (1.25 to 1.49)


Asthma 1.10 (1.02 to 1.19)
Rare lung conditions (bronchiectasis, cystic fibrosis, or alveolitis) 1.29 (1.07 to 1.54)
Pulmonary hypertension or pulmonary fibrosis 1.56 (1.12 to 2.17)

Coronary heart disease 1.06 (0.99 to 1.14)


Stroke 1.31 (1.20 to 1.42)
1.19 (1.10 to 1.29)
Congestive cardiac failure 1.33 (1.21 to 1.46)
Thrombo-embolism 1.30 (1.17 to 1.44)
Peripheral vascular disease 1.27 (1.13 to 1.42)
Congenital heart disease 0.97 (0.75 to 1.25)

Dementia 2.12 (1.92 to 2.34)


Parkinson's disease 2.05 (1.74 to 2.41)
Epilepsy 1.75 (1.52 to 2.02)
Motor neurone disease, multiple sclerosis, myaesthenia gravis or Huntington's 3.34 (2.60 to 4.29)
Cerebral palsy 2.85 (1.76 to 4.62)

Severe mental illness 1.28 (1.19 to 1.38)


Osteoporotic fracture (hip, spine, wrist, humerus) 1.35 (1.22 to 1.50)
Rheumatoid arthritis or SLE 1.30 (1.07 to 1.57)
Cirrhosis of the liver 1.88 (1.46 to 2.41)

0.125 1 8

Fig 4 | Adjusted hazard ratio (95% CI) of hospital admission for covid-19 in men in derivation cohort, adjusted for variables shown, deprivation,
and fractional polynomial terms for body mass index (BMI) and age. Model includes fractional polynomial terms for age (−2 2) and BMI (−0.5 0)
and interaction terms between age terms and type 2 diabetes. Hazard ratio for type 2 diabetes reported at mean age. GP=general practitioner;
SLE=systemic lupus erythematosus. (QResearch database version 44; study period 24 January 2020 to 30 April 2020)

the bmj | BMJ 2020;371:m3731 | doi: 10.1136/bmj.m3731 13


RESEARCH

BMJ: first published as 10.1136/bmj.m3731 on 20 October 2020. Downloaded from http://www.bmj.com/ on 18 November 2020 at UCL Library Services. Protected by copyright.
Table 3 | Performance of risk models to predict risk of death and hospital admission due to covid-19 in validation cohort
in first validation period (24 January to 30 April 2020) and second temporal validation (1 May to 30 June 2020). Values
are estimates (95% CIs) unless stated otherwise
Covid-19 death Covid-19 admission
Women Men Women Men
Period 1
R2 statistic (%) 74.0 (72.7 to 75.3) 73.1 (71.9 to 74.3) 57.1 (55.5 to 58.8) 58.1 (56.7 to 59.5)
D statistic 3.46 (3.34 to 3.57) 3.37 (3.27 to 3.47) 2.36 (2.28 to 2.44) 2.41 (2.34 to 2.48)
Harrell’s C 0.933 (0.923 to 0.944) 0.928 (0.919 to 0.938) 0.847 (0.836 to 0.857) 0.860 (0.852 to 0.868)
Brier score 0.0007 0.0009 0.0015 0.0019
Period 2
R2 statistic (%) 75.4 (73.5 to 77.4) 73.6 (71.6 to 75.6) 45.4 (41.7 to 49.1) 55.4 (52.2 to 58.5)
D statistic 3.59 (3.4 to 3.77) 3.42 (3.24 to 3.59) 1.87 (1.73 to 2) 2.28 (2.14 to 2.42)
Harrell’s C 0.952 (0.938 to 0.965) 0.933 (0.918 to 0.949) 0.776 (0.753 to 0.800) 0.833 (0.812 to 0.853)
Brier score 0.0002 0.0003 0.0004 0.0004

Complexities of modelling including occupation, local infection rate, and numbers


Several complexities of modelling adverse risks from of social interactions; and risk of hospital admission
covid-19 in the general population warrant discussion. and death due to the infection, which is arguably
We used a general population approach which, primarily driven by “vulnerability” (that is, biological/
although not able to incorporate all determinants of physiological factors including age, sex, body mass
being infected, offers an overall estimate of risk of index, comorbidities, and medications). Although
adverse outcomes from covid-19 that could be used producing a prediction model for risk of “death if
in discussions between clinicians and patients about infected” is feasible in principle, this approach is not
adjustment of lifestyle or occupational and behavioural yet possible owing to the approach to testing in the UK
factors that could limit viral exposure. Our model and the context of an as yet incompletely quantified
predicts risks of “catching covid-19 and then having a degree of asymptomatic background transmission.
severe outcome,” on the basis of data collected during Limited covid-19 testing data are available, but the
the first peak of the pandemic. The endpoint in this difficulty is that no systematic community testing was
study examines a risk trajectory that comprises two done in the UK during the study period, so only patients
elements: becoming infected, which is predominantly unwell enough to attend hospital were tested. This
a function of behavioural/environmental factors means that a risk score developed in those who tested
positive would overestimate risks of severe outcomes.
As more widespread testing is done and those data
1.0 become available, we will be able to update the model
Harrell’s C statistic

to take background infection rates into account and


0.9 also model regional differences. Although the absolute
risk levels will of course change over time, depending
0.8 on the incidence of the disease, our analysis over two
validation time periods indicates that the relative
0.7
risk measures and discrimination are likely to remain
stable.
0.6
Summary C=0.919, estimated prediction interval 0.83 to 1.00
Secondly, the model estimates the absolute risk for
0.5 a non-infected individual in the general population
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 of becoming infected and then dying (or needing to
No of covid-19 deaths in men be admitted to hospital) from the virus over a 97 day
1.0 period. Although many more than 40  000 people
Harrell’s C statistic

have died from covid-19 in the UK to date, when the


0.9 denominator is a population of multi-millions, the
absolute risk for most people may be low. Therefore,
0.8 when conveying this type of risk score to an individual,
due emphasis is needed on the different meanings of
0.7 absolute and relative risk.
Thirdly, the absolute risk of catching covid-19
0.6 depends not only on the incidence of the infection but
Summary C=0.916, estimated prediction interval 0.81 to 1.00 also on the number of people one gets close to. For this
0.5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 reason, non-pharmacological interventions such as
No of covid-19 deaths in women
social distancing and shielding were introduced in the
UK during the study period. We have included some
Fig 5 | Funnel plots of discrimination using Harrell’s C statistic for each general practice measures of multi-occupancy, as we have factored care
in validation cohort versus number of deaths in each practice in men (top panel) and homes into the analysis. The data generated during the
women (bottom panel) in first validation period study period will therefore be affected by the uptake of

14 doi: 10.1136/bmj.m3731 | BMJ 2020;371:m3731 | the bmj


RESEARCH

BMJ: first published as 10.1136/bmj.m3731 on 20 October 2020. Downloaded from http://www.bmj.com/ on 18 November 2020 at UCL Library Services. Protected by copyright.
Women Men
1st study period: 24 January to 30 April 2020
2.0
Risk of covid-19 admission (%)

Observed risk
Predicted risk
1.6

1.2

0.8

0.4

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Twentieth of predicted risk at 97 days Twentieth of predicted risk at 97 days
1st study period: 24 January to 30 April 2020
1.5
Risk of covid-19 death (%)

Observed risk
Predicted risk
1.2

0.9

0.6

0.3

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Twentieth of predicted risk at 97 days Twentieth of predicted risk at 97 days
2nd study period: 1 May to 30 June 2020
0.5
Risk of covid-19 admission (%)

Observed risk
60 day predicted risk
0.4

0.3

0.2

0.1

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Twentieth of predicted risk at 60 days Twentieth of predicted risk at 60 days
2nd study period: 1 May to 30 June 2020
0.5
Risk of covid-19 death (%)

Observed risk
60 day predicted risk
0.4

0.3

0.2

0.1

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Twentieth of predicted risk at 60 days Twentieth of predicted risk at 60 days
2nd study period: 1 May to 30 June 2020
0.5
Risk of covid-19 death (%)

Observed risk
Recalibrated 60 day predicted risk
0.4

0.3

0.2

0.1

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Twentieth of predicted risk at 60 days Twentieth of predicted risk at 60 days

Fig 6 | Predicted and observed risk of covid-19 related hospital admission and death in validation cohort in first study period (24 January to 30 April
2020) and in second study period (1 May to 30 June 2020), and recalibrated predicted and observed risk of covid-19 related death in validation
cohort in second study period (1 May to 30 June 2020)

the bmj | BMJ 2020;371:m3731 | doi: 10.1136/bmj.m3731 15


RESEARCH

BMJ: first published as 10.1136/bmj.m3731 on 20 October 2020. Downloaded from http://www.bmj.com/ on 18 November 2020 at UCL Library Services. Protected by copyright.
Table 4 | Sensitivity for covid-19 related death over 97 days in validation cohort (24 January to 30 April 2020)
comprising 2 173 056 patients with 1722 covid-19 related deaths at different absolute risk thresholds*
Total patients in each Absolute risk centile cut- Total deaths in each absolute Cumulative % deaths based
Top centile centile off (%) risk centile on absolute risk (sensitivity†)
1 21730 0.9093 708 41.11
2 21731 0.5182 263 56.39
3 21730 0.3703 136 64.29
4 21731 0.2892 105 70.38
5 21730 0.2369 92 75.73
6 21731 0.1990 58 79.09
7 21730 0.1702 35 81.13
8 21731 0.1473 46 83.80
9 21731 0.1288 26 85.31
10 21730 0.1135 24 86.70
11 21731 0.1004 18 87.75
12 21730 0.0895 19 88.85
13 21731 0.0800 19 89.95
14 21730 0.0719 18 91.00
15 21731 0.0647 7 91.41
16 21730 0.0584 5 91.70
17 21731 0.0528 14 92.51
18 21731 0.0477 12 93.21
19 21730 0.0432 9 93.73
20 21731 0.0393 5 94.02
21 21730 0.0357 6 94.37
22 21731 0.0325 9 94.89
23 21730 0.0296 6 95.24
24 21731 0.0270 4 95.47
25 21731 0.0246 9 95.99
*Centile value giving cut-off of predicted risk over 97 days for defining each group of absolute risk.
†Percentage of total deaths over 97 days that occurred within group of patients above predicted risk threshold.

interventions such as social distancing and shielding, risk assessment must not be used in a way that causes
intended to mitigate the risks of SARS-CoV-2 infection. harm either to the individual patient or to others (for
This could result in underestimation of some model example, by introducing or withdrawing treatments
coefficients and hence underestimation of absolute where this is not in the patient’s best interest), thereby
risk in people who were shielded. Also, as this is a supporting the non-maleficence principle. How this
prediction model derived from an observational study, applies in clinical practice will naturally depend on
the associations estimated for individual predictor many factors, especially the patient’s wishes, the
variables should not be interpreted as causal effects. evidence base for any interventions, the clinician’s
However, ethical questions must be considered experience, national priorities, and the available
regarding how the tools may be used. We have presented resources. The risk assessment equations therefore
two ways of stratifying risk based on either absolute or supplement clinical decision making and do not
relative risk measures with associated centile values, replace it. With these caveats, the predicted risk
but the choice of whether to have a threshold (given estimates can be used to identify people at higher risk,
that risk is a continuous measure), and if so what to inform shared decision making between healthcare
threshold, will depend on the purpose for which professionals and service users, or for population level
the risk assessment tool is to be used, the available stratification.
resources, and the ethical framework for decision
making. We have analysed this within the “four ethical Strengths and limitations of study
principles” framework that is widely used in medical Our study has some major strengths, but some
decision making. The four principles are autonomy, important limitations, which include the specific
beneficence, justice, and non-maleficence.32 The new factors related to covid-19 along with others that
risk equations, when implemented in clinical software, are similar to those for a range of other widely used
are designed to provide more accurate information for clinical risk prediction algorithms developed using
patients and clinicians on which to base decisions, the QResearch database.14-16 Key strengths include the
thereby promoting shared decision making and patient use of a very large validated data source that has been
autonomy. They are intended to result in clinical used to develop other risk prediction tools; the wealth
benefit by identifying where changes in management of candidate risk predictors; the prospective recording
are likely to benefit patients, thereby promoting the of outcomes and their ascertainment using multiple
principle of beneficence. Justice can be achieved by national level database linkage; lack of selection,
ensuring that the use of the risk equations results recall and respondent biases; and robust statistical
in fair and equitable access to health services that is analysis. We have used non-linear terms for body mass
commensurate with patients’ level of risk. Lastly, the index and age. We examined interaction terms, which

16 doi: 10.1136/bmj.m3731 | BMJ 2020;371:m3731 | the bmj


RESEARCH

BMJ: first published as 10.1136/bmj.m3731 on 20 October 2020. Downloaded from http://www.bmj.com/ on 18 November 2020 at UCL Library Services. Protected by copyright.
Table 5 | Summary characteristics for top 5% of patients with highest predicted absolute risks of covid-19 death. Table
shows results for all members of validation cohort
Total population Total (column %) in top 5% Top 5% predicted risk
Characteristic (n=2 173 056) predicted risk (n=108 652) (row %)
Male sex 1 075 788 63 755 (58.68) 5.93
Age band:
  19-29 years 424 125 * *
  30-39 years 417 590 * *
  40-49 years 358 695 97 (0.09) 0.03
  50-59 years 358 820 1028 (0.95) 0.29
  60-69 years 270 340 6428 (5.92) 2.38
  70-79 years 209 557 25 542 (23.51) 12.19
  ≥80 years 133 929 75 547 (69.53) 56.41
Ethnicity:
 White 1 780 507 90 958 (83.71) 5.11
 Indian 64 184 3034 (2.79) 4.73
 Pakistani 40 718 1863 (1.71) 4.58
 Bangladeshi 28 050 1247 (1.15) 4.45
  Other Asian 42 607 1489 (1.37) 3.49
 Caribbean 28 741 3702 (3.41) 12.88
  Black African 58 115 2884 (2.65) 4.96
 Chinese 29 972 603 (0.55) 2.01
  Other ethnic group 100 162 2872 (2.64) 2.87
Townsend deprivation fifth:
  1 (most affluent) 446 359 20 010 (18.42) 4.48
 2 428 735 20 524 (18.89) 4.79
 3 439 846 23 758 (21.87) 5.40
 4 436 574 23 644 (21.76) 5.42
  5 (most deprived) 409 917 20 437 (18.81) 4.99
  Townsend not recorded 11 625 279 (0.26) 2.40
Accommodation:
  Neither homeless or care home resident 2 155 199 97 210 (89.47) 4.51
  Care home or nursing home resident 14 057 11 269 (10.37) 80.17
 Homeless 3800 173 (0.16) 4.55
Body mass index:
 <18.5 59 376 4188 (3.85) 7.05
 18.5-24.99 711 186 33 122 (30.48) 4.66
 25-29.99 596 942 34 044 (31.33) 5.70
 30-34.99 278 830 18 762 (17.27) 6.73
 ≥35 160 345 13 086 (12.04) 8.16
  Not recorded 366 377 5450 (5.02) 1.49
Chronic kidney disease (CKD)
  No CKD 2 087 614 68 710 (63.24) 3.29
 CKD3 76 600 34 418 (31.68) 44.93
 CKD4 4648 3194 (2.94) 68.72
  CKD5 only 2527 1722 (1.58) 68.14
  CKD5 with dialysis 585 274 (0.25) 46.84
  CKD5 with transplant 1082 334 (0.31) 30.87
Learning disability:
  No learning disability 2 137 759 103 919 (95.64) 4.86
  Learning disability 34 257 4473 (4.12) 13.06
  Down’s syndrome 1040 260 (0.24) 25.00
Chemotherapy:
  No chemotherapy in previous 12 months 2 164 341 105 131 (96.76) 4.86
  Chemotherapy group A 3343 1100 (1.01) 32.90
  Chemotherapy group B 5032 2223 (2.05) 44.18
  Chemotherapy group C 340 198 (0.18) 58.24
Cancer and immunosuppression:
  Blood cancer 10 359 3084 (2.84) 29.77
  Bone marrow or stem cell transplant in previous 6 months 73 56 (0.05) 76.71
  Respiratory cancer 4549 1722 (1.58) 37.85
  Radiotherapy in previous 6 months 4346 1709 (1.57) 39.32
  Solid organ transplant 1147 283 (0.26) 24.67
  GP prescribed immunosuppressant medication 2814 455 (0.42) 16.17
  Prescribed leukotriene or LABA 45 905 9591 (8.83) 20.89
  Prescribed regular prednisolone 11 617 4518 (4.16) 38.89
  Sickle cell disease 717 117 (0.11) 16.32
(Continued)

the bmj | BMJ 2020;371:m3731 | doi: 10.1136/bmj.m3731 17


RESEARCH

BMJ: first published as 10.1136/bmj.m3731 on 20 October 2020. Downloaded from http://www.bmj.com/ on 18 November 2020 at UCL Library Services. Protected by copyright.
Table 5 | Continued
Total population Total (column %) in top 5% Top 5% predicted risk
Characteristic (n=2 173 056) predicted risk (n=108 652) (row %)
Other comorbidities:
  Type 1 diabetes 10 337 861 (0.79) 8.33
  Type 2 diabetes 137 092 40 674 (37.44) 29.67
  Chronic obstructive pulmonary disease 51 026 16 708 (15.38) 32.74
 Asthma 299 632 14 860 (13.68) 4.96
  Rare pulmonary diseases 11 748 2868 (2.64) 24.41
  Pulmonary hypertension or pulmonary fibrosis 1891 1061 (0.98) 56.11
  Coronary heart disease 77 035 29 476 (27.13) 38.26
 Stroke 47 513 20 384 (18.76) 42.90
  Atrial fibrillation 52 764 23 579 (21.70) 44.69
  Congestive cardiac failure 25 255 14 897 (13.71) 58.99
  Venous thromboembolism 38 962 10114 (9.31) 25.96
  Peripheral vascular disease 16 463 8005 (7.37) 48.62
  Congenital heart disease 11 344 1288 (1.19) 11.35
 Dementia 21 984 19 829 (18.25) 90.20
  Parkinson’s disease 5736 2847 (2.62) 49.63
 Epilepsy 29 031 3503 (3.22) 12.07
  Rare neurological conditions 6804 1092 (1.01) 16.05
  Cerebral palsy 2433 233 (0.21) 9.58
  Severe mental illness 246 668 17 428 (16.04) 7.07
  Osteoporotic fracture 87 595 15 933 (14.66) 18.19
  Rheumatoid arthritis or SLE 21 391 3251 (2.99) 15.20
  Cirrhosis of liver 4442 1054 (0.97) 23.73
GP=general practitioner; LABA=long acting β agonist; SLE=systemic lupus erythematosus.
*Values suppressed owing to small numbers <15.

show increased risks at younger ages for adults with We have reported a validation in each of two time
type 2 diabetes. We also established a new linkage periods using practices from QResearch, but these
to the systemic anti-cancer therapy (SACT) database practices were completely separate from those used
for chemotherapy prescribed and administered in to develop the model. We have used this approach
secondary care (which may not be recorded well in previously to develop and validate other widely used
general practice software) to circumvent possible prediction models. When these have been further
missing data for this important variable. externally validated on completely different clinical
Specific limitations include the occurrence of databases, by ourselves and others, the results have
shielding during the study period and that the study was been very similar.33-35 Work is already under way to
conducted during the first phase of the UK epidemic. evaluate the models in external datasets across all
We have accounted for many risk factors for covid-19 four nations of the UK and to integrate the algorithms
mortality, but risks may be conferred by some rare within NHS clinical software systems.
medical conditions or other factors such as occupation
that have not yet been observed or are poorly recorded Policy implication and conclusions
in general practice or hospital data. In particular, the This study presents robust risk prediction models that
model does not include two important predictors— could be used to stratify risk in populations for public
namely, prevailing infection rate and personal social health purposes in the event of a “second wave” of the
distancing measures. A lack of comprehensive testing pandemic and support shared management of risk. We
has led to some missing data on covid-19 admissions anticipate that the algorithms will be updated regularly
and/or deaths, which means that development of a as understanding of covid-19 increases, as more data
valid model for predicting death in people infected become available, as behaviour in the population
with SARS-CoV-2 is not yet possible. We acknowledge changes, or in response to new policy interventions.
that absolute risks are changing during the course of It is important for patients/carers and clinicians that
the pandemic, so these should be interpreted with a common, appropriately developed, evidence based
caution. However, we would expect predictors of risk, model exists that is consistently implemented and
relative risk measures, and discrimination to be more is supported by the academic, clinical, and patient
stable over time, which is consistent with the results communities. This will then help to ensure consistent
from our temporal validation. Although this tool was policy and clear national communication between
modelled on the best available data from the first policy makers, professionals, employers, and the
wave of the pandemic, it will be updated as further public.
testing and outcome data accrue, immunity levels
change, and (potentially) a vaccine becomes available. Author affiliations
1
Nuffield Department of Primary Care Health Sciences, Radcliffe
Nevertheless, having a risk score available at this stage Observatory Quarter, Oxford OX2 6GG, UK
of the pandemic may be useful to identify people at 2
Division of Primary Care, School of Medicine, University of
high risk before a vaccine or treatment is available. Nottingham, Nottingham, UK

18 doi: 10.1136/bmj.m3731 | BMJ 2020;371:m3731 | the bmj


RESEARCH

BMJ: first published as 10.1136/bmj.m3731 on 20 October 2020. Downloaded from http://www.bmj.com/ on 18 November 2020 at UCL Library Services. Protected by copyright.
3
Faculty of Epidemiology and Population Health, London School of produces open and closed source software to implement clinical risk
Hygiene and Tropical Medicine, London, UK algorithms (outside this work) into clinical computer systems; CC
4
Usher Institute, University of Edinburgh, Edinburgh, UK reports receiving personal fees from ClinRisk, outside this work; AH is
5 a member of the New and Emerging Respiratory Virus Threats Advisory
UCL Institute of Epidemiology and Health Care, University College Group; PJ was employed by NHS England during the conduct of the
London, London, UK study and has received grants from Epizyme and Janssen and personal
6
UCL Institute for Health Informatics, University College London, fees from Takeda, Bristol-Myers-Squibb, Novartis, Celgene, Boehringer
London, UK Ingelheim, Kite Therapeutics, Genmab, and Incyte, all outside the
7
Centre for Tropical Medicine and Global Health, University of submitted work; AKC has previously received personal fees from Huma
Oxford, Oxford, UK Therapeutics, outside of the scope of the submitted work; RL has
8 received grants from Health Data Research UK outside the submitted
Office of the Chief Medical Officer, Department of Health and Social
work; AS has received grants from the Medical Research Council (MRC)
Care, London, UK
9
and Health Data Research UK during the conduct of the study; CS has
NHS Digital, Leeds, UK received grants from the DHSC National Institute of Health Research
10
Diabetes Research Centre, University of Leicester, Leicester, UK UK, MRC UK, and the Health Protection Unit in Emerging and Zoonotic
11
Winton Centre for Risk and Evidence Communication, Faculty of Infections (University of Liverpool) during the conduct of the study and
Mathematics, University of Cambridge, Cambridge, UK is a minority owner in Integrum Scientific LLC (Greensboro, NC, USA)
12
outside of the submitted work; KK has received grants from NIHR, is
NHS England, London, UK the national lead for ethnicity and diversity for the National Institute for
13
Swansea University, Swansea, UK Health Applied Research Collaborations, is director of the University of
14
Centre for Primary Care and Public Health, Queen Mary University Leicester Centre for Black Minority Ethnic Health, was a steering group
of London, London, UK member of the Risk reduction Framework for NHS staff (chair) and for
15 Adult care Staff, is a member of Independent SAGE, and is supported
Institute of Infection, Veterinary and Ecological Sciences, by the NIHR Applied Research Collaboration East Midlands (ARC EM)
University of Liverpool, Liverpool, UK and the NIHR Leicester Biomedical Research Centre (BRC); RHK was
16
Centre for Public Health, Queen’s University Belfast, Belfast, UK supported by a UKRI Future Leaders Fellowship (MR/S017968/1);
17
Association of Local Authority Medical Advisors, London, UK KDO was supported by a grant from the Alan Turing Institute Health
18 Programme (EP/T001569/1); no other relationships or activities
Imperial College London, London, UK
that could appear to have influenced the submitted work. The views
We acknowledge the contribution of EMIS practices who contribute to expressed are those of the author(s) and not necessarily those of the
QResearch and EMIS Health and the Universities of Nottingham and NIHR, the NHS, or the Department of Health and Social Care.
Oxford for expertise in establishing, developing, or supporting the
Ethical approval: The QResearch ethics approval is with East
QResearch database. This project involves data derived from patient
Midlands-Derby Research Ethics Committee (reference 18/EM/0400).
level information collected by the NHS, as part of the care and support
of cancer patients. The data are collated, maintained, and quality Data sharing: To guarantee the confidentiality of personal and health
assured by the National Cancer Registration and Analysis Service, information, only the authors have had access to the data during the
which is part of Public Health England (PHE). Access to the data was study in accordance with the relevant licence agreements. Access to
facilitated by the PHE Office for Data Release. The Hospital Episode the QResearch data is according to the information on the QResearch
Statistics data used in this analysis are reused by permission from website (www.qresearch.org).
NHS Digital, which retains the copyright in that data. We thank the The lead author affirms that the manuscript is an honest, accurate,
Office for National Statistics (ONS) for providing the mortality data. and transparent account of the study being reported; that no
NHS Digital, PHE, and the ONS bear no responsibility for the analysis important aspects of the study have been omitted; and that any
or interpretation of the data. We express our gratitude to Anne Rigg, discrepancies from the study as planned (and, if relevant, registered)
Nisha Shaunak, Tom Charlton, Ana Montes, Claire Harrison, Susan have been explained.
Robinson, David Wrench, Matthew Streetly, Omer BenGal, Doraid
Dissemination to participants and related patient and public
Alrifai, and Rajjinder Nijjar for aiding the authors (notably PJ and JHC)
communities: Patient representatives from the QResearch Advisory
with the classification of agents on the SACT dataset linkage used in
Board have advised on dissemination of studies using QResearch
this study and to David Coggon for general comments on the study
data, including the use of lay summaries describing the research and
protocol and interpretation.
its findings.
Contributors: JHC, CC, AKC, RK, KDO, PH, and NM led study
Provenance and peer review: Not commissioned; externally peer
conceptualisation. All authors contributed to the development of the
reviewed.
research question and study design, with development of advanced
statistical aspects led by JHC, CC, RK, KDO, and AKC. JHC, AKC, CC, JB, This is an Open Access article distributed in accordance with the
and PJ were involved in data specification, curation, and collection. terms of the Creative Commons Attribution (CC BY 4.0) license, which
JHC and AKC developed, checked, or updated clinical code groups. JHC permits others to distribute, remix, adapt and build upon this work,
did the statistical analyses, which were checked by CC. JHC developed for commercial use, provided the original work is properly cited. See:
the software for the web calculator. All authors contributed to the http://creativecommons.org/licenses/by/4.0/.
interpretation of the results. AKC and JHC wrote the first draft of the
paper. All authors contributed to the critical revision of the manuscript 1  Johns Hopkins Coronavirus Resource Center. Global map. 2020.
for important intellectual content and approved the final version of the https://coronavirus.jhu.edu/map.html.
manuscript. The corresponding author attests that all listed authors 2  Cowling BJ, Ali ST, Ng TWY, et al. Impact assessment of non-
meet authorship criteria and that no others meeting the criteria have pharmaceutical interventions against coronavirus disease 2019
and influenza in Hong Kong: an observational study. Lancet Public
been omitted. JHC is the guarantor.
Health 2020;5:e279-88. doi:10.1016/S2468-2667(20)30090-6 
Funding: This study is funded by a grant from the National Institute for 3  Davies NG, Kucharski AJ, Eggo RM, Gimma A, Edmunds WJ, Centre
Health Research (NIHR) following a commission by the Chief Medical for the Mathematical Modelling of Infectious Diseases COVID-19
Officer for England, whose office contributed to the development of working group. Effects of non-pharmaceutical interventions on
the study question, facilitated access to relevant national datasets, COVID-19 cases, deaths, and demand for hospital services in the
and contributed to interpretation of data and drafting of the report. UK: a modelling study. Lancet Public Health 2020;5:e375-85.
doi:10.1016/S2468-2667(20)30133-X 
Competing interests: All authors have completed the ICMJE uniform
4  Zhou F, Yu T, Du R, et al. Clinical course and risk factors for mortality
disclosure form at www.icmje.org/coi_disclosure.pdf and declare: JHC of adult inpatients with COVID-19 in Wuhan, China: a retrospective
has received grants from the National Institute for Health Research cohort study. Lancet 2020;395:1054-62. doi:10.1016/S0140-
Biomedical Research Centre, Oxford, John Fell Oxford University Press 6736(20)30566-3 
Research Fund, Cancer Research UK (grant number C5255/A18085) 5  Yancy CW. COVID-19 and African Americans. JAMA 2020;323:1891-
through the Cancer Research UK Oxford Centre, and the Oxford 2. doi:10.1001/jama.2020.6548 
Wellcome Institutional Strategic Support Fund (204826/Z/16/Z) 6  Chen T, Wu D, Chen H, et al. Clinical characteristics of 113 deceased
during the conduct of the study, is an unpaid director of QResearch, patients with coronavirus disease 2019: retrospective study.
a not-for-profit organisation which is a partnership between the BMJ 2020;368:m1091. doi:10.1136/bmj.m1091 
University of Oxford and EMIS Health who supply the QResearch 7  Weiss P, Murdoch DR. Clinical course and mortality risk of severe
database used for this work, and is a founder and shareholder of COVID-19. Lancet 2020;395:1014-5. doi:10.1016/S0140-
ClinRisk Ltd and was its medical director until 31 May 2019; ClinRisk 6736(20)30633-4 

the bmj | BMJ 2020;371:m3731 | doi: 10.1136/bmj.m3731 19


RESEARCH

BMJ: first published as 10.1136/bmj.m3731 on 20 October 2020. Downloaded from http://www.bmj.com/ on 18 November 2020 at UCL Library Services. Protected by copyright.
8  Wadhera RK, Wadhera P, Gaba P, et al. Variation in COVID-19 22  Royston P. Explained variation for survival models. Stata J 2006;6:1-
Hospitalizations and Deaths Across New York City Boroughs. 14. doi:10.1177/1536867X0600600105
JAMA 2020;323:2192-5. doi:10.1001/jama.2020.7197  23  Royston P, Sauerbrei W. A new measure of prognostic separation in
9  Le Brocq S, Clare K, Bryant M, Roberts K, Tahrani AA, writing group survival data. Stat Med 2004;23:723-48. doi:10.1002/sim.1621 
form Obesity UK, Obesity Empowerment Network, UK Association for 24  Harrell FEJr, Lee KL, Mark DB. Multivariable prognostic models:
the Study of Obesity. Obesity and COVID-19: a call for action from issues in developing models, evaluating assumptions and adequacy,
people living with obesity. Lancet Diabetes Endocrinol 2020;8:652- and measuring and reducing errors. Stat Med 1996;15:361-87.
4. doi:10.1016/S2213-8587(20)30236-9  doi:10.1002/(SICI)1097-0258(19960229)15:4<361::AID-
10  Sattar N, McInnes IB, McMurray JJV. Obesity Is a Risk SIM168>3.0.CO;2-4 
Factor for Severe COVID-19 Infection: Multiple Potential 25  Steyerberg EW, Vickers AJ, Cook NR, et al. Assessing the
Mechanisms. Circulation 2020;142:4-6. doi:10.1161/ performance of prediction models: a framework for traditional and
CIRCULATIONAHA.120.047659  novel measures. Epidemiology 2010;21:128-38. doi:10.1097/
11  Singh AK, Gillies CL, Singh R, et al. Prevalence of co-morbidities EDE.0b013e3181c30fb2 
and their association with mortality in patients with COVID-19: A 26  Newson RB. Comparing the predictive powers of survival
systematic review and meta-analysis. Diabetes Obes Metab 2020; models using Harrell’s C or Somers’ D. Stata J 2010;10:339-58.
doi:10.1111/dom.14124 doi:10.1177/1536867X1001000303
12  Smith GD, Spiegelhalter D. Shielding from covid-19 should be 27  Booth S, Riley RD, Ensor J, Lambert PC, Rutherford MJ. Temporal
stratified by risk. BMJ 2020;369:m2063. doi:10.1136/bmj.m2063  recalibration for improving prognostic model development and risk
13  Wynants L, Van Calster B, Collins GS, et al. Prediction models for predictions in settings where survival is improving over time. Int J
diagnosis and prognosis of covid-19 infection: systematic review and Epidemiol 2020;dyaa030. doi:10.1093/ije/dyaa030 
critical appraisal. BMJ 2020;369:m1328. doi:10.1136/bmj.m1328  28  Riley RD, Ensor J, Snell KIE, et al. External validation of clinical
14  Hippisley-Cox J, Coupland C, Brindle P. Development and prediction models using big datasets from e-health records or IPD
validation of QRISK3 risk prediction algorithms to estimate meta-analysis: opportunities and challenges. BMJ 2016;353:i3140.
future risk of cardiovascular disease: prospective cohort study. doi:10.1136/bmj.i3140 
BMJ 2017;357:j2099. doi:10.1136/bmj.j2099  29  Pike MM, Decker PA, Larson NB, et al. Improvement in Cardiovascular
15  Hippisley-Cox J, Coupland C. Development and validation of Risk Prediction with Electronic Health Records. J Cardiovasc Transl
QDiabetes-2018 risk prediction algorithm to estimate future risk of Res 2016;9:214-22. doi:10.1007/s12265-016-9687-z 
type 2 diabetes: cohort study. BMJ 2017;359:j5019. doi:10.1136/ 30  Kengne AP, Beulens JW, Peelen LM, et al. Non-invasive risk scores for
bmj.j5019  prediction of type 2 diabetes (EPIC-InterAct): a validation of existing
16  Hippisley-Cox J, Coupland C. Development and validation of models. Lancet Diabetes Endocrinol 2014;2:19-29. doi:10.1016/
QMortality risk prediction algorithm to estimate short term risk S2213-8587(13)70103-7 
of death and assess frailty: cohort study. BMJ 2017;358:j4208. 31  Williamson EJ, Walker AJ, Bhaskaran K, et al. Factors associated with
doi:10.1136/bmj.j4208  COVID-19-related death using OpenSAFELY. Nature 2020;584:430-
17  Hippisley-Cox J, Clift AK, Coupland CAC, et al. Protocol for the 6. doi:10.1038/s41586-020-2521-4 
development and evaluation of a tool for predicting risk of short-term 32  Gillon R. Medical ethics: four principles plus attention to scope.
adverse outcomes due to COVID-19 in the general UK population. BMJ 1994;309:184-8. doi:10.1136/bmj.309.6948.184 
medRxiv 2020:2020.06.28.20141986-2020.06.28. 33  Collins GS, Altman DG. An independent external validation and
18  Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent evaluation of QRISK cardiovascular risk prediction: a prospective
Reporting of a multivariable prediction model for Individual open cohort study. BMJ 2009;339:b2584. doi:10.1136/bmj.b2584 
Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. Ann Intern 34  Collins GS, Altman DG. An independent and external validation of
Med 2015;162:55-63. doi:10.7326/M14-0697  QRISK2 cardiovascular disease risk score: a prospective open cohort
19  Benchimol EI, Smeeth L, Guttmann A, et al, RECORD Working study. BMJ 2010;340:c2442. doi:10.1136/bmj.c2442 
Committee. The REporting of studies Conducted using Observational 35  Collins GS, Altman DG. Predicting the 10 year risk of cardiovascular
Routinely-collected health Data (RECORD) statement. PLoS disease in the United Kingdom: independent and external
Med 2015;12:e1001885. doi:10.1371/journal.pmed.1001885  validation of an updated version of QRISK2. BMJ 2012;344:e4181.
20  Townsend P, Davidson N. The Black report. Penguin, 1982. doi:10.1136/bmj.e4181 
21  Fine JP, Gray RJ. A Proportional Hazards Model for the Subdistribution
of a Competing Risk. J Am Stat Assoc 1999;94:496-509. doi:10.108
0/01621459.1999.10474144 Web appendix: Supplementary materials

No commercial reuse: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe

You might also like