You are on page 1of 29

Identifying COVID-19 patient profiles in the Basque Country: A clustering

approach

Lander Rodriguez1*, Daniel Fernández2,3, José M. Quintana-Lopez4,5,6,7, Julia Garcia-

Asensio8, Ane Villanueva4,5,6,7, Maria Jose Legarreta4,5,6,7, Nere Larrea4,5,6,7, Irantzu

Barrio9,1

1
Applied Statistics Group, Basque Centre for Applied Mathematics (BCAM), Bilbao,
Basque Country, Spain
2
Serra Húnter Fellow. Department of Statistics and Operations Research (DEIO).
Universitat Politècnica de Catalunya · BarcelonaTech (UPC), Barcelona, Catalonia,
Spain
3
Institute of Mathematics of UPC - BarcelonaTech (IMTech), Barcelona, Catalonia,
Spain
4
Research Unit of the Galdakao-Usansolo University Hospital, Osakidetza Basque
Health Service, Galdakao, Basque Country, Spain
5
Network for Research on Chronicity, Primary Care, and Health Promotion (RICAPPS)
6
Health Service Research Network on Chronic Diseases (REDISSEC), Bilbao, Basque
Country, Spain
7
Kronikgune Institute for Health Services Research, Barakaldo, Basque Country, Spain
8
Office of Healthcare Planning, Organization and Evaluation, Basque Government
Department of Health, Basque Country, Spain
9
Department of Mathematics, University of the Basque Country UPV/EHU, Leioa,
Basque Country, Spain

*Corresponding author: Lander Rodriguez


E-mail: lrodriguez@bcamath.org (LR)
Abstract

The classification of patients is essential in the outbreak of a pandemic to identify the

worst prognostic patients. In this research our aim is to identify clinically useful profiles

with a novel clustering technique and to demonstrate their association with the adverse

evolution of the COVID-19 disease.

We implement a two-stage process in this retrospective cohort study: first we identify

the profiles of SARS-CoV-2 positive patients with the KAMILA clustering technique

and then we assess their association with adverse outcomes such as mortality, bad

progress (ICU or death) and hospital admission. The profiles are created for four

different periods of the pandemic through a population-based database containing

sociodemographic, comorbidities and baseline treatments data.

In general, four different groups have been identified: Very low, young patients with

almost no comorbidities; Low, middle-aged patients with few comorbidities; High, old

patients with different number of comorbidities; and Very high, old patients with

multiple comorbidities. The variables that mainly segregate these clusters are age, the

Charlson index, diabetes, kidney disease, metastatic solid tumor and heart failure. In

addition, these profiles strongly associate with the adverse outcomes of COVID-19.

Finally, even if the identified profiles were stable along the pandemic, the hospital

admissions, bad progress and death rates decreased.

To our best knowledge, this is the first study determining COVID-19 patient profiles

from COVID-19 positives of the population and to assess their evolution over time. Our

findings suggest the appropriateness of clustering methods for a quick classification of

the most vulnerable patients in new pandemics or other diseases for an improved

medical attention.
Introduction

It is crucial to quickly identify the worst prognostic patients in the outbreak of a new

virulent virus. In a context with great uncertainty, high circulation of the virus and high

hospitalization and death rates, it is critical to rapidly segregate patients so that targeted

care intervention strategies can be developed for an improved medical attention. In the

pages that follow, our purpose is to classify patients according to their clinical and

sociodemographic profiles and subsequently show they are related with the evolution of

the disease. In this case the profiles are related to the COVID-19 outcomes, although

they are independent of the virus and could be related to any disease or new pandemics.

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection began in

December 2019 [1] and the World Health Organization declared a global pandemic in

March 2020. The disease became a threat to public health [2] due to its ease of

transmission and the number of deaths caused throughout the world [3]. This prompted

governments worldwide to take urgent action to contain the spread of the virus and

mitigate its effects.

The virulence of this pandemic precipitated the creation of a vast number of models to

further understand the disease. These models were mainly developed with Machine

Learning (ML) and advanced statistical methods. Both type of methods are able to

extract relevant variables from electronic health records [4], which could be a valuable

tool for either predicting adverse outcomes or classifying patients based on their

similarities and differences in baseline clinical and sociodemographic variables. For the

latter, clustering techniques are appropriate as they discover hidden and inherent

patterns to organize data into groups without any a priori hypothesis [5] and have

already been successfully applied in medical research [6-8]. However, these methods

1
received very little attention to classify patients during the pandemic. Even if patient

profiles were identified with clustering techniques in [9] and [10], the studies were

restricted to hospitalized patients.

One of the main challenges of clustering methods is their application to mixed-type data

i.e. categorical and continuous variables. This unsupervised learning task is often

accomplished with either categorical or continuous variables although clinical research

usually involves both of them. In this regard, various techniques have been developed to

overcome the inherent difficulties of applying mathematical operations to both types of

variables. However, given a specific context, there is in general no clear guidance to

choose the most appropriate technique [11].

The novel KAMILA (KAy-means for MIxed LArge data) clustering approach [12,13] is

suitable when handling mixed-type data as suggested by different benchmarking studies

[11, 14]. Among the different methods, KAMILA in general offers superior

performance, which is emphasized when dealing with large datasets. In addition, it

provides the best performance in time efficiency thanks to the scalability of the

algorithm [11]. KAMILA overcomes different challenges of the methods employed for

clustering mixed-type data, like: the requirement of strong parametric assumptions, the

incapacity to minimize the contributions of individual variables and the need of an

arbitrary choice of weightings for the relative contribution of the variables [13].

In this research work we identify the COVID-19 patient profiles from the Basque

Country for the most important periods of the pandemic. This is accomplished through

the implementation of a two-stage process. First, we identify the COVID-19 patient

profiles of the positives from the Basque Country with KAMILA and later we assess

their association with the adverse outcomes of the disease. We hypothesize that the

obtained groups will be associated with the disease severity, leading to a clinically
2
useful segregation of the patients. In addition, we explore the differences among clusters

and investigate their evolution along the pandemic.

Materials and Methods

This is a retrospective study of a cohort based on data from the electronic database and

health records of the health service from the Basque Country.

Database

All the patients included in this study were residents of the Basque Country, a region

with a population of 2.18 million, who had SARS-CoV-2 between March 1, 2020 and

January 9, 2022. COVID-19 diagnosis was laboratory confirmed by a positive result on

the reverse transcriptase-polymerase chain reaction assay for SARS-CoV-2 or a positive

antigen test. Also, from March 1, 2020 to July 31, 2020 positive IgM or IgG antibody

tests performed to patients having symptoms suggestive of the disease or having had

contact with a positive case were included in the sample. The authors did not have

access to information that could identify individual participants during or after data

collection. The need for consent was waived by the ethics committee due to the

pandemic situation. The study protocol was approved by the Ethics Committee for our

area (reference PI2020123).

Patient data was included in a unified electronic database of the Basque Country health

service. The data includes sociodemographic data (age, sex, and nursing home

residents’ indicator); vaccination dates and doses; baseline comorbidities (all those

included in Charlson Comorbidity Index [15] plus angina, arrhythmia, arterial

hypertension, dyslipidemia, asthma, bronchiectasis, cystic fibrosis, interstitial lung

disease, lymphoma, leukemia, coagulopathy, inflammatory bowel disease and

3
gastrointestinal bleeding); and baseline treatments based on the Anatomical,

Therapeutic, Chemical (ATC) classification system [16].

We grouped comorbidities in the following way: myocardial infarction; angina;

arrhythmia; congestive heart failure; peripheral vascular disease; cerebrovascular

disease; hemiplegia and/or paraplegia; arterial hypertension; dyslipidemia; dementia;

interstitial pulmonary disease; cystic fibrosis; chronic obstructive pulmonary disease

(COPD); bronchiectasis; chronic bronchial infection; asthma; liver disease (mild liver,

moderate or severe liver disease); diabetes (diabetes with/without organ damage);

kidney disease; malignant tumor; metastatic solid tumor; lymphoma; rheumatic disease;

peptic ulcer; inflammatory bowel disease; and coagulopathies. For baseline medication

the baseline treatment was defined as any drug prescribed before diagnosis with SARS-

CoV-2 infection and had no end date.

Vaccine doses were coded in the following manner: the first dose was considered 14

days after the inoculation of the vaccine whereas the second and third doses were

considered the day the inoculation occurred. There were no fourth doses in the period of

study. The vaccination variable was determined as a three-level categorical variable: no

dose or 1 dose, 2 doses, and 3 doses. This categorization was decided because in this

region the first dose was a transitional dose to the second one, the moment a patient was

considered to be protected against the virus, and there was only three weeks difference

between the first two doses. Thus, we considered that getting one dose was more similar

to having none than to be fully vaccinated. Apart from that, the third dose was

considered a booster dose, and thus, we decided to separate it from the full vaccination

indicator.

4
The outcomes of interest in the study were hospitalization, bad progress (ICU or death)

and death. Their definition is shown below:

 Hospitalization: when a patient tested positive for COVID-19 before

hospitalization, hospital admission was considered COVID-19 related if it

occurred within 15 days of the positive test. If the patient tested positive during

hospitalization, hospital admission was considered COVID-19 related up to 21

days after the positive test. This last definition was included to account for the

lack of testing capacity at the beginning of the pandemic.

 Death: if the patient died during the three months following COVID-19

diagnosis or during a hospitalization, or three months from hospital discharge

by a COVID-19 admission.

 Bad progress (ICU or death): when the patient died (as defined above) or had

an ICU admission during a hospital admission related to a SARS-CoV-2

infection.

The data of the study was collected from March 1, 2020 until April 9, 2022 and it was

accessed on April 18, 2022 for research purposes.

Statistical Analysis

The full period of collection of data was divided into 4 different periods: The first

period spans from March 1, 2020 until June 30, 2020 when the first wave of the

pandemic took place; the second one spans from July 1, 2020 until December 31, 2020

when the vaccination process started; the third one takes place from January 1, 2021

until December 13, 2021 when the Omicron wave started; and the last period covers the

Omicron wave until January 9, 2022. On Januray 9, 2022 the Basque Government

5
modified its protocol for collecting COVID-19 positive data, which restricted the time

span of data acquisition for the Omicron wave.

Due to our interest in the first stages of the pandemic only the first positive from each

patient was included in the study. Additionally, only adult patients were considered.

Descriptive statistics included frequency tables for categorical variables and median and

interquartile ranges for numerical ones. Vaccination data was only available for periods

3 and 4 as the vaccination process started in the third period.

Patients were clustered based on KAMILA for the different periods of the pandemic.

Clusters were determined with all the available variables except the disease outcomes

and the nursing home residents’ indicator, whose association with the clusters was later

assessed. The numerical variables (i.e. age and the Charlson index) were standardized to

avoid the variable units manage the clustering structure. The number of clusters studied

for each period was from a minimum of two to a maximum of five as more groups were

considered excessive for a correct clinical distinction of the patients. The optimal

number of clusters was selected based on the prediction strength method [17] with a

threshold of 0.8, as suggested by the authors. This procedure was accomplished with the

kamila R package [13]. All statistical analyses were performed using R (version 4.1.2)

[18].

The resulting number of clusters were labeled as Very low, Low, High and Very high

based on their defining characteristics. Presumably, this labeling would later indicate

the likelihood of the adverse outcomes of COVID-19, although this assumption should

be ratified following the clusters creation in an unsupervised manner.

Post-hoc Analysis

6
Apart from the descriptive characteristics of the different clusters two more aspects

were examined. First, we investigated if the clusters were significantly different for the

same period. Second, we studied if the same risk level clusters had evolved during the

pandemic. The variables were compared pairwise with the Pearson Chi-squared test for

the categorical variables and the two-sided Mann-Whitney U test for continuous

variables with an alpha of 0.01 to be considered statistically significant. The Shapiro-

Wilk test was previously done on the continuous variables to confirm their distribution

was not normal. Due to the large sample size, the effect size was measured with

Cramer’s V [19] for categorical variables and Vargha and Delaney’s A [20] for

continuous ones. Large effect sizes were considered as suggested by the previous

authors, respectively: Cramer’s V values superior to 0.5 for 1 degree of freedom

variables and higher than 0.35 for 2 degrees of freedom variables, and Vargha and

Delaney’s A values higher than 0.71 or lower than 0.29 for continuous variables.

Results

Summary statistics of the sociodemographic variables and the background

characteristics of the whole sample for the different periods can be seen in Table 1. The

rest of the variables can be found online in the S1 Table. Significant differences exist

among the variables for the different periods, being the first the most dissimilar one. In

general, the proportion of the comorbidities, nursing home residents, and the median

and interquartile ranges of age and the Charlson index decrease with time. Of interest

here is the reduction of hospitalization, bad progress (ICU or death), and death

percentages from period 1 to 4 (Omicron).

Table 1. Descriptive characteristics of COVID-19 patients for the different pandemic

periods.

7
Variables Period 1 Period 2 Period 3 Omicron
TOTAL 20,457 (5.38%) 79,942 (21.03%) 140,672 (37.01%) 139,018 (36.58%)
Sociodemographic variables
Gender2-4, N (%)
Female 12,529 (61.25) 42,164 (52.74) 71,345 (50.72) 73,522 (52.89)
Male 7,928 (38.75) 37,778 (47.26) 69,327 (49.28) 65,496 (47.11)
3-4
Age, Median [Q1,Q3] 57 [44,75] 47 [33,61] 44 [30,58] 44 [32,55]
Nursing home, N (%) 3,523 (17.22) 2,530 (3.16) 1,556 (1.11) 1,011 (0.73)
Vaccines, N (%)
0-1 dose 103,867 (73.84) 15,683 (11.28)
2 doses 35,382 (25.15) 99,866 (71.84)
3 doses 1,423 (1.01) 23,469 (16.88)
Comorbidities
Charlson index, Median [Q1,Q3] 0 [0,2] 0 [0,1] 0 [0,1] 0 [0,1]
Myocardial infarction, N (%) 959 (4.69) 1,800 (2.25) 2,618 (1.86) 2,127 (1.53)
Congestive heart failure, N (%) 1,679 (8.21) 2,619 (3.28) 3,357 (2.39) 2,281 (1.64)
Peripheral vascular disease, N 1,058 (5.17) 2,234 (2.79) 2,992 (2.13) 2,363 (1.70)
(%)
Cerebrovascular disease, N (%) 2,484 (12.14) 4,922 (6.16) 6,884 (4.89) 5,809 (4.18)
Dementia, N (%) 1,685 (8.24) 2,042 (2.55) 1,798 (1.28) 1,104 (0.79)
COPD, N (%) 3,693 (18.05) 12,207 (15.27) 22,251 (15.82) 22,512 (16.19)
Rheumatic disease, N (%) 638 (3.12) 1,428 (1.79) 2,114 (1.50) 1,908 (1.37)
Peptic ulcer, N (%) 700 (3.42) 1,590 (1.99) 2,400 (1.71) 2,080 (1.50)
Liver disease, N (%)
Mild 1,072 (5.24) 2,800 (3.50) 4,359 (3.10) 3,994 (2.87)
Moderate/Severe 140 (0.68) 264 (0.33) 335 (0.24) 296 (0.21)
Diabetes, N (%)
Yes, without organ damage 2,151 (10.51) 4,967 (6.21) 7,075 (5.03) 5,312 (3.82)
Yes, with organ damage 531 (2.60) 924 (1.16) 1,266 (0.90) 853 (0.61)
Hemiplegia / Paraplegia, N (%) 442 (2.16) 773 (0.97) 918 (0.65) 694 (0.50)
Kidney, N (%) 2,206 (10.78) 4,606 (5.76) 7,086 (5.04) 6,055 (4.36)
Metastatic solid tumor, N (%) 507 (2.48) 1,193 (1.49) 1,746 (1.24) 1,381 (0.99)
Heart failure, N (%) 1,679 (8.21) 2,619 (3.28) 3,357 (2.39) 2,281 (1.64)
Angina, N (%) 717 (3.50) 1,402 (1.75) 1,948 (1.38) 1,523 (1.10)
Arterial hypertension, N (%) 7,109 (34.75) 17,614 (22.03) 25,611 (18.21) 21,390 (15.39)
Dyslipidemia, N (%) 6,494 (31.74) 17,367 (21.72) 26,653 (18.95) 24,842 (17.87)
Lymphoma, N (%) 940 (4.60) 4,055 (5.07) 8,673 (6.17) 9,712 (6.99)
Gastrointestinal bleeding, N (%) 361 (1.76) 605 (0.76) 857 (0.61) 661 (0.48)
Chronic bronchitis, N (%) 1,441 (7.04) 3,575 (4.47) 5,810 (4.13) 5,201 (3.74)
Cystic fibrosis, N (%) 568 (2.78) 830 (1.04) 1,136 (0.81) 884 (0.64)
Outcome variables
Hospitalization, N (%) 5,486 (26.82) 6,943 (8.69) 10,951 (7.78) 2,236 (1.61)
Death, N (%) 1,678 (8.20) 1,764 (2.21) 1,901 (1.35) 584 (0.42)
Bad progress, N (%) 2,028 (9.91) 2,334 (2.92) 3,112 (2.21) 761 (0.55)
Footnote: The superscripts found in the variable names indicate the periods in which that variable

satisfies the independence test with an alpha of 0.01. In case no superscripts are shown, it means there

are significant differences among all the periods. Only sociodemographic variables, comorbidities with

significant differences among all periods and the outcomes are included.

8
Turning now to the clusters’ identification, four clusters, from Very low to Very high,

were identified in Periods 1 to 3, while three clusters were identified in the Omicron

period. In Fig 1 COVID-19 patients are plotted according to their age and the Charlson

index and are colored by the identified clusters. For each cluster, the median values of

age and the Charlson index are represented by bigger dots. In addition, tables with the

sample size of the different clusters are shown for each period. Age is the variable that

differentiates the Very low and Low clusters while the Charlson index is similar in these

clusters for all the periods. In contrast, the High and Very high clusters have similar

ages but different Charlson index values, being higher for the Very high group. Of note

here is that in the last period there is just one High-Very high severity cluster with a

Charlson index in-between the one obtained in the previous periods.

In order to analyze the differences of the clusters in more detail, in Fig 2 the age and the

Charlson index density plots corresponding to the clusters identified in Fig 1 are shown.

The Very low and Low clusters present different age distributions and similar Charlson

index density distributions. On the other side, the High and Very high clusters have

similar age distributions but different Charlson index distributions. In the last period the

High-Very high cluster has a similar age distribution to these clusters in the previous

periods while the Charlson index density distribution is a combination of the

distributions of these two clusters from the previous periods.

Fig 3 shows the proportions of hospitalization, bad progress and death in each cluster

for all the periods. The clusters created in an unsupervised manner present a stepped

proportion of adverse outcomes of the disease. Actually, the proportion of the outcomes

increases with the risk level of the clusters for all the periods and outcomes.

In Table 2 the summary of the differences between clusters for the same period can be

seen. Only the variables with significant differences in the pairwise tests performed
9
between the different level clusters and the ones that have at least one large effect size

between clusters in those tests are shown in Table 2. The numeric superscript found on

top of the variable values that define the clusters indicate the cluster indexes that have

large effect sizes with that specific variable for that specific cluster. All the outcomes

were also included. The rest of the comparisons can be found online in the S2 Table.

Table 2. Cluster descriptive characteristics by period and differences for each period by

cluster.

Very low [1] Low [2] High [3] Very high [4]
Period 1 N = 11148 N = 4996 N = 3051 N = 1262
Age, Median [Q1,Q3] 46 [36-54]2,3,4 71 [60-82]1,3 85 [76-90]1,2 80 [70-87]1
Nursing home, N (%) 168 (1.51)3,4 1295 (25.92) 1557 (51.03)1 503 (39.86)1
3,4 3,4 1,2,4
Charlson index, Median [Q1,Q3] 0 [0-0] 1 [0-2] 3 [2-4] 7 [6-9]1,2,3
Congestive heart failure N (%) 36 (0.32)3,4 23 (0.46)4 1043 (34.19)1 577 (45.72)1,2
4
Peripheral vascular disease, N (%) 38 (0.34) 133 (2.66) 489 (16.03) 398 (31.54)1
Cerebrovascular disease, N (%) 250 (2.24)3,4 435 (8.71) 1255 (41.13)1 544 (43.11)1
1-4
Liver disease, N (%)
Mild 190 (1.70) 369 (7.39) 278 (9.11) 235 (18.62)
Moderate/severe 11 (0.10) 16 (0.32) 15 (0.49) 98 (7.77)
1-3,1-4,2-4,3-4
Diabetes, N (%)
Yes, without organ damage 0 (0.92) 818 (16.37) 872 (28.58) 358 (28.37)
Yes, with organ damage 2 (0.02) 36 (0.72) 134 (4.39) 359 (28.45)
Kidney disease, N (%) 237 (2.13)4- 305 (6.1)4 910 (29.83) 754 (59.75)1,2
Metastatic solid tumor, N (%) 47 (0.42)3,4 22 (0.44)3,4 0 (0)1,2 438 (34.71)1,2
3 3, 1,2
HIV, N (%) 5 (0.04) 6 (0.12) 0 (0) 20 (1.58)
3,4 4 1
Heart failure, N (%) 36 (0.32) 23 (0.46) 1043 (34.19) 577 (45.72)1,2
2,3,4 1 1
Arterial hypertension, N (%) 476 (4.27) 3191 (63.87) 2456 (80.5) 986 (78.13)1
Antidiabetics, N (%) 73 (0.65)4 690 (13.81) 779 (25.53) 566 (44.85)1
3,4 1
Diuretics, N (%) 35 (0.31) 468 (9.37) 1281 (41.99) 568 (45.01)1
RAAS inhibitors, N (%) 93 (0.83)2,3,4 2227 (44.58)1 1480 (48.51)1 502 (39.78)1
3,4 1
Lipid lowering drugs/statins, N (%) 164 (1.47) 1598 (31.99) 1302 (42.67) 545 (43.19)1
Anticoagulants, N (%) 134 (1.2)3,4 715 (14.31)3 2255 (73.91)1,2 799 (63.31)1
3,4 1
Antiplatelets, N (%) 59 (0.53) 395 (7.91) 1333 (43.69) 438 (34.71)1
Hospitalization, N (%) 1590 (14.26) 1840 (36.83) 1368 (44.84) 688 (54.52)
Bad progress, N (%) 155 (1.39) 609 (12.19) 828 (27.14) 436 (34.55)
Death, N (%) 40 (0.36)4 457 (9.15) 767 (25.14) 414 (32.81)1
Period 2 N = 44399 N = 22857 N = 9638 N = 3048
2,3,4 1,3,4 1,2
Age, Median [Q1,Q3] 35 [26-43] 58 [53-65] 76 [65-85] 77 [66-86]1,2
3,4 3,4 1,2,4
Charlson index, Median [Q1,Q3] 0 [0-0] 0 [0-0] 2 [1-3] 7 [5-8]1,2,3
Congestive heart failure N (%) 63 (0.14)4 50 (0.22)4 1386 (14.38) 1120 (36.75)1,2
4
COPD, N (%) 7118 (16.03) 511 (2.24) 3255 (33.77) 1323 (43.41)2
Liver disease, N (%)1-4
Mild 430 (0.97) 968 (4.24) 884 (9.17) 518 (16.99)

10
Very low [1] Low [2] High [3] Very high [4]
Moderate/severe 20 (0.05) 21 (0.09) 25 (0.26) 198 (6.50)
Diabetes, N (%)1-3,1-4,2-3,2-4
Yes, without organ damage 252 (0.57) 1042 (4.56) 2779 (28.83) 894 (29.33)
Yes, with organ damage 21 (0.05) 31 (0.14) 235 (2.44) 637 (20.9)
Kidney disease, N (%) 864 (1.95)4 463 (2.03)4 1705 (17.69) 1574 (51.64)1,2
Metastatic solid tumor, N (%) 50 (0.11)3,4 63 (0.28)3,4 0 (0)1,2 1080 (35.43)1,2
3 3
HIV, N (%) 18 (0.04) 15 (0.07) 0 (0) 38 (1.25)
4 4
Heart failure, N (%) 63 (0.14) 50 (0.22) 1386 (14.38) 1120 (36.75)1,2
3,4 1
Arterial hypertension, N (%) 623 (1.4) 7347 (32.14) 7372 (76.49) 2272 (74.54)1
Dyslipidemia, N (%) 1230 (2.77)3,4 8530 (37.32) 5773 (59.9)1 1834 (60.17)1
4
Antidiabetics, N (%) 217 (0.49) 946 (4.14) 2528 (26.23) 1209 (39.67)1
Diuretics, N (%) 37 (0.08)4 629 (2.75) 2250 (23.35) 1182 (38.78)1
3,4 1
RAAS inhibitors, N (%) 79 (0.18) 4581 (20.04) 5377 (55.79) 1300 (42.65)1
Lipid lowering drugs/statins, N (%) 93 (0.21)3,4 3355 (14.68) 4872 (50.55)1 1418 (46.52)1
3,4 3,4 1,2
Anticoagulants, N (%) 423 (0.95) 990 (4.33) 5002 (51.9) 1785 (58.56)1,2
Antiplatelets, N (%) 124 (0.28)3,4 257 (1.12) 3054 (31.69)1 971 (31.86)1
Hospitalization, N (%) 1129 (2.54) 2151 (9.41) 2438 (25.30) 1225 (40.19)
Bad progress, N (%) 126 (0.28) 497 (2.17) 1036 (10.75) 675 (22.15)
Death, N (%) 18 (0.04) 247 (1.08) 871 (9.04) 628 (20.60)
Period 3 N = 69409 N = 50401 N = 16735 N = 4127
Age, Median [Q1,Q3] 30 [23-38]2,3,4 53 [48-61]1,3,4 72 [63-81]1,2 74 [63-84]1,2
Charlson index, Median [Q1,Q3] 0 [0-1]3,4 0 [0-0]3,4 2 [1-3]1,2,4 7 [5-8]1,2,3
4 4
Congestive heart failure N (%) 107 (0.15) 106 (0.21) 1756 (10.49) 1388 (33.63)1,2
Liver disease, N (%)1-4
Mild 581 (0.84) 1365 (2.71) 1703 (10.18) 710 (17.20)
Moderate/severe 26 (0.04) 32 (0.06) 36 (0.22) 241 (5.84)
Diabetes, N (%)1-3,1-4,2-3,2-4
Yes, without organ damage 398 (0.57) 1070 (2.12) 4523 (27.03) 1084 (26.27)
Yes, with organ damage 27 (0.04) 45 (0.09) 401 (2.4) 793 (19.21)
4
Kidney disease, N (%) 1837 (2.65) 861 (1.71) 2449 (14.63) 1939 (46.98)2
3,4 3,4 1,2
Metastatic solid tumor, N (%) 12 (0.02) 78 (0.15) 0 (0) 1656 (40.13)1,2
HIV, N (%) 12 (0.02)3 19 (0.04)3 0 (0)1,2 99 (2.4)
4 4
Heart failure, N (%) 107 (0.15) 106 (0.21) 1756 (10.49) 1388 (33.63)1,2
Arterial hypertension, N (%) 778 (1.12)3,4 9613 (19.07)3 12364 (73.88)1,2 2856 (69.2)1
3,4 1
Dyslipidemia, N (%) 1749 (2.52) 12682 (25.16) 9936 (59.37) 2286 (55.39)1
Antidiabetics, N (%) 329 (0.47)4 921 (1.83) 4262 (25.47) 1525 (36.95)1
4
Diuretics, N (%) 41 (0.06) 646 (1.28) 3007 (17.97) 1411 (34.19)1
RAAS inhibitors, N (%) 93 (0.13)3,4 5356 (10.63) 9552 (57.08)1 1770 (42.89)1
3,4 1
Lipid lowering drugs/statins, N (%) 135 (0.19) 3716 (7.37) 8681 (51.87) 1801 (43.64)1
Anticoagulants, N (%) 784 (1.13)3,4 1243 (2.47)3,4 7487 (44.74)1,2 2231 (54.06)1,2
Hospitalization, N (%) 1843 (2.66) 3831 (7.60) 3734 (22.31) 1543 (37.39)
Bad progress, N (%) 218 (0.31) 773 (1.53) 1352 (8.08) 769 (18.63)
Death, N (%) 11 (0.02) 231 (0.46) 970 (5.80) 689 (16.69)
Period 4
High – Very high
N = 99899 N = 31581 [4] N = 7538
Age, Median [Q1,Q3]All

39 [28-46] 60 [55-68]

11
Very low [1] Low [2] High [3] Very high [4]
Vaccines, N (%)1-2,1-4

73 [63-84]
2 doses

83486 (83.57) 14286 (45.24)


3 doses

3035 (3.04) 15371 (48.67) 2094 (27.78)


Charlson index, Median [Q1,Q3]

0 [0-0]4 0 [0-1]4 5063 (67.17)


1-4,2-4
Diabetes, N (%)

4 [3-5]1,2
Yes, without organ damage

550 (0.55) 2444 (7.74)


Yes, with organ damage

60 (0.06) 61 (0.19) 2318 (30.75)


Arterial hypertension, N (%)

1930 (1.93)2,4 14048 (44.48)1 732 (9.71)


Dyslipidemia, N (%)

4958 (4.96)4 15414 (48.81) 5412 (71.8)1


Antidiabetics, N (%)

442 (0.44)4 2265 (7.17) 4470 (59.3)1


RAAS inhibitors, N (%)

225 (0.23)4 9771 (30.94) 2619 (34.74)1


Lipid lowering drugs/statins, N (%)

326 (0.33)4 7969 (25.23) 3883 (51.51)1


Anticoagulants, N (%)

1045 (1.05)4 3187 (10.09) 4051 (53.74)1


Antiplatelets, N (%)

283 (0.28)4 2041 (6.46) 4237 (56.21)1


Hospitalization, N (%)

604 (0.60) 764 (2.42) 2671 (35.43)1


Bad progress, N (%)

81 (0.08) 234 (0.74) 868 (11.51)


Death, N (%)

21 (0.02) 156 (0.49) 446 (5.92)


Footnote: The numeric superscript found on top of the variable values that define the clusters indicate

the cluster indexes that have large effect sizes with that specific variable for that specific cluster. Only

12
variables with large effect sizes, as suggested by [26] and [27], in the performed tests were included

together with the outcomes.

Apart from the Charlson index and age, the variables that differentiate the lower-risk

and higher-risk clusters in the first three periods are: a higher percentage in the higher

risk-clusters of congestive heart failure, peripheral vascular disease, cerebrovascular

disease, liver disease, diabetes, kidney disease, metastatic solid tumor, heart failure,

arterial hypertension, and the baseline prescribed treatments of antidiabetics, diuretics,

RAAS inhibitors, statins, anticoagulants and antiplatelets in period 1; congestive heart

failure, COPD, liver disease, diabetes, kidney disease, metastatic solid tumor, heart

failure, arterial hypertension, dyslipidemia and baseline treatments of antidiabetics,

diuretics, RAAS inhibitors statins, anticoagulants and antiplatelets in period 2,

congestive heart failure, liver disease, diabetes, kidney disease, metastatic solid tumor,

heart failure, arterial hypertension, dyslipidemia and baseline treatments of

antidiabetics, diuretics, RAAS inhibitors, statins and anticoagulants in period 3. In the

Omicron period the proportion of vaccinated people, the presence of diabetes, arterial

hypertension, dyslipidemia and prescribed treatments of antidiabetics, RAAS inhibitors,

statins, anticoagulants and antiplatelets segregate the lower risk clusters with the High-

Very high one.

Even though the presence of comorbidities is further increased in the High and Very

high clusters, there are some comorbidities that gradually increase from the Very low to

the Low cluster differentiating these profiles: diabetes, arterial hypertension and

dyslipidemia. In addition, the High and Very high clusters are mainly segregated by

their proportions in heart failure, liver disease, diabetes, kidney disease and metastatic

solid tumor, all of them included and contributing to the Charlson index.

13
Finally, even though there are no large effect sizes in the COVID-19 outcomes

comparisons except for death between the Very low and Very high clusters in period 1,

we can see that for all the outcomes (hospitalization, bad progress and death) and

periods there are significant differences in their proportions among clusters, as it was

concluded in Fig 3.

In Table 3 a summary of the differences among periods for a specific risk level are

shown. Again, the variables shown in Table 3 are the ones that for a specific cluster

level have at least one large effect size in the tests performed for all the possible

combinations of periods. The numeric superscript found on top of the variable values

that define the clusters indicate the period indexes that have large effect sizes with that

specific variable for that specific period. The High-Very high cluster found in the last

period was compared with the Very high cluster from the previous periods. The

outcomes were also included in the table. The rest of the comparisons can be found

online in the S3 Table.

Table 3. Cluster descriptive characteristics by risk level and differences in time.

Period 1 Period 2 Period 3 Period 4


Very low
Age, Median [Q1, Q3] 46 [36-54]2,3 35 [26-43]1 30 [23-38]1 39 [28-46]
3-4
Vaccines, N (%)
2 doses 9788 (14.1) 83486 (83.57)
3 doses 101 (0.15) 3035 (3.04)
Hospitalization, N (%) 1590 (14.26) 1129 (2.54) 1843 (2.66) 604 (0.60)
Bad progress, N (%) 155 (1.39) 126 (0.28) 218 (0.31) 81 (0.08)
Death, N (%) 40 (0.36) 18 (0.04) 11 (0.02) 21 (0.02)
Low
Age, Median [Q1, Q3] 71 [60-82]2,3 58 [53-65]1 53 [48-61]1 60 [55-68]
3-4
Vaccines, N (%)
2 doses 18161 (36.03) 14286 (45.24)
3 doses 411 (0.82) 15371 (48.67)
Charlson index, Median [Q1, Q3] 1 [0-2]3 0 [0-0] 0 [0-0]1 0 [0-1]
Hospitalization, N (%) 1840 (36.83) 2151 (9.41) 3831 (7.60) 764 (2.42)
Bad progress, N (%) 609 (12.19) 497 (2.17) 773 (1.53) 234 (0.74)
Death, N (%) 457 (9.15) 247 (1.08) 231 (0.46) 156 (0.49)

14
Period 1 Period 2 Period 3 Period 4
High
Age, Median [Q1, Q3] 85 [76-90]3 76 [65-85] 72 [63-81]1
Charlson index, Median [Q1, Q3] 3 [2-4]2,3 2 [1-3]1 2 [1-3]1
Hospitalization, N (%) 1368 (44.84) 2438 (25.30) 3734 (22.31)
Bad progress, N (%) 828 (27.14) 1036 (10.75) 1352 (8.08)
Death, N (%) 767 (25.14) 871 (9.04) 970 (5.80)
Very high (High-Very high in
period 4)
Vaccines, N (%)3-4
2 doses 1464 (35.47) 2094 (27.78)
3 doses 211 (5.11) 5063 (67.17)
Charlson index, Median [Q1, Q3] 7 [6-9]4 7 [5-8]4 7 [5-8]4 4 [3-5]2,3,4
Hospitalization, N (%) 688 (54.52) 1225 (40.19) 1543 (37.39) 868 (11.51)
Bad progress, N (%) 436 (34.55) 675 (22.15) 769 (18.63) 446 (5.92)
1
Death, N (%) 414 (32.81) 628 (20.60) 689 (16.69) 407 (5.40)
Footnote: The numeric superscript found on top of the variable values that define the clusters indicate

the period indexes that have large effect sizes with that specific variable for that specific period. Only

variables with large effect sizes, as suggested by [26] and [27], in the performed tests were included

together with the outcomes.

In this case, the only variables with at least one large effect size are age, the Charlson

index and the vaccine doses. With the exception of the Very high cluster, age is reduced

along the pandemic in all the clusters. In addition, leaving out the Very low cluster, the

Charlson index is also reduced for the rest of the clusters. This can also be seen in the

outcomes proportion. Although there are not large effect sizes in the comparisons, the

proportions of death, bad progress and hospital admission are reduced along the

pandemic.

Discussion

This study utilized the KAMILA clustering technique to classify patients according to

their clinical and sociodemographic characteristics for different time periods of the

COVID-19 pandemic based on a large cohort of patients. We concentrated on the first

periods on account of the high hospitalization and death rates in the early stages of the

COVID-19 pandemic.
15
As we hypothesized, the identified clusters are closely associated with the adverse

outcomes of the disease creating a description of risk profiles. Actually, the severity of

the outcomes increases with the risk level of the clusters for all the periods and

outcomes. While the outcomes were related to the COVID-19, this procedure is

independent of the virus and could be used in new pandemics or other diseases. This

could be valuable, especially in the early stages of a pandemic, to obtain a quick

identification of risk groups and to provide targeted care to the most vulnerable patients.

As a summary of the results, we found four profiles based on age and the presence, or

not, of comorbidities as measured by the Charlson Comorbidity Index:

 Young patients with almost no comorbidities: Very low.

 Middle-aged patients with, generally, few comorbidities: Low.

 Old patients with different number of comorbidities: High.

 Old patients with multiple comorbidities: Very high.

Age and the Charlson index, as a measurement of the patients’ comorbidities impact,

emerged as important factors discriminating the profiles, with age being more relevant

discerning the Very low and Low profiles, while the Charlson index was more important

separating the High and Very high profiles. Although age, the Charlson index and

comorbidities proportion reduced along the pandemic, there are no large differences,

which suggests that the clusters have been stable over time. In fact, the COVID-19

adverse outcomes reduce presumably due to a combination of factors, including the

reduced virulence of the virus, the initiation of vaccination programs, increased

population immunity due to infections and reinfections, population-level containment

measures, and more effective treatments at the hospital level.

16
The highest risk profiles are characterized by higher ages and higher Charlson index

values, which are associated with poorer outcomes of COVID-19. On the one side, age

has a significant impact due to its correlation with frailty and overall health condition,

which have been shown to result in poorer COVID-19 outcomes in the literature [21-

23]. On the other side, the Charlson index mainly segregates the higher risk profiles,

suggesting its influence on a worse prognosis of the disease. This is consistent with the

Charlson index’s use as an indicator of the patient’s health status. In addition, prior

studies have also associated the Charlson index with a worse COVID-19 prognosis [24].

The evolution of these variables along the pandemic is explained by the special case of

the first period. The lower testing capacity at the initial stage of the pandemic [25]

resulted in selective testing of the most severe cases, which may have biased the results

of this period. For instance, the median age of the Low cluster in this period was 71,

which is high considering its risk level. However, in the subsequent two periods, with

increased testing capacity, the median age reduced significantly.

In the last period (Omicron variant) the Very high and High clusters join together

creating an intermediate cluster that we renamed High-Very high. This can be concluded

by simultaneously looking at Fig 1 and Fig 2: the Charlson index median value of the

High-Very high cluster in the last period is in the middle of the median values of both

clusters from the previous period and its distribution is a combination of their density

plots. Accordingly, the best prognosis patients of the previous periods from the High

profile may have mixed with the patients from the Low profile, resulting in this case in

an increase of the median age of the Low profile for the last period.

Regarding the rest of the variables, there were no large differences among periods for

the same risk level clusters. This leads us to conclude that the clusters have been stable

along the pandemic and that the reduction of the adverse outcomes of the disease is
17
explained by the following reasons: the vaccination process starting in 2021 and the

effectiveness of the vaccines [26]; in general better self-protection of the population; an

improvement of the treatments applied to COVID-19 positives; and the appearance of

less virulent SARS-CoV-2 variants [27].

Comorbidities and baseline treatments’ proportions increase in the clusters identified as

higher risk profiles. Bearing in mind the High and Very high profiles are distinct due to

the Charlson index, it is interesting to note which comorbidities segregate these clusters.

As expected, these comorbidities contribute to the Charlson index: diabetes, liver

disease, kidney disease, metastatic solid tumor and heart failure. These variables have

already been reported in the literature as suggestive of a worse prognosis of COVID-19

[28-31]. Furthermore, other comorbidities gradually increase in all the clusters like

arterial hypertension, dyslipidemia and diabetes and even discriminate the Very low and

Low profiles. These variables have already been associated with poorer outcomes of

COVID-19 [32,33].

Regarding baseline treatments, antidiabetics, diuretics, RAAS inhibitors, statins,

anticoagulants and antiplatelets proportions differ in the higher risk clusters and the

lower ones. These treatments are usually employed to treat various comorbidities

simultaneously; thus, we consider they are not that important by themselves to create

patient profiles. Rather than being essential in creating the profiles, baseline treatments

can be considered as an indirect measure of the severity of associated comorbidities.

The strong correspondence of our results with those found in the literature reinforces

that KAMILA can effectively classify patients. In fact, the variables of the profiles that

differ the most are the ones that have been highlighted in the literature as suggestive of a

worse COVID-19 prognosis. This also accords the earlier studies suggesting KAMILA

is suitable when dealing with mixed-type data and large databases.


18
This study has several strengths, including the large sample size that encompasses all

COVID-19-positive cases in the Basque Country over a period of almost two years.

Additionally, the database used in this study contained a wide range of variables,

including sociodemographic factors, comorbidities and baseline treatments, which

allowed the identification of patient profiles. Furthermore, the study has not been

restricted to hospitalized patients, unlike some previous studies [9,10], and the profile

identification has been expanded to the COVID-19 positives of a population. However,

there are also several limitations in our study. First, there are not established standards

for the statistical validation of unsupervised clustering results, but the acceptance of the

clustering structures by clinicians with expertise in this topic helped to mitigate this

issue. Second, our study identifies statistical associations and descriptions, but does not

describe causality. Third, this study is time-limited and future research may be needed

to identify the future profiles of COVID-19 positives in new waves. Finally, to enhance

generalizability, identification of the profiles of other regions should be performed to

compare the created clusters.

Conclusions

The purpose of the present research was to classify patients according to their clinical

and sociodemographic profiles and subsequently show their association with the

evolution of the disease. Our initial hypothesis was tested and the profiles created in an

unsupervised manner could be associated with the adverse outcomes of COVID-19. In

addition, the study’s results are consistent with the literature, suggesting the variables

that differ the profiles the most are those highlighted in the literature as indicative of a

worse COVID-19 prognosis. In particular, age and the Charlson index have played a

19
major role in the determination of the profiles jointly with diabetes, kidney disease,

metastatic solid tumor, and heart failure.

These findings suggest that this method can be used in new pandemics, with other

chronic conditions or even with populations, to segregate patients in a clinically useful

way. This could lead to a quick classification of the worse prognostic patients, allowing

for targeted care intervention strategies and resulting in an overall improvement of their

medical attention.

20
Acknowledgments

We are grateful for the support of the Basque health service, Osakidetza, and the

Department of Health of the Basque Government. We also gratefully acknowledge the

patients who participated in the study. Open access funding provided by BCAM.

References

1. Zhu N, Zhang D, Wang W, Li X, Yang B, Song J, Zhao X, et al. A Novel

Coronavirus from Patients with Pneumonia in China, 2019. New England Journal of

Medicine. 2020; 382(8): 727-733. doi: 10.1056/NEJMoa2001017.

2. McCabe R, Schmit N, Christen P, D´Aeth JC, Løchen A, Rizmie D, et al. Adapting

hospital capacity to meet changing demands during the COVID-19 pandemic. BMC

Med. 2020; 18(1): 1-12. doi: https://doi.org/10.1186/s12916-020-01781-w.

3. World Health Organization. WHO Coronavirus (COVID-19) Dashboard. [Cited 2023

February 15]. Available from: https://covid19.who.int.

4. Garg A, Mago V. Role of machine learning in medical research: A survey. Computer

Science Review. 2021; 40: 100370. doi: https://doi.org/10.1016/j.cosrev.2021.100370.

5. Landau S, Leese m, Stahl D, Everitt BS. Cluster Analysis. 5 th ed. John Wiley & Sons;

2011.

6. McLachlan GJ. Cluster analysis and related techniques in medical research. Stat

Methods Med Res. 1992;1(1): 27-48. doi: 10.1177/096228029200100103.

7. Rappoport N, Shamir R. Multi-omic and multi-view clustering algorithms: review

and cancer benchmark. Nucleic Acids Res. 2018;46(20):10546-10562. doi:

10.1093/nar/gky889.
8. Grant RW, McCloskey J, Hatfield M, Uratsu C, Ralston JD, Bayliss E, et al. Use of

Latent Class Analysis and k-means Clustering to Identify Complex Patient Profiles.

JAMA Netw Open. 2020; 3(12): e2029068-e2029068.

9. Bondeelle L, Chevret S, Cassonnet S, Harei S, Denis B, de Castro N, et al. Profiles

and outcomes in patients with COVID-19 admitted to wards of a French

oncohematological hospital: A clustering approach. PLoS One. Published online

2021:e0250569-e0250569.

10. Ye W, Lu W, Tang Y, Chen G, Li X, Ji C. Identification of COVID-19 Clinical

Phenotypes by Principal Component Analysis-Based Cluster Analysis. Front Med

(Laussane). 2020; 7: 570614. doi: 10.3389/fmed.2020.570614.

11. Preud’homme G, Duarte K, Dalleau K, Lacomblez C, Bresso E, Smail-Tabbone M,

et al. Head-to-head comparison of clustering methods for heterogeneous data: a

simulation-driven benchmark. Scientific Reports. 2021; 11: 4202. doi:

https://doi.org/10.1038/s41598-021-83340-8.

12. Foss A, Markatou M, Ray B, Heching A. A semiparametric method for clustering

mixed data. Machine Learning. 2016; 105: 419-458. doi: 10.1007/s10994-016-5575-7.

13. Foss A, Markatou M, kamila: Clustering Mixed-Type Data in R and Hadoop.

Journal of Statistical Software. 2018, 83(16): 1-44. doi: 10.18637/jss.v083.i13.

14. Costa E, Papatsouma I, Markos A. Benchmarking distance-based partitioning

methods for mixed-type data. Advances in Data Analysis and Classification. 2022. Doi:

https://doi.org/10.1007/s11634-022-00521-7.
15. Charlson ME, Sax FL, MacKenzie CR, Fields SD, Braham RL, Douglas Jr RG.

Assessing illness severity: does clinical judgement work? J Chronic Dis. 1986; 39(6):

439-452. doi: 10.1016/0021-9681(86)90111-6.

16. WHO Collaborating Centre for Drug Statistics Methodology. Guidelines for ATC

classification and DDD assignment, 16th ed. Oslo; 2013.

17. Tibshirani R, Walther G. Cluster Validation by Prediction Strength. Journal of

Computational and Graphical Statistics. 2005; 14(3): 511-528. doi:

10.1198/106186005X59243.

18. R Core Team. A language and environment for statistical computing. R Foundation

for Statistical Computing, Austria. 2021. Available from: https://www.R-project.org/.

19. Kim HY. Statistical notes for clinical researchers: Chi-squared test and Fisher’s

exact test. Restorative Dentistry & Endodontrics. 2017; 42(2): 152-155. doi:

10.5395/rde.2017.42.2.152.

20. Vargha A, Delaney HD. A critique and Improvement of the CL Common Language

Effect Size Statistics of McGraw and Wong. Journal of Educational and Behavioral

Statistics. 2000; 25(2): 101-132. doi: 10.3102/10769986025002101.

21. Gupta RK, Marks M, Samuels THA, Luintel A, Rampling T, Chowdhury H, et al.

Systematic evaluation and external validation of 22 prognostic models among

hospitalised adults with COVID-19: an observational cohort study. Eur Respir J. 2020;

56(6): 2003498. doi: 10.1183/13993003.03498-2020.

22. Verity R, Okell LC, Dorigatti I, Winskill P, Whittaker C, Imai N, et al. Estimates of

the severity of coronavirus disease 2019: a model-based analysis. The lancet: Infectious

diseases. 2020; 20(6): 669-677. doi: https://doi.org/10.1016/S1473-3099(20)30243-7.


23. Sousa GJB, Garces TS, Cestari VRF, Florêncio RS, Moreira TMM, Pereira MLD.

Mortality and survival of COVID-19. Epidemiology & Infection. 2020; 148:e123. Doi:

10.1017/S0950268820001405.

24. Tuty Kuswardhani RA, Henrina J, Pranata R, Lim MA, Lawrensia S, Suastika K.

Charlson comorbidty index and a composite of poor outcomes in COVID-19 patients: A

systematic review and meta-analysis. Diabetes & Metabolic Sindrome: Clinal Research

& Reviews. 2020; 14(6): 2103-2109. Doi: https://doi.org/10.1016/j.dsx.2020.10.022.

25. Soriano V, Ganado-Pinilla P, Sanchez-Santos M, Gómez-Gallego F, Barreiro P, de

Mendoza C, et al. Main differences between the first and second waves of COVID-19 in

Madrid, Spain. International Journal of Infectious Diseases. 2021; 105: 374-376. Doi:

10.1016/j.ijid.2021.02.115.

26. Lin D, Gu Y, Wheeler B, Young H, Holloway S, Sunny SK, et al. Effectiveness of

Covid-19 Vaccines over a 9-Month Period in North Carolina. The New England Journal

of Medicine. 2022; 386: 933-941. doi: 10.1056/NEJMoa2117128.

27. Shuai H, Chan JFW, Hu B, Chai Y, Yuen TTT, Yin F, et al. Attenuated replication

and pathogenicity of SARS-CoV-2 B.1.1.529 Omicron. Nature. 2022; 603(7902): 696-

699. Doi: 10.1038/s41586-022-04442-5.

28. Huang I, Lim MA, Pranata R. Diabetes mellitus is associated with an increased

mortality and severity of disease in COVID-19 pneumonia – a systematic review, meta-

analysis and meta-regression. Diabetes & Metabolic Syndrome: Clinical Research &

Reviews. 2020; 14(4): 395-403. Doi: https://doi.org/10.1016/j.dsx.2020.04.018.

29. Cheng Y, Luo R, Wang K, Zhang M, Wang Z, Dong L, et al. Kidney disease is

associated with in-hospital death of patients with COVID-19. Kidney International.

2020; 97(5): 829-838. Doi: https://doi.org/10.1016/j.kint.2020.03.005.


30. Yoshida Y, Chu S, Fox S, Zu Y, Lovre D, Denson JL, et al. Sex differences in

determinants of COVID-19 severe outcomes – findings from the National COVID

Cohort Collaborative (N3C). BMC Infectious Diseases. 2022; 22: 784. Doi:

https://doi.org/10.1186/s12879-022-07776-7.

31. Rey JR, Caro-Codón J, Rosillo SO, Iniesta AM, Castrejón-Castrejón S, Marco-

Clement I, et al. Heart failure in COVID-19 patients: prevalence, incidence and

prognostic implications. European Journal of Heart Failure. 2020; 22: 2205-2215. Doi:

https://doi.org/10.1002/ejhf.1990.

32. Pranata R, Lim MA, Huang I, Raharjo SB. Hypertension is associated with an

increased mortality and severity of disease in COVID-19 pneumonia: A systematic

review, meta-analysis and meta-regression. Journal of the Renin-Anglotensis-

Aldosterone System. 2020; 21(2): 147032032092689. Doi:

10.1177/1470320320926899.

33. Hariyanto TI, Kurniawan A. Dyslipidemia is associated with severe coronavirus

disease 2019 (COVID-19) infection. Diabetes & Metabolic Synrdome: Clinical

Research & Reviews. 2020; 14(5): 1463-1465. Doi:

https://doi.org/10.1016/j.dsx.2020.07.054.

Figure captions

Fig 1. Overview of the clusters according to age and the Charlson index.

Fig 2. Charlson index density plots for all the periods and clusters.

Fig 3. Description of main outcomes differences by clusters and periods.

Supporting information captions


S1 Table. Complementary table for the descriptive characteristics in Table 1.

S2 Table. Complementary table for the cluster descriptive characteristics by period

found in Table 2.

S3 Table. Complementary table to the cluster descriptive characteristics by risk level

found in Table 3.

Declarations

Ethics Statement

The study protocol was approved by the Ethics Committee of the Basque Country

(reference PI2020123). The need for consent was waived by the ethics committee due to

the pandemic situation.

Funding

This work was supported in part by the health outcomes group from Galdakao-

Barrualde Health Organization; the Kronikgune Institute for Health Service Research;

Instituto de Salud Carlos III (ISCIII) through the project "RD16/0001/0001" (Red de

Investigación en Servicios de Salud en Enfermedades Crónicas) and the project

“RD21CIII/0003/0017” (Red de Investigación en Cronicidad, Atención Primaria y

Prevención y Promoción de la Salud) and co-funded by the European Union, and the

Basque Government through BMTF ‘‘Mathematical Modeling Applied to Health’’

Project. The work of IB was financially supported in part by grants from the

Departamento de Educación, Política Lingüística y Cultura del Gobierno Vasco

[IT1456-22] and by the Ministry of Science and Innovation through BCAM Severo

Ochoa accreditation [CEX2021-001142-S / MICIN / AEI /10.13039/501100011033]

and through project [PID2020-115882RB-I00 / AEI /10.13039/501100011033] funded


by Agencia Estatal de Investigación and acronym ``S3M1P4R" and also by the Basque

Government through the BERC 2022-2025 program. DF has been supported by the

Ministerio de Ciencia e Innovación (Spain) [PID2019-104830RB-I00/ DOI (AEI):

10.13039/501100011033], and by grant 2021 SGR 01421 (GRBIO) administrated by

the Departament de Recerca I Universitats de la Generalitat de Catalunya (Spain). The

funders had no role in study design, data collection and analysis, decision to publish, or

preparation of the manuscript.

You might also like