You are on page 1of 35

1 STOBE: A Long-COVID Syndromic Study using Real-World data in Brazil

ed
2
3 7Heitor Cavalini*, 7Victor Neves*, 1Zhenni Qin*, 1Yutian Zeng*, 3,4Ashish Shetty, 5,6Peter Phiri**, 1,8Jian Qing
4 Shi**, 2Gayathri Delanerolle**
5

iew
6 Affiliations
7 1Southern University of Science and Technology
8 2University of Oxford, Nuffield Department of Primary Health Care Sciences
9 3University College London Hospitals NHS Foundation Trust

10 4University College London

v
11 5Southern Health NHS Foundation Trust, Research & Innovation Department

12 6University of Southampton, Primary Care, Population Sciences and Medical Education, Faculty of Medicine

re
13 Unit
14 7University of Pernambuco, Postgraduate Program in Rehabilitation and Functional Performance, Petrolina

15 Campus
16 8 Alan Turing Institute

17
18 Shared authorships
er
19 *First author*
20
pe
21 **Last author**
22
23
24
ot

25

26 Corresponding author:
27 Professor Jian Qing Shi,
tn

28 Department of Statistics and Data Science


29 Southern University of Science and Technology
30 Shenzhen
31 China
rin

32 shijq@sustech.edu.cn,
33 0086-755-88015796
34

35
ep

36

37
Pr

38

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
39 Abstract

ed
40
41 Background
42 Patients that tested positive to COVID-19 led to a sub-population of patients who continue to
43 demonstrate acute COVID-19 symptoms. Those demonstrating COVID-19 symptomatologies post 90

iew
44 days are considered as potential long-COVID-19 (LC) patients. Identifying and managing patients
45 with long-Covid continues to be a challenge, especially in areas where surveillance data and resources
46 are scarce, such as Brazil. The primary aim of this study is to show the prevalence of LC in the city
47 one of the largest municipalities in Northern Brazil, Petrolina.
48
Methods

v
49
50 A retrospective, epidemiology study design was used with a real-world dataset. The sample size was

re
51 1,164 LC patients. A study specific database was created using Microsoft Excel. Comparative and
52 subgroup analyses were conducted to evaluate demographics, comorbidities, clinical symptoms, and
53 mortality.
54
55 Findings
56
er
Pain and neuropsychological symptoms were commonly reported. The prevalence of physical, pain
57 and autonomic symptoms increased with age. Male patients reported a higher prevalence of physical
58 and pain symptoms than females’ patients. Patients belonging to Black and Caucasian racial groups
pe
59 were more likely to experience physical and pain symptoms compared to Pardo patients.
60
61 Interpretations
62 A potential correlation between physical symptoms and an increased number of comorbidities were
63 identified within the sample. The severity of long-Covid tends to elevate in line with an increase in the
ot

64 number of comorbidities, medications taken, and, in particular, the number of symptoms.


65
66 Funding
tn

67 No funding
rin
ep
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
ed
Research in context

Evidence before this study


Little was known about the presence and characteristics of long-COVID-19 (LC) patients in Brazil.

iew
To our knowledge there is a lack of Epidemiology studies reporting LC in Brazil.

Added value of this study


The results of this study can help strengthen our understanding of the prevalence and symptomatology
of LC. The findings of this study can also aid with managing healthcare system access and potential
new services to be developed to manage LC patients long term.

v
re
Implications of all the available evidence
Future studies should continue to focus on building up a phenotype for LC that could potentially be
specific to the Brazilian population and could have translational value to global healthcare system to
better manage diverse populations.
er
68
pe
69

70

71 Introduction
ot

72 The SARS-CoV-2 emerged in Wuhan, China in 2019 and has led to a catastrophic COVID-19
73 pandemic globally with high levels of morbidity and mortality.1 Rapid transmission of the virus has
tn

74 meant emergency measures had to be introduced across global healthcare systems and the declaration
75 of a public health emergency by the World Health Organization (WHO). COVID-19 has led to the
development of a new condition, long-COVID-19 (LC) which demonstrates ongoing COVID-19
rin

76

77 symptoms post 90 days of receiving a positive test to SARS-CoV-2. Various countries have opted for
78 differing methods of management of LC patients, that demonstrate acute symptoms.2
ep

79

80 Del Rio and colleagues were one of the first to report symptoms of neurological, neuropsychiatric,
81 autonomic dysfunction and cardiac disfunctions post-acute period of COVID-19 which leads to a LC
Pr

82 diagnosis.3 Interestingly, a meta-analysis by Phiri and colleagues demonstrated the impact of COVID-
83 19 on mental health whilst another showed 72·5% of COVID-19 patients demonstrated fatigue or
3

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
84 shortness breath that remained post-60 days or more after testing positive to the virus.4,5 Remanence

ed
85 of these appear to resonate among LC patients as well. Thus, LC should become an equal focus of this
86 pandemic to enable a clinical diagnosis and treatments to be available to them. In addition, the
87 phenotype and pathophysiology to characterize LC should be a high priority as a unilateral definition

iew
88 is still unavailable. In a bid to better understand the disease, various research and survellience studies
89 are taking place globally.
90

91 Although some countries have managed to reduce COVID-19 incidence, new variants of SARS-CoV-2

v
92 continue to prevail with sporadic outbreaks globally. Effective vaccination programs use of social

re
93 distancing and protective clothing such as masks have managed to further support the reduction of new
94 COVID-19 patients thereby, potentially providing healthcare systems more leeway to support LC
95 patients. In Brazil, LC continues to impact local populations that is exacerbated further due to social-
er
96 constructs and socio-economic inequalities that could prevent patients from accessing necessary
97 healthcare services. Brazil shares these similarities with many other Low Middle Income Countries
pe
98 (LMICs) such as India and Pakistan. This is further complicated by ineffective measures to obtain
99 accurate data from healthcare systems which is an issue shared by many healthcare organisations
100 globally.
ot

101

102 Reports of LC within Brazil have increased exponentially although a dearth of research evidence is
tn

103 available in terms of prevalence. As a first step, our study aims to support this work by assessing the
104 epidemiological significance of LC within a sub-population in Petrolina, a northern municipality
105 situated in the state of Pernambuco, Brazil.
rin

106

107
ep

108

109
Pr

110

111
4

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
112 Methods

ed
113
114 Study design
115 A retrospective, cross-sectional study was designed with a sample size of 1164 patients collated from

iew
116 the Health Department of the Municipality of Petrolina database held by the Municipality of Petrolina
117 authorities. The city of Petrolina is within the State of Pernambuco which has a total population of
118 395,000. Recovery time was defined for the purpose of this study to be a negative test following
119 confirmation of a COVID-19 test and acute COVID-19 symptoms to day 90 (Figure 5). All patients

v
120 reporting acute symptoms were considered as LC patients.

re
121

122 Aims and outcomes


123 er
124 In this study, our primary aim was to report the descriptive statistical information from a long-Covid
125 patient dataset from those in Petrolina, Brazil. A planned comparative analysis in terms of reporting
pe
126 the difference in the prevalence of each symptom vs clinical category. The relationship between the
127 number of symptoms and recovery time was also investigated.
128

129 Our secondary aim was to conduct a subgroup analysis. The incidence of each symptom of the four
ot

130 themes was compared with the overall incidence of the four themes in subgroups in terms of age,
131 gender and skin color. Furthermore, assessment of the severity of long-Covid (disease severity) was
tn

132 examined and the number of comorbidities, number of medications patients have ever taken were used
133 for clustering the patients and obtaining the different levels of the overall disease severity.
rin

134

135 Eligibility criteria


136
ep

137 All patients included within the sample size had a positive SARS-CoV-2 test, as confirmed by reverse
138 transcriptase -polymerase chain reaction (RT-PCR) test between October 2020 and August 2021. The
139 source data was confirmed within the local healthcare authorities. Electronic data was also obtained
Pr

140 from a communicable diseases database specifically for the local municipality of Petrolina.
141

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
142 Data collection

ed
143

144 Data from the electronic database was extracted and recorded in an Excel sheet. This included
145 demographic and clinical data ranging from age, gender, skin color, comorbidities, medication history

iew
146 and neurological, neuropsychiatry, pain, respiratory and cardiac symptomatologies, respectively. The
147 key features identified have been demonstrated in Table1.
148 To describe the recovery status of the patients following a SARS-CoV-2 infection, all the symptoms
149 that occurred during the recovery time were recorded individually and grouped into four main clinical

v
150 categories of Pain, Autonomic, Neuropsychiatry and General. General symptomatologies included

re
151 fever, cough, dyspnea and vomiting. The pain group comprised of symptoms of abdominal pain and
152 body pains. The autonomic category comprised of symptoms of headaches, fatigue and vertigo. The
153 neuropsychiatry category included symptoms of anxiety, depression and brain fog. All patients with
er
154 confirmed comorbidities and medication histories were included. Patients with missing data were
155 excluded from the study group. In this context, recovery time is the number of days between the time
pe
156 of a negative test and the time of a positive test.
157

158 An initial overview of pre-existing comorbidities and medications within the study population appear
ot

159 to be less than 1% of the total sample size and therefore was labelled as “Others” in the data sample
160 pool. Patients without prerequisites were recorded as “None”.
tn

161

162 Ethical considerations


163 The study protocol was submitted and approved by the Research Ethics Committee of the Amaury de
rin

164 Medeiros Integrated Health Center, University of Pernambuco, Petrolina campus, CAAE
165 48683521.8.0000.5191 and CAAE 42858321.5.0000.5191 on the February 4, 2021, and on June 29,
166 2021, respectively. This study was conducted in accordance with national and international research
ep

167 legislations, including but not limited to Good Clinical Practice.


168
Pr

169

170 Statistical analysis plan


171
6

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
172 Descriptive statistical analysis was conducted along with a clustering model. One-way ANOVA was

ed
173 used to obtain the outcomes of differences in disease severity between sub-groups while the k-means
174 clustering model was used to cluster different disease severity of the long-Covid patients.6
175

iew
176 A multinomial model was applied to quantify disease severity based on the symptoms identified and
177 to make predictions of different degrees due to the clinical management and treatment variations
178 observed within Brazilian clinics.7 Therefore, the outcome was analyzed separately. We used a
179 multinomial log-linear model for prediction when we deemed that the number of symptoms of a patient

v
180 was related to disease severity. The multinomial log-linear model is a generalised linear regression for

re
181 ordered responses that only requires knowledge of the size relationship and not the magnitude, which
182 is suitable for the case of our dataset.
183

184
er
The objective of k-means clustering model is to minimize the squared error between the samples and
185 the cluster centres. The sum of the squared distance errors between the cluster centre of each cluster
pe
186 and the sample points within the cluster is referred to as the distortions. The measure of effective
187 clustering is to ensure that the samples are as close as possible within each cluster and that the samples
188 in different clusters are as far apart as possible.8 The distortions decrease as the category increases, but
ot

189 for data with a certain degree of differentiation, the distortions improve greatly before a certain
190 threshold is reached, after which they decrease slowly, and this threshold can be considered as the
tn

191 number of clusters. This method of selecting the number of clusters is called the elbow method.9
192

193 All statistical results were performed with R. Package, “plyr”, “Rmisc”, “forcats” and ”readxl” were
rin

194 used for data curation and package “car”, “caret”, “cluster”, “nnet”, “stats”, “Tidyverse” were applied
195 in the statistical analysis. Package “sjPlot”, “ggplot2”, “ggpubr”, “GGally”, “hrbrthemes” and
196 “factoextra” were used for graphing and visualization.
ep

197
198

199 Results
Pr

200 The total sample size consisting 1164 patients with ongoing LC was included in the study. A total of
201 12 clinical variables [Table 1] broadly within the themes of Neuropsychiatry and Autonomic
7

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
202 dysfunction were identified along with key demographics. Comorbidities of Diabetes, Chronic cardiac

ed
203 disease, Obesity and High blood pressure were identified within the data pool. A pool of pregnant
204 women was part of an additional category. Patients with comorbidities also demonstrated the use of
205 Atenolol, Insulin, Metformin, Pioglitazone and Diuretics.

iew
206
207 Table 1. Characteristics of the Brazilian COVID-19 dataset.
208
Variable Sample size Missing rate Categories

Age 1164 0% Range: 0~106

v
Gender 1164 0% Female, Male

Skin color 1164 0% Pardo, Black, Caucasian, Asiatic, Indian

re
Positive test time 1156 0·69%

Negative test time 136 88·32%

Deaths 1164 0%

Variable Sample size Missing rate Number of symptoms suffered Categories

Comorbidities 1163 0·09%


er 0~4 Pregnant, Diabetes,
Cardiac chronic disease, High blood
pressure, Obesity, Smoker, Others
pe
Medication history 1154 0·86% 0~2 Beta blockers, Atenolol,
Insulin, Metformin,
Pioglitazone, Diuretics,
Others
Physical symptoms 1164 0% 0~6 Fever, Cough, Dyspnea,
ot

Saturation <95% vomit,


Diarrhea, Pain, Fatigue,
Nose running,
tn

Loss of smell and taste,


Others
Neurological symptom 1164 0% 0~2 Lack of concentration,
Fatigue, Anxiety,
rin

Lack of memory,
Migraine, Myalgia,
Others
Pain symptom 1152 1·03% 0~2 Headache, Sore of throat,
ep

Abdominal pain,
Myalgia, Joint pain,
Migraine, Tender joint,
Others
Pr

Autonomic disorders 1164 0% 0~2 Fatigue, Vertigo, Others


symptoms

209
8

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
210 Table 1 shows the number of patients included in the analysis of each variable (factor). Comorbidities,

ed
211 medication history and Pain symptoms have a missing rate of 0·09%, 0·86% and 1·03%, respectively.
212 Data for a total of 130 patient was available for assessing the recovery time, and these were analysed
213 separately.

iew
214
215 Descriptive analysis
216
217 The overall age range of the cohort is from 0 to 106 [Table 2] with a mean of 42·95 and a standard

v
218 deviation of 20·32. The mean age of LC female patients was 40·02 with a standard deviation (SD) of
19·98, while 46·56 (SD:20·16) was recorded for men. The overall death rate was 1·6% (19/1164)

re
219

220 distributed mainly among men between the age groups of 40-49 and 50-59 [Figure 1]. The death rates
221 in both female and male groups was 0·78% (5/642) and 2·68% (14/522), respectively. A mortality
222
223
er
mean of 61·74 and a median of 59 was identified.

224 Table 2. Demographic information of COVID-19 cases in Brazil.


pe
225
Variable Categories Sample size Proportion

Age 0~9 75 6·44%


(Mean: 42.95, 10~19 62 5·33%
SD: 20.32)
20~29 186 15·98%
ot

30~39 198 17·01%

40~49 233 20·02%


tn

50~59 203 17·44%

60~69 95 8·16%

70~79 64 5·50%

≥80 48 4·12%
rin

Gender Female 642 55·10%

Male 522 44·80%

Skin color Pardo 1003 86·10%


ep

Black 47 4·00%

Caucasian 108 9·20%

Asiatic 5 0·40%

Indian 1 <0·1%
Pr

Deaths Not dead 1145 98·30%

Death 19 1·60%

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
Number of comorbidities 0 634 54·51%

1 403 34·65 %

ed
2 90 7·74%

3 33 2·84%

4 3 0·26 %

iew
Number of medications 0 936 81·11%
used 1 194 16·81 %

2 24 2·08 %

Variable Number of diagnoses Sample size Proportion

Physical symptoms 0 350 30·10%

v
1 37 3·20%

re
2 51 4·40%

3 166 14·30%

4 309 26·50%

5 er 71 6·10%

6 180 15·50%

Neurological symptom 0 1069 91·80%

1 80 6·90%
pe
2 15 1·30%

Pain symptom 0 644 55·90%

1 300 26·04 %

2 208 18·06 %
ot

Autonomic disorders 0 1064 91·40%


symptoms 1 83 7·10%

2 17 1·50%
tn

226
227
228 Figure 2 shows the results of the word cloud analysis associated with comorbidities which includes a
rin

229 significant number of pregnant women. This is followed by diabetes and cardiac chronic disease.
230 Table1 and Figure 3 demonstrate data on 403 patients that have at least a single comorbidity and 37·47%
231 of whom are pregnant. Cardiac chronic disease and Diabetes among the 403 patients are distributed as
ep

232 25·06% and 11·66%, respectively. Patients with 2 or more comorbidities would be considered as
233 multimorbid. There were 90 patients out of 403 recorded with at least 2 comorbidities, namely Chronic
234 Cardiac disease and Diabetes. Three patients reported Diabetes, Chronic Cardiac disease, high blood
Pr

235 pressure and Obesity.


236
10

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
237 A heat map was developed to illustrate the correlation between the 4 categories [Figure 4]. A high

ed
238 overall correlation was not identified, and the coefficient was less than 0·3 reflecting the health status
239 of Brazilian patients to be multifaceted.
240

iew
241 Statistical analysis
242

243
244 Recovery time

v
245

After removal of the missing data, a total of 126 patients had recovery time in the dataset as illustrated

re
246

247 in Figure 5. Among these 126 patients, only a few patients had more than six kinds of symptoms at the
248 same time [Figure 6]. The scatter distribution is uneven due to the different number of symptoms. As
249
er
such the median and IQR were evaluated. The median (IQR, in bracket) of recovery time for the
250 number of symptoms from 0 to 5 and ≥6 was 8·0 (4·00), 7·0 (5·00), 6·0 (5·00), 9·0 (5·50), 10·0 (7·00),
251 8·5 (9·25), and 9·0 (3·00), respectively. ANOVA test was performed on the 7 groups resulting in p-
pe
252 value=2·94×10-2, indicative of only a slight difference between groups.
253
254 Prevalence
ot

255
256 Table 3 indicates prevalence rates based on the categories of pain, autonomic disorders,
257 neuropsychiatric and other. The prevalence of physical symptoms was 69·5%. The strongest physical
tn

258 symptom was fever with a prevalence of 64·09%. The prevalence of pain symptoms was 43·64% and
259 with the most common symptom being headaches (33·33%). The total prevalence of autonomic and
rin

260 neurological symptomatology was 8·59% and 8·16% respectively (Table 3).
261
262 Table 3. The prevalence of each symptom and the total prevalence of four themes.
263
ep

Variable Number of diagnoses Categories Prevalence

Physical symptoms 0~6 Total 69·50%

Fever, *64·09%
Pr

Cough, *55·67%

Dyspnea, *45·79%

Saturation <95% vomit, 38·06%

11

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
Diarrhea, 22·34%

Nose running, 8·25%

ed
Loss of smell and taste, 3·52%

Physical fatigue, 3·44%

Others 0·26%

Pain symptom 0~2 Total 43·64%

iew
Headache, *33·33%

Abdominal pain, *11·25%

Sore of throat, *8·93%

Pain myalgia, 3·69%

Tender joint, 1·12%

v
Joint pain, 0·77%

Pain migraine, 0·77%

re
Others 1·63%

Autonomic disorders 0~2 Total 8·59%


symptoms Autonomic fatigue, *7·90%

Vertigo, 0·86%
er
Others 1·29%

Neurological symptom 0~2 Total 8·16%

Lack of concentration, *2·41%


pe
Lack of memory, *1·46%

Neurological fatigue, *1·12%

Anxiety, 0·95%

Neurological migraine, 0·86%


ot

Neurological myalgia, 0·60%

Others 2·06%

264
tn

265
266 Subgroup Analysis
267
268 A subgroup analysis was conducted in terms of patient age, gender, race, number of comorbidities and
rin

269 medications used within the categories. The prevalence of the categories appears to have a correlation
270 with age (Figure 7). The prevalence of physical, pain, autonomic symptoms was lower in the 10-19
ep

271 and 20-29 age groups. Neuropsychiatry symptom prevalence was lower within the under 30 years age
272 group and higher amongst the 30-39, 40-49, 50-59 and 60-69 age groups.
273
Pr

274 For autonomic disorders there was no significant difference between age groups 0-9 and the groups
275 over 40, p-value = 0·84, but they were significantly higher than the patients in the age group 10-39 p-

12

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
276 value = 0·0001. There was no significant statistical difference for neuropsychiatry symptoms between

ed
277 groups over 30, p-value = 0·92, but the difference between the patients under 30 and over 30 was
278 statistically significant, p-value<0·0001. The over 40s age group was not statistically significant, p-
279 value = 0·60 among patients with Pain symptoms, however, they were significantly higher than the

iew
280 patients under 40, p-value<0·0001. The difference of physical symptoms for age groups over 40 was
281 not statistically significant, p-value = 0·25; however, the difference for patients over 40 and under 40
282 was significantly, p-value<0·0001.
283

v
284 Subgroup analysis by gender indicated that physical and pain symptoms were more common among

re
285 male than female with both p-values of less than 0·0001 [Figure 7]. Autonomic and neurological
286 symptoms showed no significant statistical differences, p-value =0·70 and p-value 0·44 respectively.
287

288 [Figure 7]
er
289
pe
290 As for the subgroups in terms of race, the largest sample was Pardo with 1003 patients and up to
291 86·10%. Within the dataset, five Asiatic patients and one Indian patient were present, as such, our
292 comparisons were focused on the Caucasian, Black and Pardo patients. Pardo have a lower prevalence
ot

293 than Black, and Caucasian based on physical and pain symptoms with p-values of 5·99×10-4 and
294 1·73×10-4, respectively. Black patients have fewer neuropsychiatry symptoms. There was no statistical
tn

295 significance between the three ethnicities for autonomic symptoms, with p-values of 0·31 and 0·89,
296 respectively.
297
rin

298 Patients without comorbidity records were 529. Subgroups were allocated by the number of
299 comorbidities, for instance those with a single symptom (n =403); two symptoms (n= 90); with three
300 symptoms (n=33) and those with four symptoms (n=3) (Table 4). Figure 8 reflects the trend between
ep

301 the number of comorbidities and the prevalence of each theme. For physical symptoms the prevalence
302 was generally lower among patients with only one comorbidity and was significantly less than the
Pr

303 other patients, p-value <0·0001 but they were almost the same for patients with two and three
304 comorbidities; similar results were also obtained in the theme of pain symptoms, p-value=9·33×10-3,
305 however, this finding did not hold for the theme of autonomic and neurological symptoms, there was
13

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
306 no significant difference for all the patients, (p-values are 0·62 and 0·54). The latter may be related to

ed
307 the fact that the prevalence of each symptom under the neurological theme was below 1%.
308
309 Table 4. The prevalence of each symptom and total themes in subgroup of number of comorbidities.

iew
310
Variable Symptoms Subgroups in terms of number of comorbidities

N =1 a N=2 N=3 N=4


(n =403)
b (n=90) (n=33) (n=3)

Physical Total 65·26% 91·11% 93·94% 100·00%


symptoms Fever *59·06% *82·22% *78·79% *100%

v
Cough *51·61% *78·89% *75·76% *100%

re
Dyspnea *48·88% *72·22% *84·85% 33·33%

Saturation <95% vomit 43·67% 62·22% 54·55% 33·33%

Diarrhea 21·09% 32·22% 27·27% 33·33%

Nose running 0·07% 12·22% 6·06% *66·67%

Loss of smell and taste

Physical fatigue
er
0·02%

0·03%
4·44%

3·33%
9·09%

0·00%
0·00%

0·00%

Others 0·25% 1·11% 0·00% 0·00%


pe
Pain Total 37·22% 56·67% 45·45% 100·00%
symptom Headache *27·3% *42·22% *42·42% *100%

Abdominal pain *8·68% *22·22% *21·21% 0·00%

Sore of throat *7·44% *11·11% *12·12% *100%

Pain myalgia 3·23% 2·22% 0·00% 0·00%


ot

Tender joint 1·49% 0·00% 0·00% 0·00%

Joint pain 0·77% 1·11% 0·00% 0·00%

Pain migraine 0·25% 1·11% 0·00% 0·00%


tn

Others 2·98% 2·22% 0·00% 0·00%

Autonomic Total 9·93% 11·11% 12·12% 0·00%


disorders Autonomic fatigue 9·43% 10·00% 12·12% 0·00%
symptoms Vertigo 0·99% 0·00% 0 0·00%
rin

Others 0·99% 3·33% 0 0·00%

Neurological Total 9·93% 12·22% 3·03% 33·33%


symptom Lack of concentration *3·23% 1·11% *3·03% 0·00%

Lack of memory *1·49% *4·44% 0·00% 0·00%


ep

Neurological fatigue 1·24% 1·11% 0·00% 0·00%

Anxiety *1·74% *2·22% 0·00% 0·00%

Neurological migraine 0·50% 1·11% 0·00% 0·00%

Neurological myalgia 0·25% 0·00% 0·00% 0·00%


Pr

Others 2·73% 4·44% 0·00% 33·33%

311 Note: a. N indicated the number of comorbidities; b. n indicated the sample size of each subgroup

14

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
312 [Figure 8]

ed
313

314

315 A total of 235 patients were on medication, with 71 taking unspecified beta blockers. Others were

iew
316 taking Atenolol (32), Metformin (32), Insulin (26), pioglitazone and diuretics combination (10) and
317 other medicines (54). (Table 5)
318
319 Table 5. The prevalence of each symptom and total themes in subgroup of medication history.

v
320
Variable Categories Subgroups in terms of medication history

re
Beta Atenolol Insulin Metformin Pioglitazone Diuretics Others
blockers (n=32) (n=26) (n=32) (n=10) (n=10) (n=54)
(n=71)

Physical
symptoms
Total 94·37% 93·75%
er 100·00% 90·62% 100·00% 100·00% 96·30%

Fever *90·14% *90·62% *92·31% *90·62% *100·00% *80·00% *87·04%


Cough *88·73% *84·38% *88·46% *84·38% *100·00% *80·00% *79·63%
pe
Dyspnea *77·46% *84·38% *69·23% *75·00% 80·00% *90·00% *85·19%
Saturation 61·97% 81·25% 57·69% 68·75% *90·00% 70·00% 68·52%
<95% vomit

Diarrhea 19·72% 46·88% 23·08% 28·12% 80·00% 60·00% 35·19%


Nose running 9·86% 6·25% 15·38% 12·50% 10·00% 10·00% 3·70%
ot

Loss of smell 7·04% 0·00% 3·85% 0·00% 0·00% 0·00% 1·85%


and taste
Physical fatigue 2·82% 0·00% 7·69% 0·00% 10·00% 0·00% 1·85%
tn

Others 1·41% 0·00% 0·00% 0·00% 0·00% 0·00% 0·00%

Pain Total 40·85% 53·12% 46·15% 43·75% 80·00% 80·00% 48·15%


symptom Headache *29·58% *37·50% *26·92% *40·62% *50·00% *70·00% *33·33%
rin

Abdominal pain *14·08% *18·75% *7·69% *15·62% *40·00% *40·00% *12·96%

Sore of throat *5·63% 3·12% *15·38% *9·38% *20·00% *20·00% *16·67%


Pain myalgia 1·41% 3·12% 3·85% 0·00% 0·00% 0·00% 0·00%
ep

Tender joint 0·00% 3·12% 0·00% 6·25% 10·00% 0·00% 1·85%


Joint pain 2·82% 0·00% 3·85% 0·00% 0·00% 0·00% 0·00%
Pain migraine 1·41% 0·00% 0·00% 0·00% 0·00% 0·00% 1·85%

Others 1·41% 9·38% 3·85% 3·12% 0·00% 0·00% 5·56%


Pr

Autonomic Total 16·90% 21·88% 11·54% 15·62% 20·00% 0·00% 14·81%

15

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
disorders Autonomic 15·49% 21·88% 11·54% 15·62% 20·00% 0·00% 14·81%
symptoms fatigue

ed
Vertigo 0·00% 0·00% 3·85% 3·12% 0·00% 0·00% 1·85%
Others 1·40% 6·25% 0·00% 9·38% 0·00% 0·00% 0·00%

Neurological Total 18·31% 6·25% 11·54% 9·38% 10·00% 0·00% 20·37%

iew
symptom Lack of *5·63% 0·00% 0·00% *3·12% 0·00% 0·00% 1·85%
concentration
Lack of memory *4·23% *3·12% 0·00% *6·25% 0·00% 0·00% 1·85%

Neurological 0·00% 0·00% *3·85% 0·00% *10·00% 0·00% 0·00%


fatigue

v
Anxiety *5·63% 0·00% *7·69% 0·00% 0·00% 0·00% *3·70%

re
Neurological 1·41% *3·12% 0·00% 0·00% 0·00% 0·00% 1·85%
migraine
Neurological 0·00% 0·00% 0·00% 0·00% 0·00% 0·00% 0·00%
myalgia
Others 2·82% 3·12% 0·00% 0·00% 0·00% 0·00% 14·81%

321
er
322
pe
323 The prevalence of physical symptoms for patients on medication was high (90%); the prevalence of
324 pain symptoms was at an intermediate level, ranged from 40% to 80%. In contrast, the prevalence of
325 autonomic dysfunction and neuropsychiatric symptoms was relatively lower than 20%.
ot

326

327 Disease severity


tn

328 Disease severity could be related to a combination of LC, comorbidities and the medication history.
329 Therefore, the three factors mentioned above would be used to quantify the severity stage of the
330 patients with LC. To assess the disease severity quantitatively, we applied the k-means model to
rin

331 conduct a clustering analysis based on the number of total symptoms, the number of comorbidities and
332 medication history. [Figure 9, Figure 10]
333
ep

334 [Figure 8]
335 [Figure 9]
Pr

336
337 Table 6. Characteristics of patients based on the result of clustering.
338

16

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
Disease Sample sizes Average number of Average number of Average number of
severity symptoms comorbidities medications used

ed
0 216 0·09 0·45 0·03

1 383 2·70 0·54 0·19

2 357 4·52 0·67 0·30

3 208 6·75 0·77 0·37

iew
339

340 Using the elbow method of the k-model, we identified four categories of disease severity as the optimal
341 choice by the clustering model (see Figure 9). To visualise the results, we used the first two principal

v
342 components as the coordinates,6 where the first component was the linear combination of the number
343 of symptoms, the number of comorbidities and medications with weights (loadings) of 0·51, 0·83 and

re
344 0·88 while the second component had the weights of 0·85, -0·37 and -0·15. Those two components
345 explained the 87·30% reported as the total variance. Figure 9 shows the results, where each point
346

347
er
stands for a patient (the number shown in the figure is patient’s ID). Four clusters, 0 to 3, indicate mild,
moderate, severe and very severe groups respectively. The characteristics of each category are shown
348 in Table 6, and with the increase of the number of symptoms, the number of comorbidities and
pe
349 medications, the severity of LC gradually increases. The number of symptoms have the strongest
350 influences on the disease severity. Besides, patients in the cohort are divided evenly into these four
351 disease degrees which indicates that the clustering results are satisfactory.
ot

352

353 Finally, a multinomial log-linear model allowing the response variable to be ordinal was developed to
tn

354 statistically predict the disease severity, where age, gender, race and the four themes were used as the
355 predictors. After a random split of the data, we used 70% as the train set and 30% as the test set. By
356 subset selection, neurological symptoms, age, gender and race were not included in the model. This
rin

357 implies that age, gender and race do not provide much additional information to predict disease severity.
358 Neurological symptoms were removed because they were somewhat related to the other three themes,
referred to in the heatmap in Figure 4. The results of the parameter fitting of the model are shown in
ep

359

360 Table 7. It was evident that the estimated parameters for the selected themes were all positive,
361 indicating a clear positive correlation between the number of the symptoms in these three themes and
Pr

362 the severity of the disease, i.e. a higher number of symptoms was indicative of a severe disease. The
363 sample size of the test set was 349, and the number of correct classifications in each category was 114,

17

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
364 69, 101 and 54, with the total accuracy up to 96·85%. Overall, the number of symptoms in these three

ed
365 themes can be used to effectively predict the severity of the disease.
366
367 Table 7. Results of multilevel linear model.

iew
368
Level Fixed effects Estimated coefficient Standard error P-value

1 Intercept 20·91 29·02 0·47

Pain 14·69 24·42 0·55

v
Phys 18·53 29·04 0·52

Auto 19·78 29·05 0·50

re
2 Intercept 81·67 13·55 <0·0001

Pain 34·39 11·39 2·54×10-3

Phys 37·65 15·00 6·88×10-3

Auto 40·68 15·05 1·21×10-2

3 Intercept
er
113·41 13·65 <0·0001

Pain 40·23 11·37 4·04×10-4

Phys 43·59 14·99 3·63×10-3


pe
Auto 48·05 15·05 1·40×10-3

369

370 In addition, we have reported mortality data in Petrolina. Figure 11 shows the relationship between the
371 number of comorbidities, gender, age, race and mortality in the combined dataset, respectively. It is
ot

372 evident mortality increases with the age. By combining two data sets, we used a logistic regression
373 model to predict mortality,6 using patients’ age, gender, race and number of comorbidities. Similarly,
tn

374 the training and test sets were randomly divided according to 7:3 ratio. The accuracy rate was 76·97%
375 with a 95% CI of (73·11%, 80·52%). The results indicated that the death mainly depended on the
rin

376 patients' age and number of comorbidities varying with gender and race.
377

378 [Figure 11]


ep

379

380

381 Discussion
Pr

382 Our study demonstrates LC patients appear to experience multi-organ issues and the most common six
383 symptoms of fever (64·09%), cough (55·67%), dyspnea (45·79%), vomit (38·06%), headache
18

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
384 (33·33%), diarrhea (22·34%).

ed
385

386 Over 1000 deaths due to COVID-19 among pregnant women were reported in Brazil in July 2021.10
387 The REBRACO study in Brazil reported only a single center of their 338 implemented a COVID-19

iew
388 screening for childbirth and 6 centers tested all suspected patients.11 Thus, limitations in testing, lack
389 of infrastructure including human resources has impacted maternity services in Brazil. Therefore, our
390 long-COVID-19 findings are aligned to the incidence reported of COVID-19 positive cases. In order
391 to improve the prognosis, better clinical strategies should be provided by clinical support, prenatal and

v
392 postnatal services.

re
393

394 Pathological analysis of SARS-CoV-2 is vital as our results indicate a better understanding of the
395 virulence could aid in diagnosing and treating LC patients. The use of genomics data for would be an
er
396 important indicator to develop novel clinical management strategies in particular for pain and
397 neuropsychiatry symptoms to delineate the need for a definitive clinical diagnosis of LC. The
pe
398 behaviour of COVID-19 and characterization of new variants is another facet that complicates clinical
399 diagnoses of LC as well as the conduct of pre-clinical work to isolate and sequence the genome at the
400 point of a LC diagnosis. A structural analysis of COVID-19 from various regions of Brazil to better
ot

401 characterize novel mutations when compared to other countries was carried forth by Timmers and
402 colleagues (2021).12 Their study demonstrated the disproportionality of distribution of several viral
tn

403 proteins within the sequences from samples gathered although the highest number of sequenced
404 genomes were obtained from the GISAID database from Sao Paulo and Rio de Janeiro. However, there
405 are no possible evaluations made to date on how these findings are associated with the development
rin

406 of LC symptoms. As a result, there still remains a gap between understanding the physiological journey
407 of COVID-19 and the transition to LC among the Brazilian population, as well as a possible
408 explanation to the symptoms reported within our study. Regional differences further purport a need to
ep

409 have merged databases for real-time statistical analysis to be concluded to better define and translate
410 these findings to improve clinical management and treatment for LC patients.
Pr

411

412 The use of electronic medical records (EMR) ontology is another avenue that should be considered to
413 generate more comprehensive data outputs. The premise of EMR ontologies would use existing
19

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
414 Portuguese COVID-19 ontologies to formulate a more comprehensive basis using our findings to

ed
415 develop a phenotype that is aligned to the wider Brazilian populations.
416

417 At present, long-covid has attracted extensive attention, and its prevalence of different symptoms has

iew
418 been used in many aspects. Our finding indicate there is a growing need to better understand LC in
419 Brazil with longitudinal data that could provide a better understanding of the phenotype, and mortality
420 as well as the impact on various ethnicities and races. This could assist with designing better clinical
421 care plans for patients.

v
422

re
423

424 Limitations
425

426
er
We acknowledge biases exist within this dataset and requires longitudinal data to make definitive
427 conclusions. The records of patients in our study are from the local communicable diseases database
pe
428 which has minimal information of patients that may have self-quarantined with a COVID-19 and
429 potential LC symptoms. As a result, our present results are not representative of the whole LC survivors.
430 We have reported on patients that had pre-existing conditions, where most appear to be within the
ot

431 multimorbidity category. Possible worsening of the disease due to LC is a possibility, however, we did
432 not have access to all prior medical records for these patients to comprehensively analyse an impact
tn

433 and/or risk score by way of a frailty index.

434
rin

435

436

437
ep

438 Conclusion
439 The demographic characteristics combined with the clinical symptoms identified provide insightful
Pr

440 information pertinent to a Brazilian cohort that would aid policymakers to plan for better infection
441 control protocols in the future, and management of clinical services for LC patients, especially those
20

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
442 with pre-existing conditions. It is equally important to have a definitive diagnosis of LC recorded

ed
443 within the patient record and not just the symptoms. This could potentially improve the prognosis and
444 mortality among LC patients with comorbidities. The important synchronization of genomics and
445 epidemiological data could provide an important facet to improve the patient and clinical reported

iew
446 outcomes of LC within the Pernambuco municipality, Brazil. Our findings could be combined with
447 other regional datasets to predict pattern inferences of LC spread, prognosis and morbidity, including
448 for multimorbidity and pregnant patients. The use of rigorous scientific studies within Brazil would
449 aid to better prepare for future pandemics and resulting potential disease sequalae.

v
450

re
451

452 er
453
pe
454

455

456
ot

457
tn

458

459
rin

460

461

462
ep

463

464
Pr

465

21

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
466 References

ed
467

468 1 Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, et al. Clinical features of patients infected with
469 2019 novel coronavirus in Wuhan, China. Lancet 2020; 395(10223): 497–506.

iew
470

471 2 World Health Organization. 2019 Novel Coronavirus (2019-nCoV): Strategic Preparedness and
472 Response Plan. Available from: https://www.who.int/docs/default-source/coronaviruse/srp-

v
473 04022020.pdf.

re
474

475 3 Del Rio C, Collins LF, Malani P. Long-term Health Consequences of COVID-19. JAMA 2020;
476 324(17): 1723–1724. doi:10.1001/jama.2020.19719.
477
er
478 4 Phiri P, Ramakrishnan R, Rathod S, Elliot K, Thayanandan T, Sandle N, Haque N, Chau SW,
479 Wong OW, Chan SS, Wong EK, Raymont V, Au-Yeung SK, Kingdon D, Delanerolle G. An evaluation
pe
480 of the mental health impact of SARS-CoV-2 on patients, general public and healthcare professionals:
481 A systematic review and meta-analysis. EClinicalMedicine 2021; 34: 100806.
482 doi:10.1016/j.eclinm.2021.100806.
ot

483

484 5 Nasserie T, Hittle M, Goodman SN. Assessment of the Frequency and Variety of Persistent
tn

485 Symptoms Among Patients With COVID-19: A Systematic Review. JAMA Netw Open 2021; 4(5):
486 e2111417. doi:10.1001/jamanetworkopen.2021.11417.
487
rin

488 6 Johnson RA, Wichern DW. Applied multivariate statistical analysis. London: Pearson; 2014.
489
ep

490 7 Ripley, Brian D. Modern applied statistics with S. New York: springer, 2002.
491

492 8 Liu Y, Li Z, Xiong H, Gao X, Wu J. Understanding of internal clustering validation measures.


Pr

493 2010 IEEE international conference on data mining. IEEE, 2010: 911-916.
494 doi:10.1109/ICDM.2010.35.

22

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
495

ed
496 9 Joshi K D, Nalwade P S. Modified k-means for better initial cluster centres. IJCSMC, 2013; 2(7):
497 219-223. Available from:
498 http://d.researchbib.com/f/6nnJcwp21wYzAioF9xo2AmY3OupTIlpl9XqJk5ZwNkZl9JZxx3ZwNkZ

iew
499 mDkYaOxMt.pdf
500
501 10 BBC news. Brazil: Why are so many pregnant women dying from Covid? Available from:
502 https://www.bbc.co.uk/news/av/world-latin-america-57974754

v
503

11 Costa ML, Souza RT, Pacagnella RC, et al. Facing the COVID-19 pandemic inside maternities in

re
504

505 Brazil: A mixed-method study within the REBRACO initiative. PLoS One. 2021 Jul
506 23;16(7):e0254977. doi: 10.1371/journal.pone.0254977.
507
er
508 12 Timmers, L.F.S.M., Peixoto, J.V., Ducati, R.G. et al. SARS-CoV-2 mutations in Brazil: from
509 genomics to putative clinical conditions. Sci Rep 2021; 11:11998. doi:10.1038/s41598-021-91585-6
pe
510

511
ot

512

513
tn

514

515
rin

516

517
ep

518

519
Pr

520

521
23

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
522 Declaration of interests

ed
523 GD was supported by National Institute for Health Research (NIHR) Research Capability Funding
524 (RCF) and by Southern Health NHS Foundation Trust. AS is supported by industry funding.
525

iew
526 Contributions made by authors
527
528 GD, HC, PP and JQS conceptualized the logic model of this paper. GD and HC wrote the first draft of
529 the manuscript. The data analysis was conducted by ZQ, YZ, JQS, AS, HC and GD. The data extraction

v
530 and review were completed by HC and VN. The full paper was critically appraised by all authors. All
authors approved the final version of the manuscript.

re
531

532 Acknowledgements
533 We acknowledge our STOBE collaborative effort spans across China (Southern University of Science
er
534 and Technology), United Kingdom (University of Oxford, University of Southampton, University
535 College London, Southern Health NSH Foundation Trust, University College London Hospital NHS
pe
536 Foundation Trust and Alan Turing Institute) and Brazil (University of Pernambuco) making this a
537 unique project to share best research practices and knowledge for public benefit and is not a reflection
538 of the organisations.
ot

539

540
tn
rin
ep
Pr

24

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
Figure 1. Age-sex structure and mortality characteristics of patients in dataset.

ed
v iew
re
er
pe
ot
tn
rin
ep
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
Figure 2. Results of word cloud analysis for the feature of comorbidities.

ed
v iew
re
er
pe
ot
tn
rin
ep
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
Figure 3. Characteristics of comorbidities.

ed
v iew
re
Note:CCD=Cardiac Chronic Disease.
er
pe
ot
tn
rin
ep
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
Figure 4. Heat map of correlation between the four themes of symptoms.

ed
v iew
re
er
pe
ot
tn
rin
ep
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
Figure 5. A consort diagram of the recovery time

ed
v iew
re
er
pe
ot
tn
rin
ep
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
Figure 6. Scatter plot of distribution of the recovery time in subgroups with different number of
symptoms.

ed
v iew
re
er
pe
ot
tn
rin
ep
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
Figure 7. Prevalence of four themes in subgroups by age, gender, and skin color.

ed
v iew
re
er
pe
ot
tn
rin
ep
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
Figure 8. Line chart of the prevalence from total symptoms and the top symptoms in the four
themes as the number of comorbidities change.

ed
v iew
re
er
pe
ot
tn
rin
ep
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
Figure 9. Optimal choice of the clusters by Within-cluster sum-of-squares.

ed
v iew
re
er
pe
ot
tn
rin
ep
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
Figure 10. Clustering results for disease severity. (0 indicates mild, 1 indicates moderate, 2 indicates
fair severity, and 3 indicates very severe)

ed
v iew
re
er
pe
ot
tn
rin
ep
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535
Figure 11. The relationship between the number of comorbidities, gender, age, race and
mortality in the combined dataset.

ed
v iew
re
er
pe
ot
tn
rin
ep
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4016535

You might also like