You are on page 1of 7

Journal of

Journal of National
National Institute
Institute of
of Laboratory
Laboratory Medicine
Medicine and
and Referral
Referral Centre
Centre Bangladesh
Bangladesh ISSN (print)
ISSN (print) 2958-471X
2958-471X
Vol.2,
June
Dec No.1,
2021,
2022, JuneNo.2,
Vol.1,
Vol.2, 2022 PP28-28
No.1, 04-10

Original Article

Exploratory Data Analysis in CBC datasets of COVID-19 positive and negative patients
Zohra Khatun1 2 3
,
4

5 6 7 8

1
Associate Professor (Pathology), National Institute of Laboratory Medicine and Referral Center, Dhaka, Bangladesh; 2Department of CSE, Dhaka
University of Engineering and Technology, Gazipur, Dhaka, Bangladesh; 3Consultants, Narsingdi District Hospital, Narsingdi, Dhaka, Bangladesh;
4
Assistant Professor (Pathology), National Institute of Laboratory Medicine and Referral Center, Dhaka, Bangladesh; 5Assistant Professor (Hematology),
National Institute of Laboratory Medicine and Referral Center, Dhaka, Bangladesh; 6
Referral Center, Dhaka, Bangladesh; 7Professor (Pathology), Sher-E-Bangla Medical College, Barishal, Bangladesh; 8
Biochemistry, National Institute of Laboratory Medicine and Referral Centre, Dhaka, Bangladesh.

[ 10 March 2022; 4 May 2022; 1 June 2022]

Background: COVID-19 (Coronavirus) disease is one of the most critical human diseases in the contemporary world and

Objective: Methodology:

Result:

Conclusion:

COVID-19; CBC; Hb; ESR; RBC; WBC; Data; Analysis.


Correspondence:

Copyright:

Introduction
patients.1CBCs are a practical and useful laboratory test.
In order to identify key indicators of disease progression
on January 12, 2020. Fever, a dry cough, exhaustion, and stage and to give clinicians a basis for diagnosis and

4
Journal of National Institute of Laboratory Medicine and Referral Centre Bangladesh ISSN (print) 2958-471X
Vol.2, No.1, June 2022

collected from the department of pathology in National


changed after disease onset. 1
institute of cardiovascular diseases & hospital (NICVDH),
The unique SARS-CoV-2-caused coronavirus pandemic Dhaka, Bangladesh from July 2, 2021 to July 10, 2021 and
has had a severe negative impact on both human health and

resources for medical testing and treatment in Bangladesh.


In this situation, machine learning techniques have been the COVID-19 Negative group for subsequent statistical
heavily used to assess various forms of medical data, aid analysis.
Data scientists use
Numerous studies are using machine learning techniques
to analyze clinical data, including complete blood counts use of data visualization techniques, to examine and
analyses CBC data sets and highlight their key properties.3

techniques to address the class imbalance. Our paper gives of the formal modeling or hypothesis testing process.

produce the best results for several pertinent metrics and methods you are thinking about using for data analysis are

American mathematician John Tukey in the 1970s, are still


a popular approach in the data discovery process today.4
In the statistical analysis categorical
variables are expressed as percentages, absolute numbers
provided in.2
used for comparisons among multiple variables5. Statistical
data sources, use them to determine and rank the patients
in R-Programming Language. Visualization to scatter plot
results to make further decisions. The most accessible using ggplot2 in R through local polynomial regression and
datasets stored, sorted, grouped and organized by MySQL
blood count (CBC), according to and (Ferrari et al., 2020), Database.
The describes characteristics of
diagnostic method for COVID-19.2 a CBC datasets. min: minimum value of CBC datasets,

examined at the department of pathology in National sum: the values of sum in the CBC datasets, mean:
institute of cardiovascular diseases & hospital (NICVDH), value of average in the CBC datasets, median: the value
Dhaka, Bangladesh from July 2, 2021 to July 10, 2021. of middle number in according to ascending order or
Most of the patients complained of fever or chest pain, descending order observation value in datasets and can be
respiratory symptoms. more descriptive of that CBC datasets then the average,

deviation of a random variable from its datasets mean. As

5
Journal of
Journal of National
National Institute
Institute of
of Laboratory
Laboratory Medicine
Medicine and
and Referral
Referral Centre
Centre Bangladesh
Bangladesh ISSN (print)
Vol.2, 2958-471X
No.1, June 2022
Vol.2, No.1, June 2022

apart a set of numbers are from their mean value in CBC


datasets. Variance has a central role in statistics analysis of CV Platelet
CBC datasets.SE.mean: Standard error of the mean is an
min 13.1 35.9 4000 205
evaluation of the variability of the mean based on various
max 161 64.8 26500 496000
range 147.9 28.9 22500 495795
of variation that takes the variance’s square root.coef.var:
sum 2042.7 5430.3 1018070 2807235
the standard deviation is divided by the mean to calculate
median 15.6 46.15 7800 234500
mean 17.61 46.81 8776.47 242003

mean denotes the possibility, expressed as a percentage, SE.mean 1.27 0.45 348.56 8730.86
CI.mean.0.95 2.51 0.9 690.43 17294.2
sample mean. var 185.95 23.71 14093343 8842431576

A correlation matrix is a table of std.dev 13.64 4.87 3754.11 94034.2


coef.var 0.77 0.1 0.43 0.39

display (Figure-06) of correlation matrix using corrgram


Monocytes & Eosinophils

panel of CBC datasets. Monocytes Eosinophils


min 18 3 1 1
max 92 76 12 30
Age
range 74 73 11 29
min 2 3.08 2 1.4
sum 7521 2981 716 384
max 80 20.38 140 6.25
range 78 17.3 138 4.85 median 64 26 5 3

sum 5715 1292.7 2792 492.84 mean 64.8 25.7 6.17 3.31
median 52 11.11 20 4.165 SE.mean 1.2 1.13 0.19 0.3
mean 49.27 11.14 24.07 4.249 CI.mean.0.95 2.4 2.24 0.39 0.59
SE.mean 1.51 0.2 1.73 0.071 var 174.8 148 4.39 10.18
CI.mean.0.95 3 0.39 3.42 0.14
std.dev 13.2 12.17 2.09 3.19
var 265.73 4.47 346.06 0.579
coef.var 0.2 0.47 0.34 0.96
std.dev 16.3 2.11 18.6 0.761
coef.var 0.33 0.19 0.77 0.179

The descriptive statistics of Complete Blood Cell of 116


PCV MCV MCH MCHC
min 10 8.9 2.2 28.3
max 49.8 98.5 30.9 35.7
range 39.8 89.6 28.7 7.4
sum 4073.7 9615.8 3024.2 3644.05 interval of the mean (3), variance (265.73), standard
median 35.55 84.8 26.6 31.4
mean 35.12 82.89 26.07 31.414
SE.mean 0.54 0.94 0.33 0.099 of Hb (Hemoglobin), ESR (Westergren) & RBC (Red
CI.mean.0.95 1.07 1.86 0.65 0.195 Blood Cells).
var 33.85 101.74 12.49 1.13
std.dev 5.82 10.09 3.53 1.063
coef.var 0.17 0.12 0.14 0.034

6
Journal of National Institute of Laboratory Medicine and Referral Centre Bangladesh ISSN (print) 2958-471X
Vol.2, No.1, June 2022

P-Value

49.27 48.431 50.103


Age 0.6075
(2-80) (5-70) (2-80)

Hb (Hemoglobin) 11.14 11.353 10.935 0.3403

Male: 0-10 mm;


ESR (Westergren) 24.07 23.310 24.828 0.005195

Male: 4.5 – 6.5;


RBC (Red Blood Cells) 4.249 4.237 4.261 0.3042

Male: 43-50 %;
35.12 35.152 35.084 0.354
volume) Female: 38-46%
MCV (mean corpuscular volume) 76-94 fL 82.89 83.293 82.497 0.2985
MCH (Mean corpuscular hemoglobin) 27-32 pg 26.07 26.438 25.703 0.2312
MCHC (mean corpuscular hemoglobin
31.41 31.723 31.105 0.7352
concentration)
RDW-CV (Red blood cell distribution
11-16 % 17.61 18.603 16.616 0.7451

RDW-SD (Red blood cell distribution


35-56 fL 46.81 47.147 46.479 0.1183

4000-11000 cumm 8776 8540.862 9012.069 0.09247


Neutrophils 40-75 % 64.84 64.431 65.241 0.5024
Lymphocytes 20-40 % 25.7 25.500 25.897 0.01441
Monocytes 2-8 % 6.172 6.931 5.414 0.8381
Eosinophils 1-6 % 3.31 3.172 4.448 0.004929
Platelet 150000-400000 cumm 242003 222729.397 261275.82 0.2187
MPV (mean platelet volume) 7.0- 11.0 fL 8.407 8.538 8.276 0.2985
Circulating Eosionophil (CE) 40-4000 cumm 276.0 248.172 303.862 0.1971
Data Source: Department of Pathology, National Institute of Cardiovascular Diseases (NICVD) & Hospital.

hemoglobin), MCV (mean corpuscular volume), & MCHC average Hemoglobin (11.14), Red Blood Cells (4.249),
(mean corpuscular hemoglobin concentration). Table-3
volume (82.89), Mean corpuscular hemoglobin (26.07),
mean corpuscular hemoglobin concentration (31.41),

of Neutrophils, Lymphocytes, Monocytes & Eosinophils


in CBC datasets. Monocytes (6.172), mean platelet volume (8.407) of
the patients in the Covid-19 positive group higher than
A baseline characteristic of Complete Blood Cell of
Covid-19 negative group.
Covid-19 positive group patients (5-70 years old) and 58
Covid-19 negative group patients (2-80 years old). The Covid-19 Positive and Covid-19 negative group of
average age (49.27) of the patients in the Covid-19 Negative
(48.431) group higher than Covid-19 positive (50.103)
moving average (local polynomial regression) line of
blood cells (8776), Neutrophils (64.84), Lymphocytes
(25.7), Eosinophils (3.31), Platelet (242003), Circulating Red color points identify the Female patient & Green
Eosionophil (276.0) of the patients in the Covid-19 color points identify the Male patient. Small size points
Negative group higher than Covid-19 positive group and

7
Journal
Journal of
of National
National Institute
Institute of
of Laboratory
Laboratory Medicine
Medicine and
and Referral
ReferralCentre
Centre Bangladesh
Bangladesh Vol.2,
ISSN No.1,
(print) June 2022
2958-471X
Vol.2, No.1, June 2022

Figure-01

Figure-02

Figure-03

Figure-04

8
Journal of National Institute of Laboratory Medicine and Referral Centre Bangladesh ISSN (print) 2958-471X
Vol.2, No.1, June 2022

Figure-05

Lymphocytes, Monocytes, Eosinophils of Covid-19


Positive group and Covid-19 Negative group.

Covid-19 Positive and Covid-19 negative group of

moving average (local polynomial regression) line of

Red color points identify the Female patient & Green


color points identify the Male patient. Small size points
identify
age (Figure-03), Platelet (Figure-04) and relatively big
size points identify the relatively highest value of RBC,
WBC, age, Platelet. The point of relations of Hb and ESR
Figure-06 is relatively closer the moving average line of covid-19

age (Figure-03), Platelet (Figure-04) and relatively big


size points identify the relatively highest value of RBC, Monocytes, Eosinophils of Covid-19 Positive group and
WBC, age, Platelet. The point of relations of Hb and ESR
is relatively closer the moving average line of covid-19
Lymphocytes, Monocytes, Eosinophils of Covid-19
Positive group and Covid-19 Negative group.
Monocytes, Eosinophils of Covid-19 Positive group and
Conclusions

9
Journal of
Journal of National
National Institute
Institute of
of Laboratory
Laboratory Medicine
Medicine and
and Referral
Referral Centre
Centre Bangladesh
Bangladesh ISSN (print)
Vol.2, 2958-471X
No.1, June 2022
Vol.2, No.1, June 2022

COVID-19 Positive and 58 COVID-19 Negative patients. 4. Zhu B, Feng X, Jiang C, Mi S, Yang L, Zhao Z, Zhang Y, Zhang
We demonstrated and statistical analyses some CBC
and mortality in COVID-19 patients: a retrospective study.
parameter such as Hemoglobin, ESR, age, Gender, RBC, BMC Infectious Diseases. 2021 Dec;21(1):1-5.
WBC, Neutrophils, Lymphocytes, Monocytes, Eosinophils,
5. Acik DY, Bankir M. Relationship of SARS-CoV-2 pandemic

and ESR in the covid negative group patients rather than 2021;48(3):161-7.
covid positive group patients. Statistical analysis of CBC 6. Haq AU, Li JP, Memon MH, Nazir S, Sun R. A hybrid intelligent
dataset can give a preliminary idea about Covid-19 positive
or Covid-19 negative. machine learning algorithms. Mobile Information Systems.
2018 Dec 2;2018.
7.
1. Zhang H, Cao X, Kong M, Mao X, Huang L, He P, Pan S, Blood routine test in mild and common 2019 coronavirus
Li J, Lu Z. Clinical and hematological characteristics of 88 (COVID-19) patients. Bioscience reports. 2020 Aug 28;40(8).
8. Bellan M, Azzolina D, Hayden E, Gaidano G, Pirisi M,
hematology. 2020 Dec;42(6):780-7.
2. Dorn M, Grisci BI, Narloch PH, Feltes BC, Avila E, Kahmann R, Avanzi GC. Simple parameters from complete blood count
A, Alho CS. Comparison of machine learning techniques to predict in-hospital mortality in COVID-19. Disease Markers.
handle imbalanced COVID-19 CBC datasets. PeerJ Computer 2021 May 13;2021.
Science. 2021 Aug 12;7: e670. 9. Behrens JT. Principles and procedures of exploratory data
3. analysis. Psychological Methods. 1997 Jun;2(2):131.

10

You might also like