Professional Documents
Culture Documents
Abstract—Complete Blood Count (CBC) test is considered as Organization, the number of people suffering from anemia in
a very important test to evaluate the overall health of a person. It the world is 1.62 billion, with a proportion of 47.4% in children
plays a significant role in finding an infection, leukaemia, cancer, and 41.8% in pregnant women. Therefore, if it is diagnosed at a
anaemia, blood count, deficiency of vitamins and minerals, etc. previous stage, its adverse effect on the study of students can
Gujarat Vidyapith is a deemed to be university in which newly be reduced. Symptoms such as feeling tired, asthma, breathing
admitted students are medically checked up. Here, the health problems, nausea, feeling of restlessness, pail nails, etc are
condition of students, proper diagnosis and appropriate found in anemic students.
treatments are taken care of. CBC test is considered as one of the
most important sources for health analysis. Anemia indicates a reduction of hemoglobin or red cell in
However, there are many other variability and confidentiality human blood. This research paper predicts anemic and non-
in the CBC test, which is difficult to be evaluated directl y using anemic using different parameters of the blood count test
Machine Learning Algorithm. Therefore, a model is developed in including age, gender, blood group, hemoglobin, hematocrit,
this paper using the Naive Bayes Classifier which can predict MCV, MCH, and other parameters. Here, the obtained values
Anemia by analyzing the probabilities of effects of various have been compared with the normal range of such tests to
parameters of the CBC test. It will help to detect anemia at an derive the conclusion.
early stage so that the probability of serious complications can be
reduced. S tudents can take precautions to improve their health Thus, a parser has been created to read the CBC data-set
standards. Ultimately, it will affect the level of concentration, file collected from different laboratories, which converts the
regularity and mental health of students. Therefore, the l evel of data to a readable format. Then anemia can be predicted by
education can be improved. analyzing that data.
Keywords—Anemia, CBC Parameters, Naive Bayes algorithm,
Prediction, Machine Learning, Healthcare. Naive Bayes Algorithm gives a conditional probability,
which works on the Bayesian theorem. The Bayesian theorem
is used for the supervised learning method as well as the
I. INT RODUCT ION statistical method for estimating the classification and the
Complete Blood Count (CBC) test is called as an important probability model underlying it. And it allows capturing
test for whole-body health screening. For the diagnosis of any uncertainty about the model theoretically by determining the
disease, firstly the doctor will prescribe the CBC test which can probabilities of the results. Thereby solving diagnostic and
diagnose any type of infection, cancer, anemia, deficiency of prediction problems.
vitamins and minerals. Nevertheless, it provides information on
body blood disorders and abnormal conditions. The blood
II. M ET HODOLOGY
count test is considered to be a very important source to know
the overall health analysis as well. In this research study, the data-set represents the
information for the prediction of Anemia. The students in the
Gujarat Vidyapith is a deemed to be university in which age group from 16 to 33 years studying in Gujarat Vidyapith
newly admitted students are examined. If there is any disease are selected as the sample of the study. Here, CBC samples of
or symptom, an attempt is made to remove it. As the CBC 2151 students of the last five years (from 2014 to 2018) are
report collected from different laboratories varies and being taken as data set for the research, which is collected from
confidential, it is difficult to be collected. Therefore, because of various pathology laboratories. There are many inequalities due
the diversity in the CBC report, 16 parameters have been to various testing tools. It can be said that not all dimensions of
selected, which play an important role in the classification of the CBC report are important for the prediction of anemia
anemia. (Research of the Department of Pathology at the University of
Anemia means low blood volume in the body. Anemia is Medicine, Utah). Therefore, only 16 parameters which are very
not a disease but a symptom that can cause serious illness in important for the prediction of anemia are considered and
the body. It is the biggest and most dangerous problem of the tested in the present research study.
human body. According to the research of the World Health
Authorized licensed use limited to: University of Exeter. Downloaded on June 22,2020 at 05:07:10 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Inventive Computation Technologies (ICICT-2020)
IEEE Xplore Part Number:CFP20F70-ART; ISBN:978-1-7281-4685-0
Here, the prediction of anemia is done using 16 parameters Fig.2. Missing Value Parameter
in the data set as mentioned in the above table.
Fig.3. HB classification
Authorized licensed use limited to: University of Exeter. Downloaded on June 22,2020 at 05:07:10 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Inventive Computation Technologies (ICICT-2020)
IEEE Xplore Part Number:CFP20F70-ART; ISBN:978-1-7281-4685-0
Authorized licensed use limited to: University of Exeter. Downloaded on June 22,2020 at 05:07:10 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Inventive Computation Technologies (ICICT-2020)
IEEE Xplore Part Number:CFP20F70-ART; ISBN:978-1-7281-4685-0
It is shown in the table that the respondents with O+ve is the proportion of students whose test is positive for anemia
blood group are comparatively more (31.6%) followed by [24].
B+ve (28.9%), A+ve (28.9%). Other blood groups have been
Specificity: 0.8737
found only in 3.63% of the respondents.
Specificity correlates with the ability of tests to properly refuse
In the present research study, a prediction model is anemic students without any conditions, which is the
developed using the Naive Bayes Classifier which can predict proportion of non-anemic students who are not anemic, which
Anemia from the CBC data-set. Here the whole sample of 2151 shows a negative test for them [24].
students has been classified into two parts: 80% of the Training Testing with high specificity for the presence of anemia
set and 20% of the Testing set. Then the anemia in students has results in low levels of hemoglobin positively. Low levels of
been found using Anemia Test as a target variable keeping into hemoglobin are rarely found in non-anemic students. Low
consideration 16 parameters. levels of hemoglobin indicate the probability of the presence of
anemia.
Method: 10 fold Cross-Validation
'Positive' Class: Anemic
Model Evaluation Here shows that through the Naive Bayes classifier model,
Using the test data set on the model to test the efficiency of one can predict whether a student has anemia or not, the
the model prepared by the training set, the confusion matrix, accuracy of the model is about 85%.
and accuracy evaluation model has been prepared as follows.
Bayesian Theorem
TABLE IV. CONFUSION MATRIX The Bayesian theorem provides a way to calculate the
posterior probability.
Confusion Matrix Prediction Predicted Value
Anemic Non-Anemic P (a|b), from P (a), P (b), and P (b|a). The Naive Bayes
classifier assumes that the effect of the predictor (b) on a given
Actual Value Anemic 193(T P) 25(FN) class (a) is independent of other predictor values.
Non-Anemic 39(FP) 173(T N)
Equation: P (a|b) = P (b|a) * P(a)
P(b) (1)
The above confusion matrix table shows the values of TP a=anemia_status (Anemic|NonAnemic)
(True Positive), TN (True Negative), FN (False Negative), FP b1=hb, b2=RBC, b3= pcv, b4=mcv, b5=mch,
(False Positive). It shows probable value and actual value from b6=mchc, b7=rdw, b8=wbc, b9=platelet.
which the accuracy of the model can be calculated as follows:
Accuracy: TP+TN = 193+173 = 366 = 0.85
TP+FP+TN+FN = 193+39+173+25 = 430 P (a|b) is the posterior probability of class (target)
TABLE V. MODEL EVALUATION STATISTICS given predictor (attribute).
Mode l Evaluation with Statistics P(a) is the prior probability of class.
Accuracy 0.8512 Pos Pred Value 0.8853
P-Value < 2e-16 Neg Pred Value 0.8160 P (b|a) is the likelihood which is the probability of
[Acc > NIR] predictor given class.
Kappa 0.702 Prevalence 0.5395
P(b) is the prior probability of predictor.
Mcnemar's Test 0.1042 Detection Rate 0.4488
P-Value Classification of the effect of CBC parameters by Naive
Sensitivity 0.8319 Detection Prevalence 0.5070
Bayes.
Specificity 0.8737 Balanced Accuracy 0.8528
'Positive' Class Anemic Naive Bayes gives the conditional probability. It shows
whether the student is anemic or non-anemic on the basis of 9
parameters in normal/abnormal condition. This is mentioned in
P-Value [Acc > NIR] : < 2e-16 the following table.
Here, the p-value is less than 0.05. Hence the null
hypothesis is rejected at a 5% level of significance. Thus, it can Bayesian Theorem Formula
be said that the CBC report can predict anemia [25].
P (a|b1.....b9) = P (b1|a)*P(b2|a)*P(b3|a)*…*P(b9|a)*P(a)
Kappa: 0.702
Kappa values are always less than or equal to 1. A value of TABLE VI. FEMALE ANEMIC P REDICTION P ROBABILITY T ABLE
1 shows perfect agreement and values less than 1 shows less Prediction of Female Anemic on Normal Condition
than perfect agreement. Here the Kappa value is 0.702, which Parameter Condition Sample Probability
is near to one. Hence, it indicates nearly perfect agreement Sex Female 1129/2151 0.525
[27]. HB Normal 399/1129 0.353
RBC Normal 964/1129 0.854
Sensitivity: 0.8319 PCV Normal 520/1129 0.460
Sensitivity here means the test's ability to detect anemia in MCV Microcytic 623/1129 0.551
students, who have low hemoglobin. The sens itivity of the test MCH Normal 330/1129 0.292
MCHC Normal 113/1129 0.100
Authorized licensed use limited to: University of Exeter. Downloaded on June 22,2020 at 05:07:10 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Inventive Computation Technologies (ICICT-2020)
IEEE Xplore Part Number:CFP20F70-ART; ISBN:978-1-7281-4685-0
Authorized licensed use limited to: University of Exeter. Downloaded on June 22,2020 at 05:07:10 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Inventive Computation Technologies (ICICT-2020)
IEEE Xplore Part Number:CFP20F70-ART; ISBN:978-1-7281-4685-0
TABLE XI. ABNORMAL P ARAMETER FEMALE ANEMIA P REDICTION serious illness can be prevented. The health of the students can
Abnormal Parameter Male Anemic Prediction be improved and the drop out ratio can also be reduced.
Sex M
HB Abnormal V. REFERENCES
RBC Abnormal 1. Jaiswal, Manish, Anima Srivastava, and T anveer J. Siddiqui. "
PCV Abnormal Machine Learning Algorithms for Anemia Disease Prediction." In
MCV Microcytic Recent T rends in Communication, Computing, and Electronics, pp. 463-
MCH Abnormal 469. Springer, Singapore, 2019.
MCHC Abnormal 2. Induja, S. N., and C. G. Raji. " Computational Methods for
RDW Abnormal Predicting Chronic Disease in Healthcare Communities." In 2019
WBC Abnormal International Conference on Data Science and Communication
Plt Abnormal (IconDSC), pp. 1- 6. IEEE, 2019.
Anemia ? 3. Shetty, Badreesh. " Supervised Machine Learning: Classification.
" T owards Data Science, 2018.
P(anemia|M-Abnormal)=P(RBC-Ab|anemia)*P(PCV- 4. Saxena Shruti. " Precision vs Recall."Towards Data Science,
Ab|anemia)*P(MCV-Microcytic|anemia)…*P(M|Abnormal) 2018.
=0.113* 0.181* 0.376* 0.527* 0.759* 0.106* 0.081* 0.131* 5. Jaitley, Urvashi. " Why Data Normalization is necessary for
Machine Learning models. " medium.com, 2018.
0.024* 0.353* 0.004*0.421
6. Abdullah, Manal, Salma Al-Asmari. " Anemia type’s prediction
=0.00000000211 based on data mining classification algorithms." In T aylor &
Francis Group, London, 2017.
= P(anemia|M_Abnormal) = 0.00000000211 (2.112632e-8) 7. Stecanella, Bruno. " A practical explanation of a Naive Bayes
classifier,https://monkeylearn.com/blog/practical-explanation-naive-
bayes-classifier/, 2017.
8. C.H. Yu, M. Bhatnagar, R. Hogen, D. Mao , A. Farzindar, K.
Dhanireddy. " Anemic Status Prediction using Multilayer
Perceptron Neural Network Model. " In EPiC Series in
Computing, Pages 213-220, Volume 50, 2017.
9. Joshi, Renuka. "Accuracy, Precision, Recall & F1 Score:
Interpretation of Performance Measures - Exsilio Blog, 2016.
10. Brownlee, Jason. " What is a Confusion Matrix in Machine
Learning, 2016.
11. Medhekar, Dhanashree S, Mayur P. Bote, Shruti D. Deshmukh. "
Heart Disease Prediction System using Naive Bayes. " In
IJERST E, 2013.
12. Data school. "Simple guide to confusion matrix terminology.
2014.
13. T efferi, Ayalew, Md; Curtis A. Hanson, Md; And David J. Inwards, Md,
How to Interpret and Pursue an Abnormal Complete Blood Cell Count
Fig.6. Importance of parameter in Adults., Mayo Clin Proc,2005.
14. Henry O. Ogedegbe, Ph.D., BB (ASCP)SC,1 Laszlo Csury, MD,2 Byron
The above chart shows the importance of different CBC H. Simmons, MD2, Anemias: A Clinical Laboratory Perspective,
parameters among which the HB parameter is the most laboratory medicine, 2004.
important. Therefore, it can be said that HB plays a significant 15. Pattekari, Shadab Adam and Asma Parveen. " Prediction System
role in the prediction of anemia. Subsequently, PCV, MCV, For Heart Disease Using Naive Bayes.
RDW, WBC and MCH parameters are also important. This can 16. Mehta Vinod Kumar." Anemia In Urban and Rural School Girls
be said from the research of the Department of Pathology at the Aged12-16 Years Shimla- A Comparative Study.2004, National Institute
Of Epidemiology.
Utah University of Medicine [26]. This plays an important role
in predicting different types of anemia such as iron deficiency, 17. J. David Bessman, M.D., P. Ridgway Gilmer, Jr., M.D., And Frank H.
Gardner, M.D., Improved Classification of Anemias by MCV and RDW,
vitamin B12 deficiency, sickle cell anemia, aplastic anemia, 1983, American Society of Clinical Pathologists.
anemia of chronic disease. Furthermore, it includes parameters 18. Bessman, J.David and Randall K. Johnson, Erythrocyte volume
such as neutrophil, lymph, eosin, mono, bgr, (blood group); distribution in normal and abnormal subjects.
which is statistically significant but medically not significant. 19. Kariyeva, G.K., A. Magtymova, and A. Sharman, ANEMIA, chepter12.
IV. CONCLUSION 20. Naive Bayesian, https://www.saedsayad.com/naive_bayesian.htm.
21. Narkhede, Sarang. Understanding Confusion Matrix, T D Science.
In this research study, an analysis has been conducted on 22. NHNES, Laboratory Procedure Manual.
the CBC data set of the students of higher education in Gujarat
23. Anand, Avati, Evaluation Metrics (Classifiers) CS229 Section.
Vidyapith. In this, the proportion of various parameters has
24. Sensitivity and Specificity.From Wikipedia.
been tested. The analysis revealed that the proportion of
25. https://www.investopedia.com/terms/p/p-value.asp.
anemic students was 54% (1161); while the proportion of non -
anemic students is 46% (990). Therefore, anemia may be one 26. Agarwal, Archana M, M.D., Diagnostic approach to Anemia.
of the reasons for the reduction in the number of students. 27. http://www.pmean.com/definitions/kappa.ht m.
Authorized licensed use limited to: University of Exeter. Downloaded on June 22,2020 at 05:07:10 UTC from IEEE Xplore. Restrictions apply.