Professional Documents
Culture Documents
An Effective Approach For Pneumonia Detection Using Convolution Vision Transformer
An Effective Approach For Pneumonia Detection Using Convolution Vision Transformer
CHRIST (Deemed to be University), Pune Lavasa Campus, India. Oct 14-15, 2022
Abstract – Early detection of pneumonia in patients As per the findings of studies in the health sector, y
through effective medical imaging may enable timely remedial Convolutional Neural Networks (CNN), was instrumental in
measures and reduce the severity of the infection. There is an examination of chest X-ray images for the identification of
increase in cases among new-borns, teenagers and also people the infection. However, the accuracy obtained with it is not
with health issues in recent years. The COVID-19 pandemic satisfactory. The other deep learning models used for the
also revealed the major impact pneumonia had on the lungs same purpose are VGG-19 and ViT. ViT is an upcoming
and the consequences of delayed detection. The presence of the new model for image processing which is still in the process
infection in the lungs is examined through images of Chest X- of development. ViT has lately surpassed CNN’s
ray, however, for an early diagnosis of the infection, this paper
performance for image classification, however, the expense
proposes an automated model as a more effective alternative.
Convolutional Vision Transformer (CVT) which gives an
of pre-training is still exorbitant given the huge outside
accuracy of 97.13%, and is a robust combination of datasets [3].
Convolution and Vision Transformer (ViT), is suggested in this A combination of Vision Transformer along with
paper as a potential model to detect pneumonia early in convolutional and pooling layers has been employed in this
patients. study to examine chest X-Ray images. The CVT system
combines two widely used architectures to overcome some
Keywords - Chest X-ray, Pneumonia, Detection, Vision
important limitations of each approach on its own, namely
Transformation, Convolution Neural Network, Convolutional
Vision Transformer, Deep Learning.
Convolutional Neural Networks (CNNs) and Transformer-
based models. By leveraging both techniques, the Vision
I. INTRODUCTION System can outperform existing architectures, especially at
low data levels while achieving similar performance with
Countries in the developing stage have registered an respect to the large dataset regime. This process occurs
increase in the cases of pneumonia. Excess pollution, poor without compromising on accuracy or speed [4].
living conditions, rise in population and scarcity of medical
infrastructure has contributed to the increasing rate of II. LITERATURE REVIEW
infection. Overcoming these setbacks is possible by looking
at how technology can be used on a wide scale to detect the Debaditya Shome et al., [5] described the
condition before it reaches a point of severity. Patients Vision Transformer (ViT) which supports deep learning for
diagnosed early will be able receive specialized treatments. COVID -19 prediction from the images that are based on
Due to the simple fact that pneumonia can be detected only chest X-ray. A total of 30 K collection images of chest X-
by images of a chest X- ray, there is a setback as assessing rays were gathered and used. The data set for the study
the images is a challenging task and comes with high health composed of mixing different open-source data sets A very
risk. [1] good score of accuracy that reached to 98% and 99% was
noted. A self-operating pneumonia detection model works
Increasing the chance of survival for pneumonia patients effectively in early prediction of pneumonia in many places
is possible through timely examination The method of as told by Khushal Tyagi et al., [6] They implied 3 different
diagnosing the infection with chest X-ray images is risky and models named Convolutional Neural Network (CNN),
subjected to several health concerns. An automated VGG16, and Visual Transformer (ViT) and compared the
algorithm is required which will improve the quality of outcomes of all models. The outcomes suggested that ViT
healthcare and help in better, more accurate diagnosis of the can recognize pneumonia with 96.45% accuracy. This model
infection. This algorithm will prove to be both time and life- can be used to prevent unfortunate situations in far off
saving for the stakeholders [2]. Among the growing places.
technological advancements, computer vision is a precise
way to identify infection with the help of deep learning As explained in Boyuan Wang et al., [7] the paper shows
frameworks. how early detection of COVID - 19 through deep learning
could help in curbing the spread of the virus to some extent
and also decrease hospital costs. Swim Transformer (ST) is
2
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on August 26,2023 at 16:52:48 UTC from IEEE Xplore. Restrictions apply.
x Randomly zoomed some training images by 2% x Convolutional Layers – In this layer, filters are
applied to extract information about various features
x Randomly flipped images horizontally. of the image. The output of this layer is termed as
feature map and it provides information about the
edges and corners of the image.
3
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on August 26,2023 at 16:52:48 UTC from IEEE Xplore. Restrictions apply.
Instead of adding the full architecture of CNN, only the
convolution layer and pooling layer is added before the
positional embedding layer of the transformer. The image
patches are first passed on to the convolutional and pooling
layers and only then it is embedded, followed by the
transformer encoder and finally the output (as shown in Fig.
4)
4
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on August 26,2023 at 16:52:48 UTC from IEEE Xplore. Restrictions apply.
V. RESULTS AND DISCUSSION
After performing training processes and also testing the
algorithm, the model’s efficiency can be stated based on
four performance metrics namely Precision, Recall, F1 score
and Accuracy.
Correct _ predictions
Accuracy
Total _ predictions
x Recall – It is an estimate which states that out of all Confusion Matrix is essential as it aids in computing the
the positives in the data, how many is the model False Positive (FP), True Positive (TP), False Negative (FN),
able to predict correctly. and True Negative of the prediction where,
FP indicates that the prediction is positive but in reality,
x F1 Score – F1 score is another evaluation metric it is false.
which is used to assess the model’s performance. TP means that the in reality and in prediction it is
Usually, the F1 score is used for evaluating any positive.
model built on an imbalanced dataset instead of
precision and recall. This is because F1 score is FN indicates that the prediction is negative and same is
attained when precision and recall are combined, the case in reality
hence biasness is eliminated from this evaluator. TN indicates that the prediction showed negative but in
reality it is found to be true.
Precision u Recall
2u Here the two classes defined are ‘Pneumonia’ and
Precision Recall ‘Normal’ instead of Positive and Negative. The following
can be inferred from the plotted confusion matrix:
Firstly, the training and testing accuracy obtained by the
model is 97.13% and 93.43% respectively which indicates x 368 images have been correctly predicted as
that the model is performing very well on both training and Normal.
testing data. Both the accuracies have been plotted and
shown in Fig. 7. x 22 images have been predicted as normal but it
actually belongs to Pneumonia class.
5
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on August 26,2023 at 16:52:48 UTC from IEEE Xplore. Restrictions apply.
indicates that the model is able to predict and deal with both [3] S. d’Ascoli, H. Touvron, M. L. Leavitt, A. S. Morcos, G. Biroli and
the classes effectively. L. Sagun, "Convit: Improving vision transformers with soft
convolutional inductive biases," International Conference on Machine
Table 3 depicts the comparison of the model with the Learning, no. 1, pp. 2286-2296, 2021.
other existing models. And it can be seen that CVT [4] A. Razzaq, "Marktechpost.com," 20 july 2021. [Online].
outperforms all the models in all the evaluation metric. Also, [5] D. Shome, T. Kar, S. N. Mohanty, P. Tiwari, K. Muhammad, A.
as mentioned before CVT can even perform better with less AlTameem and A. K. J. Saudagar, "Covid-transformer: Interpretable
covid-19 detection using vision transformer for healthcare.,"
data and therefore it requires less computational power, International Journal of Environmental Research and Public Health,
which makes CVT better than others. no. 1, 2021.
[6] K. Tyagi, G. Pathak, R. Nijhawan and A. Mittal, "Detecting
TABLE III. COMPARISON ACROSS MODELS Pneumonia using Vision Transformer and comparing with other
techniques.," 5th International Conference on Electronics,
Communication and Aerospace Technology (ICECA), pp. 12-16,
2021.
[7] B. Wang, D. Zhang and Z. Tian, "STCovidNet: Automatic Detection
Model of Novel Coronavirus Pneumonia Based on Swin
Transformer.," no. 1, 2022.
[8] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai,
T. Unterthiner and N. Houlsby, "COVID-Transformer: Interpretable
COVID-19 Detection Using Vision Transformer for Healthcare," no.
1, 2020.
VI. CONCLUSION [9] S. Park, G. Kim, Y. Oh, J. B. Seo, S. M. Lee, J. H. Kim and J. C. Ye,
Employing the Convolution Vision Transformer (CVT) "AI can evolve without labels: self-evolving vision transformer for
chest X-ray diagnosis through knowledge distillation.," 2022.
to examine chest X-Ray images was proposed to arrive at a
[10] T. Banerjee, S. Karthikeyan, A. Sharma, K. Charvi and S. Raman,
timely detection of pneumonia. CVT is a combination of "Attention-Based Discrimination of Mycoplasma Pneumonia,"
Vision Transformer along with convolutional and pooling Proceedings of International Conference on Computational
layers. This hybrid model not only outperforms both CNN Intelligence and Data Engineering , no. 1, pp. 29-41, 2022.
and CVT, but it also it overcomes the limitation of both the [11] X. Gao, Y. Qian and A. Gao, "COVID-VIT: Classification of Covid-
approaches. The final train and test accuracy obtained by the 19 from CT chest images based on vision transformer models," p. 6,
model is 97.13 % and 93.34 % respectively. Pneumonia is an 2021.
extremely critical health issue and early detection can be life [12] K. Sivarama Krishnan and K. Sivarama Krishnan, "Vision
saving for some patients. This model will help in faster and Transformer based COVID-19 Detection using Chest X-rays," p. 5,
2021.
more accurate examination of pneumonia and will help the
[13] N. Nguyen and J. M. Chang, "COVID-19 Pneumonia Severity
health care sector to function effectively. Prediction using Hybrid Convolution-Attention Neural
Architectures," 2021.
REFERENCES
[14] M. M. Al Rahhal, Y. Bazi, R. M. Jomaa, A. AlShibli, N. Alajlan, M.
[1] R. Kundu, R. Das, Z. W. Geem , G. T. Han and R. Sarkar, L. Mekhalfi and F. Melgani , "COVID-19 Detection in CT/X-ray
"Pneumonia detection in chest X-ray images using an ensemble of Imagery Using Vision Transformers," Journal of Personalized
deep learning models," Plos one, no. 1, 2021. Medicine, 2022.
[2] B. Almaslukh, "A Lightweight Deep Learning-Based Pneumonia
Detection Approach for Energy-Efficient Medical Systems," Wireless
Communications and Mobile Computing, p. 14, 2021.
6
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on August 26,2023 at 16:52:48 UTC from IEEE Xplore. Restrictions apply.