You are on page 1of 7

Proceedings of the Second International Conference on Artificial Intelligence and Smart Energy (ICAIS-2022)

IEEE Xplore Part Number: CFP22OAB-ART; ISBN: 978-1-6654-0052-7

Facemask Detection to Prevent COVID-19 Using


YOLOv4 Deep Learning Model
Praagna Prasad, Akhil Chawla, Mohana
2022 Second International Conference on Artificial Intelligence and Smart Energy (ICAIS) | 978-1-6654-0052-7/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICAIS53314.2022.9742863

Electronics and Telecommunication Engineering, RV College of Engineering®,


Bengaluru, Karnataka, India

Abstract- The on-going global Covid-19 pandemic has impacted donning a face mask, it is not worn in a way that covers both
everyone’s life. World Health Organization (WHO) and their nose and mouth. Thus, they endanger others around them.
Governments all over world have found that social distancing and A lot of this carelessness has led to the subsequent second
donning a mask in public places has been instrumental in wave, third wave, and so on in many countries. In India today,
reducing the rate of COVID-19 transmission. Stepping out of
people still refuse to wear masks properly while stepping out.
homes in a face mask is a social obligation and a law mandate that
is often violated by people and hence a face mask detection model While our healthcare system is overwhelmed and stepping out
that is accessible and efficient will aid in curbing the spread of of our homes can be a dangerous act, even if it’s an essential
disease. Detecting and identifying a face mask on an individual in need. Proposed work assisting all persons in making sure that
real time can be a daunting and challenging task but using deep people abide by the basic rule, i.e., to wear a face mask in
learning and computer vision, establish tech-based solutions that public at all times, for which come up with a design of face
can help combat COVID-19 pandemic. In this paper, YOLOv4 mask detection model using the SOTA deep learning technique
deep learning model is designed and applied deep transfer [2]. Goal is to find out who is not wearing a facial mask in
learning approach to create a face mask detector which can be public places and not abiding rules with the help of a model.
used in real time. GPU used was Google Collab to run the
As stated WHO, respiratory droplets and any type of physical
simulations and to draw inferences. Proposed implementation
considered three types of data as input such as image dataset, contact are two ways the coronavirus spreads, and to prevent
video dataset and real time data for face mask detection. the virus's spread, medical masks are best and only solution.
Performance parameters are tabulated and obtained mean Mask is important for two reasons: virus spreads directly from
average precision of 0.86, F1 score 0.77 for image dataset, 90 % the infected person’s sneezes or cough, and if he/she is masked,
accuracy for video dataset. And real time face mask detector with they will prevent themselves from transmitting to others.
accuracy of 95%, it is successfully able to identify a person with natural spread of the disease is rampant. No one anticipated that
and without facemask and report if they are wearing a face mask would be engulfed by a worldwide pandemic, but with power
or not. of technology, innovate to prevent spread of disease and
Keywords—Convolutional Neural Network (CNN), Artificial hopefully see an end to it. With ongoing crisis, many
Intelligence (AI), Machine learning (ML), Video Surveillance, companies and governments have used power of technology,
Computer Vision (CV),YOLOv4, Facemask detection. particularly artificial intelligence (AI), to help combat the
disease. Seen robots delivering food, medicine, etc. to infected
I. INTRODUCTION individuals in hospitals. AI has been extensively used to help
predict origin of disease and also help to find cure, and many
COVID-19 pandemic has instilled fear among people as this are using it to detect COVID-19 through X-rays and CT scans.
disease can be transmitted through the respiratory system. This AI is even used in building tracking software and wearable tech
virus has killed more than a million people around the globe, to make sure people are abiding by their quarantine rules. AI
and it is expected to rise and continue in same way, leading to and technology have had a significant impact in combating the
the death of many more people. India is on the verge of a disease. The use of AI and DL in healthcare applications is
second wave despite a large-scale vaccination drive [1]. growing rampant, but hasn’t been able to eradicate human
Vaccinating the whole population will take a lot of time, labor. However, as time progresses, AI will be used in
perhaps years, and there is no evidence of reinfection or long- mainstream applications and become an integral part of
term protection against COVID-19. It can be prevented by healthcare systems and governments around the world. This
using safety measures. For example, maintain social distance, work makes use of deep learning to detect masks to prevent the
wash your hands regularly, and wear a mask at all times. A spread of coronavirus. Proposed implementation is aimed at
proper mask that covers mouth and nose is a very important helping government to take action against those not wearing
method to prevent this. A mask of any type gives 98% masks. Used a deep transfer learning approach to go about the
protection against virus droplets spreading through their problem. Transfer learning is an approach in deep learning
mouth. However, we observe that a lot of people in public where knowledge can be transferred from pre-trained, highly
places don’t wear their masks properly, and even if they are efficient models trained by researchers, leading to high
efficiency and reducing the computational complexity [5].

978-1-6654-0052-7/22/$31.00 ©2022 IEEE 382

Authorized licensed use limited to: National University of Singapore. Downloaded on August 25,2022 at 16:06:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Artificial Intelligence and Smart Energy (ICAIS-2022)
IEEE Xplore Part Number: CFP22OAB-ART; ISBN: 978-1-6654-0052-7

Also made use of python along with the cross-platform is another type of algorithm that is used mainly in two-stage
computer vision library, OpenCV, which helped to develop detection schemes [9].
real-time computer vision applications. TensorFlow and keras
provide a python interface for artificial neural networks [6] [7]
[10].
II. MOTIVATION, PROBLEM ANALYSIS AND METHODOLOGY

Coronavirus disease (COVID-19) is a virus infection caused by


a new coronavirus. It has affected lives of millions of people,
and in some cases, people have lost their lives fighting this.
Economies have been shut down and unemployment rate has
seen a sharp increase. It has impacted everyone in one way or
the other and completely disrupted the way institutes,
organizations, and civilizations work. People with fundamental
clinical issues and co-morbidities are at risk, leaving them
vulnerable to serious cases of illness. While following safety
measures and protocols, using a face mask has proven to be a
must, as data shows that it can stop spread of virus between two
masked people. People still tend to not follow this rule in public
and endanger others. A mask detection system will help local
officials to identify rule-breakers quickly and safely without
them putting themselves out there.
Fig.3. CSPDarknet53

Bag-Of-Freebies (BoF): one of the most unique technique used


in YOLOv4, it is a collection of intelligent techniques that are
used during training the data to help to get better and more
accurate outcome without adding any inference time. The bag
of freebies is widely used in data augmentation.
Primarily, data augmentation is used in object detection to
increase the size of training set and it also helps with accuracy
by exposing the model to semantic conditions [11] [12]. One
Fig.1. Proposed methodology
of the latest applications of data augmentation is mosaic data
Figure 1 shows proposed design methodology. Three types of
augmentation. This tiles four images together from the data set
input data, such as image data, video data, and real-time data,
and trains the model to identify and remember smaller objects.
are considered for implementation. Preprocessed data were
divided into training and testing purposes on a 90:10 basis. Bag-Of-Specials (BoS): is one of the most unique technique
used in YOLOv4. However, the difference between BoF and
III. YOLOV4 DEEP LEARNING MODEL BoS is that BoS changes network architecture and augments
the interference cost a bit. One of the most widely used
A. Architectre
techniques is mish activation. It is a type of activation function
which is designed to drive signals left and right. In the drop
block regularization technique, sections of the image are
hidden from the first layer, which forces the network to learn
features to recognize the object.
IV. DESIGN AND IMPLEMENTATION

Fig.2. YOLOv4 Architecture


The architecture of YOLOv4 is shown in figure 2.
Backbone: CSPDarknet53 backbone is used for feature
extraction as shown in figure 3, which includes a series of
convolutional and residual layers.
Neck: used to add extra layers between the head, which is also
known as the dense prediction block, and the backbone. Deep
learning algorithms with a single stage of detection are used in
deep learning algorithms with a single stage of prediction. This Fig.4. Proposed implementation

978-1-6654-0052-7/22/$31.00 ©2022 IEEE 383

Authorized licensed use limited to: National University of Singapore. Downloaded on August 25,2022 at 16:06:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Artificial Intelligence and Smart Energy (ICAIS-2022)
IEEE Xplore Part Number: CFP22OAB-ART; ISBN: 978-1-6654-0052-7

Figure 4 shows the block diagram of proposed implementation, by creating a random fixed-size batch of the input training data
the process is split into multiple parts as follows. and providing it to the model to do prediction and keep
modifying weights. The dataset is made to pass through the
A. Acquiring the Dataset
model for multiple iterations until the required accuracy is
The process begins by finding or creating an appropriate reached. At the end of training, fine-tuned weights file that
dataset, which consists of multiple images of people. The when implemented to carry out testing should give high-
quality of dataset has a major contribution towards final accuracy results.
efficiency of model as, in this phase, it feeds our model with
all possible scenarios it needs to carry out detections on while G. Testing phase
testing and later in real-time implementation. The image Once the model has reached desired accuracy in training phase,
dataset used in this implementation includes images of people model is saved and implemented on testing dataset to
wearing different types of masks, with multiple people in a determine how well the model performs on unseen data and
frame with and without masks. It also included people sitting images that would give an idea about its performance in real-
in different poses with masks on. Number of variations the time implementation. In this phase, a random fixed-size batch
model covers while training directly impacts the accuracy of of input testing data is created and provided to the model to
detection through model. carry out prediction based on saved weights from model
calculated while training. Multiple performance parameters are
B. Cleaning the Dataset
also calculated during this phase, which suggests
It is the process of ensuring data is correct, consistent, and improvements required if any.
usable. This step involves keeping only the meaningful and
required parts of the dataset that are relevant to the final H. Realtime implementation
prediction that the model needs to be trained on. This is critical Once fine-tuning of the model is done, the model is tested for
for the working of the model. real-time implementation, thereby carrying out predictions on
live videos and predicting if a person in the frame is masked or
C. Data Pre-Processing
not.
A major part of the processing is data labelling process, to get
dataset ready for training. I. Dataset Specifications

D. Data Labelling Dataset used:


Data labelling is an important tool in design and  Joseph Nelson Roboflow
implementation process. It helps to identify raw data such as  Prasoonkottarathil Kaggle
images, files etc. and adds meaningful labels to these data so it  Ashishjangra27 Kaggle
can provide better context for deep learning algorithm. In  Andrewmvd Kaggle
object detection, data labelling refers to the process of creating Dataset specifications:
a border that fully encloses a digital image, known as a  Training images: 1359
bounding box around object. Once the bounding box is drawn,  Test images: 151
its information, such as the center of box, height and width of  Total No. of images: 1510
box, and class of the object enclosed in box, is stored and
provided to model during training. The process of data J. Sample images from Dataset
labelling is carried out using open labelling tool. Tool provides
a GUI for labelling dataset easily, with multiple features such
as single keys for switching between classes of the object and
fast switching between different images, thereby aiding in
labelling process. The tool directly saves a text file that is the
YOLOv4 format, which is used for training. Another feature
provided open labelling tool to carry out automatic data
labelling is labelling using the frozen inference graph. Frozen
inference graph is a graph that has been taken from a pre-
trained model used for same object detection where it is used
for labelling. Graph uses saved weights from pre-trained model
it belongs to, to carry out the labelling of images, and does so
with the accuracy acquired by the pre-trained model. The
process saves a lot of time but does require a manual re-check Fig.5. A glimpse of dataset with mask
after its termination as the labelling might not be 100% Figure 5 shows a glimpse of the dataset, which consists of 2000
accurate, but it can be re-done easily using the open labelling images. Considered a diverse dataset with people from
tool's GUI features. different ethnicities, facial features, and demographics.
E. Splitting the data Set
The dataset is split into training and testing datasets, with 90%
of the data used for training and 10% used for testing.
F. Training phase
In this phase, the model is provided with training images along
with the labelling in YOLOv4 txt format. The training is done

978-1-6654-0052-7/22/$31.00 ©2022 IEEE 384

Authorized licensed use limited to: National University of Singapore. Downloaded on August 25,2022 at 16:06:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Artificial Intelligence and Smart Energy (ICAIS-2022)
IEEE Xplore Part Number: CFP22OAB-ART; ISBN: 978-1-6654-0052-7

Fig.6. A glimpse of dataset without mask


Figure 6 shows a glimpse of the dataset of people without
wearing a mask. This has images of people unmasked in Fig.8. Facemask detection with two people standing side by side
different directions and different orientations so that we can
train model accurately.
Figure 8 shows the result of facemask detection with two
K. Performance Parameters people standing side by side. Two people are detected with
(𝑇𝑃+𝑇𝑁) facemasks standing in a side-facing pose with a confidence
Accuracy = (1) score of 0.96 and 0.90. In this image, the background is clear
(𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁)
Precision =
(𝑇𝑃)
(2) and no objects are present in the background.
(𝑇𝑃+𝐹𝑃)
𝑇𝑃
Recall = (3)
(𝑇𝑃+𝐹𝑃)

Equations 1 to 3 show mathematical formulas used to evaluate


model. Precision, which is also known as positive predictive
value, is basically relevant instances upon the retrieved
instances. Recall, also known as sensitivity, is a fraction of
relevant instances that were retrieved. Accuracy is the ratio of
the addition of true positives and false positives divided by the
total number of predictions. The F1 score is given by 2 times
precision multiplied by recall, divided by precision plus recall.

V. SIMULATION RESULTS AND ANALSIS

This section elaborates on the obtained results and analysis of


Fig.9. Facemask detection with multiple people in a frame
performance metrics on unseen data.
A. Simulation results on with mask
Results were obtained after running the model on test images Figure 9. Shows the result of facemask detection with multiple
with masks from the dataset. The detections were made with people in a single image frame. Detection is carried out on
high accuracy, even when multiple people were sitting in multiple people in a single frame, facing in different directions,
different poses in an image. with a confidence score of 0.97, 0.93, 0.89, 0.87.

Fig.7. Facemask detection with multiple people in frame


Figure 7 shows the result of facemask detection with multiple
people in a single image. Facemask detection is done on people
who are visible in the frame with a bounding box prediction
confidence score of 0.87, 0.79, and 0.71. The image also
consists of blurred background. .
Fig.10 Facemask is detected on a marked person

978-1-6654-0052-7/22/$31.00 ©2022 IEEE 385

Authorized licensed use limited to: National University of Singapore. Downloaded on August 25,2022 at 16:06:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Artificial Intelligence and Smart Energy (ICAIS-2022)
IEEE Xplore Part Number: CFP22OAB-ART; ISBN: 978-1-6654-0052-7

Figures 11, 12, and 13 show the results of facemask detection


Figure.10. shows the result of facemask detection on a marked on unmasked people. A person is detected without a facemask
person. Here, a facemask is detected on a person wearing a with a bounding box accuracy of 0.99, 0.85, and 0.96.
different type of mask than in other example images with a
confidence score of 0.97.
B. Simulation results on without mask
Results were obtained after running the model on images
without masks from the dataset.

Fig.14. Facemask detection with multiple unmasked people in a single image


frame.
Figure 14 shows the result of facemask detection with multiple
unmasked people in a single image frame. Two people are
detected without facemasks, where the first person is detected
with a confidence score of 0.83 and the second person in the
background, being a little blurred, is detected with a confidence
Fig.11. Facemask detection on unmasked person (Sample output image one) of 0.58.
C. Performance Analysis

TABLE I PERFORMANCE PARAMETERS ANALYSIS

Images used for testing 151


Detections Count 1487
Confidence Threshold 0.25
True Positive 295
False Positive 131
False Negative 44
Average IOU 53.15%
Recall 0.87
Precision 0.69
F1 Score 0.77
Mean Average Precision 0.86

Table I shows the results obtained after running a model on test


Fig.12. Facemask detection on unmasked person (Sample output image two) images. Images used for testing were 151, where the detection
count for with and without facemask was 1487 detections.
Threshold set for detection is 0.25, which lets the detection
happen even with a confidence score of 0.25. It determines
average extent of overlap in all detections between the
bounding box generated by detector and ground truth bounding
box, which in our case came out to be 53.15%. Recall is the
fraction of relevant instances that were retrieved. Obtained
value of recall is 0.87. Precision refers to the fraction of
retrieved detections that are relevant to object detection.
Obtained value for precision is 0.69, F1 score is 0.77, and mean
average precision of the model is 0.86.

D. Real time implementation


Designed YOLOv4 facemask detection model was simulated
in real-time to validate the model. This section describes the
model results of real-time implementation with and without a
mask. Figures 15 and 16 shows sample results of facemask
detection on a real-time video of a masked person. Results
Fig.13. Facemask detection on unmasked person (Sample output image three)

978-1-6654-0052-7/22/$31.00 ©2022 IEEE 386

Authorized licensed use limited to: National University of Singapore. Downloaded on August 25,2022 at 16:06:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Artificial Intelligence and Smart Energy (ICAIS-2022)
IEEE Xplore Part Number: CFP22OAB-ART; ISBN: 978-1-6654-0052-7

show that the model was able to detect masked people correctly
with an accuracy of 0.95 and 0.99.

Fig.18. Facemask detection on a real time video of a unmasked person


Fig.15. Facemask detection on a real time video of a masked person(sample (sample output image two)
output image one)
Figures 17 and 18 shows the sample results of facemask
detection on a real-time video of an unmasked person. Results
show that the model was able to detect unmasked people
correctly with an accuracy of 0.98 and 0.92.

VI. CONCLUSION AND FUTURE WORK

COVID-19 pandemic has impacted the way people interact


with each other and has changed the social patterns that existed
before the pandemic. The mask mandate has been compulsory
since beginning of the pandemic. According to data published
by WHO and various researchers, COVID-19 transmission is
significantly higher in crowded places and among groups that
are not wearing face masks. Facemask detection is technology
that would be implemented in public places, malls,
workspaces, corporate offices, co-working spaces, and
stadiums. As the world is opening up again, businesses and
offices are re-opening, and wearing a mask has been made
compulsory and is proven to keep us safe against the virus. In
this paper, implemented a real-time facemask detection model
Fig.16. Facemask detection on a real time video of a masked person (sample using YOLOv4 algorithm, a state-of-the-art object detection
output image two) model for image datasets, video datasets, and real-time video.
Various model performance metrics are tabulated to draw
conclusions and validate model. Implemented our model to test
in three conditions: video form, image form, and real-time
application. Real-time implementations are the most useful
applications in future. Obtained an accuracy of 95% for real-
time detection model, and hence conclude that the model is
successful. Key players and industry leaders like NVIDIA, etc.,
are working on creating and developing new technology to
combat the spread of COVID-19. Researchers, corporations,
and public and private organizations can implement smart
solutions to protect their employees and ensure a safe working
environment. Development of facemask detection using
mobile phone technology can be made accessible on a larger
scale and in public places. Having access to this technology on
such a large scale can help make it popular and aid in
preventing the spread of the virus. There is also scope for
Fig.17. Facemask detection on a real time video of a unmasked person
integrating social distancing software to detect and prevent
(sample output image one) overcrowding at public places. Also huge scope for
integrating different algorithms, such as the GAN algorithm,
which can be used for face mask detection through speech

978-1-6654-0052-7/22/$31.00 ©2022 IEEE 387

Authorized licensed use limited to: National University of Singapore. Downloaded on August 25,2022 at 16:06:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Artificial Intelligence and Smart Energy (ICAIS-2022)
IEEE Xplore Part Number: CFP22OAB-ART; ISBN: 978-1-6654-0052-7

analysis and processing, by understanding the variation in


sound parameters while talking with a mask on. Industry and
innovators are working towards tech-based solutions to
overcome the pandemic and can integrate the proposed work
with research [13] [14] [15].

REFERENCES

[1] Sneha Sen et al.,“Face Mask Detection System for COVID_19 Pandemic
Precautions using Deep Learning Method”,International Journal of
Emerging Technologies and Innovative Research, Vol.7, Issue 10, pp.
16-21.
[2] https://www.researchgate.net/publication/347439579_Social_Distance_
Monitoring_and_Face_Mask_Detection_Using_Deep_Neural_Network
[3] I. B. Venkateswarlu et al.,“Face mask detection using MobileNet and
Global Pooling Block,” IEEE 4th Conference on Information &
Communication Technology (CICT), 2020, pp. 1-5.
[4] https://www.researchgate.net/publication/344239546_Comparative_Stu
dy_of_Deep_Learning_Methods_in_Detection_Face_Mask_Utilization
[5] G. Jignesh Chowdary et al.,“Face Mask Detection using Transfer
Learning of InceptionV3” arXiv:2009.08369,2020.
[6] https://www.researchgate.net/publication/344725412_Covid19_Face_M
ask_Detection_Using_TensorFlow_Keras_and_OpenCV.
[7] A. Biswas et al., “Classification of Objects in Video Records using
Neural Network Framework,” International Conference on Smart
Systems and Inventive Technology (ICSSIT), 2018, pp. 564-569.
[8] M. R. Nehashree et al., “Simulation and Performance Analysis of Feature
Extraction and Matching Algorithms for Image Processing
Applications,” International Conference on Intelligent Sustainable
Systems (ICISS), 2019, pp. 594-598.
[9] R. K. Meghana et al.,“Background-modelling techniques for foreground
detection and Tracking using Gaussian Mixture Model,” International
Conference on Computing Methodologies and Communication
(ICCMC), 2019, pp. 1129-1134.
[10] C. V. Krishna et al., “A Review of Artificial Intelligence Methods for
Data Science and Data Analytics: Applications and Research
Challenges,” International Conference on I-SMAC (IoT in Social,
Mobile, Analytics and Cloud) (I-SMAC) I-SMAC (IoT in Social, Mobile,
Analytics and Cloud) (I-SMAC), 2018, pp. 591-594.
[11] N. Jain et al., “Performance Analysis of Object Detection and Tracking
Algorithms for Traffic Surveillance Applications using Neural
Networks,” Third International conference on I-SMAC (IoT in Social,
Mobile, Analytics and Cloud) (I-SMAC), 2019, pp. 690-696.
[12] C. Kumar B et al., “Performance Analysis of Object Detection Algorithm
for Intelligent Traffic Surveillance System,” Second International
Conference on Inventive Research in Computing Applications (ICIRCA),
2020, pp. 573-579.
[13] M. Rohith et al., “Comparative Analysis of Edge Computing and Edge
Devices: Key Technology in IoT and Computer Vision Applications,”
International Conference on Recent Trends on Electronics, Information,
Communication & Technology (RTEICT), 2021, pp. 722-727.
[14] P. Pradhyumna et al.,“Graph Neural Network (GNN) in Image and Video
Understanding Using Deep Learning for Computer Vision Applications”
Second International Conference on Electronics and Sustainable
Communication Systems (ICESC),2021,pp. 1183-1189.
[15] E. Shreyas et al., “3D Object Detection and Tracking Methods using
Deep Learning for Computer Vision Applications,” International
Conference on Recent Trends on Electronics, Information,
Communication & Technology (RTEICT), 2021, pp. 735-738.
[16] Jacob, I. et al., “Design of Deep Learning Algorithm for IoT Application
by Image based Recognition.” Journal of ISMAC 3, no. 03 (2021): 276-
290.

978-1-6654-0052-7/22/$31.00 ©2022 IEEE 388

Authorized licensed use limited to: National University of Singapore. Downloaded on August 25,2022 at 16:06:25 UTC from IEEE Xplore. Restrictions apply.

You might also like