You are on page 1of 25

CALIFORNIA STATE UNIVERSITY SAN MARCOS

PROJECT SIGNATURE PAGE

PROJECT SUBMITTED IN PARTIAL FULFILLMENT


OF THE REQUIREMENTS FOR THE DEGREE

MASTER OF SCIENCE

IN

COMPUTER SCIENCE

PROJECT TITLE: Face Mask Detection using YOLOv5 for COVID-19

AUTHOR: Vinay Sharma

DATE OF SUCCESSFUL DEFENSE: 11/24/2020

THE PROJECT HAS BEEN ACCEPTED BY THE PROJECT COMMITTEE IN


PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF
SCIENCE IN COMPUTER SCIENCE.

Dr. Xin Ye Nov 26, 2020


PROJECT COMMITTEE CHAIR SIGNATURE DATE

Dr. Ahmad R Hadaegh Nov 25, 2020


PROJECT COMMITTEE MEMBER SIGNATURE DATE

Name of Committee Member


PROJECT COMMITTEE MEMBER SIGNATURE DATE
Face Mask Detection using YOLOv5 for COVID-19

In affiliation, with
California State University, San Marcos

In partial fulfillment of the Requirements for the Degree of


Master of Computer Science
By

Vinay Sharma
November 24, 2020

1
ACKNOWLEDGEMENT

I express my deep sense of gratitude to my advisor Dr. Xin Ye and my committee member and
program coordinator Dr. Ahmad Hadaegh for their continue support towards my project. Their
guidance and motivation helped me throughout my project and leads me with a successfully
implemented project as I planned. I thank both of my professor for being with me and help me in
my journey of Master of Computer Science at California State University – San Marcos.

2
Table of Contents

ACKNOWLEDGEMENT .......................................................................................................... 2
LIST OF FIGUERS ..................................................................................................................... 4
ABSTRACT ................................................................................................................................ 5
1. INTRODUCTION ................................................................................................................... 5
1.1 SCOPE AND OBJECTIVES............................................................................................. 5
1.2 INTRODUCTION TO SYSTEM ...................................................................................... 5
2. LITERATURE SURVEY ....................................................................................................... 7
2.1 Image processing ............................................................................................................... 7
2.2 TensorFlow ........................................................................................................................ 9
2.3 Object Detection .............................................................................................................. 11
2.4 Face Mask Recognition ................................................................................................... 13
3. METHODOLOGY AND RESULTS .................................................................................... 14
3.1 Libraries used .................................................................................................................. 14
3.2 Data collection ................................................................................................................. 15
3.3 Model Development ........................................................................................................ 16
3.5 Model Training ................................................................................................................ 17
3.6 Testing the Model ............................................................................................................ 18
3.7 Result and Analysis ......................................................................................................... 18
3.8 Recommendations ........................................................................................................... 20
4. CONCLUSION AND FUTURE ENHANCEMENTS ......................................................... 21
REFERENCE ............................................................................................................................ 22

3
LIST OF FIGUERS

Figure 1: TensorFlow sample graph ..............................................................................................17


Figure 2: Deducted Results ........................................................................................................... 18
Figure 3:Precision of the developed model .................................................................................. 19
Figure 4:Recall graph of the developed model ............................................................................. 19
Figure 5: mAp graph of the developed model .............................................................................. 20

4
ABSTRACT

COVID-19 is a big threat to human mankind. The whole world is now struggling to
reduce the spread of COVID-19 virus. Wearing masks is a good practice that helps to control the
COVID-19 effectively. From the results of China and South Korea that is clear wearing,
facemask reduces the virus spread. Now they backed to normal life. But ensuring all peoples
wearing facemask is not an easy thing. This paper attempts to develop a simple and effective
model for real-time monitoring. The proposed model successfully recognize if an individual is
wearing a face mask or not.

1. INTRODUCTION

1.1 SCOPE AND OBJECTIVES

The main aim of this project is to provide service to patient in terms of saving their time
and find the nearest hospital. The patient will get more time to know about their diseases by
submitting the symptoms they are facing. And based on the disease they will be treated in the
hospitals nearby.

1.2 INTRODUCTION TO SYSTEM

The coronavirus pandemic is responsible for producing an atmosphere of terror as this disease
can transmit through the respiratory system. This virus has killed more than a million people
around the globe, and it is expected to kill close to 400,000 people by February 1st, 2021 is US.
Currently, there is not any specific single medicine or vaccine in hand to fight against this virus.
Therefore, the only option left is to take the utmost care from our side to stay away from the
disease. For example, you can maintain the social distancing, wash your hand regularly, and
wear a mask. To take part in the protection against the pandemic, my aim is to design a Face
Mask Detection program using the famous Deep Learning technique. This technique is useful to
find out who is not wearing the facial mask and not deploying the trained model. The WHO
report points out that there are two ways of coronavirus spread i.e. the respiratory droplets and
any type of physical contact.

The droplets are produced through the respiratory system in case an infected person is coughing
or sneezing. If a human is present closer than 4 feet, there are high chances he can inhale these
infection-causing droplets. These droplets can stick on those surfaces where the virus can live for
days. This way the infected person’s surroundings can become a big reason for virus spread. To
prevent the virus from the spread, medical masks are the best bet. In the research, medical masks
mean surgical as well as the procedure masks and maybe look like a cup shape or folded. These
masks can be attached to the head with cords. They are examined well to control the filtration,

5
easy breathing, and some time for water resistivity. The research examines the collection of
video as well as the images to find out those persons who are wearing those medical masks that
are according to the govt. guidelines. This way, it can greatly assist the govt. to do action against
those people who are not wearing the right type of masks.

Using the mask in public has been a normal thing in countries like China and other Asian
countries with the very start of the pandemic. Currently, the USA is under the grip and severe
pandemic outbreak and cases are increasing day by day along with the confirmed deaths. The
CDC (Centers for Disease Control and Prevention) has cautioned the people too must wear
protective equipment like the masks. The studies have revealed that many people, particularly
the young ones who have the virus but without any symptoms and can spread the virus to many
other persons unknowingly, the same is true for those people who eventually develop the
symptoms, but spread the disease before being tested positive. Seeing this, CDC has issue
advisory to wear masks in a public gathering where the social distancing is impossible to spread
the community-based virus spread. This advisory of the CDC is backed with various studies
including the one published in the New England Journal of Medicine which show.

Wearing a mask while going outdoors during pandemic has been a great helping hand in
controlling the spread of coronavirus. It is a symbol of being a sensible citizen of a country.
Several countries like China and Korea successfully controlled the Covid-19 in a short time due
to the habit of using masks regularly. It was recommended to use masks, no matter what type of
mask is available to you, just use it and become safe. The mask acts as a physical barrio to
prevent the entry of the virus. Many individuals unknowingly infected many other people with
the Covid-19. Mask is necessary because of two benefits. It does not let the virus enter your
mouth or nose directly from the infected person’s sneeze or cough. The irresponsibility of a few
people has to lead to the death of many others due to the spread of the virus. Secondly, if you
touch a virus-contaminated surface and then your mouth or nose, the mask will stop the
transmission of the virus. Many governments made it compulsory to use masks and people were
compelled and monitor to act upon the order. The paper understudy will make deep learning
about using masks to prevent the spread of deadly coronavirus. It will also learn the facemask
detection by deep learning strategy. This is a useful technique of learning being used these days.

Deep learning is a subfield of machine learning. We study a hierarchy of features and functions
of the subject under study with the help of input data. The researchers are found to use this
technique intensively for their research work. They used it in their research related to image
classification, speech recognition, signal processing, and natural language processing. Deep
learning is similar to machine learning. It builds a hierarchy of features from top to bottom. The
systematic and arranged features are easy to understand and explain to others as well. This
method of learning can learn the features at any level stigmatically. The help of human-made
techniques is not required. The best thing about deep learning is that all the models have deep
architectures. Being the best opposite of shallow architecture which has few hidden layers, deep
architecture has several layers. The Regression, Classification, dimensionality reduction,
6
modelling motion, modelling textures, information retrieval, natural language processing,
robotics, fault diagnosis, and road crack detection are some important fields that are found
deriving benefit from deep architecture techniques.

2. LITERATURE SURVEY

2.1 Image processing

The traffic lights and street signs are also discussed by the author. These lights and signs have a
different appearance in different situations. The impression of street signs is affected by the
halfway-impediments, climate, and changes to brightness. The variety of signboards with the
best possible appearance should be provided. Many image processing algorithms and neural
networks are accompanied together to increase human efficiency while performing different
tasks. Machine learning systems cannot manage to deal with the images of different types and
sizes and point extent from the dataset. To scale the images to a settled size is the original
standard approach. The deep learning technique can be much helpful in this concern.

This can resolve issues when the viewpoint measure is particularly between the first and the
objective sizes. Moreover, it discards information in larger pictures or presents artefacts allowing
amazingly small images intentionally enhanced. Individuals are decidedly ready to see evidence
signs in different sizes, paying limited attention to the likelihood seen from sharp edges. In
particular, they are now going after a benchmark dataset for the exposure of development signs
in full camera pictures.

The author presented an encoded video grouping before the decoding. The methodology
damaged the information contained in the DCT coefficients of MPEG or JPEG encoded video
groupings. The system has been attempted viably on various video groupings, joint meetings,
presentations, one on one session, and others.

The author has introduced a paper on picture recovery utilizing Meta highlight design. Since the
range of picture collections increases rapidly, for example, individual or probably group photos,
remedial pictures, etc., productive organization of these image aggregations has transformed into
a primary exploration issue in picture revival. Explicitly picture revival techniques have been
successfully made with a particular ultimate objective to fulfil mechanical solicitation, for
example, to deal with large scale picture collections. What's more, a powerful picture
recuperation system is gifted of effectively requesting picture information bases to recoup
pictures with high or appealing precision just as a survey. By the day's end, given an inquiry, the

7
purpose of picture revival structure is to recoup numerous similar (or appropriate) pictures that
are permitted.

The author proposed a procedure to develop a non-prominent method for the evident verification
using appealing overpowering pictures. The system incorporates pre-handling, picture division,
highlight extraction, separation, and constructor clarifiers on precision, proficiency, and sneaked
past time. The outcomes uncovered by the framework are strong and exact in, spent less time in
perversion.

Author inspected a multi-strategic, administrator, and AI system that uses data about pictures
dealing with treads and their necessities to assemble executable picture planning in contents to
support strange state science requests. This article depicts an overall AI composing approach to
manage computerization and use the best method to manage a specific locale of picture planning
for planetary science applications, radiometric remedy, and shading triplets.

The author examined how pictures rely upon ordinary pictures' collection and successfully works
for advanced achievements using significant learning frameworks. Then again, they examine
how picture request systems of distortions in the present times are generally left hidden because
considering that poor quality pictures contain very fragile information about the items and the
grouped arranging launches. They moreover made use of GPU to stimulate both pictures
planning to help acknowledgement of disfigurements and AI. They proposed the mix of
significant neural frameworks with unpredictable woodlands classifiers for picture portrayal of
film distortions, which performed better than using both of the two frameworks alone.

They used unbalanced woodland for classifier instead of using neural systems. This provided
more precision of at least 97.1%. Overall achieved accuracy was larger as compared to other
classifiers.

For various types of deformed pictures, the same blending technique can be applied. Due to the
different properties of a picture, it wasn’t easy to position some deformity pictures in a definite
arrangement.

They tried to increase the overall arrangement precision by applying different methods.
Following three main new ideas were applied to reach greater accuracy.

• Extension of Picture Information


• Modification of Neural system design and
• Change in Layer parameters

The author suggested a prototype i.e. a model representing a connection among modality values
for two images. He gave an analysis based on a combined co-sparsity system. The overall

8
functionality of the co-sparsity arrangement was decreased to have attached analysis operatives.
This was achieved with the help of a more complex method via conjugate inclination technique.

The main feature of the proposed model was observed under two different submissions. In the
first submission, the main goal was to balance the inverse issues of an image. However, in the
second application, it was used to solve the problem of bi-modal image processing and
registering. This was done by using a different algorithm. Algorism consisted of bi-modal
operators’ pairs for the sake of registering intensity, penetrating capacity, and NIR images.

The author has proposed a system based on smart machine learning. According to this proposal,
object learning can be done without any dependency on the actual environment. At the start, this
work was inspired by the early stages of human graphic outlines. It includes an algorithm that
was designed mainly for the easy and prominent recognition of objects. Due to that, they had
benefits with photometric invariants.

This specifically designed procedure has relatively low glitches. So, it can be handled more
efficiently in real-time. Today, our machine vision structures are using the same algorithm.

In overall design, the salient object recognition was used to create the second part of the total
framework. This second part is the machine learning-based recognition and detection unit.

2.2 TensorFlow

TensorFlow is a great resource for machine learning. It is based on an opensource library


platform and has all in one plan offering machine learning, models for training and efficient
algorithms that aim to assist following;

• Google brain TensorFlow


• Processing data
• Models testing
• Getting precise results
• Refining the outcomes

This platform is designed to help developers in every way possible. It can generate a graph
consisting of nodes. Each node in the graph is the representation of a mathematical function and
every connection signifies data. Now, developers don’t have to worry about the minor details and
glitches. Instead, they can remain focus on the overall logic and functionality of the full
application.

9
We can say that this library is the most popular among software libraries. The research team at
google specialized for AI in deep leaning, named as Google Brain, established this platform in
2015. It was developed using python for front-end to run in an enhanced C++ structure. Their
main purpose was to use TensorFlow for internal usage of google itself.

TensorFlow is playing major roles in different categories including various applications that are
text-based, image recognition technology, image captioning software, and many others. It’s
because it is based on end-to-end open-source different libraries that provide deep machine
learning. This platform also has many community resources, useful tools, and comprehensive
deploying techniques.

If you know Apple’s Siri, TensorFlow is the main source behind its voice reignition. That is its
only example. There are millions of applications. Think about the apps that you installed from
google. Each one is made using TensorFlow. What is A tensor? Consider tensor as a matrix but
more like a matrix having n-dimensions. Each tensor consists of values representing data types.
These values have the same data types and display a certain shape. Through this shape, we get to
know the overall dimensionality of our vector.

Each tensor has the following three main features

1. A unique name referred to a label


2. A specific dimension or a shape
3. The data type

A tensor with only one dimension is called a vector. Similarly, tensor with two dimensions is
called a dimensional tensor. If a tensor has zero dimension, it is referred to as scalar-tensor.
Nodes are the ones who execute the computing of numerical values. Tensor’s edge is responsible
for the relation of one input and output to another. So, we know that tensors are the vectors
having n-dimensionality and are responsible for taking input. This input then undergoes through
various computing operations and based on that gives an output. Sometimes shapes of tensors are
unknown and graphs can be built to have all those operations for an output.

As mentioned earlier, tensor flow is a big open library source mainly responsible for machine
learning. It majorly uses python for all the front-end frameworks and then runs those frameworks
built applications in optimized C++.

TensorFlow is not limited and expands its application to a huge network of technologies
including embedding words, various repetitive neural systems, machine translations, and faster
language processing. The most wonderful feature of TensorFlow is that it offers support for
production estimates at measure while using the same training models. It is proving a big

10
opportunity for developers as they can develop different efficient data flows, visual graphs
representing how data will process via a graph and sequence of computing nodes.

Python made it easy for developers to learn by using python. As python is the top one and most
easy programming language. So, it is comparatively easy to understand how different complex
concepts can be put together. The tensors and the nodes are referred to as objects in Python. So,
every TensorFlow application is a python-based application.

Though, computing of numerical values isn’t based on python. These mathematics values are
represented in high-level C++ binaries. So, In short, all the transformation is done in C++ while
python keeps every function connected and is responsible for hooking complex notions together.

Now, applications build via TensorFlow can perform on any GPU, compatible machine, CPU, an
android device, and on iOS. The application can also run on any cloud system. If you are running
an application on googles cloud, TPU will accelerate the whole functionality. Final prototypes
can be deployed on any device as mentioned above.

TensorFlow version 2.0 came out in October 2019. This version has more easy frameworks and
was designed keeping in mind the user’s reviews. This version also supported TensorFlow Lite,
which provided more opportunities to perform on various platforms.

Though, if you wrote code in the previous version, you’d have to rewrite all the code to male use
of 2.0 TensorFlow. Sometimes, it would require you to change the code only slightly and
sometimes the complete. Abstraction for the development of machine learning is the super
advantage of using TensorFlow. It has become easy for the developer to mainly focus on the
main logic of the application and not worry about the minor details of output-input information
handling. As TensorFlow, take responsibility for all these minor actions behind the scenes. It
means it is still the same in the core.

Developers can also build a single independent graph for each operation and can modify them
separately. They don’t have to put all data in one graph and process them all together. The
visualization suite of the Tensor Board provides an efficient and interactive dashboard. This
dashboard is web-based and lets the developers inspect the graphs. TensorFlow also backup a
marketable outfit of A-list on google. Google made a huge development and also created viable
offers around the platform of TensorFlow, which made deployment easy. For example, TPU
silicon for faster speed while using googles cloud.

2.3 Object Detection

A process that consists of detecting or recognizing instances of objects belonging to the various
class in a video or an image is known as object detection. A class can be any type including
humans, animals, etc. Frameworks based on object detection includes the development of

11
different windows. These candidate windows are classified depending on Convolutional neural
network (CNN) features. Take an example. Consider a method to employ careful search to
originate object proposals. This method produces CNN feature for each object proposal. Now, it
feeds the CNN feature to classifier SVM.

Though, there is still another huge number of processes that are working to improve CNN's
featured region's performance. Some of those methods are successful enough to reach maximum
accuracy. However, they are still not unable to detect the most accurate position in object
detection. These methods mostly follow a technique of attached object detection i.e. a
segmentation style approach. Most works on object detection for deep learning have different
CNN variations.

Other deep models have been minimally used for object detection. For instance, the locating
technique for a coarse object which utilizes saliency mechanisms and a Deep Belief Network
(DBN) to identify objects in remote sensing images; introduces a recently developed DBN to aid
in recognition of 3D objects, whereby, the model, which is top-notch is a third-order Boltzmann
machine which is a hybrid algorithm trained. This algorithm integrates the discriminative and
generative gradients; utilizes a fused method of deep learning, as it examines a deep model’s
representation abilities in a semi-supervised model. Lastly, employs layer piled autoencoders to
detect several organs in images from the medical field. At the same time, it uses saliency-guided
layer piled autoencoders to detect video-based objects.

One of the most popular computer-based vision applications that have spiked up an interest in
business today, is face recognition. Several face recognition programs that use manually
developed features have been brought forward; in these instances, the feature extractor draws out
the features from a well-positioned face and it, develops an image that is of a lower dimension
and the classifier generates predictions. A significant change in facial recognition has been
developed by CNNs based on feature learning as well as transformation invariance components.
Recently, VGG (Very Deep Convolutional Networks for Large-Scale Image Recognition) Face
Descriptor and light CNNs are the most advanced and have the most recognition. A
Convolutional DBN performed significantly in face verification, in this study.

CNN's have been used by both Facebook’s DeepFace [29] and Google’s FaceNet [28]. DeepFace
designs faces in 3D that position the face appearing as a frontal face. The face is then presented
to a filter that is of a single convolution-pooling-convolution kind, then follows three locally
linked layers and two completely linked layers that are utilized to come up with the last
projection. Despite DeepFace having great functionality recognition, interpretation of the
representations is difficult. This is due to no clustering of faces belonging to the same person at
the stage of training. In contrast, FaceNet enables clustering of representations from one person
in the training process due to a triplet loss function upon the image. Additionally, CNN's are at

12
the heart of OpenFace. OpenFace is a face recognition instrument that is of an open-source, that
is of better accuracy, (although the accuracy is a bit lower), it is open-source and sufficient for
mobile computing, having a speedy performance time and being of a lower size.

2.4 Face Mask Recognition

Taking maximum advantage of our webcam, the writer used OpenCV (Open Source Computer
Vision Library) to perform face detection in real-time from a live stream. It is common
knowledge that videos are comprised of frames that are still images. From a video, face
detection was carried out in every frame. There is no significant difference between face
detections in still images and video streams of real-time. For face detection, we will employ the
YOLOv5 algorithm. This is an essential machine learning algorithm for the detection of objects.
It is purposed to distinguish objects from a video or image. Having a trained model in our hands,
it is now possible to enhance the first section’s code, for it to detect faces and identify if a subject
is wearing a mask or not.

The Mask detector model requires images of faces to work. Here we will identify the frames
with faces using the various ways illustrated in the first segment, then pass them to the model
after preprocessing them. But first, let us get the libraries necessary.

Because the faces variable has the height and width of the rectangle, the top-left corner
coordinates enclosing the faces, that can be utilized to produce a face frame then preprocessing it
so it can be inserted into the model to predict it. The procedure for preprocessing is the same as
that in the second segment used when training the model. What follows is drawing a rectangle on
top of the face and adding a label as per the predictions. This now concludes our paper. We have
learned how to develop a model that can detect masked faces as well as how to identify faces in
real-time. By this model, we can adjust the face detector to mask detector.

In object detection, face detection is a crucial task. Detection is the first part of identity
authentication and pattern recognition. Deep learning-based algorithms for object detection have
evolved in the last years at a high rate. The algorithms can be subdivided into two general parts.
These are one-stage detectors such as YOLO and two-stage detector such as Faster R-CNN
(Faster Region Based Convolutional Neural Networks). Though YOLO and its varieties aren’t as
good in terms of accuracy as the two-stage detectors, they outmarch their correspondents in
speed by a large margin. When facing standard sized objects, YOLO does well but it is unable to
detect small objects. When dealing with objects with faces that seem to have large scale
changing properties, the accuracy reduces significantly. We propose a face detector called
YOLO-face. It is based on ultralytics open-source object-detection method YOLOv5. It can deal
with the difficulty of detection of varying face scales, hence improving the performance of face

13
detection. The current technique involves the use of anchor boxes that are more suitable for face
detection and a more accurate regression loss function. The enhanced detector improved
accuracy significantly while maintaining a fast detection speed.

3. METHODOLOGY AND RESULTS

This section of the paper gives a brief overview of the methodology used in this paper. Here the
developed methodology includes four major stages. The first stage is data collection, the second
stage is model development, the third stage is model training, and the last stage is testing the
developed model. Along with the methodology, a brief description of the different libraries and
tools used in this project are explained. Among the used tools and technologies most important
processes are discussed in this section. And they are NumPy, TensorFlow, and Open CV.

3.1 Libraries used

3.1.1 NumPy

NumPy consists of matrix and multi-dimensional array data formats. Mathematical operations on
arrays like statistical, algebraic and trigonometric routines can be performed by the help of
NumPy. It offers a highly-functioning multidimensional array as well as the essential instruments
for computing with and controlling the arrays. SciPy is developed on this. It offers greater
performance which functions on NumPy arrays and is essential for various engineering and
scientific applications. In the stages of pre-processing, an image is enhanced to 224×224 pixels.
It is then changed into a NumPy array style. Afterwards, precise labels are included in the dataset
images.

3.1.2 TensorFlow

The TensorFlow is used to create a fast numerical calculation and released by Google. This is the
latest python library. Create deep learning models directly, use this foundation library. Also, the
wrapper libraries which help to create top of TensorFlow. This is one type of math library and
machine learning applications like neural networks.

14
There are different types of deep learning models available and install TensorFlow using pip
command. TensorFlow helps in Data augmentation before it begins model training. It is also
used to make algorithms prediction efficiency perfect, then download pre-trained image net
weights.

Using web camera in the PC, this TensorFlow identify easily whether wearing a mask or not. It
is also applicable in a mobile phone camera to identify that.

TensorFlow advantages

• Data augmentation.
• Load the classifier.
• Make a completely new connected head
• Pre-processing
• Load all the image data

3.1.3 OpenCV

OpenCV also known as Open Source Computer Vision library is used in the computer vision and
deep learning programs. It provides us different libraries which can be used in the object
detection, face mask detection using computer vision and deep learning algorithms. This library
is used in the detection process in this model.

3.2 Data collection

Data collection is the first step in the face mask identification model development project.
Accuracy of the training data impacts on the final overall accuracy of the model. In our case, we
have to train the model to find if the person is wearing facemask or not. So, we have downloaded
a large volume of images of peoples who wear facemask and people who don't wear the
facemasks and one who did not properly wear the facemask. System should be able to classify
whether a person is wearing a facemask or not.

15
We can't directly feed the training images. For that, we need to label the images initially. It is one
of the important process involved in data collection. In this project, we have used a tool called
Labelling. This tool allows the users to create the labels on the images and saves the data for the
training process. We can save the labelled images in terms of the XML format by using this tool.

3.3 Model Development

TensorFlow is a well-known and commonly used open-source library developed by the Google
rank brain team. It is one of the best image processing library. This tool is used in this project for
developing the model. In our tasks, this algorithm makes the entire process simpler and easier to
implement.

16
Figure 1 TensorFlow sample graph

The scalability of this tool is the major reason behind the selection of this tool for data
processing. The model creation process started with the installation of TensorFlow in python.
Here the TensorFlow python API has been used. Also, additional libraries are installed in the
system. Data flow graphs are the important elements of the TensorFlow. Here the data flow is
represented by the graphs. In the graph, each node represents an instance of mathematical
functions. Each edge is represented as a tensor. Generally, tensor is a multidimensional dataset.
Here all the operations are performed on the tensor. In this project, TensorFlow is used for object
detection process.
YOLO (You Only Look Once) is a family of models that are used for real-time object detection
application. In our application, this model is used to identify the mask from the video. In this
project, YOLOv5, as well as YOLOv5x, are used to compare the performance.

3.5 Model Training

After setup the model the next step is to train the model. The training process is a time and
resource-consuming process in deep learning. Because the overall result of the process mainly
depends on the quality of the training process. In this project, we already developed the training
dataset which contains the information like persons who wear the facemasks correctly and
persons who don’t wear the facemask correctly as well as the persons who wear the facemask
partially. All the data are labelled with the help of another software tool.

17
3.6 Testing the Model

Testing the developed model is the final part of the process. In this stage, the developed model
has been tested using the test dataset. The system processes the data test dataset similar to the
training dataset. And then the system calculates the coefficient value and compare this value with
the trained value. Based on that, the system classifies the object (facemask). Here we have
trained the model for finding the person with and without a facemask. For this process, Open CV
is used. This tool allows the model to load the images for the testing process.

3.7 Result and Analysis

In this section of this paper, the results of the developed model will be discussed. The developed
model works fairly on the artificially developed test data. The model accurately classified the
persons who wear a mask as well as persons who are not wearing masks. Look at the below-
given figure. In the below given there are three persons are there. Among them, two are not worn
facemasks. One lady wore the facemask. The developed model has been classified people who
worn mask and without mask accurately.

Figure 2 Deducted Results – without_mask & with_mask

18
Here look at the statistics of the results. First, considering the precision of the developed model.
Precision is a ratio between the number of positive results and the number of positive results
predicted by the classifier. From the precision graph it can be said that model is performing well
as the precision value increase overtime in the training process.

Figure 3 Precision of the developed model

Figure 4 Recall graph of the developed model

19
The above-given graph shows the recall value of the developed model. Here the recall is the ratio
between the correct positive results and all samples founded to be positive.

Figure 5 mAP(Mean Average Precision) graph for the developed model

The model has been tested using YOLOv5s as well as YOLOv5x. Training process has been
completed by running the model through Google Colab (Virtual machine developed by Google
which provides high RAM and GPU memory for executing machine learning programs). Using
Google Colab was a necessity for me as my personal laptop is very old and does not have enough
RAM and GPU memory to execute this model. YOLOv5s is quite better than YOLOv5x in terms
of performance and speed. The mAP is quite similar in both techniques. If the processing speed
is considered the YOLOv5s is a bit superior to YOLOv5x.

3.8 Recommendations

This section of the report discusses the recommendations for improving the understanding of the
deep learning projects. I have developed these recommendations purely based on the knowledge
gathered during this project. In deep learning project, the input only matters. Higher the amount
of training data higher the accuracy of the results. Providing all the different possible variations
in the training dataset provides better trained model. Selection of algorithm also plays a crucial
role in the accuracy and time required for processing.

20
4. CONCLUSION AND FUTURE ENHANCEMENTS

I have developed the system which can monitor the area through the real-time camera, without
any additional devices. The proposed system is a simple real-time video analyzer. It has the
potential to check whether the people wear masks or not. It can be installed in any supermarkets
and public places. This helps us to defeat the widespread of COVID-19 virus. Because wearing
masks reduces the community spread of COVID-19 virus. We can use this for many other
options like checking and verifying all the customers have wearing facemask. The system
thoroughly checks the persons who enter through the main gate. We can process the video
recorded and find whether the person is wearing a facemask or not. If the person wears his/her
facemask, the door will open; otherwise it may say some error command like "please wear your
facemask".
The developed model uses the YOLOv5 and TensorFlow technologies for processing the images
and real-time videos. From the results it can be said that the developed model is able to detect
whether an individual is wearing a facemask or not. The model quickly learns the parameters. It
collects the video from the camera, process the video, identifies the objects, and finds is a person
wears mask or not.
This system has some limitations. For example, sometimes it detects accurately if a person has
worn the mask or not only if the person is directly facing the camera. For example, it is quite
useful in supermarkets, and airports. One open problem is to improve this system to detect the
faces of the people who are not directly facing the camera.

21
REFERENCE

[1] A. Patel, D. R. Kasat, S. Jain and V. M. Thakare, "Performance Analysis of Various Feature
Detector and Descriptor for Real-Time Video based Face Tracking", International Journal
of Computer Applications, vol. 93, no. 1, pp. 37-41, 2014. Available: 10.5120/16183-5415.
[2] A. Mikolajczyk and M. Grochowski, "Data augmentation for improving deep learning in
image classification problem", 2018 International Interdisciplinary PhD Workshop
(IIPhDW), 2018. Available: 10.1109/iiphdw.2018.8388338 [Accessed 28 August 2020].
[3] A. Seghouane, N. Shokouhi and I. Koch, "Sparse Principal Component Analysis With
Preserved Sparsity Pattern", IEEE Transactions on Image Processing, vol. 28, no. 7, pp.
3274-3285, 2019. Available: 10.1109/tip.2019.2895464.
[4] B. Gupta, A. Chaube, A. Negi and U. Goel, "Study on Object Detection using Open CV -
Python", International Journal of Computer Applications, vol. 162, no. 8, pp. 17-21, 2017.
Available: 10.5120/ijca2017913391.
[5] C. Gershenson and D. Rosenblueth, "Self-organizing traffic lights at multiple-street
intersections", Complexity, vol. 17, no. 4, pp. 23-39, 2011. Available: 10.1002/cplx.20392.
[6] C. Popa, "Extended and constrained diagonal weighting algorithm with application to
inverse problems in image reconstruction", Inverse Problems, vol. 26, no. 6, p. 065004,
2010. Available: 10.1088/0266-5611/26/6/065004.
[7] "Face Mask Detection", Kaggle.com, 2020. [Online]. Available:
https://www.kaggle.com/andrewmvd/face-mask-detection. [Accessed: 05- Sep- 2020].
[8] G. Mangmang, "Face Mask Usage Detection Using Inception Network", Journal of
Advanced Research in Dynamical and Control Systems, vol. 12, no. 7, pp. 1660-1667,
2020. Available: 10.5373/jardcs/v12sp7/20202272.
[9] I. Riadi and A. Wirawan, "Network Packet Classification using Neural Network based on
Training Function and Hidden Layer Neuron Number Variation", International Journal of
Advanced Computer Science and Applications, vol. 8, no. 6, 2017. Available:
10.14569/ijacsa.2017.080631.
[10] J. Bamber and T. Christmas, "Covid-19: Each discarded face mask is a potential
biohazard", BMJ, p. m2012, 2020. Available: 10.1136/bmj.m2012.
[11] J. Cao, C. Song, S. Peng, F. Xiao and S. Song, "Improved Traffic Sign Detection and
Recognition Algorithm for Intelligent Vehicles", Sensors, vol. 19, no. 18, p. 4021, 2019.
Available: 10.3390/s19184021 [Accessed 28 August 2020].
[12] J. Johnson and T. Khoshgoftaar, "Survey on deep learning with class imbalance", Journal of
Big Data, vol. 6, no. 1, 2019. Available: 10.1186/s40537-019-0192-5 [Accessed 28 August
2020].

22
[13] J. Zhu, W. Zheng, J. Lai and S. Li, "Matching NIR Face to VIS Face Using
Transduction", IEEE Transactions on Information Forensics and Security, vol. 9, no. 3, pp.
501-514, 2014. Available: 10.1109/tifs.2014.2299977 [Accessed 28 August 2020].
[14] L. Rampasek and A. Goldenberg, "TensorFlow: Biology’s Gateway to Deep
Learning?", Cell Systems, vol. 2, no. 1, pp. 12-14, 2016. Available:
10.1016/j.cels.2016.01.009.
[15] M. BANSAL, "FACE RECOGNITION IMPLEMENTATION ON RASPBERRYPI
USING OPENCV AND PYTHON", INTERNATIONAL JOURNAL OF COMPUTER
ENGINEERING & TECHNOLOGY, vol. 10, no. 3, 2019. Available:
10.34218/ijcet.10.3.2019.016.
[16] M. Chanu, "A Deep Learning Approach for Object Detection and Instance Segmentation
using Mask RCNN", Journal of Advanced Research in Dynamical and Control Systems,
vol. 12, no. 3, pp. 95-104, 2020. Available: 10.5373/jardcs/v12sp3/20201242.
[17] M. Inamdar and N. Mehendale, "Real-Time Face Mask Identification Using Facemasknet
Deep Learning Network", SSRN Electronic Journal, 2020. Available:
10.2139/ssrn.3663305.
[18] M. Kiechle, T. Habigt, S. Hawe and M. Kleinsteuber, "A Bimodal Co-sparse Analysis
Model for Image Processing", International Journal of Computer Vision, vol. 114, no. 2-3,
pp. 233-247, 2014. Available: 10.1007/s11263-014-0786-5 [Accessed 28 August 2020].
[19] M. Loey, G. Manogaran, M. Taha and N. Khalifa, "A hybrid deep transfer learning model
with machine learning methods for face mask detection in the era of the COVID-19
pandemic", Measurement, vol. 167, p. 108288, 2020. Available:
10.1016/j.measurement.2020.108288.
[20] Ming-Hsuan Yang, D. Kriegman and N. Ahuja, "Detecting faces in images: a
survey", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 1,
pp. 34-58, 2002. Available: 10.1109/34.982883 [Accessed 28 August 2020].
[21] P. Jhinkwan, V. Ingale and S. Chaturvedi, "Object Detection Using Convolution Neural
Networks", SSRN Electronic Journal, 2019. Available: 10.2139/ssrn.3422311.
[22] R. Cárdenas, C. Beltrán and J. Gutiérrez, "Small Face Detection Using Deep Learning on
Surveillance Videos", International Journal of Machine Learning and Computing, vol. 9,
no. 2, pp. 189-194, 2019. Available: 10.18178/ijmlc.2019.9.2.785.
[23] "Real-time Object Detection and Recognition Using Deep Learning with YOLO Algorithm
for Visually Impaired People", Journal of Xidian University, vol. 14, no. 4, 2020. Available:
10.37896/jxu14.4/261.
[24] S. Khan, A. Akram and N. Usman, "Real Time Automatic Attendance System for Face
Recognition Using Face API and OpenCV", Wireless Personal Communications, vol. 113,
no. 1, pp. 469-480, 2020. Available: 10.1007/s11277-020-07224-2.

23
[25] S. Khan and Z. Farooqui, "Face Recognition in Cross-spectral Environment using Deep
Learning", International Journal of Computer Applications, vol. 177, no. 19, pp. 21-25,
2019. Available: 10.5120/ijca2019919626.
[26] S. Sumit, J. Watada, A. Roy and D. Rambli, "In object detection deep learning methods,
YOLO shows supremum to Mask R-CNN", Journal of Physics: Conference Series, vol.
1529, p. 042086, 2020. Available: 10.1088/1742-6596/1529/4/042086.
[27] S. Yadav, "Deep Learning based Safe Social Distancing and Face Mask Detection in Public
Areas for COVID-19 Safety Guidelines Adherence", International Journal for Research in
Applied Science and Engineering Technology, vol. 8, no. 7, pp. 1368-1375, 2020.
Available: 10.22214/ijraset.2020.30560.
[28] Schroff, F., Kalenichenko, D., & Philbin, J. (2015, June 17). FaceNet: A Unified
Embedding for Face Recognition and Clustering. Retrieved November 22, 2020, from
https://arxiv.org/abs/1503.03832
[29] Taigman, B., Taigman, Y., Ranzato, M., & Wolf, L. (2014, June 24). DeepFace: Closing
the Gap to Human-Level Performance in Face Verification. Retrieved November 22, 2020,
from Deep Face (https://research.fb.com/publications/deepface-closing-the-gap-to-human-
level-performance-in-face-verification/)
[30] V. Dhar, "The Scope and Challenges for Deep Learning", Big Data, vol. 3, no. 3, pp. 127-
129, 2015. Available: 10.1089/big.2015.29000.vdb.
[31] V. Gunjan, R. Pathak and O. Singh, "Understanding Image Classification Using
TensorFlow Deep Learning - Convolution Neural Network", International Journal of
Hyperconnectivity and the Internet of Things, vol. 3, no. 2, pp. 19-37, 2019. Available:
10.4018/ijhiot.2019070103.
[32] V. S.V, M. Katti, A. Khatawkar and P. Kulkarni, "Face Detection and Tracking using
OpenCV", The SIJ Transactions on Computer Networks & Communication Engineering,
vol. 04, no. 03, pp. 01-06, 2016. Available: 10.9756/sijcnce/v4i3/0103540102.
[33] V. KumarB.V.P, N. S. Murthy Sharma and K. Lal Kishore, "A Technique to Reduce Glitch
Power during Physical Design Stage for Low Power and Less IR Drop", International
Journal of Computer Applications, vol. 39, no. 18, pp. 62-67, 2012. Available:
10.5120/5086-7450 [Accessed 28 August 2020].
[34] X. Sun, P. Wu and S. Hoi, "Face detection using deep learning: An improved faster RCNN
approach", Neurocomputing, vol. 299, pp. 42-50, 2018. Available:
10.1016/j.neucom.2018.03.030.

24

You might also like