Professional Documents
Culture Documents
DEEP LEARNING
1
DECLARATION
We, students of B.Tech (Electronics & Communication Engineering) hereby declare that the
report titled “ FACE MASK DETECTION USING DEEP LEARNING” , submitted in
partial fulfilment of the requirement for the award of degree of Bachelor of Technology
comprises of our original work and has not been submitted anywhere else for any other degree
to the best of our knowledge.
2
CERTIFICATE
This is to certify that the minor project work done on “Face Mask Detection Using Machine
Learning” submitted at Maharaja Surajmal Institute of Technology, Janakpuri, Delhi by
“AbhishekTomar(35196302817),GautamGoel(42096302817),RitvikBhatia(40596302817)
”in partial fulfilment for the award of degree of Bachelor of Technology, is a Bonafede work
carried out by them under my supervision and guidance. This project work comprises of
original work and has not been submitted anywhere else for any other degree to the best of our
knowledge.
3
ACKNOWLEDGEMENT
Team effort together with precious proper n mindful guidance makes daunting tasks
achievable. It is a pleasure to acknowledge the direct and implied help we have received at
various stages while developing the project. It would not have been possible to develop such a
project without the assistance of numerous individuals. We find it impossible to express our
thanks to each one of them in words, for it seems too trivial when compare to the profound
encouragement that they extended to us.
We are grateful to Dr. Pradeep Sangwan (HOD ECE), for having given us opportunity to
do this project, which was of great interest to us.
Our sincere thanks to Ms Swati Malik for believing in us and providing motivation all through.
Without her guidance this project would not be such a success.
An undertaking of this nature could never have been attempted without our reference to and
inspiration from the works of others whose details are mentioned in references section. I
acknowledge my indebtedness to all of them. Last but not the least, my sincere thanks to all
my friends who have patiently extended all sorts of help for accomplishing this undertaking.
4
LIST OF FIGURES
5
INDEX
1.) Introduction ...............................................................................................................................10
6
7.) Implementation in real time video streams...............................................................................39
9.) Bibliography………………........................................................................................................49
7
ABSTRACT
Our Project Consists two-phases of COVID-19 face mask detector, detailing how our computer
vision/deep learning pipeline will be implemented.
From there, we’ll review the dataset we’ll be using to train our custom face mask detector.
We will then implement a Python script to train a face mask detector on our dataset using Keras
and TensorFlow.
We’ll use this Python script to train a face mask detector and review the results.
Given the trained COVID-19 face mask detector, we’ll proceed to implement two more
additional Python scripts used to:
Figure
1: Phases and
individual steps
for building a
COVID-19
face mask
detector with
computer
vision and deep
learning using
Python, Open
CV and
TensorFlow
8
In order to train a custom face mask detector, we need to break our project into two distinct
phases, each with its own respective sub-steps (as shown by Figure 1 ):
1. Training: Here we’ll focus on loading our face mask detection dataset from disk,
training a model (using Keras/TensorFlow) on this dataset, and then serializing the
face mask detector to disk
2. Deployment: Once the face mask detector is trained, we can then move on to loading
the mask detector, performing face detection, and then classifying each face
as with_mask or without_maskWe’ll review each of these phases and associated
subsets in detail in the remainder of this tutorial, but in the meantime, let’s take a look
at the dataset we’ll be using to train our COVID-19 face mask detector.
9
CHAPTER 1
INTRODUCTION
1.1 NEED FOR THE HOUR
10
1.1 NEED FOR THE HOUR
The trend of wearing face masks in public is rising due to the COVID-19 coronavirus epidemic
all over the world. Before Covid-19, People used to wear masks to protect their health from air
pollution. While other people are self-conscious about their looks, they hide their emotions
from the public by hiding their faces. Scientists proofed that wearing face masks works on
impeding COVID-19 transmission. COVID-19 (known as coronavirus) is the latest epidemic
virus that hit the human health in the last century. In 2020, the rapid spreading of COVID-19
has forced the World Health Organization to declare COVID-19 as a global pandemic.
According to, more than five million cases were infected by COVID-19 in less than 6 months
across 188 countries. The virus spreads through close contact and in crowded and overcrowded
areas.
The coronavirus epidemic has given rise to an extraordinary degree of worldwide scientific
cooperation. Artificial Intelligence (AI) based on Machine learning and Deep Learning can
help to fight Covid-19 in many ways. Machine learning allows researchers and clinicians
evaluate vast quantities of data to forecast the distribution of COVID-19, to serve as an early
warning mechanism for potential pandemics, and to classify vulnerable populations. The
provision of healthcare needs funding for emerging technology such as artificial intelligence,
IoT, big data and machine learning to tackle and predict new diseases. In order to better
understand infection rates and to trace and quickly detect infections, the AI 's power is being
exploited to address the Covid-19 pandemic such as the detection of COVID-19 in medical
chest X-rays.
Policymakers are facing a lot of challenges and risks in facing the spreading and transmission
of COVID-19 . People are forced by laws to wear face masks in public in many countries.
These rules and laws were developed as an action to the exponential growth in cases and deaths
in many areas. However, the process of monitoring large groups of people is becoming more
difficult. The monitoring process involves the detection of anyone who is not wearing a face
mask. In France, to guarantee that riders wear face masks, new AI software tools are integrated
in the Paris Metro system's surveillance cameras . The French startup DatakaLab , which
developed the software, reports that the goal is not to recognize or arrest people who do not
wear masks but to produce anonymous statistical data that can help the authorities predict
potential outbreaks of COVID-19.
In this project, we introduce a mask face detection model that is based on deep transfer learning
and classical machine learning classifiers. The proposed model can be integrated with
surveillance cameras to impede the COVID-19 transmission by allowing the detection of
people who are not wearing face masks. The model is integration between deep transfer
learning and classical machine learning algorithms. We have used deep transfer leering for
feature extractions and combined it with three classical machine learning algorithms. We
introduced a comparison between them to find the most suitable algorithm that achieved the
highest accuracy and consumed the least time in the process of training and detection.
The novelty of this research is using a proposed feature extraction model have an end-to-end
structure without traditional techniques with three classifiers machine learning algorithms for
mask face detection.
11
1.2 PROPOSED MODEL
The introduced model includes two main components, the first component is deep transferring
learning (ResNet50) as feature extractor and the second component is a classical machine
learning like decision trees, SVM, and ensemble. According to , ResNet-50 has achieved better
results when it is used as a feature extractor. FIG2 illustrates the proposed classical transfer
learning model. Mainly, the ResNet50 used for the feature extraction phase while the traditional
machine learning model used in the training, validation, and testing phase.
Figure 3: A face mask detection dataset consists of “with mask” and “without mask” images.
12
We will use the dataset to build a COVID-19 face mask detector with computer vision and
deep learning using Python, OpenCV, and TensorFlow/Keras.
• with_mask
: 690 images
• without_mask
: 686 images
Our goal is to train a custom deep learning model to detect whether a person is or is
not wearing a mask.
The second dataset is a Simulated Masked Face Dataset (SMFD) . The SMFD dataset consists
of 1570 images, 785 for simulated masked faces, 785 for unmasked faces. Examples for images
of the SMFD are presented in Fig4. The SMFD dataset used for the training, validation, and
testing phases.
The Third dataset used in this research is the Labelled Faces in the Wild (LFW) . It is a
simulated masked face dataset that contains 13,000 masked faces for celebrities around the
round. Fig5. illustrates samples of LFW images. The LFW dataset used for the testing phase
only as a benchmark testing dataset which the proposed model never trained on it.
13
Fig. 5. LFW dataset images samples.
14
CHAPTER 2
PROJECT STRUCTURE
2.1 DIRECTORY STRUCTURE
15
2.1 DIRECTORY STRUCTURE
├── dataset
│ ├── with_mask
│ └── without_mask
├── face_detector
│ ├── deploy.prototxt
│ └── res10_300x300_ssd_iter_140000.caffemodel
├── detect_mask_image.py
├── detect_mask_video.py
├── mask_detector.model
├── plot.png
└── train_mask_detector.py
Three image examples/ are provided so that you can test the static image face mask detector.
3)detect_mask_video.py: Using your webcam, this script applies face mask detection to every
frame in the stream
16
CHAPTER 3
IMPLEMENTATION
3.1 IMPORTING NECESSARY PACKAGES
17
3.1 IMPORTING NECESSARY PACKAGES
Now that we’ve reviewed our face mask dataset, let’s learn how we can use Keras and
TensorFlow to train a classifier to automatically detect whether a person is wearing a mask or not.
To accomplish this task, we’ll be fine-tuning the MobileNet V2 architecture, a highly efficient
architecture that can be applied to embedded devices with limited computational capacity (ex.,
Raspberry Pi, Google Coral, NVIDIA Jetson Nano, etc.). Deploying our face mask detector to
embedded devices could reduce the cost of manufacturing such face mask detection systems,
hence why we choose to use this architecture.
Data augmentation
Loading the MobilNetV2 classifier (we will fine-tune this model with pre-trained
ImageNet weights)
Building a new fully-connected (FC) head
Pre-processing
Loading image data
We’ll use scikit-learn (sklearn) for binarizing class labels, segmenting our dataset, and printing a
classification report. Imutils paths implementation will help us to find and list images in our
dataset. And we’ll use matplotlib to plot our training curves.
18
3.2 CONSTRUCTION OF ARGUMENT PARSER
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
help="path to input dataset")
ap.add_argument("-p", "--plot", type=str, default="plot.png",
help="path to output loss/accuracy plot")
ap.add_argument("-m", "--model", type=str,
default="mask_detector.model",
help="path to output face mask detector model")
args = vars(ap.parse_args())
--dataset: The path to the input dataset of faces and and faces with masks
--plot: The path to your output training history plot, which will be generated using matplotlib
--model: The path to the resulting serialized face mask classification model
The above lines of code assume that your entire dataset is small enough to fit into memory. If
your dataset is larger than the memory you have available, We suggest using HDF5.Our data
preparation work isn’t done yet. Next, we’ll encode our labels, partition our dataset, and prepare
for data augmentation.
20
CHAPTER 4
CONFIGURING CLASSIFIER
4.1 PREPRATION FOR DATA AUGMENTATION
21
4.1 PREPRATION FOR DATA AUGMENTATION
Lines 67-69 one-hot encode our class labels, meaning that our data will be in the following
format:
As you can see, each element of our labels array consists of an array in which only one index is
“hot” (i.e., 1).
Using scikit-learn’s convenience method, Lines 73 and 74 segment our data into 80% training and
the remaining 20% for testing.
During training, we’ll be applying on-the-fly mutations to our images in an effort to improve
generalization. This is known as data augmentation, where the random rotation, zoom, shear,
shift, and flip parameters are established on Lines 77-84. We’ll use the aug object at training time.
22
4.2 MOBILENETV2 CONFIGURATION
1. Load MobileNet with pre-trained ImageNet weights, leaving off head of network (Lines
88 and 89)
2. Construct a new FC head, and append it to the base in place of the old head (Lines 93-102)
3. Freeze the base layers of the network (Lines 106 and 107). The weights of these base
layers will not be updated during the process of backpropagation, whereas the head layer
weights will be tuned.
Fine-tuning is a strategy We nearly always recommend to establish a baseline model while saving
considerable time.
23
Lines 111-113 compile our model with the Adam optimizer, a learning rate decay schedule, and
binary cross-entropy. If you’re building from this training script with > 2 classes, be sure to use
categorical cross-entropy.
Face mask training is launched via Lines 117-122. Notice how our data augmentation object (aug)
will be providing batches of mutated image data.
Once our plot is ready, Line 152 saves the figure to disk using the --plot filepath.
24
CHAPTER 5
25
5.1 TRAINING MODEL ON DATASET
We trained our face mask detector using Keras, TensorFlow, and Deep Learning.
1. train_mask_detector.py is the python script that applies our model on the dataset we have acquired.
2. –dataset is the parameter where we specify the path of the directory that has the dataset used for
training in it.
Epoch 1/20
34/34 [==============================] - 30s 885ms/step - loss:
0.6431 - accuracy: 0.6676 - val_loss: 0.3696 - val_accuracy: 0.8242
Epoch 2/20
34/34 [==============================] - 29s 853ms/step - loss:
0.3507 - accuracy: 0.8567 - val_loss: 0.1964 - val_accuracy: 0.9375
Epoch 3/20
34/34 [==============================] - 27s 800ms/step - loss:
0.2792 - accuracy: 0.8820 - val_loss: 0.1383 - val_accuracy: 0.9531
Epoch 4/20
34/34 [==============================] - 28s 814ms/step - loss:
0.2196 - accuracy: 0.9148 - val_loss: 0.1306 - val_accuracy: 0.9492
Epoch 5/20
34/34 [==============================] - 27s 792ms/step - loss:
0.2006 - accuracy: 0.9213 - val_loss: 0.0863 - val_accuracy: 0.9688
Here TensorFlow shows us the loss, accuracy, validation loss, and validation accuracy values for
each epoch.
26
5.2 ANALYZING CLASSIFICATION REPORT
The scikit-learn generates a classification report for us based on how our classifier performs on the
training and validation dataset. The classification report generated for our particular model is given
below.
CLASSIFICATION REPORT:
accuracy 0.99
The report shows the main classification metrics precision, recall and f1-score on a per-class basis.
The metrics are calculated by using true and false positives, true and false negatives. Positive and
negative in this case are generic names for the predicted classes. There are four ways to check if
the predictions are right or wrong
Precision is the ability of a classifier not to label an instance positive that is actually negative. For
each class it is defined as the ratio of true positives to the sum of true and false positives.
TP – True Positives
FP – False Positives
27
Recall – What percent of the positive cases did you catch?
Recall is the ability of a classifier to find all positive instances. For each class it is defined as the
ratio of true positives to the sum of true positives and false negatives.
FN – False Negatives
The F1 score is a weighted harmonic mean of precision and recall such that the best score is 1.0 and
the worst is 0.0. Generally speaking, F1 scores are lower than accuracy measures as they embed
precision and recall into their computation. As a rule of thumb, the weighted average of F1 should
be used to compare classifier models, not global accuracy.
After we trained our model on the dataset that we had collected, matplotlib library helped us plot
the Loss v/s Accuracy graph.
A loss function is used to optimize a deep learning algorithm. The loss is calculated on training and
validation and its interpretation is based on how well the model is doing in these two sets. It is the
sum of errors made for each example in training or validation sets. Loss value implies how poorly
or well a model behaves after each iteration of optimization.
An accuracy metric is used to measure the algorithm’s performance in an interpretable way. The
accuracy of a model is usually determined after the model parameters and is calculated in the form
of a percentage. It is the measure of how accurate your model's prediction is compared to the true
data.
Example-
Suppose you have 1000 test samples and if your model is able to classify 990 of them correctly,
then the model’s accuracy will be 99.0%.
28
Our Graph:
29
Looking at Figure 6, we can see there are some signs of overfitting around epochs 11 and 15, with
the validation loss slightly higher than the training loss. But both the losses are somewhat equal
elsewhere, that tells us that our model is just right fitted.
Given these results, we were hopeful that our model will generalize well to images outside our
training and testing set.
30
CHAPTER 6
31
6.1 IMPORTS AND ARGUMENT PARSER
We open up the detect_mask_image.py file in our directory structure. The libraries we need to
import are given below:
Our driver script requires three TensorFlow/Keras imports to (1) load our model and (2) pre-process
the input image.
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
help="path to input image")
ap.add_argument("-f", "--face", type=str,
default="face_detector",
help="path to face detector model directory")
ap.add_argument("-m", "--model", type=str,
default="mask_detector.model",
help="path to trained face mask detector model")
ap.add_argument("-c", "--confidence", type=float, default=0.5,
help="minimum probability to filter weak detections")
args = vars(ap.parse_args())
32
Our four command line arguments include:
--image: The path to the input image containing faces for inference
--face: The path to the face detector model directory (we need to localize faces prior to classifying
them)
--model: The path to the face mask detector model that we trained earlier in this tutorial
--confidence: An optional probability threshold can be set to override 50% to filter weak face
detections
We first loaded both our face detector and face mask classifier models using the following piece of
code:
Here we are first loading the prototxt file and then the caffemodel file using the path given as
argument --face.
These are the files needed to detect faces in images. This model is included with OpenCV. It is a
pretrained model that has been trained in Caffe.
Then we load the face mask detection model we trained on our dataset and serialized to the disk.
This is loaded from the path/filename passed in the argument --model.
33
6.3 LOAD INPUT IMAGE FROM DISK
With our deep learning models now in memory, our next step is to load and pre-process an input
image:
image = cv2.imread(args["image"])
orig = image.copy()
(h, w) = image.shape[:2]
# construct a blob from the image
blob = cv2.dnn.blobFromImage(image, 1.0, (300, 300),
(104.0, 177.0, 123.0))
# pass the blob through the network and obtain the face detections
print("[INFO] computing face detections...")
net.setInput(blob)
detections = net.forward()
Upon loading our –image from disk, we make a copy and grab frame dimensions for future scaling
and display purposes.
Pre-processing is handled by OpenCV’s blobFromImage function. This function can perform the
following processes:
Mean subtraction
Scaling
As shown in the parameters, we resize to 300×300 pixels and perform mean subtraction.
Then we perform face detection to localize where in the image all faces are. Once we know where
each face is predicted to be, we’ll ensure they meet the --confidence threshold before we extract
the faceROIs.
34
6.4 DETECT FACES
Here, we loop over our detections and extract the confidence to measure against the --confidence
threshold.
We then compute bounding box value for a particular face and ensure that the box falls within the
boundaries of the image.
36
First, we determine the class label based on probabilities returned by the mask detector model and
assign an associated colour for the annotation. The colour will be “green” for with_mask and “red”
for without_mask.
We then draw the label text (including class and probability), as well as a bounding box rectangle
for the face, using OpenCV drawing functions.
Once all detections have been processed, we display the output image and wait for the user to press
a key.
Results on images:
37
Figure 8 with and without mask image Gautam
38
CHAPTER 7
39
7.1) Loading video and updating algorithm
We first Open up the detect_mask_video.py file in our directory structure, and insert this code.
The algorithm for this script is the same, but it is pieced together in such a way to allow for
processing every frame of your webcam stream. Thus, the only difference when it comes to imports
is that we need a VideoStream class and time. Both of these will help us to work with the stream.
We’ll also take advantage of imutils for its aspect-aware resizing method.
This function detects faces and then applies our face mask classifier to each face ROI. Such a
function consolidates our code — it could even be moved to a separate Python file if you so choose.
Our detect_and_predict_mask function accepts three parameters:
2. faceNet: The model used to detect where in the image faces are
40
Inside, we construct a blob, detect faces, and initialize lists, two of which the function is set to
return. These lists include our faces (i.e., ROIs), locs (the face locations), and preds (the list of
mask/no mask predictions).
41
7.5) executing face mask predictor
The logic here is built for speed. First we ensure at least one face was detected (Line 63) — if not,
we’ll return empty preds.
Secondly, we are performing inference on our entire batch of faces in the frame so that our pipeline
is faster (Line 68). It wouldn’t make sense to write another loop to make predictions on each face
individually due to the overhead (especially if you are using a GPU that requires a lot of overhead
communication on your system bus). It is more efficient to perform predictions in batch. Line 72
returns our face bounding box locations and corresponding mask/not mask predictions to the caller.
With our imports, convenience function, and command line args ready to go
42
7.7) running initializations and looping over Frames
1. Face detector
We begin looping over frames on Line 103. Inside, we grab a frame from the stream and resize it
(Lines 106 and 107). From there, we put our convenience utility to use; Line 111 detects and
predicts whether people are wearing their masks or not.
43
7.8) Processing and displaying results
Inside our loop over the prediction results (beginning on Line 115), we:
1. Unpack a face bounding box and mask/not mask prediction (Lines 117 and 118)
After the frame is displayed, we capture key presses. If the user presses q (quit), we break out of
the loop and perform housekeeping.
44
CHAPTER 8
45
8.1) Running face mask detector in livestream
We launch the mask detector in real-time video streams using the following command:
46
8.2) Scope for error
A few reasons why we cannot detect the face in the foreground is because:
2. Using Cloth, Bandana or a shirt to cover face would likely produce an error as it is not
present in our database.
Therefore, if a large portion of the face is occluded, our face detector will likely fail to detect the
face.
▸ We collect large amount of data with people wearing mask as well as not wearing any
mask , this could reduce the scope of error As it uses large amount of data for
comparison
▸ Secondly, We can also gather images of faces that may “confuse” our classifier into
thinking the person is wearing a mask when in fact they are not — potential examples
include shirts wrapped around faces, bandana over the mouth, etc.
47
CONCLUSION AND
FUTURE WORKS
The coronavirus COVID-19 pandemic is causing a global health crisis. Governments all over the
world are struggling to stand against this type of virus. The protection from infection caused by
COVID-19 is a necessary countermeasure, according to the World Health Organization (WHO). In
this paper, a hybrid model using deep and classical machine learning for face mask detection was
presented. The proposed model consisted of two parts. The first part was for the feature extraction
using Resnet50. Resnet50 is one of the popular models in deep transfer learning. While the second
part was for the detection process of face masks using classical machine learning algorithms. The
Support Vector Machine (SVM), decision trees, and ensemble algorithms were selected as
traditional machine learning for investigation.
Three datasets had experimented on, and different training and testing strategies had adopted
through this research. The plans include training on a specific dataset while testing over other
datasets to prove the efficiency of the proposed model. The presented works concluded that The
SVM classifier achieved the highest accuracy possible with the least time consumed in the training
process. The SVM classifier in RMFD achieved 99.64% testing accuracy. In SMFD, it gained
99.49%, while in LFW, it reached 100% testing accuracy. A comparative result had carried out with
related works. The proposed model super passed the associated works in terms of testing accuracy.
The major drawback is not tray most of classical machine learning methods to get lowest consume
time and highest accuracy. One of the possible future tasks is to use deeper transfer learning models
for feature extraction and use the neutrosophic domain as it shows promising potential in the
classification and detection problems.
To create our face mask detector, we trained a two-class model of people wearing masks and people
not wearing masks.
We fine-tuned MobileNetV2 on our mask/no mask dataset and obtained a classifier that is ~99%
accurate.
We then took this face mask classifier and applied it to both images and real-time video streams by:
Our face mask detector is accurate, and since we used the MobileNetV2 architecture, it’s also
computationally efficient, making it easier to deploy the model to embedded systems (Raspberry
Pi, Google Coral, Jetosn, Nano, etc.).
48
BIBLIOGRAPHY
JOURNALS
[1] S. Feng, C. Shen, N. Xia, W. Song, M. Fan, B.J. Cowlin Rational use of face masks in the
COVID-19 pandemic Lancet Respirat. Med., 8 (5) (2020), pp. 434-436, 10.1016/S2213-
2600(20)30134-X
[2] X. Liu, S. Zhang, COVID-19: Face masks and human-to-human transmission, Influenza
Other Respirat. Viruses, vol. n/a, no. n/a, doi: 10.1111/irv.12740.
[3] M. Loey, F. Smarandache, N.E.M. Khalifa. Within the lack of chest COVID-19 X-ray
dataset: a novel detection model based on GAN and deep transfer learning
Symmetry, 12 (4) (2020), p. 651
[4] M.S. Ejaz, M.R. Islam, M. Sifatullah, A. Sarker Implementation of principal component
analysis on masked and non-masked face recognition
[5] Jeong-Seon Park, You Hwa Oh, Sang Chul Ahn, and Seong-Whan Lee, Glasses removal from
facial image using recursive error compensation, IEEE Trans. Pattern Anal. Mach. Intell. 27 (5)
(2005) 805–811, doi: 10.1109/TPAMI.2005.103.
[6] D.M. Altmann, D.C. Douek, R.J. BoytonWhat policy makers need to know about COVID-
19 protective immunity
49