Project Report

FACE MASK DETECTION USING
DEEP LEARNING
A Minor Project Report submitted in partial fulfilment of

the requirement for the award of degree of
BACHELOR OF TECHNOLOGY
IN
ELECTRONICS AND COMMUNICATION
ENGINEERING
Under the Supervision of
Ms Swati Malik
By
ABHISHEK TOMAR (35196302817)
GAUTAM GOEL (42096302817)
RITVIK BHATIA (40596302817)
MAHARAJA SURAJMAL INSTITUTE OF

TECHNOLOGY
1
DECLARATION
We, students of B.Tech (Electronics & Communication Engineering) hereby declare that the
report titled “ FACE MASK DETECTION USING DEEP LEARNING” , submitted in
partial fulfilment of the requirement for the award of degree of Bachelor of Technology
comprises of our original work and has not been submitted anywhere else for any other
degree to the best of our knowledge.

2
CERTIFICATE
This is to certify that the minor project work done on “Face Mask Detection Using Machine
Learning” submitted at Maharaja Surajmal Institute of Technology, Janakpuri, Delhi by
“AbhishekTomar(35196302817),GautamGoel(42096302817),RitvikBhatia(40596302817)
”in partial fulfilment for the award of degree of Bachelor of Technology, is a Bonafede work
carried out by them under my supervision and guidance. This project work comprises of
original work and has not been submitted anywhere else for any other degree to the best of
our knowledge.
Ms Swati Malik Dr. Pradeep Sangwan
(Project Supervisor) (HOD,ECE)
3
ACKNOWLEDGEMENT
Team effort together with precious proper n mindful guidance makes daunting tasks
achievable. It is a pleasure to acknowledge the direct and implied help we have received at
various stages while developing the project. It would not have been possible to develop such
a project without the assistance of numerous individuals. We find it impossible to express our
thanks to each one of them in words, for it seems too trivial when compare to the profound
encouragement that they extended to us.
We are grateful to Dr. Pradeep Sangwan (HOD ECE), for having given us opportunity to
do this project, which was of great interest to us.
Our sincere thanks to Ms Swati Malik for believing in us and providing motivation all
through. Without her guidance this project would not be such a success.
An undertaking of this nature could never have been attempted without our reference to and
inspiration from the works of others whose details are mentioned in references section. I
acknowledge my indebtedness to all of them. Last but not the least, my sincere thanks to all
my friends who have patiently extended all sorts of help for accomplishing this undertaking.

4
LIST OF FIGURES
Figure 1 – Steps for building covid-19 mask………….………………………………………….8
Figure 2 – The proposed deep transfer learning model…………………….

…………………..12
Figure 3 – facemask detection dataset.………………….… …………..

………………………...12
Figure 4 – SMFD dataset image samples……………….………………..………………………

13
Figure 5 – LFW dataset image samples.

………………………………………….........................14
Figure 6 – COVID-19 face mask detector accuracy curves………………….

……………….....29
Figure 7 – with and without mask image Ritvik………………….…………..

……………….....37
Figure 8 – with and without mask image Gautam……………………………….

……………....38
Figure 9 – with and without mask image Abhishek………………….…….………..

……….....38
Figure 10 – with and without mask livestream results….…………..………………..

……….....46
5
INDEX
1.) Introduction
...............................................................................................................................10
1.1) Need for the hour..........................................................................................11

1.2) Proposed model.............................................................................................12
1.3) Dataset
characterstics.....................................................................................12
2.) Project
Structure.........................................................................................................................15
2.1) Directory structure.........................................................................................16

2.2) Briefing the
structure......................................................................................16
3.) Implementation ..........................................................................................................................17
6
3.1) Importing necessary
packages.........................................................................18
3.2) Construction of argument
parser.....................................................................19
3.1) Initialize the initial learning
rate......................................................................19
3.2) Load and pre-process
data..............................................................................19
4.) Configuring Classifier................................................................................................................21
4.1) Prepration for data

augmentation...................................................................22
4.2) MOBILENETV2 configuration..........................................................................23
4.3) Training head of
network................................................................................23
4.4) Loading model on test....................................................................................24
4.5) Plot loss vs
accuracy........................................................................................24
5.) Training face detection
model....................................................................................................25
5.1) Training model on

dataset...............................................................................26
5.2) Analysing classification
report.........................................................................27
5.3) Analysing loss v/s accuracy
graph....................................................................28
6.) Implementation for
images.........................................................................................................31
7
6.1) Imports and argument
parser..........................................................................32
6.2) Load face detector and our
model...................................................................33
6.3) Load input image from
disk.............................................................................34
6.4) Detect
faces....................................................................................................35
6.5) Apply detector on
faces...................................................................................35
7.) Implementation in real time video

streams...............................................................................39
7.1) Loading video and updating

algorithm............................................................40
7.2) Loading face detection prediction logic function.............................................40
7.3) Creating loop over detections.........................................................................41
7.4) Extracting face ROI and preprocessing............................................................41
7.5) executing face mask
predictor.........................................................................42
7.6) Defining command line
arguments..................................................................42
7.7) running initializations and looping over
frames...............................................43
7.8) Processing and displaying
results....................................................................44
8.) Real time detection and further
improvements........................................................................45
8
8.1)Running face detector in
livestream ................................................................46
8.2) scope for
error................................................................................................47
8.2) scope for
improvements.................................................................................47
9.) Conclusion & Future
Works......................................................................................................48
9.) Bibliography………………........................................................................................................49
ABSTRACT
Our Project Consists two-phases of COVID-19 face mask detector, detailing how our
computer vision/deep learning pipeline will be implemented.
From there, we’ll review the dataset we’ll be using to train our custom face mask detector.
9
We will then implement a Python script to train a face mask detector on our dataset using
Keras and TensorFlow.
We’ll use this Python script to train a face mask detector and review the results.
Given the trained COVID-19 face mask detector, we’ll proceed to implement two more
additional Python scripts used to:
1. Detect COVID-19 face masks in images
2. Detect face masks in real-time video streams
Figure
1: Phases and
individual steps
for building a
COVID-19
face mask
detector with
computer
vision and deep
learning using
Python, Open
CV and
TensorFlow
In order to train a custom face mask detector, we need to break our project into two distinct
phases, each with its own respective sub-steps (as shown by Figure 1 ):
1. Training: Here we’ll focus on loading our face mask detection dataset from disk,
training a model (using Keras/TensorFlow) on this dataset, and then serializing the
face mask detector to disk
10
2. Deployment: Once the face mask detector is trained, we can then move on to loading
the mask detector, performing face detection, and then classifying each face
as with_mask or without_maskWe’ll review each of these phases and associated
subsets in detail in the remainder of this tutorial, but in the meantime, let’s take a
look at the dataset we’ll be using to train our COVID-19 face mask detector.
CHAPTER 1
11
INTRODUCTION
1.1 NEED FOR THE HOUR
1.2 PROPOSED MODEL
1.3 DATASET CHARACTERSTICS
1.1 NEED FOR THE HOUR

The trend of wearing face masks in public is rising due to the COVID-19 coronavirus
epidemic all over the world. Before Covid-19, People used to wear masks to protect their
health from air pollution. While other people are self-conscious about their looks, they hide
their emotions from the public by hiding their faces. Scientists proofed that wearing face
masks works on impeding COVID-19 transmission. COVID-19 (known as coronavirus) is the
latest epidemic virus that hit the human health in the last century. In 2020, the rapid spreading
12
of COVID-19 has forced the World Health Organization to declare COVID-19 as a global
pandemic. According to, more than five million cases were infected by COVID-19 in less
than 6 months across 188 countries. The virus spreads through close contact and in crowded
and overcrowded areas.
The coronavirus epidemic has given rise to an extraordinary degree of worldwide scientific
cooperation. Artificial Intelligence (AI) based on Machine learning and Deep Learning can
help to fight Covid-19 in many ways. Machine learning allows researchers and clinicians
evaluate vast quantities of data to forecast the distribution of COVID-19, to serve as an early
warning mechanism for potential pandemics, and to classify vulnerable populations. The
provision of healthcare needs funding for emerging technology such as artificial intelligence,
IoT, big data and machine learning to tackle and predict new diseases. In order to better
understand infection rates and to trace and quickly detect infections, the AI 's power is being
exploited to address the Covid-19 pandemic such as the detection of COVID-19 in medical
chest X-rays.
Policymakers are facing a lot of challenges and risks in facing the spreading and transmission
of COVID-19 . People are forced by laws to wear face masks in public in many countries.
These rules and laws were developed as an action to the exponential growth in cases and
deaths in many areas. However, the process of monitoring large groups of people is becoming
more difficult. The monitoring process involves the detection of anyone who is not wearing a
face mask. In France, to guarantee that riders wear face masks, new AI software tools are
integrated in the Paris Metro system's surveillance cameras . The French startup DatakaLab ,
which developed the software, reports that the goal is not to recognize or arrest people who
do not wear masks but to produce anonymous statistical data that can help the authorities
predict potential outbreaks of COVID-19.
In this project, we introduce a mask face detection model that is based on deep transfer
learning and classical machine learning classifiers. The proposed model can be integrated
with surveillance cameras to impede the COVID-19 transmission by allowing the detection of
people who are not wearing face masks. The model is integration between deep transfer
learning and classical machine learning algorithms. We have used deep transfer leering for
feature extractions and combined it with three classical machine learning algorithms. We
introduced a comparison between them to find the most suitable algorithm that achieved the
highest accuracy and consumed the least time in the process of training and detection.
The novelty of this research is using a proposed feature extraction model have an end-to-end
structure without traditional techniques with three classifiers machine learning algorithms for
mask face detection.
1.2 PROPOSED MODEL

The introduced model includes two main components, the first component is deep
transferring learning (ResNet50) as feature extractor and the second component is a classical
machine learning like decision trees, SVM, and ensemble. According to , ResNet-50 has
achieved better results when it is used as a feature extractor. FIG2 illustrates the proposed
classical transfer learning model. Mainly, the ResNet50 used for the feature extraction phase
13
while the traditional machine learning model used in the training, validation, and testing
phase.
Figure 2. The proposed deep transfer learning model.
1.3 DATASET CHARACTERSTICS

The first dataset is Real-World Masked Face Dataset (RMFD). The author of RMFD created
one of the biggest masked face datasets used in this research
Fig
ure 3: A face mask detection dataset consists of “with mask” and “without mask” images.
We will use the dataset to build a COVID-19 face mask detector with computer vision and
deep learning using Python, OpenCV, and TensorFlow/Keras.
This dataset consists of 1,376 images belonging to two classes:
14
 with_mask
: 690 images
 without_mask
: 686 images
Our goal is to train a custom deep learning model to detect whether a person is or is
not wearing a mask.
The second dataset is a Simulated Masked Face Dataset (SMFD) . The SMFD dataset
consists of 1570 images, 785 for simulated masked faces, 785 for unmasked faces. Examples
for images of the SMFD are presented in Fig4. The SMFD dataset used for the training,
validation, and testing phases.
Fig. 4. SMFD dataset images samples.
The Third dataset used in this research is the Labelled Faces in the Wild (LFW) . It is a
simulated masked face dataset that contains 13,000 masked faces for celebrities around the
round. Fig5. illustrates samples of LFW images. The LFW dataset used for the testing phase
only as a benchmark testing dataset which the proposed model never trained on it.
15
Fig. 5. LFW dataset images samples.
16
CHAPTER 2
PROJECT STRUCTURE
2.1 DIRECTORY STRUCTURE
2.2 BRIEFING THE STRUCTURE
17
2.1 DIRECTORY STRUCTURE
├── dataset
│ ├── with_mask
│ └── without_mask
├── face_detector
│ ├── deploy.prototxt
│ └── res10_300x300_ssd_iter_140000.caffemodel
├── detect_mask_image.py
├── detect_mask_video.py
├── mask_detector.model
├── plot.png
└── train_mask_detector.py
2.2 BRIEFING THE STRUCTURE

The dataset/ directory contains the data described in the “Our COVID-19 face mask detection
dataset” section.
Three image examples/ are provided so that you can test the static image face mask detector.
We’ll be reviewing three Python scripts in this tutorial:
1)train_mask_detector.py: Accepts our input dataset and fine-tunes MobileNetV2 upon it to

create our mask_detector.model. A training history plot.png containing accuracy/loss curves is
also produced
2)detect_mask_image.py: Performs face mask detection in static images
3)detect_mask_video.py: Using your webcam, this script applies face mask detection to every
frame in the stream
18
CHAPTER 3
IMPLEMENTATION
3.1 IMPORTING NECESSARY PACKAGES
3.2 CONSTRUCTION OF ARGUMENT PARSER
3.3 INITIALIZE THE INITIAL LEARNING RATE
3.4 LOAD AND PRE-PROCESS DATA
19
3.1 IMPORTING NECESSARY PACKAGES
Now that we’ve reviewed our face mask dataset, let’s learn how we can use Keras and
TensorFlow to train a classifier to automatically detect whether a person is wearing a mask or not.
To accomplish this task, we’ll be fine-tuning the MobileNet V2 architecture, a highly efficient
architecture that can be applied to embedded devices with limited computational capacity (ex.,
Raspberry Pi, Google Coral, NVIDIA Jetson Nano, etc.). Deploying our face mask detector to
embedded devices could reduce the cost of manufacturing such face mask detection systems,
hence why we choose to use this architecture.
from tensorflow.keras.preprocessing.image import ImageDataGenerator

from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.layers import AveragePooling2D
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import os
Our set of tensorflow.keras imports allow for:
 Data augmentation
 Loading the MobilNetV2 classifier (we will fine-tune this model with pre-trained
ImageNet weights)
 Building a new fully-connected (FC) head
 Pre-processing
 Loading image data
We’ll use scikit-learn (sklearn) for binarizing class labels, segmenting our dataset, and printing a
classification report. Imutils paths implementation will help us to find and list images in our
dataset. And we’ll use matplotlib to plot our training curves.
20
3.2 CONSTRUCTION OF ARGUMENT PARSER
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
help="path to input dataset")
ap.add_argument("-p", "--plot", type=str, default="plot.png",
help="path to output loss/accuracy plot")
ap.add_argument("-m", "--model", type=str,
default="mask_detector.model",
help="path to output face mask detector model")
args = vars(ap.parse_args())
Our command line arguments include:
--dataset: The path to the input dataset of faces and and faces with masks
--plot: The path to your output training history plot, which will be generated using matplotlib
--model: The path to the resulting serialized face mask classification model
3.3 INITIALIZE THE INITIAL LEARNING RATE

INIT_LR = 1e-4
EPOCHS = 20
BS = 32
Here, We’ve specified hyperparameter constants including my initial learning rate, number of
training epochs, and batch size. Later, we will be applying a learning rate decay schedule, which
is why we’ve named the learning rate variable INIT_LR.
3.4 LOAD AND PRE-PROCESS DATA

At this point, we’re ready to load and pre-process our training data
print("[INFO] loading images...")

imagePaths = list(paths.list_images(args["dataset"]))
data = []
labels = []
# loop over the image paths
for imagePath in imagePaths:
# extract the class label from the filename
label = imagePath.split(os.path.sep)[-2]
# load the input image (224x224) and preprocess it
image = load_img(imagePath, target_size=(224, 224))
image = img_to_array(image)
image = preprocess_input(image)
# update the data and labels lists, respectively
21
data.append(image)
labels.append(label)
# convert the data and labels to NumPy arrays
data = np.array(data, dtype="float32")
labels = np.array(labels)
In this block, we are:
 Grabbing all of the imagePaths in the dataset (Line 44)

 Initializing data and labels lists (Lines 45 and 46)
 Looping over the imagePaths and loading + pre-processing images (Lines 49-60). Pre-
processing steps include resizing to 224×224 pixels, conversion to array format, and
scaling the pixel intensities in the input image to the range [-1, 1] (via the
preprocess_input convenience function)
 Appending the pre-processed image and associated label to the data and labels lists,
respectively (Lines 59 and 60)
 Ensuring our training data is in NumPy array format (Lines 63 and 64)
The above lines of code assume that your entire dataset is small enough to fit into memory. If
your dataset is larger than the memory you have available, We suggest using HDF5.Our data
preparation work isn’t done yet. Next, we’ll encode our labels, partition our dataset, and prepare
for data augmentation.
22
CHAPTER 4
CONFIGURING CLASSIFIER
4.1 PREPRATION FOR DATA AUGMENTATION
4.2 MOBILENETV2 CONFIGURATION
4.3 TRAINING HEAD OF NETWORK
4.4 LOADING MODEL ON TEST
4.5 PLOT LOSS VS ACCURACY
23
4.1 PREPRATION FOR DATA AUGMENTATION
Lines 67-69 one-hot encode our class labels, meaning that our data will be in the following
format:
As you can see, each element of our labels array consists of an array in which only one index is
“hot” (i.e., 1).
Using scikit-learn’s convenience method, Lines 73 and 74 segment our data into 80% training and
the remaining 20% for testing.
During training, we’ll be applying on-the-fly mutations to our images in an effort to improve
generalization. This is known as data augmentation, where the random rotation, zoom, shear,
shift, and flip parameters are established on Lines 77-84. We’ll use the aug object at training time.
24
4.2 MOBILENETV2 CONFIGURATION
Fine-tuning setup is a three-step process:
1. Load MobileNet with pre-trained ImageNet weights, leaving off head of network (Lines
88 and 89)
2. Construct a new FC head, and append it to the base in place of the old head (Lines 93-102)
3. Freeze the base layers of the network (Lines 106 and 107). The weights of these base
layers will not be updated during the process of backpropagation, whereas the head layer
weights will be tuned.
Fine-tuning is a strategy We nearly always recommend to establish a baseline model while saving
considerable time.
4.3 TRAINING HEAD OF NETWORK

With our data prepared and model architecture in place for fine-tuning, we’re now ready to
compile and train our face mask detector network:
25
Lines 111-113 compile our model with the Adam optimizer, a learning rate decay schedule, and
binary cross-entropy. If you’re building from this training script with > 2 classes, be sure to use
categorical cross-entropy.
Face mask training is launched via Lines 117-122. Notice how our data augmentation object (aug)
will be providing batches of mutated image data.
4.3 LOADING MODEL ON TEST

Here, Lines 126-130 make predictions on the test set, grabbing the highest probability class label
indices. Then, we print a classification report in the terminal for inspection.
Line 138 serializes our face mask classification model to disk.
4.4 PLOT LOSS VS ACCURACY

Our last step is to plot our accuracy and loss curves:
Once our plot is ready, Line 152 saves the figure to disk using the --plot filepath.
26
CHAPTER 5
TRAINING THE FACE DETECTION MODEL

5.1 TRAINING MODEL ON DATASET
5.2 ANALYZING CLASSIFICATION REPORT
5.3 ANALYZING LOSS V/S ACCURACY GRAPH
5.1 TRAINING MODEL ON DATASET
27
We trained our face mask detector using Keras, TensorFlow, and Deep Learning.
From there, we started a terminal, and executed the following command:
$ python train_mask_detector.py --dataset dataset

Here,
1. train_mask_detector.py is the python script that applies our model on the dataset we have
acquired.
2. –dataset is the parameter where we specify the path of the directory that has the dataset used for
training in it.
The resulting output looks a like this:
Epoch 1/20
34/34 [==============================] - 30s 885ms/step - loss:
0.6431 - accuracy: 0.6676 - val_loss: 0.3696 - val_accuracy: 0.8242
Epoch 2/20
34/34 [==============================] - 29s 853ms/step - loss:
Epoch 3/20
34/34 [==============================] - 27s 800ms/step - loss:
Epoch 4/20
34/34 [==============================] - 28s 814ms/step - loss:
Epoch 5/20
34/34 [==============================] - 27s 792ms/step - loss:
Here TensorFlow shows us the loss, accuracy, validation loss, and validation accuracy values for
each epoch.
5.2 ANALYZING CLASSIFICATION REPORT
28
The scikit-learn generates a classification report for us based on how our classifier performs on
the training and validation dataset. The classification report generated for our particular model is
given below.
CLASSIFICATION REPORT:
precision recall f1-score
with_mask 0.99 1.00 0.99
without_mask 1.00 0.99 0.99
accuracy 0.99
macro avg 0.99 0.99 0.99
weighted avg 0.99 0.99 0.99
The report shows the main classification metrics precision, recall and f1-score on a per-class
basis. The metrics are calculated by using true and false positives, true and false negatives.
Positive and negative in this case are generic names for the predicted classes. There are four ways
to check if the predictions are right or wrong
TN / True Negative: when a case was negative and predicted negative
TP / True Positive: when a case was positive and predicted positive
FN / False Negative: when a case was positive but predicted negative
FP / False Positive: when a case was negative but predicted positive
Parameters of the report are explained below:
Precision – What percent of your predictions were correct?
Precision is the ability of a classifier not to label an instance positive that is actually negative. For
each class it is defined as the ratio of true positives to the sum of true and false positives.
TP – True Positives
FP – False Positives
Precision – Accuracy of positive predictions.

Precision = TP/(TP + FP)
Recall – What percent of the positive cases did you catch?
29
Recall is the ability of a classifier to find all positive instances. For each class it is defined as the
ratio of true positives to the sum of true positives and false negatives.
FN – False Negatives
Recall: Fraction of positives that were correctly identified.

Recall = TP/(TP+FN)
F1 score – What percent of positive predictions were correct?
The F1 score is a weighted harmonic mean of precision and recall such that the best score is 1.0
and the worst is 0.0. Generally speaking, F 1 scores are lower than accuracy measures as they
embed precision and recall into their computation. As a rule of thumb, the weighted average of
F1 should be used to compare classifier models, not global accuracy.
F1 Score = 2*(Recall * Precision) / (Recall + Precision)
5.3 ANALYZING LOSS V/S ACCURACY GRAPH
After we trained our model on the dataset that we had collected, matplotlib library helped us plot
the Loss v/s Accuracy graph.
A loss function is used to optimize a deep learning algorithm. The loss is calculated on training
and validation and its interpretation is based on how well the model is doing in these two sets. It is
the sum of errors made for each example in training or validation sets. Loss value implies how
poorly or well a model behaves after each iteration of optimization.
An accuracy metric is used to measure the algorithm’s performance in an interpretable way. The
accuracy of a model is usually determined after the model parameters and is calculated in the
form of a percentage. It is the measure of how accurate your model's prediction is compared to the
true data.
Example-
Suppose you have 1000 test samples and if your model is able to classify 990 of them correctly,
then the model’s accuracy will be 99.0%.
30
Our Graph:
Figure 6: COVID-19 face mask detector training accuracy/loss curves
It can be seen, we are obtaining ~99% accuracy on our test set.
31
Looking at Figure 6, we can see there are some signs of overfitting around epochs 11 and 15, with
the validation loss slightly higher than the training loss. But both the losses are somewhat equal
elsewhere, that tells us that our model is just right fitted.
Given these results, we were hopeful that our model will generalize well to images outside our
training and testing set.
32
CHAPTER 6
IMPLIMENTATION FOR IMAGES

6.1 IMPORTS AND ARGUMENT PARSER
6.2 LOAD FACE DETECTOR AND OUR MODEL
6.3 LOAD INPUT IMAGE FROM DISK
6.4 DETECT FACES
6.5 APPLY DETECTOR ON FACES
33
6.1 IMPORTS AND ARGUMENT PARSER
We open up the detect_mask_image.py file in our directory structure. The libraries we need to

import are given below:
from tensorflow.keras.applications.mobilenet_v2 import

preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.models import load_model
import numpy as np
import argparse
import cv2
import os
Our driver script requires three TensorFlow/Keras imports to (1) load our model and (2) pre-
process the input image.
OpenCV is required for display and image manipulations.
The next step is to parse command line arguments:
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
help="path to input image")
ap.add_argument("-f", "--face", type=str,
default="face_detector",
help="path to face detector model directory")
ap.add_argument("-m", "--model", type=str,
default="mask_detector.model",
help="path to trained face mask detector model")
ap.add_argument("-c", "--confidence", type=float, default=0.5,
help="minimum probability to filter weak detections")
args = vars(ap.parse_args())
34
Our four command line arguments include:
--image: The path to the input image containing faces for inference
--face: The path to the face detector model directory (we need to localize faces prior to classifying
them)
--model: The path to the face mask detector model that we trained earlier in this tutorial
--confidence: An optional probability threshold can be set to override 50% to filter weak face
detections
6.2 LOAD FACE DETECTOR AND OUR MODEL
We first loaded both our face detector and face mask classifier models using the following piece
of code:
# load our serialized face detector model from disk

print("[INFO] loading face detector model...")
prototxtPath = os.path.sep.join([args["face"], "deploy.prototxt"])
weightsPath = os.path.sep.join([args["face"],
"res10_300x300_ssd_iter_140000.caffemodel"])
net = cv2.dnn.readNet(prototxtPath, weightsPath)
# load the face mask detector model from disk
print("[INFO] loading face mask detector model...")
model = load_model(args["model"])
Here we are first loading the prototxt file and then the caffemodel file using the path given as
argument --face.
These are the files needed to detect faces in images. This model is included with OpenCV. It is a
pretrained model that has been trained in Caffe.
Then we load the face mask detection model we trained on our dataset and serialized to the disk.
This is loaded from the path/filename passed in the argument --model.
35
6.3 LOAD INPUT IMAGE FROM DISK
With our deep learning models now in memory, our next step is to load and pre-process an input
image:
image = cv2.imread(args["image"])
orig = image.copy()
(h, w) = image.shape[:2]
# construct a blob from the image
blob = cv2.dnn.blobFromImage(image, 1.0, (300, 300),
(104.0, 177.0, 123.0))
# pass the blob through the network and obtain the face detections
print("[INFO] computing face detections...")
net.setInput(blob)
detections = net.forward()
Upon loading our –image from disk, we make a copy and grab frame dimensions for future
scaling and display purposes.
Pre-processing is handled by OpenCV’s blobFromImage function. This function can perform

the following processes:
Mean subtraction
Scaling
And optionally channel swapping
As shown in the parameters, we resize to 300×300 pixels and perform mean subtraction.
Then we perform face detection to localize where in the image all faces are. Once we know where
each face is predicted to be, we’ll ensure they meet the --confidence threshold before we extract
the faceROIs.
36
6.4 DETECT FACES
for i in range(0, detections.shape[2]):

# extract the confidence (i.e., probability) associated with
# the detection
confidence = detections[0, 0, i, 2]
# filter out weak detections by ensuring the confidence is
# greater than the minimum confidence
if confidence > args["confidence"]:
# compute the (x, y)-coordinates of the bounding box for
# the object
box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
(startX, startY, endX, endY) = box.astype("int")
# ensure the bounding boxes fall within the dimensions of
# the frame
(startX, startY) = (max(0, startX), max(0, startY))
(endX, endY) = (min(w - 1, endX), min(h - 1, endY))
Here, we loop over our detections and extract the confidence to measure against the --confidence
threshold.
We then compute bounding box value for a particular face and ensure that the box falls within the
boundaries of the image.
6.5 APPLY DETECTOR ON FACES
# extract the face ROI, convert it from BGR to RGB channel

# ordering, resize it to 224x224, and preprocess it
face = image[startY:endY, startX:endX]
face = cv2.cvtColor(face, cv2.COLOR_BGR2RGB)
37
face = cv2.resize(face, (224, 224))
face = img_to_array(face)
face = preprocess_input(face)
face = np.expand_dims(face, axis=0)
# pass the face through the model to determine if the face
# has a mask or not
(mask, withoutMask) = model.predict(face)[0]
In this block, we:
Extract the face ROI via NumPy slicing
Pre-process the ROI the same way we did during training
Perform mask detection to predict with_mask or without_mask
From here, we displayed the result:
# determine the class label and color we'll use to draw

# the bounding box and text
label = "Mask" if mask > withoutMask else "No Mask"
color = (0, 255, 0) if label == "Mask" else (0, 0, 255)
# include the probability in the label
label = "{}: {:.2f}%".format(label, max(mask, withoutMask) * 100)
# display the label and bounding box rectangle on the output
# frame
cv2.putText(image, label, (startX, startY - 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.45, color, 2)
cv2.rectangle(image, (startX, startY), (endX, endY), color, 2)
# show the output image
cv2.imshow("Output", image)
cv2.waitKey(0)
38
First, we determine the class label based on probabilities returned by the mask detector model and
assign an associated colour for the annotation. The colour will be “green” for with_mask and
“red” for without_mask.
We then draw the label text (including class and probability), as well as a bounding box rectangle
for the face, using OpenCV drawing functions.
Once all detections have been processed, we display the output image and wait for the user to
press a key.
Results on images:
Figure 7 with and without mask image Ritvik
39
Figure 8 with and without mask image Gautam
Figure 9 with and without mask image Abhishek
40
CHAPTER 7
IMPLIMENTATION IN REAL TIME

VIDEO STREAMS
7.1 Loading video and updating algorithm
7.2 Loading Face detection prediction logic function
7.3 Creating a loop over detections
7.4 Extracting Face ROI and preprocessing
7.5 executing face mask predictor
7.6 Defining command line arguments
7.7 running initializations and looping over Frames
7.8 Processing and displaying results
41
7.1) Loading video and updating algorithm
We first Open up the detect_mask_video.py file in our directory structure, and insert this
code.
The algorithm for this script is the same, but it is pieced together in such a way to allow for
processing every frame of your webcam stream. Thus, the only difference when it comes to
imports is that we need a VideoStream class and time. Both of these will help us to work with the
stream. We’ll also take advantage of imutils for its aspect-aware resizing method.
7.2) Loading Face detection prediction logic function
This function detects faces and then applies our face mask classifier to each face ROI. Such a
function consolidates our code — it could even be moved to a separate Python file if you so
choose. Our detect_and_predict_mask function accepts three parameters:
1. frame: A frame from our stream
2. faceNet: The model used to detect where in the image faces are
3. maskNet: Our COVID-19 face mask classifier model
42
Inside, we construct a blob, detect faces, and initialize lists, two of which the function is set to
return. These lists include our faces (i.e., ROIs), locs (the face locations), and preds (the list of
mask/no mask predictions).
7.3) Creating a loop over detections

From here, we’ll loop over the face detections, Inside the loop, we filter out weak detections
(Lines 34-38) and extract bounding boxes while ensuring bounding box coordinates do not fall
outside the bounds of the image (Lines 41-47).
7.4) Extracting Face ROI and preprocessing

After extracting face ROIs and pre-processing (Lines 51-56), we append the the face ROIs and
bounding boxes to their respective lists.
43
7.5) executing face mask predictor
The logic here is built for speed. First we ensure at least one face was detected (Line 63) — if not,
we’ll return empty preds.
Secondly, we are performing inference on our entire batch of faces in the frame so that our
pipeline is faster (Line 68). It wouldn’t make sense to write another loop to make predictions on
each face individually due to the overhead (especially if you are using a GPU that requires a lot of
overhead communication on your system bus). It is more efficient to perform predictions in batch.
Line 72 returns our face bounding box locations and corresponding mask/not mask predictions to
the caller.
7.6) Defining command line arguments

Our command line arguments include:
1. face: The path to the face detector directory
2. model: The path to our trained face mask classifier
3. confidence: The minimum probability threshold to filter weak face detections
With our imports, convenience function, and command line args ready to go
44
7.7) running initializations and looping over Frames
Here we have initialized our:
1. Face detector
2. COVID-19 face mask detector
3. Webcam video stream
We begin looping over frames on Line 103. Inside, we grab a frame from the stream and resize it
(Lines 106 and 107). From there, we put our convenience utility to use; Line 111 detects and
predicts whether people are wearing their masks or not.
45
7.8) Processing and displaying results
Inside our loop over the prediction results (beginning on Line 115), we:
1. Unpack a face bounding box and mask/not mask prediction (Lines 117 and 118)
2. Determine the label and color (Lines 122-126)
3. Annotate the label and face bounding box (Lines 130-132)
Finally, we display the results and perform cleanup:
After the frame is displayed, we capture key presses. If the user presses q (quit), we break out of
the loop and perform housekeeping.
46
CHAPTER 8
REAL TIME DETECTION AND

FURTHER IMPROVEMENTS
8.1 RUNNING FACE DETECTOR IN LIVESTREAM
8.2 SCOPE FOR ERROR
8.3 SCOPE FOR IMPROVEMENTS
8.1) Running face mask detector in livestream
47
We launch the mask detector in real-time video streams using the following command:
Figure 10 with and without mask livestream results
8.2) Scope for error
48
A few reasons why we cannot detect the face in the foreground is because:
1. It’s too obscured by the mask
2. Using Cloth, Bandana or a shirt to cover face would likely produce an error as it is not
present in our database.
Therefore, if a large portion of the face is occluded, our face detector will likely fail to detect the
face.
8.3) Scope for improvements

To improve our face mask detection model further
▸ We collect large amount of data with people wearing mask as well as not wearing any
mask , this could reduce the scope of error As it uses large amount of data for
comparison
▸ Secondly, We can also gather images of faces that may “confuse” our classifier into
thinking the person is wearing a mask when in fact they are not — potential examples
include shirts wrapped around faces, bandana over the mouth, etc.
CONCLUSION AND
49
FUTURE WORKS
The coronavirus COVID-19 pandemic is causing a global health crisis. Governments all over the
world are struggling to stand against this type of virus. The protection from infection caused by
COVID-19 is a necessary countermeasure, according to the World Health Organization (WHO).
In this paper, a hybrid model using deep and classical machine learning for face mask detection
was presented. The proposed model consisted of two parts. The first part was for the feature
extraction using Resnet50. Resnet50 is one of the popular models in deep transfer learning. While
the second part was for the detection process of face masks using classical machine learning
algorithms. The Support Vector Machine (SVM), decision trees, and ensemble algorithms were
selected as traditional machine learning for investigation.
Three datasets had experimented on, and different training and testing strategies had adopted
through this research. The plans include training on a specific dataset while testing over other
datasets to prove the efficiency of the proposed model. The presented works concluded that The
SVM classifier achieved the highest accuracy possible with the least time consumed in the
training process. The SVM classifier in RMFD achieved 99.64% testing accuracy. In SMFD, it
gained 99.49%, while in LFW, it reached 100% testing accuracy. A comparative result had
carried out with related works. The proposed model super passed the associated works in terms of
testing accuracy. The major drawback is not tray most of classical machine learning methods to
get lowest consume time and highest accuracy. One of the possible future tasks is to use deeper
transfer learning models for feature extraction and use the neutrosophic domain as it shows
promising potential in the classification and detection problems.
To create our face mask detector, we trained a two-class model of people wearing masks and
people not wearing masks.
We fine-tuned MobileNetV2 on our mask/no mask dataset and obtained a classifier that is ~99%
accurate.
We then took this face mask classifier and applied it to both images and real-time video streams
by:
 Detecting faces in images/video

 Extracting each individual face
 Applying our face mask classifier
Our face mask detector is accurate, and since we used the MobileNetV2 architecture, it’s also
computationally efficient, making it easier to deploy the model to embedded systems (Raspberry
Pi, Google Coral, Jetosn, Nano, etc.).
BIBLIOGRAPHY
50
JOURNALS
[1] S. Feng, C. Shen, N. Xia, W. Song, M. Fan, B.J. Cowlin Rational use of face masks in the

COVID-19 pandemic Lancet Respirat. Med., 8 (5) (2020), pp. 434-436, 10.1016/S2213-
2600(20)30134-X
[2] X. Liu, S. Zhang, COVID-19: Face masks and human-to-human transmission, Influenza
Other Respirat. Viruses, vol. n/a, no. n/a, doi: 10.1111/irv.12740.
[3] M. Loey, F. Smarandache, N.E.M. Khalifa. Within the lack of chest COVID-19 X-ray

dataset: a novel detection model based on GAN and deep transfer learning
Symmetry, 12 (4) (2020), p. 651
[4] M.S. Ejaz, M.R. Islam, M. Sifatullah, A. Sarker Implementation of principal component

analysis on masked and non-masked face recognition
[5] Jeong-Seon Park, You Hwa Oh, Sang Chul Ahn, and Seong-Whan Lee, Glasses removal
from facial image using recursive error compensation, IEEE Trans. Pattern Anal. Mach. Intell.
27 (5) (2005) 805–811, doi: 10.1109/TPAMI.2005.103.
[6] D.M. Altmann, D.C. Douek, R.J. BoytonWhat policy makers need to know about COVID-

19 protective immunity
Lancet, 395 (10236) (2020), pp. 1527-1529,
51

Project Report

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project Report

Uploaded by

Copyright:

Available Formats

FACE MASK DETECTION USING

A Minor Project Report submitted in partial fulfilment of

MAHARAJA SURAJMAL INSTITUTE OF

ABHISHEK TOMAR (35196302817)

Ms Swati Malik Dr. Pradeep Sangwan

(Project Supervisor) (HOD,ECE)

ABHISHEK TOMAR (35196302817)

Figure 1 – Steps for building covid-19 mask………….………………………………………….8

Figure 2 – The proposed deep transfer learning model…………………….

Figure 3 – facemask detection dataset.………………….… …………..

Figure 4 – SMFD dataset image samples……………….………………..………………………

Figure 5 – LFW dataset image samples.

Figure 6 – COVID-19 face mask detector accuracy curves………………….

Figure 7 – with and without mask image Ritvik………………….…………..

Figure 8 – with and without mask image Gautam……………………………….

Figure 9 – with and without mask image Abhishek………………….…….………..

Figure 10 – with and without mask livestream results….…………..………………..

1.1) Need for the hour..........................................................................................11

2.1) Directory structure.........................................................................................16

4.1) Prepration for data

5.1) Training model on

7.) Implementation in real time video

7.1) Loading video and updating

1. Detect COVID-19 face masks in images

2. Detect face masks in real-time video streams

1.2 PROPOSED MODEL

1.3 DATASET CHARACTERSTICS

1.1 NEED FOR THE HOUR

1.2 PROPOSED MODEL

Figure 2. The proposed deep transfer learning model.

1.3 DATASET CHARACTERSTICS

This dataset consists of 1,376 images belonging to two classes:

Fig. 4. SMFD dataset images samples.

2.2 BRIEFING THE STRUCTURE

2.2 BRIEFING THE STRUCTURE

We’ll be reviewing three Python scripts in this tutorial:

1)train_mask_detector.py: Accepts our input dataset and fine-tunes MobileNetV2 upon it to

2)detect_mask_image.py: Performs face mask detection in static images

3.2 CONSTRUCTION OF ARGUMENT PARSER

3.3 INITIALIZE THE INITIAL LEARNING RATE

3.4 LOAD AND PRE-PROCESS DATA

from tensorflow.keras.preprocessing.image import ImageDataGenerator

Our set of tensorflow.keras imports allow for:

Our command line arguments include:

3.3 INITIALIZE THE INITIAL LEARNING RATE

3.4 LOAD AND PRE-PROCESS DATA

print("[INFO] loading images...")

In this block, we are:

 Grabbing all of the imagePaths in the dataset (Line 44)

4.2 MOBILENETV2 CONFIGURATION

4.3 TRAINING HEAD OF NETWORK

4.4 LOADING MODEL ON TEST

4.5 PLOT LOSS VS ACCURACY

Fine-tuning setup is a three-step process:

4.3 TRAINING HEAD OF NETWORK

4.3 LOADING MODEL ON TEST

Line 138 serializes our face mask classification model to disk.

4.4 PLOT LOSS VS ACCURACY

TRAINING THE FACE DETECTION MODEL

5.2 ANALYZING CLASSIFICATION REPORT

5.3 ANALYZING LOSS V/S ACCURACY GRAPH

5.1 TRAINING MODEL ON DATASET

F1 Score = 2(Recall Precision) / (Recall + Precision)