You are on page 1of 49

FACE MASK DETECTION USING

DEEP LEARNING

A Minor Project Report submitted in partial fulfilment of


the requirement for the award of degree of
BACHELOR OF TECHNOLOGY
IN
ELECTRONICS AND COMMUNICATION
ENGINEERING
Under the Supervision of
Ms Swati Malik
By
ABHISHEK TOMAR (35196302817)
GAUTAM GOEL (42096302817)
RITVIK BHATIA (40596302817)

MAHARAJA SURAJMAL INSTITUTE OF


TECHNOLOGY

1
DECLARATION

We, students of B.Tech (Electronics & Communication Engineering) hereby declare that the
report titled “ FACE MASK DETECTION USING DEEP LEARNING” , submitted in
partial fulfilment of the requirement for the award of degree of Bachelor of Technology
comprises of our original work and has not been submitted anywhere else for any other degree
to the best of our knowledge.

ABHISHEK TOMAR (35196302817)


GAUTAM GOEL (42096302817)
RITVIK BHATIA (40596302817)

2
CERTIFICATE

This is to certify that the minor project work done on “Face Mask Detection Using Machine
Learning” submitted at Maharaja Surajmal Institute of Technology, Janakpuri, Delhi by
“AbhishekTomar(35196302817),GautamGoel(42096302817),RitvikBhatia(40596302817)
”in partial fulfilment for the award of degree of Bachelor of Technology, is a Bonafede work
carried out by them under my supervision and guidance. This project work comprises of
original work and has not been submitted anywhere else for any other degree to the best of our
knowledge.

Ms Swati Malik Dr. Pradeep Sangwan

(Project Supervisor) (HOD,ECE)

3
ACKNOWLEDGEMENT

Team effort together with precious proper n mindful guidance makes daunting tasks
achievable. It is a pleasure to acknowledge the direct and implied help we have received at
various stages while developing the project. It would not have been possible to develop such a
project without the assistance of numerous individuals. We find it impossible to express our
thanks to each one of them in words, for it seems too trivial when compare to the profound
encouragement that they extended to us.

We are grateful to Dr. Pradeep Sangwan (HOD ECE), for having given us opportunity to
do this project, which was of great interest to us.

Our sincere thanks to Ms Swati Malik for believing in us and providing motivation all through.
Without her guidance this project would not be such a success.

An undertaking of this nature could never have been attempted without our reference to and
inspiration from the works of others whose details are mentioned in references section. I
acknowledge my indebtedness to all of them. Last but not the least, my sincere thanks to all
my friends who have patiently extended all sorts of help for accomplishing this undertaking.

ABHISHEK TOMAR (35196302817)


GAUTAM GOEL (42096302817)
RITVIK BHATIA (40596302817)

4
LIST OF FIGURES

Figure 1 – Steps for building covid-19 mask………….………………………………………….8

Figure 2 – The proposed deep transfer learning model…………………….…………………..12

Figure 3 – facemask detection dataset.………………….… …………..………………………...12

Figure 4 – SMFD dataset image samples……………….………………..………………………13

Figure 5 – LFW dataset image samples.………………………………………….........................14

Figure 6 – COVID-19 face mask detector accuracy curves………………….……………….....29

Figure 7 – with and without mask image Ritvik………………….…………..……………….....37

Figure 8 – with and without mask image Gautam……………………………….……………....38

Figure 9 – with and without mask image Abhishek………………….…….………..……….....38

Figure 10 – with and without mask livestream results….…………..………………..……….....46

5
INDEX
1.) Introduction ...............................................................................................................................10

1.1) Need for the hour..........................................................................................11


1.2) Proposed model.............................................................................................12
1.3) Dataset characterstics.....................................................................................12
2.) Project Structure.........................................................................................................................15

2.1) Directory structure.........................................................................................16


2.2) Briefing the structure......................................................................................16
3.) Implementation ..........................................................................................................................17

3.1) Importing necessary packages.........................................................................18


3.2) Construction of argument parser.....................................................................19
3.1) Initialize the initial learning rate......................................................................19
3.2) Load and pre-process data..............................................................................19
4.) Configuring Classifier................................................................................................................21

4.1) Prepration for data augmentation...................................................................22


4.2) MOBILENETV2 configuration..........................................................................23
4.3) Training head of network................................................................................23
4.4) Loading model on test....................................................................................24
4.5) Plot loss vs accuracy........................................................................................24
5.) Training face detection model....................................................................................................25

5.1) Training model on dataset...............................................................................26


5.2) Analysing classification report.........................................................................27
5.3) Analysing loss v/s accuracy graph....................................................................28
6.) Implementation for images.........................................................................................................31

6.1) Imports and argument parser..........................................................................32


6.2) Load face detector and our model...................................................................33
6.3) Load input image from disk.............................................................................34
6.4) Detect faces....................................................................................................35
6.5) Apply detector on faces...................................................................................35

6
7.) Implementation in real time video streams...............................................................................39

7.1) Loading video and updating algorithm............................................................40


7.2) Loading face detection prediction logic function.............................................40
7.3) Creating loop over detections.........................................................................41
7.4) Extracting face ROI and preprocessing............................................................41
7.5) executing face mask predictor.........................................................................42
7.6) Defining command line arguments..................................................................42
7.7) running initializations and looping over frames...............................................43
7.8) Processing and displaying results....................................................................44
8.) Real time detection and further improvements........................................................................45

8.1)Running face detector in livestream ................................................................46


8.2) scope for error................................................................................................47
8.2) scope for improvements.................................................................................47
9.) Conclusion & Future Works......................................................................................................48

9.) Bibliography………………........................................................................................................49

7
ABSTRACT

Our Project Consists two-phases of COVID-19 face mask detector, detailing how our computer
vision/deep learning pipeline will be implemented.

From there, we’ll review the dataset we’ll be using to train our custom face mask detector.

We will then implement a Python script to train a face mask detector on our dataset using Keras
and TensorFlow.

We’ll use this Python script to train a face mask detector and review the results.

Given the trained COVID-19 face mask detector, we’ll proceed to implement two more
additional Python scripts used to:

1. Detect COVID-19 face masks in images

2. Detect face masks in real-time video streams

Figure
1: Phases and
individual steps
for building a
COVID-19
face mask
detector with
computer
vision and deep
learning using
Python, Open
CV and
TensorFlow

8
In order to train a custom face mask detector, we need to break our project into two distinct
phases, each with its own respective sub-steps (as shown by Figure 1 ):

1. Training: Here we’ll focus on loading our face mask detection dataset from disk,
training a model (using Keras/TensorFlow) on this dataset, and then serializing the
face mask detector to disk

2. Deployment: Once the face mask detector is trained, we can then move on to loading
the mask detector, performing face detection, and then classifying each face
as with_mask or without_maskWe’ll review each of these phases and associated
subsets in detail in the remainder of this tutorial, but in the meantime, let’s take a look
at the dataset we’ll be using to train our COVID-19 face mask detector.

9
CHAPTER 1

INTRODUCTION
1.1 NEED FOR THE HOUR

1.2 PROPOSED MODEL

1.3 DATASET CHARACTERSTICS

10
1.1 NEED FOR THE HOUR
The trend of wearing face masks in public is rising due to the COVID-19 coronavirus epidemic
all over the world. Before Covid-19, People used to wear masks to protect their health from air
pollution. While other people are self-conscious about their looks, they hide their emotions
from the public by hiding their faces. Scientists proofed that wearing face masks works on
impeding COVID-19 transmission. COVID-19 (known as coronavirus) is the latest epidemic
virus that hit the human health in the last century. In 2020, the rapid spreading of COVID-19
has forced the World Health Organization to declare COVID-19 as a global pandemic.
According to, more than five million cases were infected by COVID-19 in less than 6 months
across 188 countries. The virus spreads through close contact and in crowded and overcrowded
areas.

The coronavirus epidemic has given rise to an extraordinary degree of worldwide scientific
cooperation. Artificial Intelligence (AI) based on Machine learning and Deep Learning can
help to fight Covid-19 in many ways. Machine learning allows researchers and clinicians
evaluate vast quantities of data to forecast the distribution of COVID-19, to serve as an early
warning mechanism for potential pandemics, and to classify vulnerable populations. The
provision of healthcare needs funding for emerging technology such as artificial intelligence,
IoT, big data and machine learning to tackle and predict new diseases. In order to better
understand infection rates and to trace and quickly detect infections, the AI 's power is being
exploited to address the Covid-19 pandemic such as the detection of COVID-19 in medical
chest X-rays.

Policymakers are facing a lot of challenges and risks in facing the spreading and transmission
of COVID-19 . People are forced by laws to wear face masks in public in many countries.
These rules and laws were developed as an action to the exponential growth in cases and deaths
in many areas. However, the process of monitoring large groups of people is becoming more
difficult. The monitoring process involves the detection of anyone who is not wearing a face
mask. In France, to guarantee that riders wear face masks, new AI software tools are integrated
in the Paris Metro system's surveillance cameras . The French startup DatakaLab , which
developed the software, reports that the goal is not to recognize or arrest people who do not
wear masks but to produce anonymous statistical data that can help the authorities predict
potential outbreaks of COVID-19.
In this project, we introduce a mask face detection model that is based on deep transfer learning
and classical machine learning classifiers. The proposed model can be integrated with
surveillance cameras to impede the COVID-19 transmission by allowing the detection of
people who are not wearing face masks. The model is integration between deep transfer
learning and classical machine learning algorithms. We have used deep transfer leering for
feature extractions and combined it with three classical machine learning algorithms. We
introduced a comparison between them to find the most suitable algorithm that achieved the
highest accuracy and consumed the least time in the process of training and detection.

The novelty of this research is using a proposed feature extraction model have an end-to-end
structure without traditional techniques with three classifiers machine learning algorithms for
mask face detection.

11
1.2 PROPOSED MODEL
The introduced model includes two main components, the first component is deep transferring
learning (ResNet50) as feature extractor and the second component is a classical machine
learning like decision trees, SVM, and ensemble. According to , ResNet-50 has achieved better
results when it is used as a feature extractor. FIG2 illustrates the proposed classical transfer
learning model. Mainly, the ResNet50 used for the feature extraction phase while the traditional
machine learning model used in the training, validation, and testing phase.

Figure 2. The proposed deep transfer learning model.

1.3 DATASET CHARACTERSTICS


The first dataset is Real-World Masked Face Dataset (RMFD). The author of RMFD created
one of the biggest masked face datasets used in this research

Figure 3: A face mask detection dataset consists of “with mask” and “without mask” images.

12
We will use the dataset to build a COVID-19 face mask detector with computer vision and
deep learning using Python, OpenCV, and TensorFlow/Keras.

This dataset consists of 1,376 images belonging to two classes:

• with_mask

: 690 images

• without_mask

: 686 images

Our goal is to train a custom deep learning model to detect whether a person is or is
not wearing a mask.

The second dataset is a Simulated Masked Face Dataset (SMFD) . The SMFD dataset consists
of 1570 images, 785 for simulated masked faces, 785 for unmasked faces. Examples for images
of the SMFD are presented in Fig4. The SMFD dataset used for the training, validation, and
testing phases.

Fig. 4. SMFD dataset images samples.

The Third dataset used in this research is the Labelled Faces in the Wild (LFW) . It is a
simulated masked face dataset that contains 13,000 masked faces for celebrities around the
round. Fig5. illustrates samples of LFW images. The LFW dataset used for the testing phase
only as a benchmark testing dataset which the proposed model never trained on it.

13
Fig. 5. LFW dataset images samples.

14
CHAPTER 2

PROJECT STRUCTURE
2.1 DIRECTORY STRUCTURE

2.2 BRIEFING THE STRUCTURE

15
2.1 DIRECTORY STRUCTURE
├── dataset
│ ├── with_mask
│ └── without_mask
├── face_detector
│ ├── deploy.prototxt
│ └── res10_300x300_ssd_iter_140000.caffemodel
├── detect_mask_image.py
├── detect_mask_video.py
├── mask_detector.model
├── plot.png
└── train_mask_detector.py

2.2 BRIEFING THE STRUCTURE


The dataset/ directory contains the data described in the “Our COVID-19 face mask detection
dataset” section.

Three image examples/ are provided so that you can test the static image face mask detector.

We’ll be reviewing three Python scripts in this tutorial:

1)train_mask_detector.py: Accepts our input dataset and fine-tunes MobileNetV2 upon it to


create our mask_detector.model. A training history plot.png containing accuracy/loss curves is
also produced

2)detect_mask_image.py: Performs face mask detection in static images

3)detect_mask_video.py: Using your webcam, this script applies face mask detection to every
frame in the stream

16
CHAPTER 3

IMPLEMENTATION
3.1 IMPORTING NECESSARY PACKAGES

3.2 CONSTRUCTION OF ARGUMENT PARSER

3.3 INITIALIZE THE INITIAL LEARNING RATE

3.4 LOAD AND PRE-PROCESS DATA

17
3.1 IMPORTING NECESSARY PACKAGES
Now that we’ve reviewed our face mask dataset, let’s learn how we can use Keras and
TensorFlow to train a classifier to automatically detect whether a person is wearing a mask or not.

To accomplish this task, we’ll be fine-tuning the MobileNet V2 architecture, a highly efficient
architecture that can be applied to embedded devices with limited computational capacity (ex.,
Raspberry Pi, Google Coral, NVIDIA Jetson Nano, etc.). Deploying our face mask detector to
embedded devices could reduce the cost of manufacturing such face mask detection systems,
hence why we choose to use this architecture.

from tensorflow.keras.preprocessing.image import ImageDataGenerator


from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.layers import AveragePooling2D
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import os

Our set of tensorflow.keras imports allow for:

 Data augmentation
 Loading the MobilNetV2 classifier (we will fine-tune this model with pre-trained
ImageNet weights)
 Building a new fully-connected (FC) head
 Pre-processing
 Loading image data

We’ll use scikit-learn (sklearn) for binarizing class labels, segmenting our dataset, and printing a
classification report. Imutils paths implementation will help us to find and list images in our
dataset. And we’ll use matplotlib to plot our training curves.

18
3.2 CONSTRUCTION OF ARGUMENT PARSER
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
help="path to input dataset")
ap.add_argument("-p", "--plot", type=str, default="plot.png",
help="path to output loss/accuracy plot")
ap.add_argument("-m", "--model", type=str,
default="mask_detector.model",
help="path to output face mask detector model")
args = vars(ap.parse_args())

Our command line arguments include:

--dataset: The path to the input dataset of faces and and faces with masks

--plot: The path to your output training history plot, which will be generated using matplotlib

--model: The path to the resulting serialized face mask classification model

3.3 INITIALIZE THE INITIAL LEARNING RATE


INIT_LR = 1e-4
EPOCHS = 20
BS = 32
Here, We’ve specified hyperparameter constants including my initial learning rate, number of
training epochs, and batch size. Later, we will be applying a learning rate decay schedule, which
is why we’ve named the learning rate variable INIT_LR.

3.4 LOAD AND PRE-PROCESS DATA


At this point, we’re ready to load and pre-process our training data

print("[INFO] loading images...")


imagePaths = list(paths.list_images(args["dataset"]))
data = []
labels = []
# loop over the image paths
for imagePath in imagePaths:
# extract the class label from the filename
label = imagePath.split(os.path.sep)[-2]
# load the input image (224x224) and preprocess it
image = load_img(imagePath, target_size=(224, 224))
image = img_to_array(image)
image = preprocess_input(image)
# update the data and labels lists, respectively
19
data.append(image)
labels.append(label)
# convert the data and labels to NumPy arrays
data = np.array(data, dtype="float32")
labels = np.array(labels)

In this block, we are:

• Grabbing all of the imagePaths in the dataset (Line 44)


• Initializing data and labels lists (Lines 45 and 46)
• Looping over the imagePaths and loading + pre-processing images (Lines 49-60). Pre-
processing steps include resizing to 224×224 pixels, conversion to array format, and
scaling the pixel intensities in the input image to the range [-1, 1] (via the
preprocess_input convenience function)
• Appending the pre-processed image and associated label to the data and labels lists,
respectively (Lines 59 and 60)
• Ensuring our training data is in NumPy array format (Lines 63 and 64)

The above lines of code assume that your entire dataset is small enough to fit into memory. If
your dataset is larger than the memory you have available, We suggest using HDF5.Our data
preparation work isn’t done yet. Next, we’ll encode our labels, partition our dataset, and prepare
for data augmentation.

20
CHAPTER 4

CONFIGURING CLASSIFIER
4.1 PREPRATION FOR DATA AUGMENTATION

4.2 MOBILENETV2 CONFIGURATION

4.3 TRAINING HEAD OF NETWORK

4.4 LOADING MODEL ON TEST

4.5 PLOT LOSS VS ACCURACY

21
4.1 PREPRATION FOR DATA AUGMENTATION

Lines 67-69 one-hot encode our class labels, meaning that our data will be in the following
format:

As you can see, each element of our labels array consists of an array in which only one index is
“hot” (i.e., 1).

Using scikit-learn’s convenience method, Lines 73 and 74 segment our data into 80% training and
the remaining 20% for testing.

During training, we’ll be applying on-the-fly mutations to our images in an effort to improve
generalization. This is known as data augmentation, where the random rotation, zoom, shear,
shift, and flip parameters are established on Lines 77-84. We’ll use the aug object at training time.

22
4.2 MOBILENETV2 CONFIGURATION

Fine-tuning setup is a three-step process:

1. Load MobileNet with pre-trained ImageNet weights, leaving off head of network (Lines
88 and 89)
2. Construct a new FC head, and append it to the base in place of the old head (Lines 93-102)
3. Freeze the base layers of the network (Lines 106 and 107). The weights of these base
layers will not be updated during the process of backpropagation, whereas the head layer
weights will be tuned.

Fine-tuning is a strategy We nearly always recommend to establish a baseline model while saving
considerable time.

4.3 TRAINING HEAD OF NETWORK


With our data prepared and model architecture in place for fine-tuning, we’re now ready to
compile and train our face mask detector network:

23
Lines 111-113 compile our model with the Adam optimizer, a learning rate decay schedule, and
binary cross-entropy. If you’re building from this training script with > 2 classes, be sure to use
categorical cross-entropy.

Face mask training is launched via Lines 117-122. Notice how our data augmentation object (aug)
will be providing batches of mutated image data.

4.3 LOADING MODEL ON TEST


Here, Lines 126-130 make predictions on the test set, grabbing the highest probability class label
indices. Then, we print a classification report in the terminal for inspection.

Line 138 serializes our face mask classification model to disk.

4.4 PLOT LOSS VS ACCURACY


Our last step is to plot our accuracy and loss curves:

Once our plot is ready, Line 152 saves the figure to disk using the --plot filepath.

24
CHAPTER 5

TRAINING THE FACE DETECTION MODEL


5.1 TRAINING MODEL ON DATASET

5.2 ANALYZING CLASSIFICATION REPORT

5.3 ANALYZING LOSS V/S ACCURACY GRAPH

25
5.1 TRAINING MODEL ON DATASET

We trained our face mask detector using Keras, TensorFlow, and Deep Learning.

From there, we started a terminal, and executed the following command:

$ python train_mask_detector.py --dataset dataset


Here,

1. train_mask_detector.py is the python script that applies our model on the dataset we have acquired.
2. –dataset is the parameter where we specify the path of the directory that has the dataset used for
training in it.

The resulting output looks a like this:

Epoch 1/20
34/34 [==============================] - 30s 885ms/step - loss:
0.6431 - accuracy: 0.6676 - val_loss: 0.3696 - val_accuracy: 0.8242
Epoch 2/20
34/34 [==============================] - 29s 853ms/step - loss:
0.3507 - accuracy: 0.8567 - val_loss: 0.1964 - val_accuracy: 0.9375
Epoch 3/20
34/34 [==============================] - 27s 800ms/step - loss:
0.2792 - accuracy: 0.8820 - val_loss: 0.1383 - val_accuracy: 0.9531
Epoch 4/20
34/34 [==============================] - 28s 814ms/step - loss:
0.2196 - accuracy: 0.9148 - val_loss: 0.1306 - val_accuracy: 0.9492
Epoch 5/20
34/34 [==============================] - 27s 792ms/step - loss:
0.2006 - accuracy: 0.9213 - val_loss: 0.0863 - val_accuracy: 0.9688

Here TensorFlow shows us the loss, accuracy, validation loss, and validation accuracy values for
each epoch.

26
5.2 ANALYZING CLASSIFICATION REPORT

The scikit-learn generates a classification report for us based on how our classifier performs on the
training and validation dataset. The classification report generated for our particular model is given
below.

CLASSIFICATION REPORT:

precision recall f1-score

with_mask 0.99 1.00 0.99

without_mask 1.00 0.99 0.99

accuracy 0.99

macro avg 0.99 0.99 0.99

weighted avg 0.99 0.99 0.99

The report shows the main classification metrics precision, recall and f1-score on a per-class basis.
The metrics are calculated by using true and false positives, true and false negatives. Positive and
negative in this case are generic names for the predicted classes. There are four ways to check if
the predictions are right or wrong

TN / True Negative: when a case was negative and predicted negative

TP / True Positive: when a case was positive and predicted positive

FN / False Negative: when a case was positive but predicted negative

FP / False Positive: when a case was negative but predicted positive

Parameters of the report are explained below:

Precision – What percent of your predictions were correct?

Precision is the ability of a classifier not to label an instance positive that is actually negative. For
each class it is defined as the ratio of true positives to the sum of true and false positives.

TP – True Positives
FP – False Positives

Precision – Accuracy of positive predictions.


Precision = TP/(TP + FP)

27
Recall – What percent of the positive cases did you catch?

Recall is the ability of a classifier to find all positive instances. For each class it is defined as the
ratio of true positives to the sum of true positives and false negatives.

FN – False Negatives

Recall: Fraction of positives that were correctly identified.


Recall = TP/(TP+FN)

F1 score – What percent of positive predictions were correct?

The F1 score is a weighted harmonic mean of precision and recall such that the best score is 1.0 and
the worst is 0.0. Generally speaking, F1 scores are lower than accuracy measures as they embed
precision and recall into their computation. As a rule of thumb, the weighted average of F1 should
be used to compare classifier models, not global accuracy.

F1 Score = 2*(Recall * Precision) / (Recall + Precision)

5.3 ANALYZING LOSS V/S ACCURACY GRAPH

After we trained our model on the dataset that we had collected, matplotlib library helped us plot
the Loss v/s Accuracy graph.

A loss function is used to optimize a deep learning algorithm. The loss is calculated on training and
validation and its interpretation is based on how well the model is doing in these two sets. It is the
sum of errors made for each example in training or validation sets. Loss value implies how poorly
or well a model behaves after each iteration of optimization.

An accuracy metric is used to measure the algorithm’s performance in an interpretable way. The
accuracy of a model is usually determined after the model parameters and is calculated in the form
of a percentage. It is the measure of how accurate your model's prediction is compared to the true
data.

Example-
Suppose you have 1000 test samples and if your model is able to classify 990 of them correctly,
then the model’s accuracy will be 99.0%.

28
Our Graph:

Figure 6: COVID-19 face mask detector training accuracy/loss curves

It can be seen, we are obtaining ~99% accuracy on our test set.

29
Looking at Figure 6, we can see there are some signs of overfitting around epochs 11 and 15, with
the validation loss slightly higher than the training loss. But both the losses are somewhat equal
elsewhere, that tells us that our model is just right fitted.

Given these results, we were hopeful that our model will generalize well to images outside our
training and testing set.

30
CHAPTER 6

IMPLIMENTATION FOR IMAGES


6.1 IMPORTS AND ARGUMENT PARSER

6.2 LOAD FACE DETECTOR AND OUR MODEL

6.3 LOAD INPUT IMAGE FROM DISK

6.4 DETECT FACES

6.5 APPLY DETECTOR ON FACES

31
6.1 IMPORTS AND ARGUMENT PARSER

We open up the detect_mask_image.py file in our directory structure. The libraries we need to
import are given below:

from tensorflow.keras.applications.mobilenet_v2 import


preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.models import load_model
import numpy as np
import argparse
import cv2
import os

Our driver script requires three TensorFlow/Keras imports to (1) load our model and (2) pre-process
the input image.

OpenCV is required for display and image manipulations.

The next step is to parse command line arguments:

ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
help="path to input image")
ap.add_argument("-f", "--face", type=str,
default="face_detector",
help="path to face detector model directory")
ap.add_argument("-m", "--model", type=str,
default="mask_detector.model",
help="path to trained face mask detector model")
ap.add_argument("-c", "--confidence", type=float, default=0.5,
help="minimum probability to filter weak detections")
args = vars(ap.parse_args())

32
Our four command line arguments include:

--image: The path to the input image containing faces for inference

--face: The path to the face detector model directory (we need to localize faces prior to classifying
them)

--model: The path to the face mask detector model that we trained earlier in this tutorial

--confidence: An optional probability threshold can be set to override 50% to filter weak face
detections

6.2 LOAD FACE DETECTOR AND OUR MODEL

We first loaded both our face detector and face mask classifier models using the following piece of
code:

# load our serialized face detector model from disk


print("[INFO] loading face detector model...")
prototxtPath = os.path.sep.join([args["face"], "deploy.prototxt"])
weightsPath = os.path.sep.join([args["face"],
"res10_300x300_ssd_iter_140000.caffemodel"])
net = cv2.dnn.readNet(prototxtPath, weightsPath)
# load the face mask detector model from disk
print("[INFO] loading face mask detector model...")
model = load_model(args["model"])

Here we are first loading the prototxt file and then the caffemodel file using the path given as
argument --face.

These are the files needed to detect faces in images. This model is included with OpenCV. It is a
pretrained model that has been trained in Caffe.

Then we load the face mask detection model we trained on our dataset and serialized to the disk.
This is loaded from the path/filename passed in the argument --model.

33
6.3 LOAD INPUT IMAGE FROM DISK

With our deep learning models now in memory, our next step is to load and pre-process an input
image:

image = cv2.imread(args["image"])
orig = image.copy()
(h, w) = image.shape[:2]
# construct a blob from the image
blob = cv2.dnn.blobFromImage(image, 1.0, (300, 300),
(104.0, 177.0, 123.0))
# pass the blob through the network and obtain the face detections
print("[INFO] computing face detections...")
net.setInput(blob)
detections = net.forward()

Upon loading our –image from disk, we make a copy and grab frame dimensions for future scaling
and display purposes.

Pre-processing is handled by OpenCV’s blobFromImage function. This function can perform the
following processes:

Mean subtraction

Scaling

And optionally channel swapping

As shown in the parameters, we resize to 300×300 pixels and perform mean subtraction.

Then we perform face detection to localize where in the image all faces are. Once we know where
each face is predicted to be, we’ll ensure they meet the --confidence threshold before we extract
the faceROIs.

34
6.4 DETECT FACES

for i in range(0, detections.shape[2]):


# extract the confidence (i.e., probability) associated with
# the detection
confidence = detections[0, 0, i, 2]
# filter out weak detections by ensuring the confidence is
# greater than the minimum confidence
if confidence > args["confidence"]:
# compute the (x, y)-coordinates of the bounding box for
# the object
box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
(startX, startY, endX, endY) = box.astype("int")
# ensure the bounding boxes fall within the dimensions of
# the frame
(startX, startY) = (max(0, startX), max(0, startY))
(endX, endY) = (min(w - 1, endX), min(h - 1, endY))

Here, we loop over our detections and extract the confidence to measure against the --confidence
threshold.

We then compute bounding box value for a particular face and ensure that the box falls within the
boundaries of the image.

6.5 APPLY DETECTOR ON FACES

# extract the face ROI, convert it from BGR to RGB channel


# ordering, resize it to 224x224, and preprocess it
face = image[startY:endY, startX:endX]
face = cv2.cvtColor(face, cv2.COLOR_BGR2RGB)
35
face = cv2.resize(face, (224, 224))
face = img_to_array(face)
face = preprocess_input(face)
face = np.expand_dims(face, axis=0)
# pass the face through the model to determine if the face
# has a mask or not
(mask, withoutMask) = model.predict(face)[0]

In this block, we:

Extract the face ROI via NumPy slicing

Pre-process the ROI the same way we did during training

Perform mask detection to predict with_mask or without_mask

From here, we displayed the result:

# determine the class label and color we'll use to draw


# the bounding box and text
label = "Mask" if mask > withoutMask else "No Mask"
color = (0, 255, 0) if label == "Mask" else (0, 0, 255)
# include the probability in the label
label = "{}: {:.2f}%".format(label, max(mask, withoutMask) * 100)
# display the label and bounding box rectangle on the output
# frame
cv2.putText(image, label, (startX, startY - 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.45, color, 2)
cv2.rectangle(image, (startX, startY), (endX, endY), color, 2)
# show the output image
cv2.imshow("Output", image)
cv2.waitKey(0)

36
First, we determine the class label based on probabilities returned by the mask detector model and
assign an associated colour for the annotation. The colour will be “green” for with_mask and “red”
for without_mask.

We then draw the label text (including class and probability), as well as a bounding box rectangle
for the face, using OpenCV drawing functions.

Once all detections have been processed, we display the output image and wait for the user to press
a key.

Results on images:

Figure 7 with and without mask image Ritvik

37
Figure 8 with and without mask image Gautam

Figure 9 with and without mask image Abhishek

38
CHAPTER 7

IMPLIMENTATION IN REAL TIME


VIDEO STREAMS
7.1 Loading video and updating algorithm

7.2 Loading Face detection prediction logic function

7.3 Creating a loop over detections

7.4 Extracting Face ROI and preprocessing

7.5 executing face mask predictor

7.6 Defining command line arguments

7.7 running initializations and looping over Frames

7.8 Processing and displaying results

39
7.1) Loading video and updating algorithm
We first Open up the detect_mask_video.py file in our directory structure, and insert this code.

The algorithm for this script is the same, but it is pieced together in such a way to allow for
processing every frame of your webcam stream. Thus, the only difference when it comes to imports
is that we need a VideoStream class and time. Both of these will help us to work with the stream.
We’ll also take advantage of imutils for its aspect-aware resizing method.

7.2) Loading Face detection prediction logic function

This function detects faces and then applies our face mask classifier to each face ROI. Such a
function consolidates our code — it could even be moved to a separate Python file if you so choose.
Our detect_and_predict_mask function accepts three parameters:

1. frame: A frame from our stream

2. faceNet: The model used to detect where in the image faces are

3. maskNet: Our COVID-19 face mask classifier model

40
Inside, we construct a blob, detect faces, and initialize lists, two of which the function is set to
return. These lists include our faces (i.e., ROIs), locs (the face locations), and preds (the list of
mask/no mask predictions).

7.3) Creating a loop over detections


From here, we’ll loop over the face detections, Inside the loop, we filter out weak detections (Lines
34-38) and extract bounding boxes while ensuring bounding box coordinates do not fall outside the
bounds of the image (Lines 41-47).

7.4) Extracting Face ROI and preprocessing


After extracting face ROIs and pre-processing (Lines 51-56), we append the the face ROIs and
bounding boxes to their respective lists.

41
7.5) executing face mask predictor
The logic here is built for speed. First we ensure at least one face was detected (Line 63) — if not,
we’ll return empty preds.

Secondly, we are performing inference on our entire batch of faces in the frame so that our pipeline
is faster (Line 68). It wouldn’t make sense to write another loop to make predictions on each face
individually due to the overhead (especially if you are using a GPU that requires a lot of overhead
communication on your system bus). It is more efficient to perform predictions in batch. Line 72
returns our face bounding box locations and corresponding mask/not mask predictions to the caller.

7.6) Defining command line arguments


Our command line arguments include:

1. face: The path to the face detector directory

2. model: The path to our trained face mask classifier

3. confidence: The minimum probability threshold to filter weak face detections

With our imports, convenience function, and command line args ready to go

42
7.7) running initializations and looping over Frames

Here we have initialized our:

1. Face detector

2. COVID-19 face mask detector

3. Webcam video stream

We begin looping over frames on Line 103. Inside, we grab a frame from the stream and resize it
(Lines 106 and 107). From there, we put our convenience utility to use; Line 111 detects and
predicts whether people are wearing their masks or not.

43
7.8) Processing and displaying results

Inside our loop over the prediction results (beginning on Line 115), we:

1. Unpack a face bounding box and mask/not mask prediction (Lines 117 and 118)

2. Determine the label and color (Lines 122-126)

3. Annotate the label and face bounding box (Lines 130-132)

Finally, we display the results and perform cleanup:

After the frame is displayed, we capture key presses. If the user presses q (quit), we break out of
the loop and perform housekeeping.

44
CHAPTER 8

REAL TIME DETECTION AND


FURTHER IMPROVEMENTS
8.1 RUNNING FACE DETECTOR IN LIVESTREAM

8.2 SCOPE FOR ERROR

8.3 SCOPE FOR IMPROVEMENTS

45
8.1) Running face mask detector in livestream
We launch the mask detector in real-time video streams using the following command:

Figure 10 with and without mask livestream results

46
8.2) Scope for error
A few reasons why we cannot detect the face in the foreground is because:

1. It’s too obscured by the mask

2. Using Cloth, Bandana or a shirt to cover face would likely produce an error as it is not
present in our database.

Therefore, if a large portion of the face is occluded, our face detector will likely fail to detect the
face.

8.3) Scope for improvements


To improve our face mask detection model further

▸ We collect large amount of data with people wearing mask as well as not wearing any
mask , this could reduce the scope of error As it uses large amount of data for
comparison

▸ Secondly, We can also gather images of faces that may “confuse” our classifier into
thinking the person is wearing a mask when in fact they are not — potential examples
include shirts wrapped around faces, bandana over the mouth, etc.

47
CONCLUSION AND
FUTURE WORKS
The coronavirus COVID-19 pandemic is causing a global health crisis. Governments all over the
world are struggling to stand against this type of virus. The protection from infection caused by
COVID-19 is a necessary countermeasure, according to the World Health Organization (WHO). In
this paper, a hybrid model using deep and classical machine learning for face mask detection was
presented. The proposed model consisted of two parts. The first part was for the feature extraction
using Resnet50. Resnet50 is one of the popular models in deep transfer learning. While the second
part was for the detection process of face masks using classical machine learning algorithms. The
Support Vector Machine (SVM), decision trees, and ensemble algorithms were selected as
traditional machine learning for investigation.

Three datasets had experimented on, and different training and testing strategies had adopted
through this research. The plans include training on a specific dataset while testing over other
datasets to prove the efficiency of the proposed model. The presented works concluded that The
SVM classifier achieved the highest accuracy possible with the least time consumed in the training
process. The SVM classifier in RMFD achieved 99.64% testing accuracy. In SMFD, it gained
99.49%, while in LFW, it reached 100% testing accuracy. A comparative result had carried out with
related works. The proposed model super passed the associated works in terms of testing accuracy.
The major drawback is not tray most of classical machine learning methods to get lowest consume
time and highest accuracy. One of the possible future tasks is to use deeper transfer learning models
for feature extraction and use the neutrosophic domain as it shows promising potential in the
classification and detection problems.

To create our face mask detector, we trained a two-class model of people wearing masks and people
not wearing masks.

We fine-tuned MobileNetV2 on our mask/no mask dataset and obtained a classifier that is ~99%
accurate.

We then took this face mask classifier and applied it to both images and real-time video streams by:

 Detecting faces in images/video


 Extracting each individual face
 Applying our face mask classifier

Our face mask detector is accurate, and since we used the MobileNetV2 architecture, it’s also
computationally efficient, making it easier to deploy the model to embedded systems (Raspberry
Pi, Google Coral, Jetosn, Nano, etc.).

48
BIBLIOGRAPHY

JOURNALS

[1] S. Feng, C. Shen, N. Xia, W. Song, M. Fan, B.J. Cowlin Rational use of face masks in the
COVID-19 pandemic Lancet Respirat. Med., 8 (5) (2020), pp. 434-436, 10.1016/S2213-
2600(20)30134-X

[2] X. Liu, S. Zhang, COVID-19: Face masks and human-to-human transmission, Influenza
Other Respirat. Viruses, vol. n/a, no. n/a, doi: 10.1111/irv.12740.

[3] M. Loey, F. Smarandache, N.E.M. Khalifa. Within the lack of chest COVID-19 X-ray
dataset: a novel detection model based on GAN and deep transfer learning
Symmetry, 12 (4) (2020), p. 651

[4] M.S. Ejaz, M.R. Islam, M. Sifatullah, A. Sarker Implementation of principal component
analysis on masked and non-masked face recognition

[5] Jeong-Seon Park, You Hwa Oh, Sang Chul Ahn, and Seong-Whan Lee, Glasses removal from
facial image using recursive error compensation, IEEE Trans. Pattern Anal. Mach. Intell. 27 (5)
(2005) 805–811, doi: 10.1109/TPAMI.2005.103.

[6] D.M. Altmann, D.C. Douek, R.J. BoytonWhat policy makers need to know about COVID-
19 protective immunity

Lancet, 395 (10236) (2020), pp. 1527-1529,

49

You might also like