Learn OpenCV by implementing a face mask

A brief introduction to facial detection with OpenCV.

Brendan Ferris

Nov 18, 2020·5 min read

Photo by visuals on Unsplash

In this article, we are going to implement a pre-trained TensorFlow face mask detection model
originally developed by Hussain Mujtaba. Some of the code and TensorFlow model training
information can be found in his article here.

Installing Packages
To begin, let’s go through some of the basics of OpenCV.

First, make a new directory for the project files. Inside of the directory, let’s make a virtual
environment to download the necessary packages. If you do not have virtualenv you should run
the first line of code, otherwise, skip the first line.

python3 -m pip install --user -U virtualenvpython3 -m virtualenv your_env

Now that our environment is created we can activate it by typing:

source your_env/bin/activate

Inside of our virtual environment, lets download the necessary packages:

pip install numpy

pip install pandaspip install tensorflow
pip install opencv-python

Now that we have all of our packages installed, let’s add some files and folders to our directory.

mkdir with_mask
mkdir without_mask
mkdir models

Haar CascadeClassifiers
First introduced as an object detection method in 2001 by Paul Viola and Michael Jones in this
paper, the Viola Jones Algorithm is one of the most efficient and computationally inexpensive
facial recognition algorithms. Despite being almost 20 years old, it is still widely used. In fact, if
you have ever used a digital camera that drew bounding boxes around faces in the viewfinder,
chances are it was utilizing this algorithm.

The algorithm is trained on thousands of positive (with a face) and negative images (without a
face), and uses Haar Features to calculate the difference between different regions of an image.
These calculations are made by subtracting pixel values from different regions within a specified
Let’s say we have an image. In order to reduce computation time we convert that image to
Souza, P., photographer. (2012) Official portrait of President Barack Obama in Oval Office /
Official White House photo by Pete Souza. , 2012. Dec. 6. [Photograph] Retrieved from the
Library of Congress,
Let’s look at how our computer is ‘seeing’ the corner of Obama’s mouth:
pixel values ranging from 0(black) to 255(white)

Image by author.
Adding and subtracting all of these regions would be too computationally expensive to do in real
time, and in order to solve this problem the concept of integral images was introduced. Each
pixel in an integral image is calculated by adding all pixels up and to the left of a specific point
in the original image. The integral image can the be used to quickly calculate the specific areas
of the input image, as opposed to having to make repeated sweeps over all pixels every pass over
the image.

Image by author.

In order to make detections, the each feature is tested against and input image. In the beginning,
the decision thresholds for the features a low, meaning that some faces will be detected, and
some other things will be detected as well. A typical Haar classifier will have around 6,000
features, and as goes further and further through the features, it gets more and more picky. So it
may let noise through when checking the first 10 features, but if the 11th feature rejects the
image, then the classifier also rejects the image. This allows the algorithm to be fast and
Implementing a face mask detector.
To see this in action we are going to implement a face mask detector on a public IP camera
stream. These are older security/surveillance cameras hooked up to the internet either without
passwords, or without changing the default passwords on devices with known security issues.
You can implement computer vision on some of these streams, because they transfer data in the
form of a .mjpg, which can be loaded into OpenCV with the following method:

video_capture = cv2.VideoCapture('')

Here is the stream we are going to use. It appears to be a doorbell camera facing towards the
street. We are going to write some functions that place screenshots from this video feed into the
folders with_mask and without_mask. When our program detects a person, it will take a
screenshot and place the resulting screenshot.jpg in the proper folder. It will also compile a .csv
with the relevant classification, time, prediction confidence (of the TensorFlow mask detection
model), and file path to the image of the observation.

In your ‘models’ folder, you will need two things: the Haar Classifier (which is an XML
document that you then load into OpenCV), and the trained TensorFlow model, which can be
found here.

We will then add an infinite while loop, that will repeatedly grab the images from the .mjpg
stream, and search for faces with the Haar Cascade Classifier. If the Haar Cascade matches a
face, the pre-trained TensorFlow model will predict whether the person is or isn’t wearing a
mask, and the cycle repeats.

Current Performance.
If you run the code from this article, you will see that it does an ok job of properly classifying
people with and without masks. But, the model often captures random areas of the screen as
either ‘mask’ or ‘no mask’ predictions, leading to a high number of false observations.

This is because the default Haar Cascade we are using was trained for frontal face positions.
There are other trained classifiers, including haarcascade_profileface.xml, which may offer
marginal improvements on performance. Note: the profileface.xml classifier linked above was
trained on left-side profiles.

