You are on page 1of 5

2022 9th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI2022) - 6-7 October 2022

Comparative Study Of Convolutional Neural Network


And Haar Cascade Performance On Mask Detection
Systems Using Matlab
Faisal Ali Mahaputra Imelda Uli Vistalina Simanjuntak Yuliza
Department of Electrical Engineering Department of Electrical Engineering Department of Electrical Engineering
Universitas Mercu Buana Universitas Mercu Buana Universitas Mercu Buana
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
faisalamp97@gmail.com imelda.simanjuntak@mercubuana.ac.id yuliza@mercubuana.ac.id

Heryanto Agus Dendi Rochendi Lukman Medriavin Silalahi


Department of Military Electrical Oceanographic physics Department of Electrical Engineering
Engineering Lembaga Ilmu Pengetahuan Indonesia Universitas Mercu Buana
Universitas Pertahanan Jakarta, Indonesia Jakarta, Indonesia
Jawa Barat, Indonesia agus.dendi.rochendi@lipi.go.id lukman.medriavin@mercubuana.ac.id
heryanto@idu.ac.id

Abstract— Since the outbreak of the COVID-19 virus, In several studies of making automatic mask recognition
various technologies have developed as an alternative to systems, one of them is the Mask Usage Detection System
preventing the spread of the COVID-19 virus; one of them is Using the CNN method[5], with Viola-Jones-based system as
face mask detection. Many methods are used, such as an image detection method and CNN as a classification of
Convolutional Neural Network, Haar cascade classifier, and images. The research does not discuss the performance against
more. This paper discusses how the system will work with
other parameters besides distance and the performance
face mask detection and the performance result while
running the system against the parameters that can occur comparison between the CNN method and other methods such
during training or direct testing by comparing several as the Haar cascade classifier on the mask detection system.
different methods. The test results display in the form of a Based on some research that uses the same method[1], [5], [6],
line graph, and the Haar Cascade Classifier method will be [7], [8] this paper made to design a mask detection system
displayed in tabular form, with the highest accuracy in the using two different methods. Then it directly compares the
CNN method being 93%, while the Haar Cascade Classifier performance of both and comparing them. It aims to determine
method is 96% how far the performance comparison of the two methods in
Keywords— COVID-19, mask detection, Convolutional mask detection is. So, in future research and development,
Neural Network, HAAR Cascade, image processing. selecting the most suitable method can be considered.

I. INTRODUCTION II. METHODOLOGY


Coronavirus (COVID-19) is a type of the coronavirus A. Convolutional Neural Network
family. This pandemic has put the world in a state of lockdown For decades, the CNN has produced breakthroughs related
and caused a global recession. This virus is spread mainly due
to pattern recognition[9]. Starting from image processing
to splashes generated by sneezing and coughing[1]. Washing
to voice processing. CNN dramatically affects the number
hands, keeping a distance, and wearing masks are early
of ANN (Artificial Neural Network) parameters. CNN
prevention that may take to prevent the spread of this virus.
has many layers or so-called layers. CNN performs
Many researchers focus on the use of medical face masks for
well in machine learning, especially applications
the preventive spread of the virus; one of them, often found in
related to processing data in the form of images and
public places, uses automated mask detection technology.
has impressive results. Several supporting elements can
One method widely used for face mask detection is select in this CNN technology, including:
Convolution Neural Network (CNN) and Haar Cascade
Classifier. CNN were first proposed in the 1960s when Hubel 1. Stride
and Wiesel discovered their unique network structure, which The step size used to traverse the input vertically and
can effectively reduce the feedback complexity of neural horizontally is defined as a vector [a b] of two positive
networks[2] theoretically, this method has a high detection integers, where “a” is the vertical step size, and “b” is the
rate. With an accuracy rate up to 97%[3]. Meanwhile, Haar horizontal step size. When creating a layer, stride can
Cascade is an object detection method created by Paul Viola be specified as a scalar to use the same value for both step
and Michael Jones, commonly called Viola-Jones. This sizes. Fig. 1 describes a filter with size 3x3 with a stride
method has a speed rate of 15 times faster with an accuracy of is two processing an image.
95% at the time of release[4].

74
Fig. 1. Stride on CNN

Fig. 2. Padding on CNN

2. Padding
The convolution step has a weakness, the information
that may be within the limits of the image because when
the filter performs a scan, these limits are not visible. To
solve the not visible problem, then zero padding can use to
adjust the output size. Fig.2 explains the scanning of a 3x3
through the input, with the padding size is 1.

B. Haar Cascade Classifier


Fig. 3. CNN flowchart
The method founded by Paul Viola and Michael John
is also called the Viola and John. This method is often used
in image processing because it provides advantages, such
as high accuracy and fast computation[10]. The Haar
Cascade Classifier has a simple feature value and depends
only on the number of pixels in the grid, not the pixel value
of an image. This algorithm consists of haar features[4]. In
the first level classification, each sub-image classifies
using one feature; the result of this first classification is T
(True) for images that meet certain Haar features and F
(False) if not. This classification will leave 50% of the
sub-images to classify in the second stage. The results of
this classification require more specific requirements so
that more features use. Sub-images that pass the
classification reduce to about 2%[11]. The results of the
last classification T (True) are used for images that meet
the AdaBoost process and F (False) if they are out of sync.

C. Digital Image
The digital image is an array containing real or complex
values, represented by a row of bits by computer
processing. Digital image sampling represents by a two-
dimensional matrix (x, y) consisting of columns and
rows, where the intersection between columns and rows is
called the pixel or the smallest element of an image[12].
Within these array containing values on an image, the
image processing itself runs by counting the pixel value,
the number of pixels, and others. It depends on which
method is used. Fig. 4. Haar cascade classifier flowchart
III. RESULT AND DISCUSSION
CNN method of image acquisition obtains through a GUI
Fig. 3 and Fig. 4 represent you the flow system of how (graphical user interface) containing several Pushbuttons, one
the process of both method taking picture using camera functioning as a camera acquisition button. After the image is
could obtained the image to be used on both acquired, the next step taken is preprocessing by using the
different preprocessing.
Viola-Jones method to obtain a bounding box on the image,

75
cropping process based on the existing bounding box on the
image to remove objects or backgrounds no need in the
mask detection process, the bounding box size itself
determines to be a 256x256 pixels image. Meanwhile, if
the bounding box does not appear in this process, the re-
acquisition process is carried out until it is successful, and
the final result saves in the image database. An
explanation of the GUI will be in Fig.5
In the Haar Cascade Classifier method, the
image acquisition process carries out by the same GUI as
CNN but has a different process. After the camera
acquires images, there is no cropping process. Instead, the
image will be saved into the database and will be labeled
using the ROI label application as the preprocessing Fig. 6. ROI label
stage; the ROI label explanation will be in Fig.6.

A. Graphical User Interface (GUI)


A User Interface design to make it easier to design a mask
detection system, Axes1 displays cameras that turn on
using the Start Camera pushbutton, and Axes2 serve as a
medium to display the output of the preprocessing stage,
cropping process done by pressing the Face Detection
push button. Furthermore, there are three other
pushbuttons with the Capture function to capture images
that appear on Axes1, the Save button to save the cropped
image into the database, and the Restart GUI to re-open
the GUI, which reuse for the preprocessing stage. Fig. 7. Wrong position of wearing a mask

C. Dataset and Experimental environment


B. ROI Label
In this part, two different datasets contain a set of image
By importing all the images from a database, the image uses for training and testing. A camera took these images with
label uses an ROI label which can be accessed directly from proposed parameters and conditions such as:
Matlab using the Image Labeler; then, at this stage, the 1. Range
labeled image will be considered as Positive Instances data. The image stored in the database is captured by
This data can also be considered test data, while other image determining the distance from 30cm,60cm, and
data not assigned a label is considered harmful or validation 100cm.
data. After the ROI labeling process is complete, all these 2. Time
images need to export into the Matlab workspace as table An image stored in the database captured a time
data, and then the data will be able to be trained and tested difference between morning and midnight.
using Haar Cascade Classifier. 3. Lights
Image captured with different light conditions like
during daytime the sunlight will be the primary light
source, meanwhile in the night using the lamp lights
as the light source, and there is a condition in the noon
when the sky is cloudy.
4. The position of wearing a mask
We often find some people are using their face mask
not the way it has to be; the explanation will be in
Fig.7
The dataset used for the Convolutional Neural Network
contains 1400 different images of 5 individuals divided into
eight different datasets. Six datasets contain each range and
light conditions, and two contain non-face mask images for
different light conditions. One hundred seventy-five images
use for each dataset, 130 images use for train data, and 45
Fig. 5. GUI design images use for validation data. Meanwhile, a dataset for the
Haar cascade classifier contains 2000 different images of 5
individuals and 75 labeled images used as validation data.
Haar Cascade Classifier method needs more images dataset
than CNN method, this is required to reach more

76
CascadeStages in the system which each stage requires From table II the accuracy of CNN night condition for face
double samples from validation data to train or 150 images mask detection is on average 83.06%, with a peak
per stage. accuracy is 87.78%, and the most suitable range is 30cm
with good light conditions from the lamp, and the wrong
IV. DISCUSSION position of wearing the mask, with more than ten times
For the first proposed method, the first step is to determine wrong detection on each range. It seems to impact the
the convolutional layer and size of the input image. At this testing, and it signifies that the light conditions might affect
point, input images were set 256x256x3 (3 is for the lower accuracy in this method.
RGB picture) and maxEpoch set into 60 epochs. Based on table I and II, wrong detection shows the
Experiment done 10 times for both daytime and night amount of incorrect detection occur during each test with
condition to get more data for an accurate analysis. each range tested and accuracy shows the total accuracy of
the system obtained by dividing training data with number
TABLE I. CNN MASK DETECTION DAYTIME ACCURACY of element in the dataset array. Meanwhile, counting the
accuracy of wrong detection for each range obtained by
Wrong detection
calculating total dataset used minus total wrong detection
Accuracy occurred devided by total dataset used and then multiplied
Test
(%)
30 cm 60 cm 100 cm
by 100%.
1 4 1 3 92.5 Second method using Haar Cascade Classifier, using the
2 6 5 6 89.63 same input image size with CNN 256x256 pixels and labeled
using ROI label, and the maxStages set to 50 stages.
3 4 2 4 92.22
4 5 4 6 90.56 TABLE III. HAAR CASCADE CLASSIFIER DAYTIME ACCURACY
5 4 3 2 93.89
Wrong detection Accuracy
6 7 2 5 91.67 Test
(%)
7 2 5 7 91.67 30 cm 60 cm 100 cm
8 7 3 7 88.89 1 0 2 1 96.00%

9 6 1 5 92.78
TABLE IV. HAAR CASCADE CLASSIFIER NIGHT ACCURACY
10 8 4 5 90

53 30 50 Wrong detection Accuracy


Test
96.97 98.29 97.14 91.381 (%)
30 cm 60 cm 100 cm
1 2 5 4 85.33%
From table I, the accuracy of CNN daytime condition
for face mask detection is, on average, 91% and the
peak is 93.89%, and the most suitable range is 60cm From table III showing the haar cascade showing a
within good light conditions and the wrong position of result with 96% of accuracy during training and testing
wearing a mask only occur once during the test. with 30cm as the most optimal range for daytime
conditions and wrong position of wearing mask do not
detected during the process, and this result is consistent,
TABLE II. CNN MASK DETECTION NIGHT ACCURACY some test has taken but the result always showing the same.
Table IV shows the result with 85.33% accuracy and the
Wrong detection Accuracy
Test same condition as before. 30 cm is the most optimal range
(%)
with only two wrong detections. Furthermore, in the
30 cm 60 cm 100 cm
night conditions, were found two wrong detections on
1 2 15 14 83.33% 60cm and 100cm where the ROI label is not showing up,
2 20 6 13 77.78% and one of them is an image where the person, not the
3 13 2 86.67%
mask correctly, which means that Haar Cascade Classifier
3 can differentiate between a Face-mask and non-Mask
4 5 7 5 87.78% image.
5 10 14 12 77.22%
V. CONCLUSION
6 8 12 14 80.56%
With all tests from both methods with the same parameters,
8 12 8 84.44%
7 light conditions have a constant impact on the result.
8 5 12 7 86.11% Both methods show an excellent performance related to face
11 12 9 81.11% mask detection, but they also have their weakness and
9
strength. On the CNN method, there is a bias in the
10 3 11 12 85.56% train and testing process, where the systems are picked
75 114 96 randomly from 175 images to decide where is the 130
95.71 93.49 94.51 83.06%
images for train and 45 images for validating, thus
resulting in a different result for each test, meanwhile the

77
Haar Cascade itself giving a consistent result no [12] K. RD and A. N. Tompunu, “Pengolahan Citra Digital Untuk
matter how much the test is taken. Another weakness Mendeteksi Obyek Menggunakan Pengolahan Warna Model
of CNN is that this method uses convolutional Normalisasi Rgb,” Semin. Nas. Teknol. Inf. Komun. Terap. 2011
layers, these layers match the input image size, so this will (Semantik 2011), vol. 17, no. C, pp. 329–332, 2011.
need some time to understand and create suitable layers
settings. On the other hand, the CNN method with
fullyConnectedLayer feature can divide the data with each
parameter, train all of this data at once, and validate the result,
while Haar Cascade Classifier needs to validate all the
labeled images one by one.

VI. ACKNOWLEDGMENT
The first special thanks to Mercu Buana University which
has supported in domestic collaborative research and the our
partner to Lembaga Ilmu Pengetahuan Indonesia with
Universitas Pertahanan for their assistance and cooperation
during this research. Hopefully there will always be papers in
future research.
REFERENCES
[1] S. Meivel, K. Indira Devi, S. Uma Maheswari, and J. Vijaya Menaka,
“Real time data analysis of face mask detection and social distance
measurement using Matlab,” Mater. Today Proc., no. March, 2021,
doi: 10.1016/j.matpr.2020.12.1042.
[2] D. Wang, H. Yu, D. Wang, and G. Li, “Face recognition system based
on CNN,” Proc. - 2020 Int. Conf. Comput. Inf. Big Data Appl. CIBDA
2020, pp. 470–473, 2020, doi: 10.1109/CIBDA50819.2020.00111.
[3] Y. Su, S. Shan, X. Chen, and W. Gao, “Hierarchical ensemble of global
and local classifiers for face recognition,” IEEE Trans. Image Process.,
vol. 18, no. 8, pp. 1885–1896, 2009, doi: 10.1109/TIP.2009.2021737.
[4] L. Shi and J. H. Lv, “Face detection system based on AdaBoost
algorithm,” Appl. Mech. Mater., vol. 380–384, no. 4, pp. 3917–3920,
2013, doi: 10.4028/www.scientific.net/AMM.380-384.3917.
[5] F. A. Hermawati and R. A. Zai, “Sistem Deteksi Pemakaian Masker
Menggunakan Metode Viola-Jones dan Convolutional Neural
Networks (CNN),” Proceeding KONIK (Konferensi Nas. Ilmu
Komputer), vol. 5, pp. 182–187, 2021.
[6] N. Heryana, Rini Mayasari, and Kiki Ahmad Baihaqi, “Penerapan Haar
Cascade Classification Model Untuk Deteksi Wajah, Hidung, Mulut,
dan Mata Menggunakan Algoritma Viola-Jones,” Techno Xplore J.
Ilmu Komput. dan Teknol. Inf., vol. 5, no. 1, pp. 21–25, 2020, doi:
10.36805/technoxplore.v5i1.1064.
[7] G. Aprilian Anarki, K. Auliasari, and M. Orisa, “Penerapan Metode
Haar Cascade Pada Aplikasi Deteksi Masker,” JATI (Jurnal Mhs. Tek.
Inform., vol. 5, no. 1, pp. 179–186, 2021, doi: 10.36040/jati.v5i1.3214.
[8] M. S. Ejaz, M. R. Islam, M. Sifatullah, and A. Sarker, “Implementation
of Principal Component Analysis on Masked and Non-masked Face
Recognition,” 1st Int. Conf. Adv. Sci. Eng. Robot. Technol. 2019,
ICASERT 2019, vol. 2019, no. Icasert, pp. 1–5, 2019, doi:
10.1109/ICASERT.2019.8934543.
[9] S. Albawi, T. A. M. Mohammed, and S. Alzawi, “Layers of a
Convolutional Neural Network,” Ieee, p. 16, 2017.
[10] R. Yustiawati et al., “Analyzing of Different Features Using Haar
Cascade Classifier,” Proc. 2018 Int. Conf. Electr. Eng. Comput. Sci.
ICECOS 2018, vol. 17, pp. 129–134, 2019, doi:
10.1109/ICECOS.2018.8605266.
[11] M. Jones, “Robust Real-time Object Detection,” no. January 2001,
2014.

78

You might also like