You are on page 1of 6

Real-time Face Mask Detection Using Machine

Learning/ Deep Feature-Based Classifiers For Face


Mask Recognition
Sweety Reddy Silky Goel Rahul Nijhawan
Computer Science Engineering Computer Science Engineering Computer Science Engineering
University of Petroleum and Energy University of Petroleum and Energy University of Petroleum and Energy
Studies Studies Studies
Dehradun, India Dehradun, India Dehradun, India
sweetyr2000@gmail.com silkygoel90@gmail.com rahulnijhawan@ddn.upes.ac.in
2021 IEEE Bombay Section Signature Conference (IBSSC) | 978-1-6654-1758-7/21/$31.00 ©2021 IEEE | DOI: 10.1109/IBSSC53889.2021.9673170

Abstract — In this research, we put forward an architecture recognizes the face from the image and then determines
that combines recent deep learning algorithms with geometry whether a mask is worn correctly, partially or not at all.
techniques to create robust models that can handle aspects like
detection, tracking and validation. This dissertation focuses on The machine learning algorithms with various pre-trained
creating a model that efficiently utilizes a mix of conventional deep learning architectures have been trained on 66% of our
machine learning and deep learning techniques to categorize dataset and analyzed by the remaining 34% i.e. our test
the facemasks effectively. Our model comprises two aspects in dataset. For each of the algorithms, we obtained Area Under
this hour of need: For dimensionality reduction through Curve (AUC), Recall, F1, Precision and Classification
feature extraction, the initial element is produced with Accuracy (CA) varied with popular models for feature
InceptionV3. The facemask classification procedure is
developed with the Logistic Regression (LR) method. Deep
extraction by visualizing the obtained results in the form of a
learning (DL) is used to efficiently train an architecture using a ROC Curve and Confusion Matrix. A principal challenge
dataset of photos of people's faces with and without and partial confronted by our team in the research was to be able to find
face masks to extract features. The retrieved traits are now and amalgamate the images to the category of partial masks;
passed into various classification algorithms namely Random this category consists of a mixture of mouth-chin, nose-
Forest, Logistic Regression, CNN, Support Vector Machine, mouth and only chin-covered images that help the model to
AdaBoost and K-Nearest Neighbors to appropriately classify produce a better and accurate analysis of whether a mask is
the masks position. Hence, we can project that employing worn correctly or not as shown in Fig. 1. Another major
Transfer Learning (TL) and Deep Learning together can problem was finding images of faces with masks having the
detect a properly or improperly worn face mask with high
accuracy. This system design stops transmitting this fatal virus
right kind of orientation.
by detecting individuals in urban areas who are not wearing
facemasks effectively.

Keywords—Face Mask Detection, Machine Learning, Deep


Learning, Classification, VGG16, VGG19

I. INTRODUCTION

SARS-CoV-2, a coronavirus discovered in 2019 has


triggered a worldwide pandemic with unthinkable
consequences in our daily lives, turning the world upside
down. As of July 13, 2021, 187,086,096 confirmed COVID-
19 cases worldwide with 4,042,921 fatalities [1]. The World
Health Organization (WHO) labeled it a public health
Fig. 1. Sample images of no mask, partial mask and full mask.
emergency of worldwide concern, the greatest degree of
alarm under international law. As time passed by and we The following is how the paper is structured: Section II
learned more about the SARS-CoV-2 virus, it became clear provides information on our study and a few references to
that face coverings, alongside other social distancing our work. Section III consists of a wide range of techniques
measures, were stemming the spread of the virus. Wearing such as K-Nearest Neighbor, Logistic Regression, Support
protective face masks is now part of the new normal since Vector Machine, Random Forest etc. and further explains the
we believe that prevention is better than cure. Even in the approach involved and three distinct feature extraction
near future, face masks are a part of the survival models: Inception V3, VGG-16 and VGG-19. The acquired
recommendation norms given to the public as COVID is outcomes from employing the above procedure are compared
believed to still be around us for quite some years before it through confusion matrix and other parameters such as AUC,
may disappear completely [2]. Consequently, identifying Recall, F1-score, Precision and Accuracy in Section IV.
face masks has become an important responsibility in Eventually, Section V summarizes our paper findings.
assisting global civilization.
II. RELATED WORK
In order to implement this, our research applies a
sophisticated approach. The suggested model accurately Facemask detection models which became a critical
model during the COVID-19 epidemic, have received a lot of

978-1-6654-1758-7/21/$31.00 ©2021 IEEE

Authorized licensed use limited to: SRM University Amaravathi. Downloaded on November 17,2022 at 07:15:13 UTC from IEEE Xplore. Restrictions apply.
attention in recent months. While personally checking busy places is cumbersome, researchers developed a real-
whether individuals are wearing facemasks in public and time automated model that can determine whether people are

Fig. 2. The architecture for the proposed model for mask detection using pre-trained CNN models.

using facemasks to help maintain social distancing at We utilized NN, LR, KNN, Support Vector Machine
crowded places by utilizing computer vision and Raspberry (SVM), K-Nearest Neighbors, AdaBoost and Random
Pi that can generate accurate feedback through the reports. Forest techniques employed in the image classification
After analysis, we figured that this study uses three datasets. process. For each of the three various feature extraction
The first dataset is the Simulated Masked Face Dataset models, LR has the most remarkable accuracy in our result
(SMFD), the second is Labeled Faces (LF) and the third is observation [5].
Real-World Masked Face Dataset (RWMFD) [2]. In SMFD,
the SVM learning algorithm obtained 99.49 percent In this statistical analysis, conducted for image
accuracy. LFW obtained 100 percent testing accuracy classification [6] performance measures like Recall, F1-
whereas RMFD scored 99.64 percent. Score, Precision and Accuracy have been calculated by
using the following equations (1), (2), (3) and (4).
Another significant contribution in this domain is the
efficient detector, Retina Face Mask Detector which is a face
Precision = TP/ (TP + FP) (1)
mask analyzer that detects whether individuals are wearing
their masks or not [3]. This architecture is indeed a single-
Recall = TP/ (TP + FN) (2)
stage detector that includes a novel ambient attention module
focused on face mask identification and a pyramid network
F-Measure = 2TP/ (2TP + FP + FN) (3)
feature that integrates semantic data with multiple feature
maps-this algorithmic technique eliminates the background
Accuracy = (TP + TN)/ (TP + TN + FP + FN) (4)
attention component artifacts with higher union crossovers
and weak confidences. The Retina Face Mask achieves state-
of-the-art results on facemask datasets that are 2.3 percent AUC Score is generally spoken about after defining the
and 1.5 percent higher than the expected result respectively, ROC curves as shown in Fig. 3, Fig. 4 and Fig. 5. This chart
and mask detection precision that is 11.0 percent and 5.9 visualizes the trade-off between True Positive Rate (TPR)
percent better than the standard result. and False Positive Rate (FPR) where its values are given by
equations (5) and (6):
III. METHODOLOGY True Positive Rates (TPR) = TP/ (TP + FN) (5)
We imported our dataset which was divided into images
with full masks, partial masks (mouth-chin / only chin) and False Positive Rates (FPR) = FP/ (FP + TN) (6)
normal faces without masks as shown in Fig. 1. The
following feature extraction models are then implemented on Here,
our dataset: Inception V3, VGG-16 and VGG-19. The TP = True Positive, TN = True Negative, FP = False
resultant data is sampled with the training and testing dataset Positive, FN = False Negative.
ratios remaining constant at 66:34. There are two
experiments conducted to establish a system for identifying CNN [7] has been mainly built for image classification
appropriately worn face masks. Firstly, a comparison of three tasks. The convolution layers, pooling and activation
different deep features and six machine learning algorithms function layers are layered to a certain depth one over
was carried out. Following that, a comparison of the best another in this architecture. Finally, when performing the
tuples against the average accuracies in the 5-fold cross- classification, an output layer is added to a fully linked layer
validation technique has been computed for all efficacy trials which then calculates the picture’s classification probabilities
(one partition for the training set and one for the test set) [4]. provided as an input to CNN which further uses a SoftMax
layer for mapping them onto specific classes. The

Authorized licensed use limited to: SRM University Amaravathi. Downloaded on November 17,2022 at 07:15:13 UTC from IEEE Xplore. Restrictions apply.
convolution operations are then performed by the findings, we experimented with three distinct feature
convolutional layers on these images by utilizing many extraction models [13]: Inception V3, VGG-16, and VGG-
filters with extremely tiny sizes relative to the provided 19. For all three distinct feature extraction models, we
photos (such as 3 X 3, 5 x 5, etc.). Training is used to learn observed our results through ROC curve and various
the weights associated with these filters. CNN-based network parameters like AUC, Accuracy, F1 score, Precision and
training takes a lengthy time, measured in hours or days. Recall [14].
Additionally, CNN models require a large amount of data • Inception V3 feature model
to be processed to provide informative feature maps. The
introduction of Transfer Learning (TL) [8,9] has effectively The following Fig. 3 demonstrates the ROC curve for
overcome this problem. Initial layer weights that are frozen Inception V3 model [15] and different machine learning
are used in these instances along with their pre-trained classifiers.
models. Now, our dataset is utilized to retrain the final few
layers to extract just the abstract features from those images
as the core traits are consistent across all of the others:
InceptionV3, Visual Geometry Group VGG16 and VGG19.
CNN models include a cutting-edge architecture that
employs this principle.

IV. PROPOSED APPROACH


Our study uses a mix of both conventional machine
learning and advanced deep learning techniques to detect the
facemasks. There are two components to the model: the first
element is built using deep convolutional, pre-trained neural
network models based on VGG16 ,VGG19 and Inception V3
architectures for object detection task to detect the presence
of masks correctly during feature extraction without causing Fig. 3. ROC curve representing Inception v3 and different
over-fitting, the second component is constructed using a classifiers.
number of machine learning methods namely AdaBoost, LR,
CNN, KNN and SVM. This technique achieves the TABLE I: MACHINE LEARNING ALGORITHMS UTILIZED IN
maximum accuracy of 96.3% using Inception V3 model and INCEPTION V3 MODEL.
Logistic Regression classifier. Performance Measures
Model
AUC Accuracy F1 Precision Recall
Proposed architecture, as shown in Fig. 2 utilizes a fine-
LR 99.4% 96.3% 95.3% 96.2% 96.3%
tuned feature extraction model, Inception V3 that had been NN 99.2% 94.5% 94.3% 94.7% 94.8%
pre-trained on the Normal Face without masks dataset. This
SVM 98.7% 94.8% 94.7% 94.7% 94.7%
image recognition algorithm has been demonstrated to KNN 97.0% 91.1% 90.8% 91.3% 90.7%
achieve better than 78.1 percent accuracy on the ImageNet RF 96.7% 91.3% 91.0% 90.6% 90.7%
dataset and is the culmination of numerous ideas explored by AdaBoost 87.7% 86.3% 86.3% 85.8% 85.5%
researchers over the years. Inception V3 convolutional neural
network architecture improves on previous versions by Label Through the observed outcomes in Table I and ROC curves
Smoothing, factorized 7 x 7 convolutions, and uses an shown in Fig. 3 the Logistic Regression technique provides
auxiliary classifier that is further used to transmit label us with the best accuracy of 96.3 % and AUC value of
information farther down the network [10]. This proposed 99.4%.
work employs Logistic Regression as the primary classifier • VGG16 feature model [15]
and as illustrated in Table I. Our images are categorized as
either normal face, entirely or partially masked.
V. EXPERIMENTAL RESULTS
Our proposed model was trained on 767 rigorously
selected images which were then classified into three
categories/classes: Normal face without a mask (127
pictures), Face with complete mask (449 images) and Face
with partial mask [mouth-chin / nose-mouth /only chin] (191
images) [11].Configuration: Core i3, 1.60GHz CPU, 4GB
RAM.

A. Models Used
We employed popular image classification classifiers
namely Logistic Regression, SVM [12], K-Nearest Fig. 4. ROC curve representing the VGG16 model and
Neighbors, Random Forest and AdaBoost. To get alternative several classifiers.

Authorized licensed use limited to: SRM University Amaravathi. Downloaded on November 17,2022 at 07:15:13 UTC from IEEE Xplore. Restrictions apply.
TABLE II: COMPARISON OF MACHINE LEARNING ALGORITHMS categorization model. They are commonly utilized because
UTILIZED IN VGG-16.
they provide a more accurate picture of a model's
Performance Measures performance than classification accuracy.
Model
AUC Accuracy F1 Precision Recall
LR 99.1% 94.1% 94.1% 94.1% 94.1% TABLE IV: CONFUSION MATRIX FOR LOGISTIC REGRESSION
NN 96.0% 89.0% 88.9% 89.1% 89.0% CLASSIFIER IN INCEPTION V3 MODEL
SVM 96.6% 88.7% 88.5% 88.6% 88.7%
KNN 87.4% 79.1% 78.0% 77.8% 79.1%
RF 88.0% 79.1% 78.0% 77.8% 79.2%
AdaBoost 75.4% 71.6% 71.8% 72.2% 71.6%

Through the observed outcomes in Table II and ROC


curve shown in fig. 4. The Logistic Regression technique
provides us with the best accuracy of 94.1% percent and
AUC of 99.1%.
• VGG19 feature model Model for feature extraction: Inception V3 Table IV
reveals that there are 449 mask-wearing faces, 7 of which are
incorrectly categorized as partially masked pictures. Three
pictures of exposed faces without masks are misclassified as
partially masked faces out of 127. At the same time, one of
the 191 somewhat masked faces is misclassified as wearing
no mask and 17 as that being fully masked [17, 18].
TABLE V: CONFUSION MATRIX FOR LOGISTIC REGRESSION
CLASSIFIER IN VGG16 MODEL

Fig. 5. ROC curve representing the VGG19 model and


Model for Feature Extraction: VGG-16 [19]. Table V
several classifiers
displays 449 faces with masks, 11 of which have been
TABLE III: COMPARISON OF MACHINE LEARNING ALGORITHMS misclassified as partially masked faces. Three pictures of
UTILIZED IN VGG-19 exposed faces without masks are misclassified as partially
Performance Measures masked faces out of 127. While 191 somewhat masked faces
Model are misclassified as faces without any mask, 6 are
AUC Accuracy F1 Precision Recall
LR 99.2% 94.6% 94.6% 94.7%
misclassified as fully masked [20].
94.7%
NN 97.6% 92.7% 92.6% 92.7% 92.7% TABLE VI: CONFUSION MATRIX FOR LOGISTIC REGRESSION
SVM 96.9% 90.6% 90.4% 91.0% 90.6% AND VGG19 MODEL
KNN 89.9% 84.0% 82.9% 83.7% 84.0%
RF 92.2% 84.6% 83.8% 84.4% 84.6%
AdaBoost 80.4% 71.7% 77.8% 77.9% 77.7%

Through the observed outcomes in Table III and ROC curve


shown in fig. 5. The Logistic Regression technique provides
us with an accuracy of 94.7 % and AUC of 99.2%.
Finally, based on our observations, we determined
that the Logistic Regression method is best suited for our Model for Feature Extraction: VGG-19. Table VI
model for classification, and Inception V3 model as feature displays 449 faces with masks, 11 of which have been
extraction model. The highest accuracy achieved was misclassified as partially masked pictures and 1 without a
96.1%. mask. Three pictures of exposed faces without masks are
VI. RESULT misclassified as partially masked faces, while one image is
misclassified as fully masked. While 191 partially masked
We applied the proposed hybrid model using previously faces are misclassified as no mask, 3 are misclassified as
described deep learning architectures and compared the totally masked [21].
outcomes of images with various methods and models for
VII. CONCLUSION
feature extraction, after which we accomplished the
following outcomes:
Confusion Matrix [16]: A confusion matrix is a tabular We presented a hybrid method in this paper that makes
breakdown of a classifier's accurate and erroneous use of deep learning architectures linked with the transfer
predictions. It is used to assess the effectiveness of a learning methodology associating INCEPTION V3 PLUS
LOGISTIC REGRESSION obtained the highest accuracy of

Authorized licensed use limited to: SRM University Amaravathi. Downloaded on November 17,2022 at 07:15:13 UTC from IEEE Xplore. Restrictions apply.
96.1 percent among the two other popular deep learning forest for snow cover mapping,” in Proc. of 2nd
architectures (VGG 16 and VGG 19) and several different International Conference on Computer Vision & Image
predictive analytics approaches that were evaluated. In Processing (pp. 279-287), Springer, Singapore.
assessed pictures, our implementation outperformed other [6] S. Gupta, A. Panwar, S. Goel, A. Mittal, R. Nijhawan and
state-of-the-art algorithms for the task of face mask A. K. Singh, "Classification of Lesions in Retinal Fundus
recognition. We analyzed pictures of three different mask Images for Diabetic Retinopathy Using Transfer
detection scenarios by extracting bottleneck characteristics Learning," International Conference on Information
where we utilized a dataset of 767 pictures where 66% of the Technology (ICIT), 2019, pp. 342-347.
images were utilized for training the model, while the [7] A. Chavda, J. Dsouza, S. Badgujar and A. Damani,
remaining 34% were utilized to assess the model. Since our "Multi-Stage CNN Architecture for Face Mask
approach differentiates a particular image into one of three Detection," 2021 6th International Conference for
groups, it eventually computes distinct categorization Convergence in Technology (I2CT), 2021, pp. 1-8.
probabilities for a particular image according to each class [8] G. J Chowdary, Punn, N.S., Sonbhadra, S.K. and
involved. Image is thus categorized into the class with the Agarwal, S., “Face mask detection using transfer learning
highest probability. This hybrid technique used to identify of InceptionV3,” In International Conference on Big Data
face masks here is very novel. In general, there aren't enough Analytics, 2020, pp. 81-90.
instances to train a deep architecture from the cradle to the [9] M. E. H. Chowdhury et al., "Can AI Help in Screening
grave. Our technique provides a realistic solution to employ Viral and COVID-19 Pneumonia?," IEEE Access, vol. 8,
CNN’s which eliminates their need to generate hand-crafted pp.132665-132676, 2020.
features.
[10] Tripathi, M., “Analysis of Convolutional Neural Network
In reality, this model can be employed in several
based Image Classification Techniques,” Journal of
instances to determine if a person is correctly wearing their
Innovative Image Processing (JIIP), vol. 3, no. 2, pp.100-
masks or not. There are numerous such applications for face 117, 2018.
mask detection for example: in airports, visitors' faces can be
promptly and effectively captured in the system at the entry. [11] A. Cabani, Hammoudi, K., Benhabiles, H. and Melkemi
Hospitals can use a face mask detection system to determine M., “MaskedFace-Net–A dataset of correctly/incorrectly
masked face images in the context of COVID-19,” Smart
whether or not their employees are wearing masks.
Health, pp.1-5, 2021.
Furthermore, for quarantined individuals who are forced to
wear a mask, the system can keep an eye out and identify [12] D. Varshni, K. Thakral, L. Agarwal, R. Nijhawan and A.
whether or not the mask is present. It may also be used in Mittal, "Pneumonia Detection Using CNN based Feature
offices to determine whether employees are adhering to Extraction," 2019 IEEE International Conference on
Electrical, Computer and Communication Technologies
workplace safety regulations. This framework can soon
(ICECCT), 2019, pp.1-7.
achieve great heights by expanding the collection of images
that can transform this CNN network to be more powerful in [13] R. Nijhawan, H. Sharma, H. Sahni and A. Batra, "A Deep
the near future. Learning Hybrid CNN Framework Approach for
Finally, this research opens up new avenues for future Vegetation Cover Mapping Using Deep Features," 2017
13th International Conference on Signal-Image
research. The suggested approach is not restricted to mask
Technology & Internet-Based Systems (SITIS), 2017, pp.
detection and may be implemented into any high-resolution 192-196.
video surveillance system. Secondly, this framework may be
extended to recognize facial landmarks while wearing a [14] S. S. Rawat, A. Bisht and R. Nijhawan, "A Classifier
facemask for biometric purposes. Approach using Deep Learning for Human Activity
Recognition," 2019 Fifth International Conference on
Image Information Processing (ICIIP), 2019, pp. 486-490.
REFERENCES
[15] Y. -C. Hsieh, C. -L. Chin, C. -S. Wei, I. -M. Chen, P. -Y.
[1] https://www.who.int/emergencies/diseases/novel- Yeh and R. -J. Tseng, "Combining VGG16, Mask R-CNN
coronavirus-2019/events-as-they-happen and Inception V3 to identify the benign and malignant of
breast microcalcification clusters," 2020 International
[2] K. Suresh, M. Palangappa and S. Bhuvan, "Face Mask Conference on Fuzzy Theory and Its Applications
Detection by using Optimistic Convolutional Neural (iFUZZY),2020, pp.1-4.
Network," 2021 6th International Conference on Inventive
Computation Technologies (ICICT), 2021, pp. 1084-1089, [16] S. Gupta, A. Panwar and D. Rawat, "A Comparison
doi: 10.1109/ICICT50816.2021.9358653. among Distinct Deep Learning Techniques for Real-
Time Testingof Covid-19 Infected Patient using Chest
[3] M. Jiang, Fan, X. and Yan, H., 2020. Retinamask: A face Radiography," 2021 2nd International Conference on
mask detector. arXiv preprint arXiv:2005.03950. Secure Cyber Computing and Communications
[4] R. Boutaba, Salahuddin, M.A., Limam, N., Ayoubi, S., (ICSCCC), 2021, pp.394-398.
Shahriar, N., Estrada-Solano, F. and Caicedo, O.M., 2018. [17] H. S. Chhabra, Srivastava, A.K. and Nijhawan, R., 2020.
"A comprehensive survey on machine learning for “A hybrid deep learning approach for automatic fish
networking: evolution, applications and research classification,” in Proceedings of ICETIT, 2019, pp. 427-
opportunities,” Journal of Internet Services and 436.
Applications, 9(1), pp.1-99, 2018.
[18] S. Gupta, A. Panwar and K. Mishra, "Skin Disease
[5] R. Nijhawan, Raman, B. and Das, J., “Meta-classifier Classification using Dermoscopy Images through Deep
approach with ANN, SVM, rotation forest, and random Feature Learning Models and Machine Learning

Authorized licensed use limited to: SRM University Amaravathi. Downloaded on November 17,2022 at 07:15:13 UTC from IEEE Xplore. Restrictions apply.
Classifiers," IEEE EUROCON 2021 - 19th International
Conference on Smart Technologies, 2021, pp. 170-174.
[19] S. Visa, Ramsay, B., Ralescu, A.L.,Van Der Knaap and
E., “Confusion matrix-based feature selection,” in Proc.
MAICS, 2011, pp.120-127.
[20] A. Panwar, R. Yadav, K. Mishra and S. Gupta, "Deep
Learning Techniques for the Real Time Detection of
Covid19 and Pneumonia using Chest Radiographs," IEEE
EUROCON 2021 - 19th International Conference on
smartTechnologies, 2021, pp.250-253.
[21] Gupta, S., Aggarwal, P., Chaubey, N. and Panwar, A.,
“Accurate prognosis of Covid-19 using CT scan images
with deep learning model and machine learning
classifiers,” Indian Journal of Radio & Space Physics
(IJRSP), vol. 50, no. 1, pp.19-24, 2021.

Authorized licensed use limited to: SRM University Amaravathi. Downloaded on November 17,2022 at 07:15:13 UTC from IEEE Xplore. Restrictions apply.

You might also like