Professional Documents
Culture Documents
Untitled Document
Untitled Document
Corona Virus (coronavirus) According to the World Health Organization, the COVID-19
pandemic is producing a global health crisis, and the most effective prevention strategy is
wearing a face mask in public places (WHO). The COVID-19 epidemic compelled governments
all around the world to enact lockdowns in order to prevent virus transmission. According to
reports, wearing a facemask at work significantly minimises the chance of transmission. An
effective and cost-effective method of applying AI to establish a safe environment in a
manufacturing environment. For face mask detection, a hybrid model combining deep and
traditional machine learning will be shown.We'll utilise OpenCV to accomplish real-time face
detection from a live feed via our webcam using a face mask detection dataset that consists of
with mask and without mask photos. Using Python, OpenCV, Tensor Flow, and Keras, we will
use the dataset to create a COVID-19 face mask detector with computer vision. Our goal is to
use computer vision and deep learning to determine whether or not the individual in the
image/video stream is wearing a face mask.
Due of the worldwide COVID-19 corona virus outbreak, the wearing of face masks in public is
becoming more popular. People used to wear masks to protect their health from air pollution
before Covid-19. Other people hide their feelings from the world by masking their faces, while
others are self-conscious about their appearance. COVID-19 transmission is slowed by wearing
face masks, according to scientists. COVID19 (also known as the corona virus) is the most
recent epidemic virus to strike human health in the twentieth century. COVID-19 has been
declared a global pandemic by the World Health Organization (WHO) in 2020 due to its rapid
spread.
COVID-19 infected over five million people in 188 countries in less than six months. The virus
spreads through close contact, as well as in congested and overcrowded environments.
The corona virus outbreak has resulted in unprecedented levels of international scientific
cooperation. In many ways, artificial intelligence (AI) based on machine learning and deep
learning can aid in the fight against Covid-19. Researchers and physicians can use machine
learning to analyse large amounts of data to estimate the spread of COVID-19, serve as an
early warning system for possible pandemics, and classify vulnerable people. To combat and
anticipate new diseases, investment for emerging technology such as artificial intelligence, IoT,
big data, and machine learning is required.The AI's capacity is being used to confront the
Covid-19 pandemic by better understanding infection rates and tracing and swiftly detecting
infections. Many countries have regulations requiring people to wear face masks in public.
These guidelines and legislation were created in response to the exponential increase in
instances and deaths in a variety of domains. The method of monitoring huge groups of
individuals, on the other hand, is growing increasingly complicated. Anyone who is not wearing
a face mask is detected during the monitoring process.
A mask face detection technique based on computer vision and deep learning is presented
here. The proposed model can be used in conjunction with surveillance cameras to prevent
COVID-19 transmission by detecting people who aren't wearing face masks. With opencv,
tensor flow, and keras, the model combines deep learning and traditional machine learning
approaches. For feature extraction, we integrated deep transfer learning with three traditional
machine learning techniques. We conducted a comparison between them in order to identify the
most appropriate algorithm that produced the maximum accuracy while consuming the least
amount of time throughout the training and detection processes.
Related work:
A face is detected from an image that has various attributes in the face detection method. Facial
detection research necessitates expression recognition, face tracking, and position estimation,
according to [21]. The objective is to recognise the face in a single photograph. Face
identification is a tough task due to the fact that faces alter in size, shape, colour, and other
characteristics and are not immutable. It becomes a difficult task when an opaque image is
obstructed by something not facing the camera, and so on. Occlusive face detection, according
to the authors of [22], faces two key challenges: 1) the lack of large datasets containing both
masked and unmasked faces, and 2) the exclusion of facial expression in the covered
area.Several misplaced expressions can be recovered and the dominance of facial cues can be
minimised to a large amount using the locally linear embedding (LLE) technique and dictionaries
trained on an enormous pool of masked faces, synthetic banal faces. The size of the input
image is a stringent constraint for convolutional neural networks (CNNs) in computer vision,
according to research published in [11]. To overcome the inhibition, the common approach is to
reorganise the images before fitting them into the network.
The task's key problem is to correctly detect the face from the image and then determine
whether or not it has a mask on it. The proposed approach should be able to detect a face and
a mask in motion in order to execute surveillance activities.
DATASET:
For the sake of testing the current approach, two datasets were employed. Dataset 1 has 1376
photos, 690 of which feature people wearing face masks and the remaining 686 featuring those
who do not. Figure below shows a front face posture with a single face in the frame and the
same type of mask in a white hue.
Dataset 2 from Kaggle has 853 photos, each of which has its countenance clarified with or
without a mask. In the below Figure, certain face collections include head turn, tilt, and slant
with many faces in the frame, as well as various types of masks in various colours.
Incorporated Packages
A. TensorFlow
To pursue research, TensorFlow, an interface for expressing machine learning algorithms, is
used to implement ML systems into fabrication across a variety of computer science areas,
including sentiment analysis, voice recognition, geographic information extraction, computer
vision, text summarization, information retrieval, computational drug discovery, and flaw
detection. TensorFlow is used at the backend of the proposed model's Sequential CNN
architecture (which consists of numerous layers). It's also used in data processing to restructure
the data (picture).
B. Keras
Keras is a deep learning API written in Python, running on top of the machine learning
platform TensorFlow. It was developed with a focus on enabling fast experimentation.
Being able to go from idea to result as fast as possible is key to doing good research.
Keras is:
Simple -- but not simplistic. Keras reduces developer cognitive load to free you to focus
on the parts of the problem that really matter.
Flexible -- Keras adopts the principle of progressive disclosure of complexity: simple
workflows should be quick and easy, while arbitrarily advanced workflows should be
possible via a clear path that builds upon what you've already learned.
Powerful -- Keras provides industry-strength performance and scalability: it is used by
organizations and companies including NASA, YouTube, or Waymo.
C. OpenCV
OpenCV (Open Source Computer Vision Library), an open-source computer vision and machine
learning software library, is used to distinguish and recognise faces, recognise objects, group
movements in recordings, trace progressive modules, follow eye gestures, track camera
actions, remove red eyes from photos taken with flash, find comparative pictures from an image
database, perceive landscape and overlay it with enhanced reality, and so on . In order to resize
and colour convert data images, the proposed technique makes use of OpenCV's
characteristics.
A. Data Processing
Data preparation entails converting data from one format to another that is more
user-friendly, desirable, and meaningful. It can take any shape, including tables,
photographs, movies, graphs, and so on. These ordered data are part of an information
model or composition that represents relationships between various entities. Numpy and
OpenCV are used in the suggested method to deal with image and video data.
a) Data Visualization
The total number of images in the dataset is visualized in both categories – ‘with mask’ and
‘without mask’.
We use the function cv2.cvtColor(input_image, flag) for changing the color space. Here flag
determines the type of conversion [9]. In this case, the flag cv2.COLOR_BGR2GRAY is used
for gray conversion.
Deep CNNs require a fixed-size input image. Therefore we need a fixed common size for all
the images in the dataset. Using cv2.resize() the gray scale image is resized into 100 x 100.
c) Image Reshaping
The input during relegation of an image is a three-dimensional tensor, where each channel
has a prominent unique pixel. All the images must have identically tantamount size
corresponding to 3D feature tensor. However, neither images are customarily coextensive
nor their corresponding feature tensors. Most CNNs can only accept fine-tuned images. This
engenders several problems throughout data collection and implementation of model.
However, reconfiguring the input images before augmenting them into the network can help
to surmount this constraint. .
The images are normalized to converge the pixel range between 0 and 1. Then they are
converted to 4 dimensional arrays using data=np.reshape(data,(data.shape[0],
img_size,img_size,1)) where 1 indicates the Grayscale image. As, the final layer of the
neural network has 2 outputs – with mask and without mask i.e. it has categorical
representation, the data is converted to categorical labels.
B. Training of Model
CNN has become ascendant in miscellaneous computer vision tasks [12]. The current
method makes use of Sequential CNN.
The First Convolution layer is followed by Rectified Linear Unit (ReLU) and MaxPooling
layers. The Convolution layer learns from 200 filters. Kernel size is set to 3 x 3 which
specifies the height and width of the 2D convolution window. As the model should be aware
of the shape of the input expected, the first layer in the model needs to be provided with
information about input shape. Following layers can perform instinctive shape reckoning
[13]. In this case, input_shape is specified as data.shape[1:] which returns the dimensions
of the data array from index 1. Default padding is “valid” where the spatial dimensions are
sanctioned to truncate and the input volume is non-zero padded. The activation parameter
to the Conv2D class is set as “relu”. It represents an approximately linear function that
possesses all the assets of linear models that can easily be optimized with gradient-descent
methods. Considering the performance and generalization in deep learning, it is better
compared to other activation functions [14]. Max Pooling is used to reduce the spatial
dimensions of the output volume. Pool_size is set to 3 x 3 and the resulting output has a
shape (number of rows or columns) of: shape_of_output = (input_shape – pool_size + 1) /
strides), where strides has default value (1,1) [15].
As shown in fig, 4, the second Convolution layer has 100 filters and Kernel size is set to 3 x
3. It is followed by ReLu and MaxPooling layers. To insert the data into CNN, the long
vector of input is passed through a Flatten layer which transforms matrix of features into a
vector that can be fed into a fully connected neural network classifier. To reduce overfitting
a Dropout layer with a 50% chance of setting inputs to zero is added to the model. Then a
Dense layer of 64 neurons with a ReLu activation function is added. The final layer (Dense)
with two outputs for two categories uses the Softmax activation function.
The learning process needs to be configured first with the compile method [13]. Here
“adam” optimizer is used. categorical_crossentropy which is also known as multiclass log
loss is used as a loss function (the objective that the model tries to minimize). As the
problem is a classification problem, metrics is set to “accuracy”.
Show All
The main challenges faced by the method mainly comprise of varying angles and lack of
clarity. Indistinct moving faces in the video stream make it more difficult. However,
following the trajectories of several frames of the video helps to create a better decision –
“with mask” or “without mask”.
Conclusions
At the outset of this paper, we briefly discussed the work's motivation. The model's learning
and performance task was then demonstrated. The method has attained a reasonable level
of accuracy using basic machine learning tools and simplified techniques. It can be used for
a wide range of purposes. Given the Covid-19 situation, wearing a mask may become
mandatory in the near future. Many government agencies will require clients to wear masks
correctly in order to use their services. The implemented model will make a significant
contribution to the public health care system. It could be extended in the future to detect
whether or not a person is wearing the mask properly. The model should be enhanced
further to recognise whether the mask is virus-prone or not, i.e. whether it is surgical, N95,
or not.