You are on page 1of 8

Driver Drowsiness detection using Machine

Learning and Deep Learning


Raj Dua, Vishal Patel, Melvyn Mungroo, Simran Chadda
McMaster University, Hamilton, Ontario, Canada

II. Problem Statement


Abstract – This document is the team project report for
the SEP769 cyber physical system course on Driver According to several studies, weariness is responsible
Drowsiness Detection provided by McMaster for around 20% of all traffic accidents, and up to 50% on
University's School of Graduate Studies. It is based on particular highways. There are several ways for
data pre-processing and the methodology that will be detecting driver sleepiness, including steering pattern
used in the project. Different deep learning models will monitoring, vehicle location in lane monitoring, and
also be thoroughly analysed, with advantages and driver eye/face monitoring [1].
downsides listed.
Fatigue is one of the top five driving safety issues, along
Keywords— Drowsiness, Open, Closed, Yawn, no-yawn, with speeding, drug/alcohol use, failure to wear a seat
Deep Learning. belt, and driver distraction. To date, everything but
driving fatigue have been addressed by legislative
mandates such as speed limits, speed cameras, alcohol
I. Introduction limits, seat belt requirements, and phone usage
restrictions while driving. Drowsiness while driving has
Drowsiness, defined as a condition of drowsiness when
been proved in studies and strongly promoted to have
one needs to rest, can result in symptoms that have a
the same consequences as driving over the legal alcohol
significant influence on work performance, such as limit. As a result, monitoring tired driving is necessary to
reduced response time, intermittent loss of make driving safer and has emerged as a top priority in
consciousness, or microsleeps. In fact, chronic reducing road deaths.
weariness can impair performance at levels comparable
to those produced by drinking. These symptoms are Driver fatigue is clearly still a major risk for road safety.
Despite more than a decade of study on detection and
particularly dangerous when driving since they increase
prediction methods, this issue has yet to be properly
the likelihood of drivers missing road signs or exits,
solved.
drifting into other lanes, or even wrecking their vehicle
and causing an accident. A self-driving car, also known
as an autonomous car, is a vehicle that travels between
III. Dataset Description
locations without the assistance of a human operator by
utilising sensors, cameras, radar, and artificial For this project, the team members used the drowsiness
intelligence (AI). Driver sleepiness detection is one of images dataset from Kaggle. The dataset is into 4
the advanced car's functions. Driver drowsiness different sub-folders labelled as “Open’, “Closed”,
“Yawn”, “No-Yawn”. The project team used this dataset
detection is an automobile safety device that aids in the
to create machine learning and deep learning models to
prevention of accidents caused by sleepy driving.
classify whenever a driver a drowsy and unfit to drive.
IV. Task Description

Given the dataset and the project goals. The team


member concluded that this is a classification task.

The output of a classification task is a discrete valued


label while the output of a regression task is a real
valued label or a continuous quantity output. It can also
be said that the output of classification is categorical or
discrete while that for regression is continuous or
numerical. The aim of the project is to predict if a given
person image is drowsy or not drowsy, which as we can Closed Eyes
see are discrete values.

Classification also separates the data while regression


fits the data. In this case our major aim is to separate
the drowsy and not drowsy person, which in turn
suggests that we are looking at a classification problem.
Therefore, this is a classification task as there are two
possible classes that we can predict for a particular
image: DROWSY or NOT DROWSY. The block diagram
below illustrates the methodology for the project.

Open Eyes

V. Data visualization

As a first step of the project, the team members explore


the data and look up the number of images in each sub
folder. The result is shown below.

No-Yawn

Number of Images in each Sub-folder

The team members then view some random images in


each sub-folder; Closed, Open, No yawn and Yawn.

Yawn
VII. Training and testing dataset

A pie chart was also developed to visualize the


The team member split the data into a training set and a
distribution of the dataset.
validation test in a ratio of 90% to 10%. A summary can
be illustration below.
A total of 2610 images are being used for training and a
total of 290 images are being used for validation.

Distribution of dataset

VI. Data- Preprocessing

To prepare the dataset for modelling, the team member


first resizes the images for better features extraction. Splitting the data
And batches of 32 is created.
After splitting the data, 25 random images from the
training dataset are visualize along with their
corresponding class label.

Also, the data must be pre-processed before training


the network. Currently the pixel values are in the range
of 0 to 255. Those value are scaled to a range of 0 to 1
before feeding them to the neural network model. This
was done by dividing the value by 255. Both the training
and testing set were pre processed in the same way.
The team member developed a CNN model as
VIII. Deep Learning Model
illustrated below.
Convolutional Neural Network (CNN)

A convolutional neural network (CNN) is a model of


deep neural network. CNN) is created by multiplying
matrices, which offer outputs for the training process.
Convolution is the name given to this procedure. As a
result, this sort of neural network is known as a
convolutional neural network. Training is carried out by
giving a kernel size and the number of filters to be used.
A CNN can have several dimensions. When we speak
about neural networks, we generally think of matrix
multiplications, but that is not the case with CNN. It
implements a method known as Convolution.
Convolution is a mathematical expression on two
functions that yields a third function that explains how
the form of one is affected by the other.

Fig 1: A CNN model (Google Image source)

PROS:
➢ Image recognition challenges need a high level
of precision.
➢ Without any human intervention, it
automatically recognises the relevant
properties.
➢ Weight distribution.

CONS:
➢ CNN does not encode object location and
orientation.
➢ Inability to be spatially invariant in the face of
incoming data.
➢ A large amount of training data is necessary.

This model achieved 97.89% accuracy on the training set and


90.34% accuracy on the test set.
CNN Training loss vs Validation loss

CNN Training Accuracy vs Validation Accuracy

ResNet-152
A convolutional neural network with 152 layers is called
ResNet-152. The ImageNet database contains a
pretrained version of the network that has been trained
on more than a million photos. The pretrained network
can categorise photos into 1000 different object
categories, including several animals, a keyboard, a
mouse, and a pencil. The network has therefore
acquired rich feature representations for a variety of
images. The network accepts images with a resolution
of 224 by 224.

The ResNet architecture uses skip connections to solve


vanishing gradient problem for very deep network while
achieving excellent performance. The ResNet were
initially employed for image identification tasks, but Fig 2: ResNet-152 architecture (Google Image source)
they can now be utilised for other deep learning
projects that are not related to computer vision in order
to gain higher accuracy.
PROS: ResNet-152 Training Accuracy vs Validation Accuracy
➢ By enabling this additional short-cut conduit for
the gradient to flow through, ResNet's skip
connections address the issue of disappearing
gradient in deep neural networks.

➢ The top layer will perform at least as well as the


bottom layer, if not better, if the model is
allowed to learn the identity functions.

CONS:
➢ Error detection becomes challenging for deeper
networks.
➢ The learning could be quite ineffective if the
network is too small.

The project team used the 0.001 learning rate to train


the ResNet to classify the Drowsiness image data. This
model reached 99.66% accuracy on training set and IX. Models Comparison
98.97% for validation data set.
Firstly, the team compare the CNN and the ResNet-152
using the confusion matrix.

ResNet-152 Training loss vs Validation loss A Confusion matrix is a Mx M matrix used to evaluate
the performance of a classification model, where M is
the number of target classes. The matrix determines the
current target values to those estimated by the machine
learning algorithms. This provides a comprehensive
picture of how well the classification model is operating
and what sorts of errors it is producing.

• True Positive (TP)


The projected value corresponds to the actual value.
The actual result was positive, and the model predicted
that it would be positive.
• True Negative (TN)
The projected value corresponds to the actual value.
The actual number was negative, and the model
anticipated that it would be negative.
• False Positive (FP)
The anticipated value was incorrect. The model
projected a positive value, but the actual result was
negative. Also referred to as the Type 1 mistake.
• False Negative (FN)
The anticipated value was incorrect. The model
projected a negative result, while the actual value was
positive. Also referred to as the Type 2 mistake.
Illustration of a Confusion Matrix predicted a “no-yawn” image. There was a total of 10
images of that kind being misclassified.

While for the ResNet-152 model, the accuracy went off


when the input image was a “open” image, but the
model predicted a “closed” image. There was a total of
2 images of that kind being misclassified.

In addition to the confusion matrix, the project team


also explores the precision, recall and F1-score.

Precision is defined as the ratio of True Positives to all


Fig 3: Illustration of a Confusion Matrix (Google Image source) Positives. Precision which is also known as positive
predictive value is the percentage of relevant instances
found among the retrieved instances. It is the number
CNN Confusion Matrix
of positive class predictions that are genuinely positive
class predictions.

Precision = [ True Positive / (True positive + False


positive)]

Recall is the number of correct class label predictions


made from all correct positive cases in the dataset. It is
also referred as sensitivity, which is the percentage of
relevant cases that were found. The recall is a measure
of how well our model identifies True Positives.

Recall = [ True Positive / (True positive + False


Negative)]
ResNet-152 Confusion Matrix

The F1 score attempts to integrate the precision and


recall measures into a single rating. Simultaneously, the
F1 score has been intended to perform effectively on
unbalanced data. The harmonic mean of precision and
recall is used to get the F1 score.

F1 Score = 2* [ (Precision * Recall) / (Precision +


Recall)]

When comparing the two confusion matrices above, the


project team concludes that ResNet-152 is a superior
model than CNN since the false positive and false
negative values are lower in ResNet-152.

For the CNN model, the accuracy went off when the
input image was a “yawn” image, but the model
CNN – Precision, Recall, F1-Score
XI. Future Scope
Moving ahead, we can do a few things to improve the
outcomes and fine-tune the models.

To begin, we must account for any movement by the


person when an image is captured by including the
distance between the facial landmarks. Participants will
not be static on the screen in reality, and we believe
that sudden movements by participants may indicate
tiredness or waking up from micro-sleep.
ResNet-152 – Precision, Recall, F1-Score
Second, we wish to improve outcomes by updating
parameters in our more complicated models.

Third, create the model such that only the important


bits of the images are used for training. Focusing just on
the eyes and mouth and disregarding the background
would increase the model's accuracy.

Finally, we'd want to collect our own training data from


a bigger sample of people while integrating new
When comparing the table above, the project team distinguishing sleepiness signs such as abrupt head
concludes that ResNet-152 is a superior model than movement, hand movement, or even tracking eye
CNN since a better precision, recall and F1-score is movements.
obtained.
XII. GitHub repo
X. Conclusion
https://github.com/rajdua/sep_769_deep_learning/blo
Driver drowsiness is a major cause of thousands of b/main/cps_project___Group5.ipynb
traffic accidents worldwide. Driver drowsiness detection
is an automobile safety device that aids in the
prevention of accidents caused by drowsy driving. The
project's goal is to provide a solution for detecting
driver drowsiness utilising CNN and image processing.

In this project, two separate deep learning models were


utilised to identify drowsiness in the driver using four
different sets of images. Prediction performance is quite
promising, with both models predicting with an
accuracy of more than 90%. Other services, such as
traffic and weather information, might be added in the
future to improve performance. These elements can
also have an impact on the driver's mental state.
However, because eyelid and head movements are
difficult to record in a real automobile, the emphasis
should be on building a model based on data supplied
by sensors in the car.

You might also like