You are on page 1of 7

2021 2nd Global Conference for Advancement in Technology (GCAT)

Bangalore, India. Oct 1-3, 2021

Driver Drowsiness Detection Using Deep Learning


Rupali Pawar Saloni Wamburkar Rutuja Deshmukh Nikita Awalkar
Department of Electronics Department of Electronics Department of Electronics Department of Electronics
and Telecommunication and Telecommunication and Telecommunication and Telecommunication
Cummins College of Cummins College of Cummins College of Cummins College of
Engineering Engineering Engineering Engineering
Pune, India Pune, India Pune, India Pune, India
rupali.pawar@ saloni.wamburkar@ rutuja.deshmukh@ nikita.awalkar@
2021 2nd Global Conference for Advancement in Technology (GCAT) | 978-1-6654-1836-2/21/$31.00 ©2021 IEEE | DOI: 10.1109/GCAT52182.2021.9587519

cumminscollege.in cumminscollege.in cumminscollege.in cumminscollege.in

Abstract—Drowsy driving is one of the reasons for model. If the eyes remain closed for a certain threshold
automobile accidents. We propose a ‘Driver Drowsiness value, an alarm will ring, alerting the driver.
Detection System’ which can help to reduce automobile
accidents caused due to drowsy driving. We propose a II. RELATED WORK
Convolutional Neural Network (CNN) model that is capable of Drowsiness is a biological phase of the body indicating a
detecting drowsiness based on closing of the eyelids of the
narrow margin between a wakeful state and a sleeping state.
driver, and a future scope of a cost effective and low power
consuming stand alone system that can be installed inside the
There are behavioral signs which imply that a driver is dozy
vehicle, which basically consists of a Convolutional Neural such as yawning often, difficulty in keeping the eyes open,
Network (CNN) model interfaced with a Raspberry Pi and swinging the head forward-backward. The performance
microcontroller and a webcam to capture facial images of the of behavioral methods is constrained by varying
driver. Based on the time duration for which the eyes are illumination, camera distance and number of frames per
closed, a score is calculated. When this score crosses a second used to capture facial images of the driver. Use of
predetermined threshold , it prompts the software to play a infra-red (IR) cameras can solve light variation problems to
beeping alarm and alert the driver. The score remains zero for some extent.
the duration when the eyes remain open. When integrated
with a Raspberry Pi and powered with the vehicle’s battery,
To measure levels of drowsiness, cameras mounted in the
the system can easily be placed inside a vehicle and can act as a vehicle are used to monitor facial features like state of eye
constant monitor for a driver. (open/closed), head oscillations, blink rate of eye and
yawning frequency. Facial features are extracted from the
Keywords— Haar Cascade, CNN, Deep Learning, Raspberry camera feed. Then post processing is performed to establish
Pi the drowsiness level , with the use of ML classifiers such as
Support Vector Machines (SVM), Hidden Markov Models
I. INTRODUCTION (HMM), Random Forest (RF) or sometimes Convolutional
Driving while drowsy is a major issue with significant Neural Networks (CNN).
implications that must be tackled. Drowsy driving is Pauly et al. [1] presented a drowsiness detection method
responsible for one out of every four traffic collisions, and for eye tracking. It was based on the Haar cascade classifier.
one out of every 25 adult drivers has fallen asleep behind the They used a combination of Histogram of Oriented Gradients
wheel in the last 30 days. Because of the severity of the (HOG) features combined with an SVM classifier for
problem at hand and its implications, we propose a solution detecting eye blinks. After that, the PERCLOS (Percentage
which detects drowsy driving. of eyelid closure over the pupil over time) was determined
Here, we aim to develop a system that is capable of from it. If this value turned out more than the threshold, then
detecting closed and open eyes and sounding an alarm if they the person was categorized as drowsy. The accuracy
are closed for a certain amount of time. This will help in obtained was 91.6 % under normal lighting conditions.
reducing the number of accidents and increase the driver’s A. Punitha et al. [2] presented a real-time fatigue
safety while driving. monitoring system for drivers. They used Viola-Jones
The system will capture image frames from a continuous algorithm for the detection of driver's visage. A SVM
video stream taken by a camera as the input and then classifier was trained on 2500 images to classify the face as
establish a Region of Interest (ROI) around the eyes from a either normal or fatigued. The overall accuracy achieved by
person’s visage from the image. Using this ROI, a Haar the system was 93.5%.
Cascade classifier will be used to detect the eyes and feed B. N. Manu [3] described a method for drowsiness
this data as input to a Convolutional Neural Network (CNN). detection using three phases viz detecting visage features
The CNN model is trained using a dataset comprising images using Viola Jones algorithm, tracking of an eye and detecting
of the right and left eyes under varied lighting conditions. yawning. The features generated from each of the three
While the webcam captures a continuous video of the driver, phases are fed to a binary linear SVM classifier to classify
a loop in the code ensures that continual image frames of the the frames into fatigue and non fatigue states. Training was
driver’s face are fed as input to the Haar classifier, which in done on 100 templates. Overall accuracy obtained is 94.58%.
turn are classified into open or closed categories by the CNN

978-1-6654-1836-2/21/$31.00 ©2021 IEEE 1


Authorized licensed use limited to: Zhejiang University. Downloaded on October 30,2023 at 07:17:32 UTC from IEEE Xplore. Restrictions apply.
G. Pan et al. [4] presented liveness detection against Here, the image will be converted into grayscale to detect
photograph spoofing based on eye blinks in real-time. The the face in it. The Haar Cascade classifier is used to detect
work was done on 80 video clips in the blinking video faces.
database from 20 individuals. They achieved accuracy of
one-eye rate of 88.8% and the average two-eye rate of 3 - Locating eyes from the specified region:
95.7%. The eyes are detected using the same procedure as that
Y. Sun et al. [5] introduced on-line blink detection based of detecting faces. The data for the right and left eyes is fed
on a combination of Hidden Markov Models (HMM) and to the CNN classifier.
SVM. This hybrid approach was used to model the temporal 4 - Categorizing closed or open eyes:
dynamics of eye blinks and improve the blink detection
accuracy to 90.99%. CNN classifier then predicts whether both the eyes are
closed or open.
S. Mehta et al. [6] proposed real-time driver drowsiness
detection system. The face landmarks identified by the 5 - Calculating drowsiness score :
system were fed to ML classifiers. The accuracy obtained is The time for which a person’s eyes are closed is detected
84% for the Random Forest (RF) classifier. and count is calculated. If the count exceeds a certain number
K. Dwivedi et al. [7] proposed a system to detect driver (threshold value equivalent to 5 seconds), an alarm is played.
drowsiness based on vision. Earlier methods have used hand
crafted facial features like blink rate, eye closure, yawning,
eyebrow shape. They proposed a CNN based algorithm
which automatically learns various latent facial features. The
driver state classification as drowsy or non-drowsy was done
by a softmax layer. Accuracy obtained was 78%.
F. Zhang et al. [8] presented driver fatigue detection by
recognizing eye state. They have used infrared videos for
detection of eye state. The approach was based on
convolution neural network (CNN) with accuracy 95.81%
for persons not wearing glasses and 91.45% for persons
wearing glasses.
B. Reddy et al. [9] introduced Driver Drowsiness
Detection for Embedded Systems. They proposed three
model types, viz the baseline 4-stream drowsiness detection
model, 2-stream drowsiness detection model and its
compressed version. The face recognition and alignment task
was performed by using Multi-Task Cascaded Convolutional
Networks (MTCNN). Accuracy achieved was 91.3%.
R. Jabbar et al. [10] proposed CNN based model for a
real-time driver drowsiness detection system for
embedded systems and Android devices. Accuracy
Fig. 1. Flow Chart of System
obtained was 83%.
We propose here, a CNN based model that is integrated Figure 1 explains the detailed flow chart of our system.
with a Raspberry Pi, web camera and alarm. This standalone
system can easily be installed inside a vehicle and will
operate on the vehicle's battery. It will continuously fetch IV. CONVOLUTIONAL NEURAL NETWORK
facial images of the driver through a webcam and sound an The preprocessing required for CNN is relatively less
alarm upon detecting drowsy condition. Lastly, since our than other classification algorithms. This is the prime
method is based on the CNN model trained on a large advantage of using CNN. In other words, the network
dataset, it results in faster detection and is more suitable for a automatically learns to optimize the filters (or kernels)
real time environment. through automated learning, in contrast to traditional
algorithms in which filters are handcrafted. Transition from
III. METHODOLOGY feature engineering to feature learning is pioneering
The process adopted here can be described in five stages. characteristics of CNN.
1 - Taking video as input: Also parameter sharing and dimensionality reduction are
two important features of CNN. This proves CNN better than
A webcam captures a live video of the person in the
a feed-forward network. Parameter sharing in CNN reduces
vehicle and we have used an infinite loop to capture images
the number of parameters which causes the number of
or frames from the video. Using OpenCV methods, each
computations also to reduce. So we can say that learning
frame will be read and the image will be assigned a variable.
from one portion of the image is also useful in another
2 - Detecting faces from the image and creating a ROI: portion of the image. Dimensionality reduction in CNN
causes computational power to be lowered. CNN is most
often applied to image processing problems mainly due to its

2
Authorized licensed use limited to: Zhejiang University. Downloaded on October 30,2023 at 07:17:32 UTC from IEEE Xplore. Restrictions apply.
ability to treat data as spatial. Due to better performance & C. Haar Cascade:
high accuracy in image recognition, we chose CNN over The technique of detecting objects using Haar cascade
ANN and RNN. classifiers was introduced by Michael Jones and Paul Viola.
A. Kernel: Many images (with and without faces) are used to train a
cascade function in this machine learning technique. The
classifier then can detect objects that are present in different
images and also extract features from it. The value of each
feature is computed by taking the difference of the
cumulative of pixels below the white rectangle and that of
the black rectangle.

Fig. 2. Kernel Operation

Figure 2 Shows a typical kernel or convolution mask.


Kernel is a central component in CNN. It is a small matrix of
numbers (kernel/filter) which is applied to an image, and
then the output is obtained by convolving the values from the
filter. The process is explained by equation (1). The terms in
the equation are: The input image: f, kernel: h, row index of
the output matrix: m, column index of the output matrix: n.
Gሾ݉ǡ ݊ሿ = ሺ݂ ‫݄ כ‬ሻሾ݉ǡ ݊ሿ = σ௝ σ௞ ݄ሾ݆ǡ ݇ሿ ݂ ሾ݉ െ ݆ǡ ݊ െ ݇ ሿ

B. Activation Function: Fig. 4. Haar Cascade Classifiers

D. CNN Architecture:
CNN is a neural network class which uses deep learning
algorithms that can distinguish images from one another by
assigning weights and biases to input image attributes in a
network of layers. CNN allows us to extract more accurate
representations of image content. Unlike traditional image
recognition, which requires us to define the image features,
Fig. 3. ReLU Activation Function CNN intakes the raw data from the image pixels, trains the
model using that data, and then performs automatic feature
Activation function of a node gives the output of that extraction. A CNN is a spatial data filtering feed forward
node when input or set of inputs is given. Linear activation neural network.
is the simplest, where no transform is applied at all.
CNN operates by extorting features from the images. It
Sigmoid, tanh, and ReLU are non linear activations. We
consists of the following:
chose ReLU because:
1. The input layer.
x No problem of vanishing gradient.
x ReLU takes value between max(0,x). So it has more 2. The output layer.
computational efficiency than sigmoid like 3. Hidden layers (the pooling layers, convolution layers +
functions. ReLU and fully connected layers).
x Computations of exponential operations is not
needed as in case of Sigmoids.
x Improves convergence performance of network.
As you can see, the ReLU (rectified linear activation unit) is
only half rectified (from bottom). When z is less than zero,
f(z) is zero, and when z is greater than or equal to zero, f(z)
is equal to z.
The ReLU activation function has a range of [0, ∞) and its
equation is:
Fig. 5. CNN process from input to output
f(x) = Ͳ݂‫ ݔݎ݋‬൏ Ͳ

‫ ݔݎ݋݂ݔ‬൒ Ͳ We built a CNN model using Keras. Our model’s
architecture consists of two convolutional layers (thirty two
nodes with a kernel size of three), a third convolutional layer
(sixty four nodes with a kernel size of three), a fully
connected layer (one hundred and twenty eight nodes) and a

3
Authorized licensed use limited to: Zhejiang University. Downloaded on October 30,2023 at 07:17:32 UTC from IEEE Xplore. Restrictions apply.
final fully connected layer (two nodes). All the layers except classification. A dropout layer is often
the output layer (which uses Sigmoid activation function) used to prevent the algorithm from
make use of the ReLU activation function. We observed that overfitting. While training the data,
when layers are increased from 2 to 3, accuracy also dropouts ignore some activation maps,
increased. But when the fourth layer was added, there was no but consider them all while testing.
increase in observed accuracy. So we kept the number of Overfitting can be prevented by reducing
layers in the model as 3. neuron to neuron correlation.

TABLE I. DESCRIPTION OF CNN LAYERS


6. Fully Connected Fully connected layer consists of
Layer (FC) weights, biases, and neurons. It binds
S. Layer/Activation Description
neurons from a layer to the next. It trains
No.
models to identify images into various
classes.
1. Convolution layer A convolution slides the filter through
images and computes the input and filter
7. Softmax/ Sigmoid It is the last layer of CNN. It is placed at
dot product pixel values. This enables
layer the end of the FC layer. Softmax is used
convolution to highlight the relevant
for multi-classification and Sigmoid is
features. This layer is made up of one or
used for two-class classification.
multiple kernels with varying weights
that extort input image features. The
layer functions by applying a filter to an TABLE II. SUMMARY OF OUR DL MODEL
array of image pixels and then generating
a convolved feature map. A Kernel is Sr. CNN Layer Output Shape
applied to an image to generate a No
convolved feature matrix, and repeating
this process yields a feature map that 1.
conv2d (Conv2D) (None, 22, 22, 32)
detects features in an image rather than
inspecting every single pixel value.
2.
max_pooling2d (None, 22, 22, 32)
2. ReLU Activation We can use the Rectified Linear Unit (MaxPooling2D)
Function after each convolutional and max pooling
operation (ReLU). The ReLU function
simulates our neuron activations in 3.
conv2d_1 (Conv2D) (None, 20, 20, 32)
response to a "big enough stimulus" to
introduce nonlinearity for values x>0 and 4.
returns 0 if the condition is not met. This max_pooling2d_1 (None, 20, 20, 32)
(MaxPooling2)
method has proven to be effective in
solving diminishing gradients. Very
small weights will remain 0 after the 5.
conv2d_2 (Conv2D) (None, 18, 18, 64)
ReLU activation function. The ReLU
functions as an activation function,
6.
ensuring nonlinearity as data moves max_pooling2d_2 (None, 18, 18, 64)
through each layer. Without it, the data (MaxPooling2
would lose its dimensionality, which we
are attempting to preserve. ReLU allows
7.
for faster data training. dropout (Dropout) (None, 18, 18, 64)

3. Pooling layer: On convolved features, the pooling layer 8.


flatten (Flatten) (None, 20736)
Max Pooling applies non linear down-sampling, which
is known as activation maps. It reduces 9.
dense (Dense) (None, 128)
the complexity involved in image
processing computations.
10.
Max Pooling: To minimise data size and dropout_1 (Dropout) (None, 128)
processing time, CNN uses max pooling
to replace output with a max description. 11.
dense_1 (Dense) (None, 4)
This helps us to recognise the features
that have the greatest effect while also
V. EYE DATASETS
reducing the chance of overfitting.
The experiment is performed on 3 datasets namely,
4. Pooling layer: It takes the values covered by a particular CEW, MRL and Kaggle. It consists of images of people's
Average Pooling pooling kernel and averages them. eyes in various lighting conditions. The images in the dataset
are classified as either ‘Open' or ‘Closed’, with no image
5. Flattening Layer Following the pooling, the output must preprocessing (noise filtering) done on them. The training
be transformed to a structure that enables images and testing images for every dataset are in the ratio of
a neural network to perform 9:1.

4
Authorized licensed use limited to: Zhejiang University. Downloaded on October 30,2023 at 07:17:32 UTC from IEEE Xplore. Restrictions apply.
.

Fig. 6. Sample Dataset images

VI. RESULT
The CNN model developed here is applied to 3 datasets
mentioned above. The graphs of Accuracy Vs Epochs and
Loss Vs Epochs are plotted for each dataset. Table III
compares the accuracy values obtained for 3 datasets.
Fig. 7. Accuracy vs Epochs plots for training and
TABLE III. ACCURACY COMPARISON FOR DIFFERENT DATASETS validation datasets.

Dataset Training: Number of Number of Accuracy


Type Validation training validation (%)
Image Ratio images images

CEW
9:1 4380 466 98.14

MRL 9:1 4527 532 98.62

Kaggle 9:1 43700 4855 99.18

We observe that as the size of the dataset increases, the


accuracy also increases.

5
Authorized licensed use limited to: Zhejiang University. Downloaded on October 30,2023 at 07:17:32 UTC from IEEE Xplore. Restrictions apply.
As shown in fig. 11 when closed eyes are detected, the
system will start calculating a score. If this score exceeds the
predetermined threshold then the alarm sounds. The score
value appears in the bottom left corner.

Fig.11. System output for Closed eyes condition.

Figure 12 shows Hardware Design of standalone model


for drowsiness detection.

Fig. 8. Loss vs Epochs plots for training and


validation datasets.

Figures 9, 10 and 11 depict the real time performance of


our system. Both the eyes are clearly detected in real time.

Fig. 12. Hardware Design of standalone model

Considering the compatibility of Raspberry Pi with our


model, we have proposed the above hardware design. The
Raspberry Pi would be connected to the car and the webcam
through USB. The alarm sound will be given as an output by
the audio output port (TV port) of the Raspberry Pi. This
model will help the user to place the camera flexibly on the
automobile’s dashboard so that it captures images properly.
This proposed stand alone system would be a cost effective
Fig.9. Detection of right and left eyes of the subjects. solution.
VII. CONCLUSION
We created a driver drowsiness detection system that is
able to detect a person’s eyes from a webcam captured video
stream and alert the driver with an alarm if the eyes are
closed for more than 5 seconds. We successfully simulated
the results using a laptop webcam and achieved an accuracy
of 98.64%.
REFERENCES
[1] L. Pauly and D. Sankar, “Detection of drowsiness based on HOG
features and SVM classifiers,” Proc. 2015 IEEE Int. Conf. Res.
Comput. Intell. Commun. Networks, ICRCICN 2015, pp. 181–186,
2015.
Fig.10. System output for Open eyes condition.

6
Authorized licensed use limited to: Zhejiang University. Downloaded on October 30,2023 at 07:17:32 UTC from IEEE Xplore. Restrictions apply.
[2] A. Punitha, M. K. Geetha, and A. Sivaprakash, “Driver fatigue [7] K. Dwivedi, K. Biswaranjan, and A. Sethi, “Drowsy driver detection
monitoring system based on eye state analysis,” 2014 Int. Conf. using representation learning,” Souvenir 2014 IEEE Int. Adv.
Circuits, Power Comput. Technol. ICCPCT 2014, pp. 1405–1408, Comput. Conf. IACC 2014, pp. 995–999, 2014.
2014. [8] F. Zhang, J. Su, L. Geng, and Z. Xiao, “Driver fatigue detection based
[3] B. N. Manu, “Facial features monitoring for real time drowsiness on eye state recognition,” Proc. - 2017 Int. Conf. Mach. Vis. Inf.
detection,” Proc. 2016 12th Int. Conf. Innov. Inf. Technol. IIT 2016, Technol. C. 2017, pp. 105–110, 2017.
pp. 78–81, 2017. [9] B. Reddy, Y. Kim, S. Yun, C. Seo, and J. Jang, “Real-time Driver
[4] G. Pan, L. Sun, Z. Wu, and S. Lao, “Eyeblink-based Anti-Spoofing in Drowsiness Detection for Embedded System Using Model
Face Recognition from a Generic Webcamera,” 11th IEEE ICCV, Rio Compression of Deep Neural Networks,” Comput. Vis. Pattern
Janeiro, Brazil, Oct., vol. 14, p. 20, 2007 Recognit. Work., 2017.
[5] Y. Sun, S. Zafeiriou, and M. Pantic, “A Hybrid System for On-line [10] R. Jabbar, M. Shinoy, M. Kharbeche, K. Al-Khalifa, M. Krichen and
Blink Detection,” Forty-Sixth Annu. Hawaii Int. Conf. Syst. Sci. 2013 K. Barkaoui. (2020) "Driver Drowsiness Detection Model Using
[6] S. Mehta, S. Dadhich, S. Gumber and A. Jadhav Bhatt. (2019) “Real- Convolutional Neural Networks Techniques for Android
Time Driver Drowsiness Detection System Using Eye Aspect Ratio Application,", IEEE.
and Eye Closure Ratio”, SSRN Electronic Journal

7
Authorized licensed use limited to: Zhejiang University. Downloaded on October 30,2023 at 07:17:32 UTC from IEEE Xplore. Restrictions apply.

You might also like