Professional Documents
Culture Documents
Abstract—Drowsy driving is one of the reasons for model. If the eyes remain closed for a certain threshold
automobile accidents. We propose a ‘Driver Drowsiness value, an alarm will ring, alerting the driver.
Detection System’ which can help to reduce automobile
accidents caused due to drowsy driving. We propose a II. RELATED WORK
Convolutional Neural Network (CNN) model that is capable of Drowsiness is a biological phase of the body indicating a
detecting drowsiness based on closing of the eyelids of the
narrow margin between a wakeful state and a sleeping state.
driver, and a future scope of a cost effective and low power
consuming stand alone system that can be installed inside the
There are behavioral signs which imply that a driver is dozy
vehicle, which basically consists of a Convolutional Neural such as yawning often, difficulty in keeping the eyes open,
Network (CNN) model interfaced with a Raspberry Pi and swinging the head forward-backward. The performance
microcontroller and a webcam to capture facial images of the of behavioral methods is constrained by varying
driver. Based on the time duration for which the eyes are illumination, camera distance and number of frames per
closed, a score is calculated. When this score crosses a second used to capture facial images of the driver. Use of
predetermined threshold , it prompts the software to play a infra-red (IR) cameras can solve light variation problems to
beeping alarm and alert the driver. The score remains zero for some extent.
the duration when the eyes remain open. When integrated
with a Raspberry Pi and powered with the vehicle’s battery,
To measure levels of drowsiness, cameras mounted in the
the system can easily be placed inside a vehicle and can act as a vehicle are used to monitor facial features like state of eye
constant monitor for a driver. (open/closed), head oscillations, blink rate of eye and
yawning frequency. Facial features are extracted from the
Keywords— Haar Cascade, CNN, Deep Learning, Raspberry camera feed. Then post processing is performed to establish
Pi the drowsiness level , with the use of ML classifiers such as
Support Vector Machines (SVM), Hidden Markov Models
I. INTRODUCTION (HMM), Random Forest (RF) or sometimes Convolutional
Driving while drowsy is a major issue with significant Neural Networks (CNN).
implications that must be tackled. Drowsy driving is Pauly et al. [1] presented a drowsiness detection method
responsible for one out of every four traffic collisions, and for eye tracking. It was based on the Haar cascade classifier.
one out of every 25 adult drivers has fallen asleep behind the They used a combination of Histogram of Oriented Gradients
wheel in the last 30 days. Because of the severity of the (HOG) features combined with an SVM classifier for
problem at hand and its implications, we propose a solution detecting eye blinks. After that, the PERCLOS (Percentage
which detects drowsy driving. of eyelid closure over the pupil over time) was determined
Here, we aim to develop a system that is capable of from it. If this value turned out more than the threshold, then
detecting closed and open eyes and sounding an alarm if they the person was categorized as drowsy. The accuracy
are closed for a certain amount of time. This will help in obtained was 91.6 % under normal lighting conditions.
reducing the number of accidents and increase the driver’s A. Punitha et al. [2] presented a real-time fatigue
safety while driving. monitoring system for drivers. They used Viola-Jones
The system will capture image frames from a continuous algorithm for the detection of driver's visage. A SVM
video stream taken by a camera as the input and then classifier was trained on 2500 images to classify the face as
establish a Region of Interest (ROI) around the eyes from a either normal or fatigued. The overall accuracy achieved by
person’s visage from the image. Using this ROI, a Haar the system was 93.5%.
Cascade classifier will be used to detect the eyes and feed B. N. Manu [3] described a method for drowsiness
this data as input to a Convolutional Neural Network (CNN). detection using three phases viz detecting visage features
The CNN model is trained using a dataset comprising images using Viola Jones algorithm, tracking of an eye and detecting
of the right and left eyes under varied lighting conditions. yawning. The features generated from each of the three
While the webcam captures a continuous video of the driver, phases are fed to a binary linear SVM classifier to classify
a loop in the code ensures that continual image frames of the the frames into fatigue and non fatigue states. Training was
driver’s face are fed as input to the Haar classifier, which in done on 100 templates. Overall accuracy obtained is 94.58%.
turn are classified into open or closed categories by the CNN
2
Authorized licensed use limited to: Zhejiang University. Downloaded on October 30,2023 at 07:17:32 UTC from IEEE Xplore. Restrictions apply.
ability to treat data as spatial. Due to better performance & C. Haar Cascade:
high accuracy in image recognition, we chose CNN over The technique of detecting objects using Haar cascade
ANN and RNN. classifiers was introduced by Michael Jones and Paul Viola.
A. Kernel: Many images (with and without faces) are used to train a
cascade function in this machine learning technique. The
classifier then can detect objects that are present in different
images and also extract features from it. The value of each
feature is computed by taking the difference of the
cumulative of pixels below the white rectangle and that of
the black rectangle.
D. CNN Architecture:
CNN is a neural network class which uses deep learning
algorithms that can distinguish images from one another by
assigning weights and biases to input image attributes in a
network of layers. CNN allows us to extract more accurate
representations of image content. Unlike traditional image
recognition, which requires us to define the image features,
Fig. 3. ReLU Activation Function CNN intakes the raw data from the image pixels, trains the
model using that data, and then performs automatic feature
Activation function of a node gives the output of that extraction. A CNN is a spatial data filtering feed forward
node when input or set of inputs is given. Linear activation neural network.
is the simplest, where no transform is applied at all.
CNN operates by extorting features from the images. It
Sigmoid, tanh, and ReLU are non linear activations. We
consists of the following:
chose ReLU because:
1. The input layer.
x No problem of vanishing gradient.
x ReLU takes value between max(0,x). So it has more 2. The output layer.
computational efficiency than sigmoid like 3. Hidden layers (the pooling layers, convolution layers +
functions. ReLU and fully connected layers).
x Computations of exponential operations is not
needed as in case of Sigmoids.
x Improves convergence performance of network.
As you can see, the ReLU (rectified linear activation unit) is
only half rectified (from bottom). When z is less than zero,
f(z) is zero, and when z is greater than or equal to zero, f(z)
is equal to z.
The ReLU activation function has a range of [0, ∞) and its
equation is:
Fig. 5. CNN process from input to output
f(x) = Ͳ݂ ݔݎ൏ Ͳ
൜
ݔݎ݂ݔ Ͳ We built a CNN model using Keras. Our model’s
architecture consists of two convolutional layers (thirty two
nodes with a kernel size of three), a third convolutional layer
(sixty four nodes with a kernel size of three), a fully
connected layer (one hundred and twenty eight nodes) and a
3
Authorized licensed use limited to: Zhejiang University. Downloaded on October 30,2023 at 07:17:32 UTC from IEEE Xplore. Restrictions apply.
final fully connected layer (two nodes). All the layers except classification. A dropout layer is often
the output layer (which uses Sigmoid activation function) used to prevent the algorithm from
make use of the ReLU activation function. We observed that overfitting. While training the data,
when layers are increased from 2 to 3, accuracy also dropouts ignore some activation maps,
increased. But when the fourth layer was added, there was no but consider them all while testing.
increase in observed accuracy. So we kept the number of Overfitting can be prevented by reducing
layers in the model as 3. neuron to neuron correlation.
4
Authorized licensed use limited to: Zhejiang University. Downloaded on October 30,2023 at 07:17:32 UTC from IEEE Xplore. Restrictions apply.
.
VI. RESULT
The CNN model developed here is applied to 3 datasets
mentioned above. The graphs of Accuracy Vs Epochs and
Loss Vs Epochs are plotted for each dataset. Table III
compares the accuracy values obtained for 3 datasets.
Fig. 7. Accuracy vs Epochs plots for training and
TABLE III. ACCURACY COMPARISON FOR DIFFERENT DATASETS validation datasets.
CEW
9:1 4380 466 98.14
5
Authorized licensed use limited to: Zhejiang University. Downloaded on October 30,2023 at 07:17:32 UTC from IEEE Xplore. Restrictions apply.
As shown in fig. 11 when closed eyes are detected, the
system will start calculating a score. If this score exceeds the
predetermined threshold then the alarm sounds. The score
value appears in the bottom left corner.
6
Authorized licensed use limited to: Zhejiang University. Downloaded on October 30,2023 at 07:17:32 UTC from IEEE Xplore. Restrictions apply.
[2] A. Punitha, M. K. Geetha, and A. Sivaprakash, “Driver fatigue [7] K. Dwivedi, K. Biswaranjan, and A. Sethi, “Drowsy driver detection
monitoring system based on eye state analysis,” 2014 Int. Conf. using representation learning,” Souvenir 2014 IEEE Int. Adv.
Circuits, Power Comput. Technol. ICCPCT 2014, pp. 1405–1408, Comput. Conf. IACC 2014, pp. 995–999, 2014.
2014. [8] F. Zhang, J. Su, L. Geng, and Z. Xiao, “Driver fatigue detection based
[3] B. N. Manu, “Facial features monitoring for real time drowsiness on eye state recognition,” Proc. - 2017 Int. Conf. Mach. Vis. Inf.
detection,” Proc. 2016 12th Int. Conf. Innov. Inf. Technol. IIT 2016, Technol. C. 2017, pp. 105–110, 2017.
pp. 78–81, 2017. [9] B. Reddy, Y. Kim, S. Yun, C. Seo, and J. Jang, “Real-time Driver
[4] G. Pan, L. Sun, Z. Wu, and S. Lao, “Eyeblink-based Anti-Spoofing in Drowsiness Detection for Embedded System Using Model
Face Recognition from a Generic Webcamera,” 11th IEEE ICCV, Rio Compression of Deep Neural Networks,” Comput. Vis. Pattern
Janeiro, Brazil, Oct., vol. 14, p. 20, 2007 Recognit. Work., 2017.
[5] Y. Sun, S. Zafeiriou, and M. Pantic, “A Hybrid System for On-line [10] R. Jabbar, M. Shinoy, M. Kharbeche, K. Al-Khalifa, M. Krichen and
Blink Detection,” Forty-Sixth Annu. Hawaii Int. Conf. Syst. Sci. 2013 K. Barkaoui. (2020) "Driver Drowsiness Detection Model Using
[6] S. Mehta, S. Dadhich, S. Gumber and A. Jadhav Bhatt. (2019) “Real- Convolutional Neural Networks Techniques for Android
Time Driver Drowsiness Detection System Using Eye Aspect Ratio Application,", IEEE.
and Eye Closure Ratio”, SSRN Electronic Journal
7
Authorized licensed use limited to: Zhejiang University. Downloaded on October 30,2023 at 07:17:32 UTC from IEEE Xplore. Restrictions apply.