2001 05137

1
Deep learning-based Driver Distraction and

Drowsiness Detection
Maryam Hashemi, Alireza Mirrashid, Aliasghar Beheshti Shirazi
Abstract—This paper presents a novel approach and a new Driver behaviors can be extract to detect driver drowsiness
dataset for the problem of driver drowsiness and distraction [9]. Deng et al. proposed a new method for face detection with
detection. Lack of an available and accurate eye dataset strongly using landmark points and then track the face to find fatigue
feels in the area of eye closure detection. Therefore, a new
drivers. They checked signs like yawing, eye closure and blink-
arXiv:2001.05137v1 [eess.IV] 15 Jan 2020
comprehensive dataset is proposed, and a study on driver

distraction of the road is provided to supply safety for the ing and called their system DriCare [10]. Zhao et al. used both
drivers. A deep network is also designed in such a way that landmark points and texture for face situation classifier, They
two goals of real-time application, including high accuracy and considered different part of face as nose, mouth and eyes to
fastness, are considered simultaneously. The main purposes of evaluate role of every single part of face for fatigue detection.
this article are as follows: Estimation of driver head direction
for distraction detection, introduce a new comprehensive dataset At the end they just consider eye and mouth as a dominant sign
to detect eye closure, and also, presentation of three networks of fatigue [11]. Verma and colleagues followed [11] strategy
in which one of them is a fully designed deep neural network and used two VGG16 convolution neural networks for driver
(FD-DNN) and others use transfer learning with VGG16 and emotion detection. Input of the first network are the detected
VGG19 with extra designed layers (TL-VGG). The experimental region of interest (ROI) face images and in the second VGG16
results show the high accuracy and low computational complexity
of the estimations and the ability of the proposed networks on network, used facial landmark points as input. They involved
drowsiness detection. results of both networks for deciding [12]. Another work
concentrate on bus driver fatigue and drowsiness detection.
Index Terms—Drowsiness, Distraction, Neural Networks,
Transfer learning, Deep Neural Networks, Driver. based on bus driver position and window we need examine the
eye by oblique view, so they trained an oblique face detector,
at the next step they estimate percentage of eyelid closure
I. I NTRODUCTION (PERCLOS) and then abnormal driver could be detected [13].
In [14] a new dataset for driver drowsiness detection is intro-
CCORDING to published reports from the World Health
A Organization (WHO), trafc accidents are one of the top
10 cases that lead to death in the world [1]. The reports reveal
duced named The ULG Multimodality Drowsiness Database
(DROZY) and [15] used this dataset and Computer Vision
techniques to crop the face from every frame and classify it
that the first cause of such crashes are drivers. Therefore, (within a Deep Learning framework) in two classes: rested
the detection of driver distraction, and drowsiness could be a or sleep deprived and implemented in a low-cost Android
suitable methodology to prevent accidents. It also improves the device. Another dataset created by Abtahi et al. present two
performance of Advanced Driver Assistance Systems (ADAS) video datasets of drivers with various facial characteristics,
and Driver Monitoring System (DMS); as a result, road safety. like with and without glass and sunglass or different ethnicity
There are three main categories of real-time drowsiness for designing algorithms for yawning detection. They also
detectors: including Vehicle-based [2], Signal-based [3] and report 60% yawing detection for Camera on the dash on their
Facial feature-based [4]. Vehicle-based methods try to infer proposed method [16]. Kinect camera is another instrument for
drowsiness from vehicle situation and monitor the variations of drivers monitoring and identify driving tasks in a real vehicle.
steering wheel angle, acceleration, lateral position, etc. How- In [17] authors try to detect seven common tasks performed by
ever, these approaches detect driver drowsiness slowly. Signal- drivers, normal driving, left-, right-, and rear-mirror checking,
based methods infer drowsiness from psychophysiological mobile phone answering, texting using a mobile phone with
parameters. Several studies have been done during last years one or both hands, and the setup of in-vehicle video devices.
based on these methods [5]. The most important physiological Kinect camera consist of color and depth image information
signals that used and investigated are ElectroEncephaloGram from the driver inside the vehicle. They evaluated 42 features
(EEG) [6], ElectroOculoGram (EOG) [7], activities of au- given by kinect and predicted feature importance using random
tonomous nervous system from ElectroCardioGram (ECG) forests and choosed some of them. A feed-forward neural
[8], Skin Temperature (ST), Galvanic Skin Response (GSR) network (FFNN) is used as learning network in this work.
and also neuromuscular activities as ElectroMyoGram (EMG). They achieved to 80.7% accuracy with FFNN network for
This approach needs to consider invasive captors, which can their classification [17].
affect driving negatively. Facial feature-based methods can This paper aims to propose a drowsy driver and distraction
evaluate the target in real-time without invasive instruments. warning system which detects the distraction and real-time
However, they are inexpensive and more accessible than other drivers eye closeness situation, to determine the drowsiness by
methods. a defined criterion. By the following of the situation of head
2
and eyes, if a criterion violent be a sign of drowsiness, an alert head orientation is more than a specific range, it shows that the
will be sent to the driver early enough to avoid an accident. driver is not looking at the road. Such images are clustered as a
The rest of this paper is organized as follows: The related risky situation and send as an alert to the driver, and it does not
works are reviewed in Section 2. Section 3 introduces the consider for further processing. If the orientation of the face is
proposed networks and dataset. Section 4 presents experiments within a specific range, then the detected eye regions in these
to verify the effectiveness and robustness of the method and images are cropped and passed into the network for drowsiness
dataset. The conclusion is discussed in Section 5. detection. In the second step, the camera mounted on the
vehicle look at the drivers face, continuously. The authors
also consider six frames per second, which experimentally
II. RELATED WORK
is sufficient for drowsiness detection. After finding landmark
Eye closeness detection is a challenging task since there points, the Region of Interested (ROI) is cropped and sent to
is ambient factors that affect on it, as lighting condition, the networks as the last step. For this purpose, three networks
image resolution, driver position and, different shape of eyes, are considered. The first network is the fully designed deep
different threshold for closure of an eye and etc [17]. Kim et al. neural network, the second is a deep neural network with
proposed a fuzzy-based method for classifying eye openness transfer learning and it uses pretrained VGG16 network, which
and closure. Their approach uses the information of the I and K uses the low-level features on ImageNet dataset and high-
colors from the HSI and CMYK spaces. Then, the eye region is level features to learn. The third network is similar to the
binarized using the fuzzy logic system based on I and K inputs second network but it uses VGG19 with the same goal as
[18]. Ghoddoosian et al. [19] presented a large and public real- the second network. The results show high accuracy and
life dataset, which consists of 30 hours of video, with a range short computational time in the proposed methodology of this
of content from subtle signs of drowsiness to more obvious article.
ones. The core of the method is a Hierarchical Multi-scale
Long Short-Term Memory (HM- LSTM) network, which is
III. PROPOSED WORK
fed by detected blink features in sequence. To confronting
noise and scale changes Song et al. [20] proposed a new In this section, first, the distraction detection process is
feature descriptor named Multi-scale Histograms of Principal described. Then the authors peruse the data collection process
Oriented Gradients (MultiHPOG) and used feature extraction for driver drowsiness detection and review another dataset in
approach, they tried this method on two different dataset and the drowsiness detection field. Finally, the system framework
compare the accuracy and time complexity. Zhao et al. used is presented in details.
a deep learning method to classify eye states in facial images.
The proposed method combines two deep neural networks
A. Distraction Detection
to construct a deep integrated neural network (DINN). Each
network extract different feature and fusion data to make In the first step, the proposed method estimates the driver’s
best decision. Also a transfer learning strategy is applied to head position. The driver’s view direction can change by mov-
overcome dataset shortage. This method tested on 3 dataset ing the object concerning the camera. The authors consider a
and reported 97.20% accuracy on ZJU dataset with their DINN reference (looking ahead head) and then search for variation
network [21]. The drivers eye track and detect with frames from reference. The final scene of pose estimation is to find the
from a live camera is investigated by Sharma et al. [22]. direction of the head when the locations of n three-dimensional
Viola-Jones algorithm extracts the eye using Haar lters. The points on the object where n is an integer number and the
extraction is efficient with and without spectacles (translucent), corresponding two-dimensional projections are known in the
and the system can estimate the Region of Interest (RoI) image, and calibrated camera exists.
similar to find the eyes if no eyes are explicitly detected. A Head pose estimation (HPE) tries to estimate head pose and
trained CNN model using the LeNet architecture classifies the orientation toward a reference (in this purpose a camera). we
extracted eyes. The system raises an alarm if it detects any have two kind of movement, three rotation angles (roll, yaw,
anomalies after analyzing the data points. The proposed model pitch) and the translations (t) which bring six degree of free-
showed a detection accuracy rate of 97.80% versus 94.75% of dom. HPE methods use 2D and 3D points together. transition
the LBP algorithm. For distraction detection, [23] presented a and rotation should measure from a standard reference called
model using a gathered database form video footage obtained world coordinate system (WCS). This computation use the 2D
from the transit records. Each video is processed frame by images from that camera and find equivalent 3D coordinate
frame to extract necessary features for detecting the distraction [24].
cues through OpenFace. Binary classification is also used to We consider 6 landmark point for calculating head changes
assess whether the driver is distracted or not. K-fold cross in WCS and use the tip locations of the nose, the chin, the
validation approach is considered to determine the predictive left corner of the left eye, the right corner of the right eye, the
power of the model. It was found that the average of the left corner of the mouth, and the right corner of the mouth as
statistics based on a Confusion Matrix for accuracy criteria shown in Fig.1. WCS points are also presented in TABLE1.
is 91%. In cases that the location (U ,V , W ) of a 3D point P in WCS,
The proposed work of this article includes three phases. In the rotation R (a 3×3 matrix) and also translation t (a 3×1
the first step, the authors detect head pose, and if the change in vector) are known and predefined, the location (X, Y , Z) of
3
TABLE I
W ORLD C OORDINATE S YSTEM (WCS)
Tip of Chin Left Right Left Right

the corner corner corner corner
nose of the of the of the of the
left right mouth mouth
eye eye
(0.0, ( 0.0, (- ( (- (150.0,
0.0, -330.0, 225.0f, 225.0, 150.0, -150.0,
0.0) -65.0) 170.0f, 170.0, -150.0, -
- - - 125.0)
135.0) 135.0) 125.0)
Fig. 1. 3D and 2D points for head pose estimation.
the point P in the camera coordinate system can be calculated B. Lighting preprocesses
using Eq.1 and 2 with respect to the camera coordinates. In the next step, after confirmation of the algorithm for
    the drivers head position, the finding eye process is done in
X U
Y  = R  V  + t each frame and crop. To overcome the light condition, the
(1)
authors use a histogram equalizer to equalize eye contrast.
Z W
This approach makes the methodology more accuracy for eye
In expanded form, the above equation can be written as Eq.2: closeness detection. After eye equalizing and detection, the
  cropped eye is sent to the network. The proposed framework
    U is shown in Fig.2.
X r00 r01 r02 tx  
 Y  = r10 V
r11 r12 ty  
W  (2)
Z r20 r21 r22 tz
1
where ri,j and (tx ,ty , tz ) are unknown parameters. It is

worth to mention that the above equation is a linear system of
equations that need to solve.
The 3D point (X, Y , Z) is calculate from 2D point (x, y)
accordance with Eq.3:
    
x fx 0 cx X
y  = s  0 fy cy   Y  (3)
1 0 0 1 Z
where fx and fy are the focal lengths (distance from camera

to face) in the x and y directions, and (cx , cy ) is the optical
center and s is scale factor. For solving Eq.2, which is related
to an unknown factor s, Direct Linear Transform (DLT) is
applied.
After calculating R and t matrices the driver head transla-
tion and rotation will be found. In the cases that the system
is out of a specific range, it will send an alert. Otherwise, Fig. 2. Proposed framework for driver drowsiness and distraction detection
it goes to the next step to crop the eye. This abnormal
situation includes four unsafe conditions: Rotating head to
left, right, down, and up. The introduced algorithm by Viola
and Jonas [25] is used for face detection and Dlib’s facial C. Datasets
landmark detector to reach these goals in this article. Personal Two datasets are examined for this research. The first one
driver’s features are also utilized and calculated this range to is the ZJU dataset, and the second one is a mixture dataset of
personalize the system here by considering the driver width ZJU and the created database of the authors. The details of
face. Then the ranges are normalized by the driver’s face each are described in the following.
shape. In other words, the offset value to the shape of the 1) ZJU dataset: The first dataset is the ZJU gallery from
driver’s head is normalized and then compared with a specific the ZJU Eyeblink Database [26]. A set of images are gathered
range. from 80 video clips in the blinking video database. They are
4
recorded from 20 individuals, four clips for each individual: a variety of train data. The proposed methodology of this
one clip for frontal view without glasses, one clip with frontal article involves different lighting conditions, and all of the
view and wearing thin rim glasses, one clip for frontal view images pass from a histogram equalizer. Then, the images are
and black frame glasses, and the last clip with an upward view equalized for further processes. Some eye image samples of
without glasses. Also, images of the left and the right eyes are looking ahead and turned head positions are shown in Fig.4
collected separately. However, since the face is symmetric, the and Fig.5, respectively. Displayed images are after equalizing
authors use a symmetry-based approach in a further process. the brightness. The extended dataset is combined with ZJU to
It is worth to mention that the use of a subsampled, gray-scale create a more comprehensive database to train the model.
version of the images is enough [27]. In this article, only the
bigger eye is used, both right and left eyes are trained through
the algorithm. Some samples of the dataset are shown in Fig.3.
Dataset has 4841 images, including 2383 closed eyes and
2458 open eyes. All these images are geometrically normalized
to the images with 24*24 pixels.
Fig. 4. Display of straight head for close eye (the first row) and open eye
(the second row) from our extended dataset.
Fig. 5. Display of spinned head for close eye (the first row) and open eye
(the second row) from our extended dataset.
D. Network Architecture
The face is a symmetry shape, hence for drowsiness detec-
Fig. 3. Illustration of some closed left eye (the first row), closed right eye tion, only one eye needs to consider. Therefore, this approach
(the second row), open left eye (the third row) and open right eye (the fourth decreases the computational time for detection. For this goal,
row) samples from ZJU dataset [26].
the algorithm computes distances from the most right and
the most left point of the right and left eyes, and compares
2) Proposed dataset: In this article, the ZJU dataset is them to each other and select the bigger distance for cropping.
extended with datasets consists of 4,185 images (2106 open The intended distance is shown in Fig.6. The algorithm find
and 2079 close images) from 4 different persons. Each of 68 facial landmarks, measure the absolute distance between
these images is resized to 24*24 pixels. The authors develop point 37 and 40, 43 and 46, and then, select bigger distance
this dataset with a 720p HD webcam camera and collect as the selected eye. As mentioned earlier three different
these images from a video and use 6f/s rate. The strength neural networks are proposed in this article, and the result
point of this dataset is that it considers the different situation of these networks are compared with each other for both
of the eyes. The data are collected from different distances, datasets. The first neural network is a fully designed deep
rotates and angles, which increases the drivers freedom of neural network (FD-DNN). The architecture and layers of the
action that is a more real situation. Data are also subtended model are displayed in Fig.7. There is Two 2D Convolution
drivers with and without glasses. Data is divided into two and 3*3 filter size, Maxpooling and Dropout also use for
groups. The first category considers the data that the drivers reducing and increasing features. Maxpooling filter size is 2*2
head look ahead with different eye rotations. The second and Droupout ratio is 0.2 and 0.5. All activation functions
category applies the data that the driver turns his head but except the last one is Relu. For last activation layer we used
in the permissible range. The second category is a novel sigmoid function for binary classification output. The selected
approach for eye datasets. On the other hand, the network hyperparameter for each network are also mentioned in their
of this article is trained with both straight and oblique view, section. The supremacy of FD-DNN compare to the two
simultaneously. In fact, ZJU contains only Chinese ethnicity, other used networks is its non-complexity and quickness. This
but the data here contains another ethnicity, which expands network applied to ZJU and our extended dataset.
5
function and last layer is a softmax function as an output layer,

in both networks. This approach increases accuracy. The goal
of using transfer learning is to have a deeper network with
more accuracy especially when training dataset is small and
also to reduce training time of network.
E. Making decision
As the last step of detection, images are taken in 6fps. If the
network estimates the probability of closeness of eyes more
than 50% for more than 20 images, a danger is considered.
Experimental results showed that this percent should take less
than 20 images for a normal blinking. Therefore, the percents
of more than 50% are the sign of eye tiredness. After this
detection, an alarm will be sent to the driver for awakening.
IV. RESULTS AND DISCUSSION

Fig. 6. Intended eye distance (point 37 to 40 and 43 to 46) for crop ROI. In this research, we used a system with Intel Core i7-6700K
CPU@ 4.00GHz with 16GB RAM and NVIDIA GeForce
GTX 1070Ti GPU. The ZJU dataset [26] and our extended
dataset is applied to detect drowsiness. In each iteration, 80%
and 20% of dataset images are used for the train and test,
respectively.
A. Experiments on ZJU dataset and our extended dataset

1) Accuracy evaluation: The ZJU datasets [26], which is
used to test the model, consists of static images from the eye
in varying illumination conditions, different pupil directions,
various eye features, etc. The eye in the training process
is manually classified into two categories (open and closed
regardless it is right or left eye) by a group of people. This
ROI, which is selected by landmark points, considered as input
to the driver drowsiness detection system using the transfer
learning VGG16 network (TL-VGG16), VGG19 network (TL-
VGG19), and also the fully designed deep neural network. In
VGG16 and VGG19 as the last layer we used properties of the
softmax layer and use it for classification. The authors trained
three networks with extended dataset and tested its accuracy
using the ZJU dataset in order to have better comparison with
Fig. 7. Structure of fully designed deep neural network (FD-DNN). other papers that used ZJU dataset for both training and testing
purposes. The results of networks on ZJU dataset, which are
shown in TABLE2, indicated that the FD-DNN, TL-VGG16,
In the second and third networks, the concept of transfer and TL-VGG19 gives 96.39%, 95.45% and 94.96% accuracy
learning and pre-trained Convolutional Neural Network (CNN) on ROI images, respectively. As we can see FD-DNN has
is used to extract high-level features. For this purpose, VGG16 lower loss than two other network.
and VGG19 are selected as the proposed pretrained networks. The authors also perform an experiment on an extended
VGG16 is trained on ImageNet dataset, which consists of a dataset that contains images of ZJU and the proposed images.
large number of images and having 1000 classes. VGG19 These images have 4 categories, including: 1) Closed eye and
is a deeper convolutional neural network with more layers look ahead, 2) Open eye and look ahead, 3) Closed eye and
that is also trained on images from the ImageNet database. turned head and 4) Open eye and turned head. The number of
The network is 19 deep layers and can classify images into images in each category is recorded in TABLE3. The dataset
1000 categories. The network designed for images with size of is extracted to the three networks with considering different
224*224 pixels [28]. This networks learns low-level features optimizers, batch sizes, and epochs and test different hyper-
with the weight on ImageNet dataset and high level features parameter in which the learning rate is 0.01, the optimizer is
with three last fully connected layers. The last three layers are SGD, and the loss function is MSE. As mentioned previously,
replaced with fully connected layers. First layer is an input 80% of data is used for training, 10% for validation, and
layer with 7*7*512 shape, second is an activation with relu 10% for tests. It can be seen from The TABLE4 that for the
6
TABLE II TABLE V
R ESULT OF THREE PROPOSED NETWORK ON ZJU DATASET C OMPARISON OF ACCURACY AND AUC BETWEEN THE PROPOSED
METHOD AND OTHERS METHODS ON THE ZJU DATASET
Network Accuracy AUC Optimizer Loss Epoch

(MSE) Research Method Accuracy AUC
FD- 96.39% 99.3 SGD 0.029 50 (%)
DNN Pan [26] cas-Adaboost (W=3) 88.8 -
TL- 95.45% 99.0 SGD 0.1222 100 Song [20] HPOG+LTP+Gabor(s) 95.91 89.22
VGG16 Song [20] MultiHPOG+LTP+Gabor 96.83 99.27
TL- 94.96% 99.0 SGD 0.1285 50 Song [20] MultiHPOG+LTP+Gabor(s) 96.40 96.67
VGG19 Zhao [21] DNN 94.45 97.91
Zhao [21] DCNN 95.79 98.15
Zhao [21] DINN 97.20 99.29
TABLE III Ours FD-DNN 96.39 99.30
NUMBER OF ADDED IMAGED TO ZJU IN OUR EXTENDED DATASET Ours TL-VGG16 95.45 99.0
Ours TL-VGG19 95.00 99.0
Total Closed Open Closed Open
Number eye eye eye eye
and and and and
The results of the extended dataset and ZJU dataset are also
look look Turned turned
compared in Fig.8. The figure illustrates that the extended
ahead ahead head head
dataset in VGG16 and VGG19 lead to better accuracy and
4185 1521 1445 558 661
AUC, because it is bigger than ZJU and is more suitable for
those deep learning algorithms that need a large database.
extended training data, the FD-DNN, TL-VGG16, and TL-

VGG19 networks achieve to 95.01%, 97.18%, and 96.42%
accuracy, respectively. As mentioned the highest accuracy
belongs to TL-VGG 16. The exclusivity of the extended
dataset is in containing oblique view. It is really hard to
compare three proposed network’s result of expended dataset
with only ZJU dataset, because in our expended dataset we
use test and validation but in ZJU only have test. This dataset
is applied in different epochs for all three networks. Then, the
epoch number with the best accuracy is recorded.
As mentioned in the previous sections, faces are symmetric.
Therefore, only the process on one eye serves the purpose
of reducing the computation time and the chances of false
detections. It can be found from the results of this section that Fig. 8. Comparison of accuracy of our extended dataset and ZJU dataset with
different methods.
the accuracy of detection in the proposed extended dataset
with TL-VGG16 is higher than other networks. In TABLE5,
accuracy and AUC of each method is presented in details. 2) Complexity evaluation: For a better comparison, the
computational complexity should be considered as a signif-
icant parameter between the different methods. In realtime
TABLE IV applications, faster methods are more reliable. In Fig.9 the
R ESULT OF 3 MENTIONED NETWORK ON OUR EXTENDED DATASET necessary time for deciding on eye closure for each method on
the ZJU dataset is presented. The FD-DNN is almost 4 times
and 20 times faster than the models of Zhao et al. [21] and
Network Accuracy AUC Accuracy AUC Epoch Song et al. [20], respectively. The reason for this is that the FD-
of vali- of val- of test of test DNN uses fewer parameters than the networks in [21] and is
dation idation a less complicated network than [21]. In comparison with [20]
FD- 95.01% 99.4% 94.79% 99.1% 50 we use deep neural network instead of feature extraction that
DNN cause to better accuracy and higher speed. Specifications of
TL- 97.18% 99.8% 95.04% 99.3% 150 FD-DNN makes the decision faster, which helps it to privilege
VGG16 for real-time tasks while it has comparable AUC and in some
TL- 96.42% 99.4% 96.09% 99.3% 50 cases better accuracy performances. As a final conclusion,
VGG19 three criteria of accuracy, AUC, and prediction time between
the proposed methods of this article and the other works on
7
ZJU dataset are investigated, which can be seen in Fig.10. R EFERENCES

[1] World Health Organization The top ten causes of death. [Online]. Avail-
able: http://www.who.int/mediacentre/factsheets/fs310/en/
[2] J. Yu, Z. Chen, Y. Zhu, Y. Chen, L. Kong and M. Li, ”Fine-Grained
Abnormal Driving Behaviors Detection and Identification with Smart-
phones,” in IEEE Transactions on Mobile Computing, vol. 16, no. 8, pp.
2198-2212, 1 Aug. 2017.
[3] U. Budak, V. Bajaj, Y. Akbulut, O. Atila and A. Sengur, ”An Effective
Hybrid Model for EEG-Based Drowsiness Detection,” in IEEE Sensors
Journal, vol. 19, no. 17, pp. 7624-7631, 1 Sept.1, 2019.
[4] M. Ngxande, J.-R. Tapamo, and M. Burke. Driver drowsiness detection
using behavioral measures and machine learning techniques: A review
of state-of-art techniques. In Pat- tern Recognition Association of South
Africa and Robotics and Mechatronics (PRASA-RobMech), 2017, pages
156 161. IEEE, 2017.
[5] Sahayadhas, A., Sundaraj, K., Murugappan, M.: Detecting driver drowsi-
ness based on sensors: a review, Sensors (Basel), 2012, 12, (12), pp.
16937 16953
[6] Kim, J.Y., Jeong, C.H., Jung, M.J., et al.: Highly reliable driving work-
load analysis using driver electroencephalogram (EEG) activities during
driving, Int. J. Autom. Technol., 2013, 14, (6), pp. 965970
Fig. 9. Comparison of time cost for network detection in different methods [7] W. Zheng et al., ”Vigilance Estimation Using a Wearable EOG Device
on ZJU dataset [26]. in Real Driving Environment,” in IEEE Transactions on Intelligent
Transportation Systems.
[8] Lee, Hyeonjeong, Jaewon Lee, and Miyoung Shin. ”Using Wearable
ECG/PPG Sensors for Driver Drowsiness Detection Based on Distin-
guishable Pattern of Recurrence Plots.” Electronics 8.2 (2019): 192.
[9] Wang, M., Jeong, N., Kim, K., et al.: Drowsy behavior detection based
on driving information, Int. J. Autom. Technol., 2016, 17, (1), pp. 165173
[10] W. Deng and R. Wu, ”Real-Time Driver-Drowsiness Detection System
Using Facial Features,” in IEEE Access, vol. 7, pp. 118727-118738, 2019.
[11] L. Zhao, Z. Wang, X. Wang and Q. Liu, ”Driver drowsiness detection
using facial dynamic fusion information and a DBN,” in IET Intelligent
Transport Systems, vol. 12, no. 2, pp. 127-133, 3 2018.
[12] B. Verma and A. Choudhary, ”Deep Learning Based Real-Time Driver
Emotion Monitoring,” 2018 IEEE International Conference on Vehicular
Electronics and Safety (ICVES), Madrid, 2018, pp. 1-6.
[13] B. Mandal, L. Li, G. S. Wang and J. Lin, ”Towards Detection of Bus
Driver Fatigue Based on Robust Visual Analysis of Eye State,” in IEEE
Transactions on Intelligent Transportation Systems, vol. 18, no. 3, pp.
545-557, March 2017.
[14] Q. Massoz, T. Langohr, C. Franois and J. G. Verly, ”The ULg multi-
modality drowsiness database (called DROZY) and examples of use,”
Fig. 10. Comparison of criteria between different methods on ZJU dataset. 2016 IEEE Winter Conference on Applications of Computer Vision
(WACV), Lake Placid, NY, 2016.
[15] Garca-Garca, Miguel & Caplier, Alice & Rombaut, Michle. (2018).
Sleep Deprivation Detection for Real-Time Driver Monitoring Using
V. C ONCLUSION AND FUTURE WORKS Deep Learning.
[16] S. Abtahi, M. Omidyeganeh, S. Shirmohammadi, and B. Hariri, YawDD:
The drowsiness plays a vital role in safe driving and A Yawning Detection Dataset, Proc. ACM Multimedia Systems, Singa-
therefore, this paper proposed a dataset for driver drowsiness pore, March 19 -21 2014.
[17] Y. Xing et al., ”Identification and Analysis of Driver Postures for In-
detection and studied several networks to achieve better accu- Vehicle Driving Activities and Secondary Tasks Recognition,” in IEEE
racy and less time needed for drowsiness detection based on Transactions on Computational Social Systems, vol. 5, no. 1, pp. 95-108,
eye states. The authors extended ZJU dataset to become more March 2018.
[18] Kim, K.W., Lee, W.O., Kim, Y.G., et al.: Segmentation method of eye
robust than the available datasets with more positions. Also, a region based on fuzzy logic system for classifying open and closed eyes,
model-based estimation is proposed to detect risky head turns Opt. Eng., 2015, 54, (3), p. 033103
for both distraction estimation and reducing the false close [19] Ghoddoosian, Reza, Marnim Galib, and Vassilis Athitsos. ”A Realistic
Dataset and Baseline Temporal Model for Early Drowsiness Detection.”
eye detection. Then, landmark points are applied to access Proceedings of the IEEE Conference on Computer Vision and Pattern
ROI and use them as the input to three different networks to Recognition Workshops. 2019.
achieve better results. The experimental results indicated that [20] Song, Fengyi & Tan, Xiaoyang & Liu, Xue & Chen, Songcan. (2014).
Eyes closeness detection from still images with multi-scale histograms
the proposed networks are faster than the available models and of principal oriented gradients. Pattern Recognition. 47. 28252838.
expended dataset advantage is considering oblique view which 10.1016/j.patcog.2014.03.024.
make system works in more different situations. For future [21] Zhao, L., Wang, Z., Zhang, G. et al. Multimed Tools Appl (2018) 77:
works, the authors intend to focus on yawning analysis for 19415. https://doi.org/10.1007/s11042-017-5380-8.
[22] S. Sharma et al., ”Eye state detection for use in advanced driver
fatigue detection by landmark points of lips. Also, a different assistance systems,” 2018 International Conference on Recent Trends in
level of drowsiness can be investigated without limitations of Advance Computing (ICRTAC), Chennai, India, 2018, pp. 155-161.
being in two situations because the boundaries among different [23] Michael Jay C. De Castro, Joel C. De Goma, Madhavi Devaraj, John
Paul G. Lopez, and Joshua Rodgregor E. Medina. 2018. Distraction
levels of drowsiness are so narrow, and it would be more Detection through Facial Attributes of Transport Network Vehicle Service
challenging task [29]. Drivers. In Proceedings of the 2018 International Conference on Infor-
8
mation Hiding and Image Processing (IHIP 2018). ACM, New York, NY,
USA, 112-118.
[24] Karoui M.F. and Kuebler T. Lecture Notes in Computer Science (includ-
ing subseries Lecture Notes in Artificial Intelligence and Lecture Notes
in Bioinformatics). 2019. Pages 182-197
[25] P. Viola and M. Jones, ”Rapid object detection using a boosted cascade
of simple features,” Proceedings of the 2001 IEEE Computer Society
Conference on Computer Vision and Pattern Recognition. CVPR 2001,
Kauai, HI, USA, 2001,
[26] G. Pan, L. Sun, Z. Wu, S. Lao, Eyeblink-based anti-spoofing in face
recognition from a generic webcamera, in: Proceedings of IEEE Interna-
tional Conference on Computer Vision, 2007, pp. 18.
[27] Parmar, Singh Himani, Mehul Jajal, and Yadav Priyanka Brijbhan.
”Drowsy Driver Warning System Using Image Processing 1.” (2014).
[28] Simonyan, Karen and Andrew Zisserman. Very Deep Convolutional Net-
works for Large-Scale Image Recognition. CoRR abs/1409.1556 (2014):
n. pag.
[29] Samiee, Sajjad, et al. ”Data fusion to develop a driver drowsiness
detection system with robustness to signal loss.” Sensors 14.9 (2014):
17832-17847.
Maryam Hashemi received her B.S. degrees in Electrical engineering from

Isfahan University, Iran, in 2017. She is currently pursuing her M.S. degree
in Electrical engineering at Iran University of science and technology (IUST),
Iran. Her research interests include image processing, Machine Learning, Deep
Learning and human-machine interaction.
Alireza Mirrashid is currently a Ph.D. candidate in the Electrical Engineering

Department at the Iran University of science and Technology ,Iran. He
received his B.Sc in 2012 & his M.Sc. degree in 2014 from Amirkabir
University of Technology (AUT), Iran, both in Electrical Engineering. His
main research interests are Machine Learning, Deep Learning, Digital Signal
Processing and Compressive Sensing.
Aliasghar Beheshti Shirazi received the Ph.D. degree in Electrical En-

gineering from Okayama University, Japan. He is currently an Associate
Professor at Department of Electrical Engineering, Iran University of Science
and Technology, Tehran, Iran. His research interests include Image Processing
and Coding, Neural networks and Signal Processing.

2001 05137

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2001 05137

Uploaded by

Copyright:

Available Formats

1

Deep learning-based Driver Distraction and

comprehensive dataset is proposed, and a study on driver

Tip of Chin Left Right Left Right

where ri,j and (tx ,ty , tz ) are unknown parameters. It is

where fx and fy are the focal lengths (distance from camera

function and last layer is a softmax function as an output layer,

IV. RESULTS AND DISCUSSION

A. Experiments on ZJU dataset and our extended dataset

Network Accuracy AUC Optimizer Loss Epoch

extended training data, the FD-DNN, TL-VGG16, and TL-

ZJU dataset are investigated, which can be seen in Fig.10. R EFERENCES

Maryam Hashemi received her B.S. degrees in Electrical engineering from

Alireza Mirrashid is currently a Ph.D. candidate in the Electrical Engineering

Aliasghar Beheshti Shirazi received the Ph.D. degree in Electrical En-

You might also like