Professional Documents
Culture Documents
Abstract—We propose a deep neural network-based gaze for one of the most promising type of Human Machine
sensing method in which the design of the neural architecture is Interface (HMI).
performed automatically, through a network architecture
search algorithm called Auto-Keras. First, the neural model is At the beginning, lots of time and effort were put into the
generated using the Columbia Gaze Data Set. Then, the research of eye-gaze tracking using different head-mounted
performance of the solution is estimated on an online scenario systems to be able to measure the gaze more accurately.
and proves the generalization ability of our model. In However, this kind of systems are not of interest anymore for
comparison to a geometrical approach, which uses dlib facial consumers, in general, or for the automotive industry because
landmarks, filtering and morphological operators for gaze it is impractical to wear bulky headwear. Recently, due to the
estimation, the proposed method provides superior results and improvements of embedded imaging acquisition and
certain advantages. processing capabilities, the remote monitoring of eye gaze
emerged as an attractive solution. The problems generated by
the head pose and orientation in regard with eye-gaze tracking
were tackled using either model-based or appearance-based
Keywords— eye gaze tracking, deep learning, automotive methods, e.g. the work of J. G. Wang [5] or Y. Sugano [6].
Other researches have opted to use near-infrared (NIR)
I. INTRODUCTION illumination [7], stereo imaging [8], zoom cameras in
In these days, more and more cars are making their way to combination with wide-angle cameras [9], or a combination
the streets, crowding them and making driving more of these to increase coverage, so that they will be able to allow
challenging. This, and also the increase in top speed and a larger head movement.
acceleration ability of a car, are making driving more tiresome
and require much more attention and awareness. Among the In the last years, due to the remarkable performances of
main traffic problems that are causing accidents, we could DNNs in visual computing tasks, the deep learning-based
mention the violation of the traffic rules, speeding, driving solutions for gaze estimation have gained an increased
under the influence of alcohol and drugs. Still, 80% of crashes popularity. For example, S. Vora et. al. compared the
involve driver distraction, thus making Advanced Driver performances of several Convolution Neural Network (CNN)
Assistance Systems (ADAS) an important component for architectures: AlexNet, VGG16, ResNet50 and SqueezeNet in
alerting the driver in case of dangerous situations. predicting 6 gaze zones plus eyes closed case [10]. A
Recurrent-CNN network architecture that combines
The aim of this work was to develop a Deep Neural appearance, shape and temporal information for video-based
Network (DNN) based gaze zone estimation for automotive gaze estimation is introduces in [11]. In order to overcome the
applications that monitors the driver during his trip from one problem of head rotation, H. S. Yoon et. al. are proposing a
location to another, ensuring a safer driving environment for combination of single image and dual near-infrared cameras
him and other traffic participants. The application can be used [12]. They use a Deep Convolutional Neural Network
to provide drivers with assistance and warnings to take (DCNN) that simultaneously uses both image types. The
appropriate actions and act accordingly. conventional ResNet model was modified by replacing its last
7 x 7 average (AVG) pooling layer with an additional
The paper is organized as follows: Section II makes a brief
convolutional layer due to the problem of high inter-class
overview of the previous works and researches in eye gaze
similarity.
estimation problem; Sections III describes the proposed
system from an algorithmic perspective; The experimental For more in-depth review of CNNs for gaze estimation,
part and the conclusions are presented respectively in Section see [13].
IV and Section V.
Apart from many of the above-mentioned approaches,
II. RELATED WORK ours does not require explicit personal calibration for each
user and is able to differentiate a higher number (nine) of
The first researches in the field of gaze estimation dates zones. To the best of our knowledge, our work is the first one
back in the `80s and were dedicated to help the paralyzed employing a NAS algorithm for designing a gaze detection
people use eye-gaze controlled computers (T. E. Hutchinson model. It also provides top results for Columbia Gaze Data
[1] and J. L. Levine [2]). However, one of the first researchers Set: 85% accuracy for a 78%-22% training-testing split. It
to consider employing eye-gaze for normal users was R. J. K. shows further cross-driver and real-time capabilities in
Jacob [3]. In the early 2000’s, in his work [4], A. T. realistic driving scenarios.
Duchowsky points that eye gaze tracking could be the basis
Fig. 1. From the Dlib’s 68 facial keypoints, only 12 points are selected: 36
up to 47.