You are on page 1of 6

Human Intention Estimation Using Fuzzy Inference

Based on Gaze Tracking and Saliency Map


1st Yoichiro Maeda 2nd Jiaao Niu
College of Information Science and Engineering Graduate School of Information Science and Engineering
Ritsumeikan University Ritsumeikan University
1-1-1 Noji-higashi, Kusatsu, Shiga, 525-8577 Japan 1-1-1 Noji-higashi, Kusatsu, Shiga, 525-8577 Japan
ymaeda@fc.ritsumei.ac.jp is0389hr@ed.ritsumei.ac.jp

Abstract—Recently, the development of nursing and welfare On the study to operate a robot by using eye tracking,
robots intended for people with disabilities and elderly has Okamoto et al. [4] measured the eyes electric potential, calcu-
been advanced. The research of the method to transmit human lated the eye position and performed an experiment to control
intentions to the robot by using the biological information of
the operator is proceeding. Particularly, in the gaze information, a robot arm. Matsumoto et al. [5] developed an intelligent
there are multiple important information including the human wheelchair with the face and gaze based interface. Wang et
psychological condition. In this research we propose a method to al. [6] performed the research that estimates the intention of
perform the locomotive control of omni-directional wheelchair a human operator in controlling the wheelchair extracting the
by a gaze instruction system based on the human intention. gaze position and characteristics and acquires the place where
The human intention is estimated by using fuzzy inference
combining the gaze map and visual saliency map. We performed he wants to go to next by using the neural networks.
the operating experiment using a gaze tracking device, and we On the other hand, the visual saliency map model that Itti
confirmed that the operator can successfully perform the moving et al. [7] proposed based on the characteristic of the retina of
control of wheelchair according to the instructions. human eyes as a method to estimate visual unconsciousness
Index Terms—Gaze Tracking, Saliency Map, Omni-directional information is well known. However, in case of estimating the
Wheelchair, Human Intention, Fuzzy Inference
intention of human using gaze information, it becomes neces-
sary to remove the unconscious information where a human
I. I NTRODUCTION
watches from conscious instructions intention information.
The research and development of nursing and welfare Therefore, at first in this study, the saliency map is extracted
robots, devices and systems intended for people with disabil- from the movement of human eyes and the scenery where he
ities and elderly have been advanced in recent years. Elec- is watching. Moreover the intention estimation map, which is
troencephalogram (EEG), electromyogram (EMG) and eye really indicated his desired place, is generated by combining
tracking are often used as the human biological information to the gaze map with the saliency map with fuzzy inference. In
operate or control robots and mechatronics machines in these the experiment, we confirmed the efficiency of the proposed
studies. For example, the operation instruction interface for method by using the gaze instruction system to control the
handicapped persons has been developed by using the non- motion of an omni-directional wheelchair.
contact gaze tracking device with a little load for human.
On the study for EEG or EMG, Tanaka et al. [1] proposed II. V ISUAL S ALIENCY M AP
a pattern matching method used EEG and performed the
operating instruction of the wheelchair robot. Maeda et al. Koch et al. [8] thought that each individual characteristic in
[2] developed a technique to convey the human instructions an image has a property to guide attention from the feature
intention to a mobile robot using EMG information of the integration theory of Treisman et al. in 1985. He encoded a
vehicle driver’s arm by using the fuzzy inference. However, single characteristic of the image and proposed a concept of
the noise based on changes of the electric potential on the the visual saliency map model to express the two dimensions
head or skin surface is easy to be carried on EEG or EMG map. Itti et al. [7] implemented this theory as a computational
and there is a problem that it takes much time to set some model in 1998 (See Fig.1).
electrodes in the measurement experiment. In Fig.1, the processing (1) shows the output process for
This study aims at estimating the human’s instruction inten- the characteristic images of color (Colors), brightness (Inten-
tion by analyzing the movement of his eyes in case of driving sity), direction (Orientations) from input image (See Fig.2).
the omni-directional wheelchair. Ogawara et al. [3] proposed a The spatial filter outputs the summation result of weighted
method that a robot estimates its next work to perform from the neighboring pixels corresponding to the input image.
eyes information of a vehicle driver by avoiding each other’s In the color information, we generate each two color feature
collisions while a robot and a human perform the collaboration maps RG(σ), BY (σ) with the relation of complementary
in the same space. color together such as red and green, blue and yellow as shown

978-1-5386-1728-1/19/$31.00 ©2019 IEEE


(1)

(2)

(3) Fig. 2. Original Image Fig. 3. Saliency Map

(4)

(5)

(6)

Fig. 4. Color Feature Map Fig. 5. Intensity Feature Map Fig. 6. Orientation Feature Map

Fig. 1. Outline of Visual Saliency Map Model (referred from [7])


(σ) in total from scale 0 of 1:1 to scale 8 of 1:256. Because
a smoothed image is filtered with 1/2 scale in comparison
in Eq.(3),(6) (See Fig.4). σ which is the scale number of the with the original image one after another, it is called Gaussian
image, is given 0 to 8. pyramid.
g(σ) + b(σ) 1 x2 + y 2
R(σ) = r(σ) − (1) f (x, y) = exp(− ) (9)
2 2πσ 2 2σ 2
r(σ) + b(σ) In the processing (3), the center surround processing, which
G(σ) = g(σ) − (2)
2 lets a place with large amount of the change in the density
RG(σ) = R(σ) − G(σ) (3) level distinguish by the difference between the scale image in
integrating from each nine maps to one map, is performed.
r(σ) + g(σ)
B(σ) = b(σ) − (4) In the processing (4), 12 color feature maps, 6 intensity
2 feature maps, and 24 orientation feature maps are generated
r(σ) + g(σ) |r(σ) − g(σ)|
Y (σ) = − − b(σ) (5) in total. In the processing (5), each feature map is emphasized
2 2 when there are few high saliency places (single peak) and is
BY (σ) = B(σ) − Y (σ) (6) restrained when there are many high saliency places (multi-
In the brightness information, the average of the RGB peak) by multiplying (M − m)2 in the whole with the global
channel is generated as the intensity feature map I(σ) as maximum value M and the local maximum value m. In the
shown in Eq.(7) (See Fig.5). processing (6), the saliency map is generated by integrating
these feature maps with the weighted value (See Fig.3).
r(σ) + g(σ) + b(σ)
I(σ) = (7) III. I NTENTION E STIMATION M AP
3
In the direction information, we generate four orientation In this research, we use the visual saliency map model as a
feature maps O(σ, θ) of 0◦ , 45◦ , 90◦ , 135◦ (See Fig.6) as technique to extract the unconscious movement of eyes. The
shown in Eq.(8). ψ(θ) shows Gaussian function, I(σ, m, n) generation procedure of the intention estimation map is shown
shows the brightness in (m, n) coordinate of an image at the in Fig.7. We construct an intention estimation map using the
σth scale, ψ(θ, x − m, y − n) shows the component in (x − fuzzy inference by combining the saliency map which showed
m, y − n) coordinate, hm and hn show the width of the filter, the place where human unconsciously turns his eyes and the
and θ shows the rotation angle of the filter. gazing map which showed the conscious movement of his
eyes in which human behavior intention are included. Finally
O(σ, θ) = I(σ) · ψ(θ) the operator moves the omni-directional wheelchair toward the
hm
 hn
place where the intention estimation degree is highest.
= I(σ, m, n)ψ(θ, x − m, y − n)
(8) The saliency map expresses the unconscious movement of
m=−hm n=−hn human eyes. The higher saliency degree of each pixel is,
In the processing (2), the convolution with the brightness the larger brightness (0 ∼ 255) of the pixel in the input
information is calculated by Gabor filter using Gaussian func- image grows. The place where a person is easy to look most
tion that is a kind of the linear filter as shown in Eq.(9). The unconsciously becomes pure white. We assume a parameter
Gaussian filter generates different resolution images of 9 scales to express an unconscious degree s(silency)(0 ≤ s ≤ 1)
brightness brightness
255 255

Omni-directional
Wheel Chair Fig. 8. Membership Functions Fig. 9. Membership Functions
(Saliency Map) (Gaze Map)

USB Camera in front Eye Tracking Device


of Monitor Display under Monitor Display

Forward Scenery Image Gaze Point Detection


by USB Camera by Eye Tracking Device brightness
0 255

Fig. 10. Singletons (Intention Estimation Map)

TABLE I
Saliency Map Gaze Map F UZZY RULE OF I NTENTION E STIMATION
Fuzzy Inference
Conscious
Unconscious
Degree s Degree g gaze \ saliency SSS SS SM SL SLL
GSS IMM ISM ILS IMS IVS
GS ILM IMM ISM ILS IMS
GM ILL ILM IMM ILS IMS
GL IML IML ILL IMM ISM
GLL IVL IVL IML ILL ILM
Intention Estimation
Intention Estimation Map Degree i SSS: Saliency minimum GSS: Gaze minimum
SS: Saliency Small GS: Gaze Small
SM: Saliency Middle GM: Gaze Middle
Fig. 7. Generation Procedure of Behavior Intention Estimation Map SL: Saliency Large GL: Gaze Large
SLL: Saliency maximum GLL: Gaze maximum

IVS: Intention Very Small ILM: Intention Large Middle


IMS: Intention Medium Small ILL: Intention Little Large
ILS: Intention Little Small IML: Intention Medium Large
and Fig.8 shows the antecedent membership functions of the ISM: Intention Small Middle IVL: Intention Very Large
saliency map expressed with five fuzzy labels. IMM: Intention Medium Middle

In a similar way, the higher conscious degree of each


pixel is, the larger brightness (0 ∼ 255) of the pixel in
the gaze map grows. The highest point of conscious degree inference, the conscious degree and unconsciousness degree
in the gaze map becomes pure white because of the human are combined together and the intention estimation degree
gaze point. We assume a parameter to express the conscious are output by the fuzzy rule of Table I. We supposed the
degree g(gaze)(0 ≤ g ≤ 1) and Fig.9 shows the antecedent equivalence level of brightness value in a domain of all pixels
membership functions of the gaze map expressed with five and we defined the first pixel of a domain as the representative
fuzzy labels as same as the saliency map. pixel of the domain. Then we let the brightness value of output
In the intention estimation map, the higher intention degree reflect to the whole domain in the same way.
is, the larger brightness (0 ∼ 255) of the pixel grows. The
consequent part singletons of the intention estimation map IV. I NTENTION E STIMATION E XPERIMENT U SING
with nine fuzzy labels are shown in Fig.10. The process O MNI - DIRECTIONAL W HEELCHAIR
time of the computer program becomes very late because the In this section, we report the result of the experiment using
conscious and unconscious degree are evaluated for each pixel an omni-directional wheelchair and an eye tracking device to
when the saliency map and gaze map are combined together. confirm the effectiveness of the proposed method.
Therefore, in this study, we divided the image of the saliency
map and the gaze map in the domain of 40×40 pixels and A. Experimental Method
applied the fuzzy inference for all 192 domains. The first Figure 11 shows the external appearance of experimental
pixel in each 40×40 pixel domain is input to the fuzzy rule. equipment and device which we used for the experiment.
The output value of the fuzzy inference is the brightness Operator’s instruction intention is estimated detecting the gaze
of each pixel in the intention estimation map. By the fuzzy point by the eye tracking device installed in the lower part
USB Camera

Forward Image
Displayed on Monitor

Eye Tracking Device

Fig. 12. Experimental Environment

Omni-directional Wheelchair

Fig. 11. Omni-directional Wheelchair with Eye Tracking Device

of monitor displaying the front scenery image on the omni-


directional wheelchair. The eye tracking device used in this Fig. 13. Wheelchair Moving Course in Experiment
experiment is a noncontact-type view point analysis device
QG-PLUS produced by DITECT Co., Ltd.. The highly precise
measurement of eyes direction is realized with the light of wheelchair as obstacles (See Fig.12). The moving course of
infrared LED from the eye tracking device based on the light the wheelchair is a figure of eight running like Fig.13 from the
reflection principle of cornea. center of two obstacles. We defined five places from place 1 to
At first, in this experiment, the forward scenery image from 5 as the most characteristic position at the right or left turn in
a front USB camera is displayed on the monitor screen in front the course. At first, the operator controls the wheelchair from
of the wheelchair driver (operator) in real time. The operator the start point, turns right for a small table, turns left for a
indicates the desired target place by his eyes watching the chair, runs a figure of eight around two obstacles, and comes
forward scenery image on the screen. The eye tracking device back to the start point finally.
measures human’s gaze position and displays the gaze map on
the screen in this time. Next, the gaze map and the saliency B. Experimental Result
map are integrated to get an intention estimation map. Finally
In this study, we estimated the intention of the operator from
the omni-directional wheelchair is moved to the desired target
his gaze information using the fuzzy inference. The parameter
place according to the instruction intention of the operator.
values of membership functions (See Fig.8 and Fig.9) and the
We asked ten subjects (nine male students, one female
singletons (See Fig.10) used in this experiment are shown in
student, average age: 22 years old) for the experiment to
Table II.
confirm the efficiency of the proposed method. And we
The experimental data and generated map images for place
explained the instructions and the procedure of this experiment
2 to place 5 are shown in Fig.14 ∼ 17. The coordinates of the
orally in advance. Then, a subject sits down on a wheelchair,
control to his desired target place by the instruction based
on the gaze position from the start point, and avoids two TABLE II
obstacles and comes back to the start point. We executed PARAMETER VALUES OF M EMBERSHIP F UNCTIONS AND S INGLETONS
three kinds of experiment by comparing the hand-operated (c) Intention Estimation Map
control by joystick, the control only by gaze point, and the
control by the proposed method. After these experiments, we (a) Saliency Map (b) Gaze Map Intention Value
asked each subject for answering the questionnaire and writing i1 31
impressions about the comfortability and operationability for Saliency Value Gaze Value i2 63
the wheelchair control. s1 63 g1 63 i3 95
s2 127 g2 127 i4 127
In the control experiment of omni-directional wheelchair, s3 191 g3 191 i5 159
we prepared for the experimental space of 3m × 3m and i6 191
located one chair and one small table in front and behind the i7 223
Fig. 14. Generated Maps (Place 2) Fig. 16. Generated Maps (Place 4)

Fig. 15. Generated Maps (Place 3) Fig. 17. Generated Maps (Place 5)

gaze point and the intention estimation point, unconscious de- or joystick control) each as processing group, the level of
gree (Saliency), conscious degree (Gaze), intention estimation significance is modified as 1/2 by the Bonferroni correction
degree (Intention) are shown in the lower part of each figures. method here. In addition, the significant differences about
Four images show the original image, the saliency map, the about the comfortability and operationability after the Bon-
gaze map, and the intention estimation map, respectively. The ferroni correction are shown in Fig.18 and 19 respectively.
human gaze point is shown with a blue circle on the original As a result of three comparison experiments, it was con-
image. firmed that the proposed method obtained the highest evalua-
In the questionnaire, 10 subjects evaluated for impres- tion in both comfortability and operationability. The evaluation
sions of three experiments: the proposed method, only gaze of the hand-operated Joystick control was the lowest, but a
control, and joystick control about the comfortability and lot of subjects felt it was unexpectedly inconvenient. When
operationability with seven points (min:1 to max:7). We used the subject operates in case of gaze control, he may turn his
t-test to inspect the significant difference in each method for own eyes to the direction where he wants to go, but, in the
the questionnaire evaluation. The calculation results of p-value case of manual operation, However, in case of joystick control
about the comfortability and operationability are shown in by hand, it is thought that he must carry out the cooperative
Table III and IV. Because we performed two times compar- behavior of joystick control and eye movement at the same
isons for the proposed method with other two methods (gaze time.
TABLE III Average
t- TEST R ESULT WITH B ONFERRONI C ORRECTION (C OMFORTABILITY) Evaluation
Value
Experiment Proposed & Gaze Proposed & Joystick
p-Value 0.4212 0.0006
Level of significance p ≥ 0.025(=0.05/2) p < 0.0025(=0.005/2)
Significant Difference n.s. ***

TABLE IV
t- TEST R ESULT WITH B ONFERRONI C ORRECTION (O PERATIONABILITY)

Experiment Proposed & Gaze Proposed & Joystick Proposed Method Gaze Control Joystick Control
p-Value 0.0051 0.0002
Level of significance p < 0.025(=0.05/2) p < 0.0025(=0.005/2) Fig. 18. Significant Difference for Comfortability
Significant Difference * ***

Average
Evaluation
Value
Finally, the gaze control was considerably better than the
joystick control because a human can intuitively operate by
the gaze point. Furthermore, we confirmed that the operation
by the proposed method is slightly superior than the gaze
control in the operationability. As the reason about this result,
we consider that human’s instruction intention was estimated
by the fuzzy inference using the saliency map and eyes
information, and the system was able to remove unconscious
eyes movements as a noise from conscious eyes information. Proposed Method Gaze Control Joystick Control

It is thought that the system was able to estimate a human


intention exactly in comparison with the gaze control by Fig. 19. Significant Difference for Operationability
questionnaire evaluation results.

V. C ONCLUSIONS This work was supported by JSPS KAKENHI Grant Num-


ber JP17K00346.
In this study, the control system of the omni-directional
R EFERENCES
wheelchair by extracting the human instruction intention from
eyes information using USB camera and eye tracking device [1] K. Tanaka, K. Matsunaga and H. O. Wang, ”Electroencephalogram-
was developed. Here, we paid an attention to the unconscious based control of an electric wheelchair,” IEEE Transactions on Robotics,
eyes movement and the conscious gaze motion in human eyes Vol.21, Vol.4 (2005)
[2] Yoichiro Maeda and Shoji Ishibashi, ”Operating Instruction Method
information. We used the saliency map model that is able to Based on EMG for Omnidirectional Wheelchair Robot,” Proc. of Joint
specify the unconscious movement of eyes on an view image. 17th World Congress of International Fuzzy Systems Association and
In contrast, the conscious movement of eyes was produced 9th International Conference on Soft Computing and Intelligent Systems
(IFSA-SCIS 2017), Otsu, CD-ROM,
by the gaze map as the intentional instructions in the human [3] Koichi Ogawara, Kenji Sakita, Katsushi Ikeuchi, ”Intention Interpreta-
viewpoint. The highest part as a human instructions intention is tion from Eye Movements and its Application to Cooperative Behavior
estimated except unconscious movement from the conscious by a Robot,” IPSJ SIG Technical Report, 2005-CVIM-150(8), pp.55-62
(2005) [in Japanese]
movement of eyes composing these two maps by the fuzzy [4] T. Okamoto, M. Sasaki, S. Ito, K. Takeda and M. I. Rusydi, ”Using gaze
inference. It is expected to apply this method to the useful point estimation and blink detection, to control robot arm by use of the
system that it is able to reduce the burden on the operator EOG signal,” Journal of the Japan Society of Applied Electromagnetics
and Mechanics, Vol.22, No.2, pp.312-317 (2014) [in Japanese]
and even limbs inconvenience person is possible to control a [5] Y. Matsumoto and T. Ino, ”Development of Intelligent Wheelchair
wheelchair only by eyes. System with Face and Gaze Based Interface,” 10th IEEE Int. Workshop
The three kinds of experiment of the proposed method, on Robot and Human Communication (ROMAN) 2001, pp.262?267
(2001)
gaze control, and joystick control was performed for ten [6] M. Wang, Y. Maeda, and Y. Takahashi, ”Visual Attention Region
subjects. After the experiment, we carried out the evaluation Prediction Based on Eye Tracking Using Fuzzy Inference,” Journal
by the questionnaire and evaluated about the comfortability of Advanced Computational Intelligence and Intelligent Informatics,
Vol.18, No.4, pp.499-510 (2014)
and operationability for the wheelchair control. As a result, it [7] L. Itti, C. Koch, and E. Niebur, ”A model of saliency-based visual at-
was able to confirm the effectiveness of the proposed method tention for rapid scene analysis,” IEEE Transactions on Pattern Analysis
of this study. In the future, we would like to build the instruc- and Machine Intelligence(PAMI), Vol.20, No.11, pp.1254-1259 (1998)
[8] C. Koch and S. Ullman, ”Shifts in selective visual attention: towards
tion intention estimation method with higher precision using the underlying neural circuitry,” Human neurobiology, Vol.4, No.4, pp.
several human bio-information except the eye movement. 219-227 (1985)

You might also like