You are on page 1of 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/224184051

A Realistic Game System Using Multi-Modal User Interfaces

Article  in  IEEE Transactions on Consumer Electronics · September 2010


DOI: 10.1109/TCE.2010.5606271 · Source: IEEE Xplore

CITATIONS READS
41 1,324

5 authors, including:

Eui Chul Lee Mincheol Whang


Sangmyung University Sangmyung University
158 PUBLICATIONS   2,340 CITATIONS    150 PUBLICATIONS   1,194 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

The development of technology for social life logging based on analyzing social emotion and intelligence of convergence contents View project

All content following this page was uploaded by Eui Chul Lee on 25 December 2013.

The user has requested enhancement of the downloaded file.


1364 IEEE Transactions on Consumer Electronics, Vol. 56, No. 3, August 2010

A Realistic Game System


Using Multi-Modal User Interfaces

Hwan Heo, Eui Chul Lee, Kang Ryoung Park, Chi Jung Kim, and Mincheol Whang

Abstract— This study is to propose a realistic game system Many studies into gaze tracking methods have been
using a multi-modal interface, including gaze tracking, hand performed [2], [3]. User interfaces using gaze tracking have
gesture recognition and bio-signal analysis. Our research is the following advantages. First, users can intuitively adapt to
novel in the following four ways, compared to previous game the gaze tracking method because its operational protocol is
systems. First, a highly immersive and realistic game is similar to that of a conventional computer mouse. Second, its
implemented on a head mounted display (HMD), with a gaze throughput is very high compared to that of the hand operation
tracker, a gesture recognizer and a bio-signal analyzer. based conventional input devices, such as a mouse, a keyboard,
Second, since the camera module for eye tracking is attached or a joystick [2], [3]. Yamato et al. showed that it took 150ms
below the HMD, a user’s gaze position onto the HMD display for a user to gaze from the left-top to the right-bottom corner
can be calculated without wearing any additional eye tracking of a 21 inch monitor [3]. In addition, the gaze tracking
devices. Third, an aiming cursor in the game system is methods offer opportunities for disabled persons to use a
controlled by the gaze tracking. The grabbing and throwing computer. Previously studied gaze tracking methods can be
behaviors toward a target are performed by the user’s hand classified into two groups: the two-dimensional (2D) and
gestures using a data glove. Finally, the level of difficulty in three-dimensional (3D) approaches. The 2D approaches do not
the game system is adaptively controlled according to the consider the 3D structure and movement of the user’s eye. The
measurement and analysis of a user’s bio-signals. methods estimate the gaze direction on the monitor plane by
Experimental results show that the proposed method extracting the pupil or iris center from successive input images.
provides more effect on experience of immersion and interest Then, the center position is directly mapped to the monitor
than conventional device such as a keyboard and or a mouse. coordinates by using a mapping function based on the pupil
(or iris) movable region or on the rectangle defined by
Index Terms—Multi-modal Interface, Gaze Tracking, Hand multiple cornea reflection points [4]-[6]. Therefore, the 2D
Gesture Recognition, Bio-signal Analysis. approaches have the advantage of requiring simple hardware
and algorithms. In the 3D approaches, the 3D relationship
I. INTRODUCTION among the camera, monitor and eye coordinates needs to be
Many studies have been carried out for making more established through a calibration procedure. In addition,
immersive and realistic games based on natural user interfaces complex and expensive hardware requirements, such as stereo
[1]. Accordingly, game interfaces have evolved from simple camera, multiple illuminators, or additional sensing devices,
controllers such as a keyboard, a mouse, or a joystick, to are needed to estimate the accurate 3D position of the human
tangible interfaces which give immersion and reality to virtual eye [7], [8]. It also requires much more processing time to
calculate the 3D geometry, as compared to the 2D methods.
space. According to this new trend in the game market, we
However, the 3D methods have the advantages of showing a
propose a realistic game system using multi-modal interface
better gaze estimation accuracy and require a simpler
based on human gaze tracking, hand gesture recognition, and
user-dependent calibration, compared to the 2D method.
bio-signal analysis, such as the photoplethysmogram (PPG), Based on these analyses, the 2D approach based gaze tracking
the galvanic skin response (GSR), and the skin temperature method is adopted in our research because the gaze tracking,
(SKT). Previous researches into uni-modal interfaces such as the game play, and other kinds of interfaces such as the hand
gaze tracking, gesture recognition and bio-signal analysis are gesture recognition and bio-signal analysis need to be operated
introduced as follows. concurrently in one computer.
This research was supported by Basic Science Research Program through Previous studies into hand gesture recognition are classified
the National Research Foundation of Korea (NRF) funded by the Ministry of
Education, Science and Technology (2010-0008099). into two categories. The first one is based on the camera
H. Heo is with the Division of Electronics and Electrical Engineering, vision method [9], [10]. The second one is based on the
Dongguk University, Seoul, Republic of Korea. data-glove [11]-[14]. Previous researches into hand gesture
E. C. Lee is with the Division of Fusion and Convergence of Mathematical recognition with a data-glove are as follows: Several studies
Sciences, National Institute for Mathematical Sciences, Daejeon, Republic of
Korea. about game, simulator and virtual reality (VR) systems using
K. R. Park is with the Division of Electronics and Electrical Engineering, the data-glove have been done [11]-[13]. O. Belmontea et al.
Dongguk University, Seoul, Republic of Korea. (e-mail: implemented an under-water robot simulator using a
parkgr@dongguk.edu). Dr. K. R. Park is the corresponding author. data-glove [11]. D. Baricevic et al. developed the first person
C. J. Kim is with the Department of Computer Science, Sangmyung
University, Seoul, Republic of Korea. shooter (FPS) game system by using hand gesture recognition
M. Whang is with the Division of Digital Media Technology, Sangmyung with a data-glove [12]. N. Bee et al. proposed a facial
University, Seoul, Republic of Korea.
Contributed Paper
Manuscript received 06/24/10
Current version published 09/23/10
Electronic version published 09/30/10. 0098 3063/10/$20.00 © 2010 IEEE
H. Heo et al.: A Realistic Game System Using Multi-Modal User Interfaces 1365

expression controlling system through data-glove based ultrasonic signal sensors attached to the screen. All the above
gesture recognition [13]. The data-glove based gesture introduced game systems simply used their multi-modal
recognition method can be usefully adopted by games, interface to send behavioral events from the user to the game.
simulators or VR systems because it can accurately classify Different from these systems, in our method new multi-modal
various gestures compared to the camera vision based methods. user interfaces are proposed that measure not only the
Therefore, we adopt the data-glove based gesture recognition behavioral event of the user but also the emotional states. The
method in our system. Various gestures can be applied to measured emotional states are used in order to adjust the game
control different kinds of user interactions, such as navigation, level. Using this, an intelligent game system is realized
exploration, and selection [14]. However, since it is more through the emotional context awareness of a human.
natural that the navigational and explorative functions are Consequently, we implement a realistic game system
performed by using gaze tracking, in our research only the through the combination of the three unit interfacing methods,
grabbing operation of selecting a specific object is such as gaze tracking integrated onto the head mounted
implemented by using the data-glove based gesture display (HMD), hand gesture recognition, and bio-signal
recognition method. analysis. This realistic game system increases the degree of
Previous applications using bio-signals are described as immersion and interest, in comparison to those games using
follows: A bio-signal is an abstract term used for all kinds of conventional interfaces.
signals sensing from biological response. The best known The structure of the rest of this paper is as follows: In
bio-signals are the electroencephalogram (EEG), the Section 2, the integrated game system is explained after
magnetoencephalogram (MEG), the galvanic skin response describing each component of the multi-modal interfaces, such
(GSR), the electrocardiogram (ECG), the electromyogram as the HMD, the gaze tracking method, the gesture recognition,
(EMG), the heart rate variability (HRV), and the and the bio-signal analysis. The experimental setup and the
photoplethysmogram (PPG). Several studies have been tried to results through objective and subjective tests are shown and
estimate a person’s emotion based on bio-signals [15], [16]. J. analyzed in Section 3. In Section 4, conclusions and future
H. Kim et al. did research into the automatic recognition of a research plans are presented.
user’s emotional state through audio-visual emotion channels,
such as facial expressions or speech, using physiological II. THE PROPOSED SYSTEM AND METHODS
signals [15]. E. Hristova et al. aimed at classifying a user’s Fig. 1 shows the flow chart of the proposed system. After
emotional experience based on their bio-signals while starting the game system (step (a) of Fig. 1), a user dependent
interacting with embodied conversational agents [16]. C. D. calibration is performed through gazing at the four corner
Katsis et al. used a wearable system for the evaluation of the points of the HMD screen as shown in step (b) of Fig. 1. From
emotional states using the facial EMGs, ECGs, etc, of race-car that, the relationship between the monitor coordinates and the
drivers [17]. As the above mentioned researches show, the pupil movable area is established. A detailed explanation of
human emotional state can be estimated by analyzing the this can be found in section 2.2. Once this is accomplished,
bio-signals. The level of attention, one of emotions, can be each interface module in our system is operated as shown by
also estimated by analyzing these bio-signals [18]. In our steps (c), (d), and (e) of Fig. 1. These are the tracking of the
proposed method, the game level is regulated based on the gaze position, the measuring of hand gestures and the
estimated attention level by analyzing the PPG, the GSR and measurement of bio-signals, respectively. The detail
the SKT of the game player. explanations are shown in sections 2.2, 2.3, and 2.4,
The researches have been attempted to adopt multi-modal respectively. The measured gaze positions, gestures and
interfaces into game systems [1], [19]-[22]. H. I. Lee et al. bio-signals are used for the cursor (or character) navigation,
developed a snowball fight game in which the user could the object selection (or gun-shot), and adjusting the game level,
throw a snowball at an object on the screen [1]. The user and respectively, as shown in steps (f), (g), and (h) of Fig. 1,
snowball location were tracked by using radio frequency, respectively.
ultrasonic signals, and IR sensor grids attached on the screen.
J. W. Yoon et al. presented a virtual fencing game in which
the user performed a whole body interaction with an
intelligent cyber fencer [19]. This system consisted of tracking
the 3D position of the user and sword by using a camera
vision method, and speech recognition. C. Magerkurth et al.
developed an augmented board game consisting of a display
table and personal digital assistants (PDAs) used for
manipulating a player’s inventory and character state (health,
strength, etc). It also included a speech generation and
recognition functionality [20]. E. Tse et al. developed a
tabletop game combining hand gestures and speech on a
digital touch table [21]. The users command their character
through hand gestures and speech. D. S. Eom et al. developed
a multi-player arcade video game in which the user could
swing a game stick [22]. This platform uses acceleration Fig. 1. The flow chart of the proposed realistic game system
sensor attached to the game stick with radio frequency and
1366 IEEE Transactions on Consumer Electronics, Vol. 56, No. 3, August 2010

A. HMD
In the proposed method, a user gazes at the virtual screen
of an HMD, as shown in Fig. 2. The HMD has the following
advantages for a game user. First, a wide visual field of view
(FOV) is guaranteed without the hindrance of the visual field
caused by the user’s actual environment. Actually, the visual
FOV is diagonally 40 degrees in the HMD used in our
research and so the users see a virtual large screen (diagonally
105 inch at the virtual Z distance of 3,657.6mm). Second,
even though facial movements occur, the screen does not
change position and so the geometric relationship between the
user’s head and the HMD’s virtual screen is maintained.
Consequently, the users feel more immersion and reality. Fig. 3. The proposed gaze tracking device attached on the HMD
However, the negative effects of an HMD were documented in
a previous research [23]. They showed that the use of a HMD The images are captured at the speed of 30 frames per
could cause cyber-sickness, including headache, nausea, second, based on the bandwidth of the universal serial bus
dizziness, and so on. In the case of watching the stereoscopic (USB) 2.0. The infrared rejection filter inside of the camera is
screen of a HMD, this phenomenon is occurred. Nevertheless, removed and replaced by a NIR passing (> 700mm) filter.
HMDs are continuously adopted for the increasing immersion Consequently, the USB camera obtains NIR eye images that
and interest, since the user can reduce cyber-sickness through are not affected by the visible light. The NIR LED is attached
the repeatedly adapting to HMD viewing. on the side of the camera; its power is supplied by the USB
In the proposed method, an HMD is used as a display for camera. Doing this reduces the size of our gaze tracking
increasing the user’s immersion and reality, as shown in Fig. 1. device without using additional power line for the NIR LED,
The eye gaze tracking device is attached on the HMD, by as shown in Fig. 3. The wavelength of the NIR LED is 880nm.
which an additional tracking system for head-movement is not In general, in an eye image illuminated by NIR light whose
needed. This is because the pupil image is continuously wavelength is 800 to 900nm, the boundary between the iris
captured without the interference caused by head movement and the pupil is more distinctive, which makes it easier to
[24]. The details about the gaze tracking are explained in the extract the pupil region [25]. The camera module is linked by
next section. The specifications of the HMD are as follows: a flexible frame from the top of the HMD in order to adjust the
viewing angle of the eye camera irrespective of the variations
- Spatial resolution of screen: 800 × 600 pixels of the eye position.
- Viewing equivalent: diagonally 105 inch at the virtual Z Additionally, our system can support augmented reality
distance of 3,657.6mm (AR) based game systems through the attachment of an
- Viewing angle: Diagonally 40 degrees FOV additional scene camera above the HMD, as shown in Figs. 3
- Number of colors: 24-bit color and 15.

B. 2 Gaze tracking algorithms


B. Gaze Tracking
In order to obtain a reliable gaze position, an accurate pupil
In this section, the device and algorithms used for gaze
center position needs to be known.
tracking are explained.

B. 1 Gaze tracking device


To unify the gaze tracking device and the HMD, a near
infrared (NIR) light emitting diode (LED) illuminator and one
camera are attached below the HMD, as shown in Fig. 3. The
spatial resolution of the captured images is 640×480 pixels. (a) (b) (c)

(d) (e) (f)


Fig. 4. The process of detecting a pupil center [25], (a) circular edge
detection using the two circle templates (yellow dotted circles), (b) the
detection of the coarse pupil region after CED, (c) the local binarization
in the area predetermined by the detected pupil region, (d) the region
filling using the morphological closing operation, (e) calculating the
Fig. 2. The HMD (head mounted display) center of gravity of the black pixels (white cross), (f) the final result of
detecting the pupil center.
H. Heo et al.: A Realistic Game System Using Multi-Modal User Interfaces 1367

Fig. 6. The mapping relationship between the HMD monitor coordinates


Fig. 5. Flow chart of the proposed gaze tracking algorithms [25] and the pupil movable area

Fig. 5 shows the flow chart of the proposed gaze tracking As shown in Fig. 6, a geometric transform is performed for
algorithms. An image is captured by the eye camera attached the mapping of the pupil’s movable area onto the monitor
below HMD. To detect the center of pupil, we perform a coordinates of the HMD [24]. The transform method from the
circular edge detection (CED) algorithm, local binarization, pupil movable area to the HMD’s monitor coordinate is
and a morphological operation. The geometric center of the explained as:
detected pupil area is determined as the pupil center, as shown
in Fig. 4. Based on the located pupil center position, the user M ULx = a ⋅ CULx + b ⋅ CULy + c ⋅ CULx ⋅ CULy + d (1)
dependent calibration is performed. During the calibration
procedure, a user is required to gaze at the 4 corners of HMD M ULy = e ⋅ CULx + f ⋅ CULy + g ⋅ CULx ⋅ CULy + h (2)
monitor for obtaining the mapping relationship between the
pupil movable area and the HMD virtual screen as shown in ⎡ M ULx M URx M LLx M LRx ⎤
Fig. 6. Based on the mapping relationship and geometric ⎢M M URy M LLy M LRy ⎥⎥
⎢ ULy =
transform, the gaze position of user can be calculated. ⎢ 0 0 0 0 ⎥
Detail explanations are as follows. For the detection of the ⎢ ⎥
⎣ 0 0 0 0 ⎦
pupil center, the CED algorithm and local binarization are
⎡a b c d ⎤ ⎡ CULx CURx C LLx C LRx ⎤
used with the input eye images, as shown in Fig. 4. First, the ⎢e
⎢ f g h ⎥⎥ ⎢⎢ CULy CURy C LLy C LRy ⎥
⎥ (3)
CED algorithm is adopted by using two circular templates, as
⎢0 0 0 0 ⎥ ⎢CULx CULy CURx CURy C LLx C LLy C LRx C LRy ⎥
shown in Fig. 4 (a). The position at which the value of the ⎢ ⎥⎢ ⎥
CED is maximized is determined as the coarse pupil center, as ⎣0 0 0 0⎦⎣ 1 1 1 1 ⎦
shown in Fig. 4 (b). However, in general, the shape of pupil is
not circular but elliptical in shape. Additionally, if specular ⎡a b c d ⎤ ⎡ Px ⎤ ⎡Gx ⎤
reflection region occurs near the pupil area, the detection ⎢e f g h ⎥⎥ ⎢⎢ Py ⎥⎥ ⎢⎢G y ⎥⎥ (4)
⎢ =
accuracy by CED can be degraded. Therefore, a local ⎢0 0 0 0 ⎥ ⎢ Px Py ⎥ ⎢ 0 ⎥
binarization method is additionally performed in the local area ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎣0 0 0 0⎦⎣ 1 ⎦ ⎣ 0 ⎦
of 100×100 pixels, which is defined by the detected pupil
region, as shown in Fig. 4 (c). A morphological closing In the user-dependent calibration procedure, the four
operation is then performed in order to fill the hole and positions of the pupil centers are obtained ((CULx, CULy), (CURx,
concave region caused by specular reflection, as shown in Fig. CURy), (CLLx, CLLy), (CLRx, CLRy)) when gazing at the four corner
4 (d). Finally, by calculating the center of gravity of the black points of the monitor screen of the HMD, such as the upper
area in the local region, the pupil center can be accurately left (MULx, MULy), the upper right (MURx, MURy), the lower left
extracted, as shown in Figs. 4 (e) and (f). (MLLx, MLLy) and the lower right (MLRx, MLRy). From this, the
In order to track the user’s gaze position on the HMD matrix coefficients of a, b, c … h are obtained by using (3).
monitor, each user first needs to perform a user-dependent With these matrix coefficients, one center position (Px, Py) of
calibration. In the proposed method, the user-dependent the pupil is mapped into one gaze position (Gx, Gy) on the
calibration is carried out by gazing at the four corner points monitor coordinates of the HMD, as shown in (4). By the
(upper left, upper right, lower left, lower right) of the HMD continuous calculation of (Gx, Gy) in successive eye images,
screen in order to establish the relationship between the HMD the gaze tracking is performed.
monitor coordinates and the pupil movable area defined by the
user-dependent calibration, as shown in Fig. 6.
1368 IEEE Transactions on Consumer Electronics, Vol. 56, No. 3, August 2010

C. Hand Gesture Recognition by using the Data Glove


The data-glove is an input device for tangible gesture
interaction. In our method, commercial data glove is adapted
for gesture recognition. The set of gestures defined by the data
glove is comprised of binary open (stretching) and close
(crooking) configurations of the fingers excluding the thumb.
Based on the binary configurations of the four fingers, there
exist 16(24) possible combinations.

Fig. 8. The sensors for measuring the bio-signals

Fig. 7. The two gestures used using a data-glove.

Gesture number “0(0000)” is defined as all the fingers


(excluding the thumb) being crooked (fist), and gesture
Fig. 9. The bio-signal analysis algorithm
number “15(1111)” as all the fingers stretched (flat hand). The
index finger indicates the least significant bit as “0001”. In
The measured PPG, GSR, and SKT signals are used to
other words, the stretching of only the index finger means the
estimate the degree of immersion through real-time signal
gesture number “1(0001)”, and that of only the little finger is
processing. The sensors for GSR and SKT are attached on
“8(1000)”. An unrecognizable gesture is defined as the value
three of the user’s fingers, and the sensor for the PPG is
“—1”. The data acquisition speed is 75 Hz. pinched on the right earlobe of the user, as shown in Fig. 8.
A scaled sensor value of higher than the upper threshold Each signal is measured at the acquisition rate of 200Hz.
setting indicates a closed finger, while a scaled sensor value of The acquired bio-signal is analyzed as shown in Fig. 9. The
lower than the lower threshold setting indicates an open finger. PPG value is calculated as a power spectrum after band pass
An in-between value is invalid and will result in an invalid filtering (1Hz ~ 2Hz) in the frequency domain. Since the GSR
gesture. In our method, the case that all the fingers excluding and the SKT values have noise factors, they are processed by
the thumb are stretched is regarded as “not-grabbing” gesture. using a moving average filter over 1 second in the time
In the case that a user bends all the fingers excluding the domain. Each value is regarded as an immersive state when a
thumb, it is regarded as “grabbing” gesture. significant change (greater than the average and the standard
In the proposed method, gesture recognition is used for deviation range of the collected values over 4 seconds)
sending a selection event into the game system. For appears. In case of the GSR, an incremented value is regarded
discriminating the selection and non-selection operations for a as an immersive state, and the decremented values of PPG and
specific object in the game system, only two gestures such as SKT are regarded as an immersive state.
grabbing and not-grabbing are used, as shown in Fig. 7, for an
intuitive and comfortable user interface. Using the data-glove,
the hand gestures of the user can be measured with great III. EXPERIMENTAL RESULTS
accuracy, compared to a camera vision based gesture system. In A. Experimental Setup
the actual game system, this gesture can be applied to acquire
items, for gun-shots and hitting some objects, and so on. A shooting genre game was selected for experiment, as
shown in Fig. 10. In this game, an aiming cursor is moved by
our gaze tracking method. Grabbing and throwing an axe
D. The Measurement of the Bio-signals toward a target is performed by the user’s hand gesture by the
The bio-signal sensors measure the reaction of the data glove. The game difficulty is controlled through the
autonomic nervous system of the user. The immersion state of measurement and analysis of the user’s bio-signals. When the
the user is estimated based on the Boucsein awakening model bio-signals indicate the lower level of the user’s concentration,
[26]. We measured the immersion state of the user by using the trembling amount of aiming cursor is increased, which
three sensors, a PPG, a GSR and an SKT among the many makes it more difficult to aim at the target. The three
kinds of bio-sensors, for the user’s convenience through programs of bio-signal, gaze tracking and gesture recognitions
simple attachment. are operated concurrently on one computer.
H. Heo et al.: A Realistic Game System Using Multi-Modal User Interfaces 1369

Fig. 10. The game used for the experiment Fig. 12. Examples of the calculated gaze positions about 9 reference positions

The experimental results showed that the average RMS


error was about 25 pixels (20 pixels on the X axis and 15
pixels on the Y axis). The average RMS error of 25 pixels
corresponds to a gaze error of about 1°. Based on that, we
designed the targeting circle or character in the game system
to have a minimum radius of about 25 pixels.
Another problem of “aiming ahead” can be raised when
aiming at a moving target by gaze tracking method. That is, if
a user looks at the moving target and throws an axe to it, the
axe can reach behind the target because the target
continuously moves. However, the users were easily
accustomed to it in our experiment and they had tendencies to
look at the estimated position of the moving target considering
the moving route and speed when throwing an axe, which can
be an another factor to increase the level of interest and
Fig. 11. An example of playing the game using our proposed method
immersion of the game users.
The experiment is performed on a desktop computer of a In order to measure the gesture recognition accuracy, we
2.3 GHz CPU with 2 GB of RAM. In order to measure the performed the following test. Users were asked to grab, using
interest, immersion, and dizziness of our realistic game using the data glove, at 2 second intervals. The test was iterated 100
the gaze, gesture and bio-signal data, we designed two kinds times. The experimental results show that the average
of experiments such as objective and subjective tests. We recognition accuracy of the grabbing gesture was about 96.9%.
employed 20 subjects (10 males and 10 females, average age: Figure 13 shows the results of the 20 subjects. Through the
24.8 (±2.4)) to participate in the experiments. They had no experiment, we found that the grabbing gesture using the
experience with our proposed game system. An example of proposed method is sufficient to be used for our game system.
playing the game using our proposed system is shown in Fig.
11.

B. Experimental Results
In this section, objective, subjective tests and another
application used for the proposed method are explained.

B. 1 Objective Test
In the objective test, we measured the gaze tracking error,
the gesture recognition accuracy, and the accuracy in
estimating the concentration power obtained by the bio-signals.
We comparatively measured the game score using the
proposed interface and the conventional input device.
To measure the gaze tracking accuracy, we measured the
average root mean square (RMS) error, which was calculated
when 20 participants gazed at 9 predetermined specific points, Fig. 13. An example of the gesture recognition accuracies of the 20 users
as shown in Fig. 12. The test was iterated 5 times.
1370 IEEE Transactions on Consumer Electronics, Vol. 56, No. 3, August 2010

The small sized hand of participants #8 and #9 made it


difficult to fit for the data glove, which degraded the
recognition rate.
It is very difficult to measure the objective accuracy of the
estimation of the human emotional state, because there is no
accurate reference data of bio-signal. Therefore, the results of
subjective survey were used as the base data of the attentive
state of the subject. During a 10 minute game play using the
proposed system, we verbally asked about the attention level of
the subject. At the same time, their bio-signals were measured
using our method. The subjective score of attention level and
the corresponding measurement of the bio-signal were
normalized as the values between 0 ~ 1 and then we calculated
the correlations between them using following equation.
(a)
Cov( s, b) (5)
Csb =
σ s ⋅σ b

The correlation coefficient (Csb) between a bio-signal (b) and


the score of subjective survey (s) can be obtained by
normalizing the covariance (Cov(s,b)) by each standard
deviation (σs and σb). Consequently, the range of correlation
coefficient is between -1 and 1. The latter case (“1”) means
the most significant correlation.
As a result, the correlation coefficients between the
subjective scores and respective measurement of PPG, GSR,
SKT were -0.539, 0.124, and 0.201. Based on them, we found
that the PPG is the most reliable metric for estimating the level
of user’s concentration.
To do a comparative analysis of our proposed method and
the conventional input device, the game scores of the subjects (b)
were measured using our system and mouse over 60 seconds.
Fig. 15. The experimental results of the 20 users (***: p<0.01), (a) the
Each subject performed this test 3 times. The Experimental survey results in terms of interest and immersion, (b) the average level of
results are shown in Fig. 14. The results showed that the score dizziness between mouse and proposed system.
gap between a conventional mouse and the proposed method
was great at the 1st play. However, it was gradually reduced at In the subjective test, we performed a survey in order to
the 2nd and 3rd play. From that, we confirmed that the game measure the user’s dizziness when the users executed the
players could accommodate the proposed system by repeated proposed system. We also asked about the level of the user’s
trials. interest and immersion in the proposed system, compared to
those using a mouse. Each question was answered using a
5-point scale where 1 and 5 represented “not at all” and “yes,
B. 2 Subjective Test very much,” respectively.
The results are shown in Fig. 15. The proposed method
acquired significantly higher scores of interest and immersion
levels than those by a mouse (p < 0.01, t = 0.0072 and 0.0001,
n = 20), as shown in Fig. 15 (a).
To validate the significance, T-test of two samples was
performed which is one of the most commonly adopted
hypothesis tests. The significance is measured by checking
whether the average difference between two groups is greater
or not [27]. “p < 0.01” means that the difference is significant
at the confidence level of 99%. The confidence level is
induced by calculating t-value such as 0.0072 of “interest” and
0.0001 of “immersion” using 20 samples (n=20) per each test
[27]. However, the dizziness rate using the proposed method
was not significantly different from that using a mouse
although that using the proposed method is a little greater, as
Fig. 14. The comparisons of the average game scores for the 20 users.
shown in Fig. 15 (b).
H. Heo et al.: A Realistic Game System Using Multi-Modal User Interfaces 1371

B. 3 Another Application: AR Based Game System IV. CONCLUSION


By using the additional scene camera mounted above the This research presented a realistic game system using a
HMD as shown in Fig. 3, our method can support an multi-modal interface based on gaze tracking, gesture
augmented reality (AR) based game system. The name of the recognition, and the measurement of bio-signals, such as PPG,
implemented game is “GRABBING THE FALLING GSR, and SKT. To calculate the gaze position, we designed
MONEY”, as shown in Fig. 16. Through gaze tracking, the the gaze tracking module which was attached below the HMD.
money is targeted and grabbed by using a hand gesture with The gaze tracking is used for the navigation interaction in a
the data glove. In order to increase the interest and immersion game. The gesture recognition was performed by a
of the user, the difficulty level of the game is changed commercial data glove, which was used to perform the
according to the level of user’s concentration. selection events in the game. The bio-signals were used for
adjusting the level of difficulty of the game. According to the
experimental results, we confirmed that each module
performed successfully and concurrently. The proposed
method provided a higher effect on experiencing immersion
and interest to users, compared to the conventional mouse.
In future works, through implementing a multi-user game
system using the proposed method, we would use the
bio-signal for communicating with another game player. We
would also increase the kinds of gesture in order to perform
additional interactions in game.

REFERENCES
[1] H. I. Lee, H. K. Jeong, and J. H. Han, "Arcade video game platform
built upon multiple sensors," IEEE International Conference on
Multisensor Fusion and Integration for Intelligent Systems, pp. 111-113,
Aug 2008.
[2] T. Ohno, “Quick menu selection task with eye mark,” Transactions of
Information Processing Society of Japan, vol. 40, no. 2, pp. 602-612,
(a) 1999.
[3] M. Yamato, A. Monden, K. Matsumoto, K. Inoue, and K. Torii, “Quick
button selection with eye gazing for general GUI environment,”
International Conference on Software: Theory and Practice, 2000.
[4] C. W. Cho, J. W. Lee, E. C. Lee, and K. R. Park, “A robust gaze
tracking method by using frontal viewing and eye tracking cameras,"
Optical Engineering, vol. 48, no. 12, pp. 127202-1 ~ 127202-15, Dec
2009.
[5] J. Zhu, and J. Yang, “Subpixel eye gaze tracking,” The Fifth IEEE
International Conference on Automatic Face and Gesture Recognition,
pp. 124-129, May 2002.
[6] D. H. Yoo, J. H. Kim, B. R. Lee, and M. J. Chung, “Non-contact eye
gaze tracking system by mapping of corneal reflections,” The Fifth IEEE
International Conference on Automatic Face and Gesture Recognition,
pp. 101-106, May 2002.
[7] J. G. Wang, and E. Sung, “Study on eye gaze estimation,” IEEE
Transactions on Systems, Man and Cybernetics, part B, vol. 32, issue 3,
pp. 332-350, June 2002.
[8] S. W. Shih, and J. Liu, “A novel approach to 3D gaze tracking using
stereo cameras,” IEEE Transactions on Systems, Man and Cybernatics,
part B, vol. 34, issue 1, pp. 234-245, Feb 2004.
[9] M. C. Roh, B. Christmas, J. Kittler, and S. W. Lee, "Gesture spotting for
low-resolution sports video annotation," Pattern Recognition, vol. 41, no.
3, pp. 1124-1137, 2008.
[10] H. D Yang, A. Y. Park, and S. W. Lee, "Gesture spotting and
(b)
recognition for human-robot interaction," IEEE Trans. on Robotics, vol.
23, no. 2, pp. 256-270, 2007.
Fig. 16. The implemented AR Based Game System, (a) at a low [11] O. Belmontea, M. Castanedab, D. Fernándeza, and J. Gila, “Federate
concentration power state, (b) at a high concentration power state. resource management in a distributed virtual environment,” Future
Generation Computer Systems, vol. 26, pp. 308-317, 2010.
When the bio-signal indicates low concentration (when [12] D. Baricevic, H. Dujmic, M. Saric, and I. Dapic, “Optical tracking for
playing game at first), the background of game is plain, as QAVE, a CAVE-like virtual reality system,” International Conference
on Software, pp. 25-27, 2008.
shown in Fig. 16 (a). However, it shows high concentration [13] N. Bee, B, Falk, and E. Andre, “Simplified facial animation control
(when the time of playing game is passed), the background of utilizing novel input devices: a comparative study,” The 13th
the game system is replaced by the captured image of the International Conference on Intelligent User Interfaces, pp. 197-206,
scene camera. It can increase the level of difficulty due to the 2009.
[14] D. Bowman, 3D User Interfaces Theory and Practice, Addison Wesley,
complicated background, as shown in Fig. 16 (b). 2005.
1372 IEEE Transactions on Consumer Electronics, Vol. 56, No. 3, August 2010
[15] J. H. Kim, and E. Andre, “Four-channel biosignal analysis and feature BIOGRAPHIES
extraction for automatic emotion recognition,” Biomedical Engineering
Systems and Technologies, vol. 25, pp. 265-277, 2008.
Hwan Heo received the BS degree in data processing and
[16] E. Hristova, M. Grinberg, and E. Lalev, “Biosignal based emotion
information from Dongyang Technical College, Seoul,
analysis of human-agent interactions,” Cross-Modal Analysis of Speech,
South Korea, in 2009. He is currently a MS candidate in
Gestures, Gaze and Facial Expressions, vol. 5641, pp. 63-75, 2009.
the Division of Electronics and Electrical Engineering at
[17] C. D. Katsis, N. Katertsidis, G. Ganiatsas, and D. I. Foriadis, “Toward
Dongguk University. He is also a research member of
emotion recognition in car-racing drivers: a biosignal processing
BERC. His research interests include image processing,
approach,” IEEE Transactions on Systems, Man, and Cybernetics, part
computer vision, and HCI.
A, vol. 38, issue 3, May 2008.
[18] C. E. Izard, Human Emotions, Springer, 1977.
Eui Chul Lee received the BS degree in software from
[19] J. W. Yoon, S. W. Kim, J. H. Ryu, and W. T. Woo, “Multimodal gumdo
Sangmyung University, Seoul, South Korea, in 2005. He
simulation: the whole body interaction with an intelligent cyber fencer,”
received the MS and Ph.D. degrees in Computer Science
Lecture Notes on Computer Science, vol. 2532, pp. 1088-1095, 2002.
from Sangmyung University, in 2007 and 2010,
[20] C. Magerkurth, R. Stenzel, Dr. Dr. N. Streitz, and E. Neuhold, "A
respectively. He is currently a Researcher in the Division
multimodal interaction framework for pervasive game applications,"
of Fusion and Convergence of Mathematical Sciences at
Workshop at Artificial Intelligence in Mobile System (AIMS 2003), Oct
NIMS (National Institute for Mathematical Sciences)
2003.
since March 2010. His research interests include computer vision, image
[21] E. Tse, S. Greenberg, C. Shen, and C. Forlines, "Multimodal multiplayer
processing, pattern recognition, ergonomics, BCI and HCI.
tabletop gaming," Computers in Entertainment (CIE), vol. 5, issue 2, pp.
1 ~ 12, 2007.
Kang Ryoung Park received the BS and MS degrees in
[22] D. S. Eom, T. Y. Kim, H. H. Jee, H. I. Lee, and J. H. Han, "A
Electronic Engineering from Yonsei University, Seoul,
multi-player arcade video game platform with a wireless tangible user
Korea, in 1994 and 1996, respectively. He received the
interface," IEEE Transactions on Consumer Electronics, vol. 54, no. 4,
Ph.D. degree in Electrical and Computer Engineering,
pp. 1819-1824, Nov 2008.
Yonsei University, in 2000. He has been an Associate
[23] Hakkinen, J., Vuori, T., and Paakka, M., “Postural stability and sickness
Professor in the Division of Electronics and Electrical
symptoms after HMD use,” IEEE International Conference on Systems,
Engineering at Dongguk University since March 2009.
Man and Cybernetics, vol. 1, pp. 147-152, Oct 2002.
He is also a research member of BERC. His research
[24] E. C. Lee, and K. R. Park, “A robust eye gaze tracking method based on
interests include computer vision, image processing, and biometrics.
virtual eyeball model,” Machine Vision and Applications, vol. 20, issue
5, pp. 319-337, 2009.
[25] C. W. Cho, J. W. Lee, E. C. Lee, and K. R. Park, “A robust gaze Chi Jung Kim received the BS degree in digital media
tracking method by using frontal viewing and eye tracking cameras," from Sangmyung University, Seoul, South Korea, in 2009.
Optical Engineering, vol. 48, no. 12, pp. 127202-1 ~ 127202-15, Dec He is currently a M.S. candidate in the Department of
2009. Computer Science at Sangmyung University. His research
[26] W. Boucsein, Electrodermal Activity, New York, Kluwer Academic interests include psychophysiology, emotion engineering,
Publishers, 1992. attentive user interface, and HCI.
[27] B. K. Moser, G. R. Stevens, and C. L. Watts, “The two-sample test
versus Satterthwaite’s approximate f test,” Communications in Statistics
– Theory and Methods, vol. 18, no. 11, pp. 3963–3975, 1989. Mincheol Whang received the MS and Ph.D. degrees in
Biomedical Engineering from Georgia Institute of
Technology, Atlanta, Georgia, in Unite State, in 1990 and
1994, respectively. He was a Professor in the Division of
Digital Media Engineering at Sangmyung University since
March 1998. His research interests include Human
computer interaction, Emotion engineering, Human
factors and Bioengineering.

View publication stats

You might also like