Professional Documents
Culture Documents
Visual Informatics
journal homepage: www.elsevier.com/locate/visinf
article info a b s t r a c t
Article history: In the immersive flow visualization based on virtual reality, how to meet the needs of complex
Received 12 October 2021 professional flow visualization analysis by natural human–computer interaction is a pressing problem.
Received in revised form 3 December 2021 In order to achieve the natural and efficient human–computer interaction, we analyze the interaction
Accepted 7 December 2021
requirements of flow visualization and study the characteristics of four human–computer interaction
Available online 14 December 2021
channels: hand, head, eye and voice. We give out some multimodal interaction design suggestions
Keywords: and then propose three multimodal interaction methods: head & hand, head & hand & eye and head
Flow visualization & hand & eye & voice. The freedom of gestures, the stability of the head, the convenience of eyes
Virtual reality and the rapid retrieval of voices are used to improve the accuracy and efficiency of interaction. The
Multimodal interaction interaction load is balanced by multimodal interaction to reduce fatigue. The evaluation shows that
Human–computer interaction
our multimodal interaction has higher accuracy, faster time efficiency and much lower fatigue than
the traditional joystick interaction.
© 2021 The Authors. Published by Elsevier B.V. on behalf of Zhejiang University and Zhejiang University
Press Co. Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
https://doi.org/10.1016/j.visinf.2021.12.005
2468-502X/© 2021 The Authors. Published by Elsevier B.V. on behalf of Zhejiang University and Zhejiang University Press Co. Ltd. This is an open access article under the
CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
C. Su, C. Yang, Y. Chen et al. Visual Informatics 5 (2021) 56–66
paper, the strategy of multimodal interaction is proposed, which front of a 2D display, such as selecting, dragging and zooming. The
combines different interactions in a serial or parallel way to make QuickSet system proposed by Cohen et al. (1997) is a collaborative
use of the complementarity of them. The multimodal interaction multimodal system using wireless handheld devices. The system
method is used to optimize the interaction of immersive flow analyzes voice and stylus input in real-time. They cooperated
visualization, improve the accuracy of interaction and reduce with the U.S. Naval Research Laboratory to build the 2D version
interaction fatigue. The main contributions of this paper are: of QuickSet and integrated it into the virtual battlefield planning
platform.
• We provide a reference for the design of immersive visual- Pfeuffer et al. (2017) proposed an interactive technology of
ization interaction paradigm, especially for the multimodal
gesture and gaze, which can select objects through gaze and
interaction in immersive flow visualization.
manipulate by gestures. Users can interact well with near and
• We propose three multimodal interaction methods to im-
far interactive objects by the technology. Wang et al. (2019) pro-
prove the interaction accuracy and efficiency, balance the
posed a remote collaboration system based on augmented reality.
workload of the interaction channel, and reduce user fatigue.
Users can complete tasks by gestures and head pointing. Results
show that head and hand collaborative interaction improves the
2. Related work
interactive experience. Some researchers consider applying more
input channels to virtual reality. Koons et al. (1998) implemented
As an important research branch of scientific visualization,
a map interaction system using interactive technologies such
flow visualization has a long history of development, but the ap-
as 3D pointing gestures, speech and eye-tracking. Problems of
plication of virtual reality technology in flow visualization started
concurrent multimodality were also discussed. Oviatt et al. (2000)
in the 1990s.
studied a multimodal 3D virtual vehicle auxiliary maintenance
system based on virtual reality. The input channels include avatar
2.1. Immersive flow visualization
based on body tracking, gesture recognition and voice input. The
Bryson and Levit (1991) accomplished the application of sci- system implements the fusion of concurrent input channels. Kok
entific visualization in the virtual wind tunnel in 1991. The goal and Van Liere (2007) developed a set of interfaces named VR-
of the system is to effectively visualize the 2D flow field. It VTK, which are used to implement 3D display and multimodal
allows users to inject particles into the precalculated flow field interaction in VTK. The interfaces make use of head tracking
and observe its trajectory. The system ensures interactivity by to control the camera, pedals to grasp, voice input to program
distributed computing (Bryson and Gerald-Yamasaki, 1992). Sub- commands and system control. Besides, they also deeply studied
sequently, the aerospace research centers in France, Germany, the problems related to 3D space interaction, such as complex
Spain, and Japan have developed their own virtual wind tunnel (Li interaction methods and depth enhanced perception.
et al., 2013). LaViola (2000) developed a multimodal scientific In general, there are two main shortcomings in the current
visualization prototype system called MSVT, which allows users study: first, user interaction techniques are not defined well for
to observe the results of 2D scientific visualization by gesture and exploring flow visualizations due to hardware limitations. Sec-
voice interaction. They optimized the implementation method of ond, limited interactions are provided, which cannot satisfy the
multimodal interaction through user experiments and evaluation. requirements for complex flow data exploration.
With the continuous development of head-mounted virtual
reality devices, more convenient virtual reality helmets, such as 3. Multimodal interaction design
HTC Vive and Oculus Rift have emerged. The research of immer-
sive flow visualization based on consumer virtual reality devices There are three problems that need to be solved in multimodal
has gradually developed. Wernert et al. (2012) studied the inte- interaction: task requirements, multimodal interaction support
gration of visualization tool VTK and virtual reality environment. technology and multimodal interaction fusion method.
They demonstrated two new methods to simplify the integration The general interaction tasks in a virtual reality environment
of immersive interface and visual rendering. Besides, they intro- include navigation/roaming, selection/operation and system con-
duced some functions of rapid update and efficient interaction. trol. The interaction tasks in flow visualization are more detailed
Paeres et al. (2021) demonstrated a virtual wind tunnel using and complex. Requirement analysis is the premise of establishing
virtual reality technology as a scientific visualization tool, which the mapping relationship among interaction tasks, input channels
enables users to observe complex turbulence in an immersive and interaction technologies, which is also the basis of virtual
environment. reality multimodal interaction design.
The supporting technology of multimodal interaction is the
2.2. Multimodal interaction single-channel interaction. There are many devices available in
the virtual reality environment, so it is necessary to select and
Multimodal interaction is the interaction combining two or manage these devices and their interactive channels properly. The
more input channels in a system. It makes human–computer analysis of the characteristics of every single channel also pro-
interaction more natural and effective by the use of different vides a decision-making basis for the multimodal fusion method.
human sensory channels, which has the following advantages: re- Multimodal interaction is optimized by the complementarity of
ducing coupling, reducing errors, increasing flexibility, controlling different channels and the adaptability of channel tasks. There-
intellectual resources, and reducing user cognitive load (Zhang fore, the fusion of different interaction channels is the core prob-
et al., 2016). The first research of multimodal interaction for lem of multimodal interaction. Appropriate fusion methods will
graphic display is bolt’s put-that-there (Bolt, 1980), which inte- ensure the availability of multimodal interaction and even im-
grates voice input and guidance input based on tracker, so that prove the efficiency of interaction.
users can create and edit 2D graphic elements in front of the rear
projection screen. 3.1. Requirement analysis
The combination of gesture and voice is the most intuitive
way of multimodal interaction, so it is also the most discussed. The interaction tasks in human–computer interaction can be
Lucente et al. (1998) utilized speech recognition and video-based divided into navigation/roaming, selection/operation and system
hand tracking input to enable users to operate large objects in control. To analyze the flow visualization better, it is necessary
57
C. Su, C. Yang, Y. Chen et al. Visual Informatics 5 (2021) 56–66
Table 1 Table 2
Interaction requirements of flow visualization. Characteristics of channels.
Categories Requirements Details Channel Advantages Disadvantages
Data reading Read numerical simulation Joystick Accurate and easy, feedback Fixed shape, destroy immersion
2D calculation data file from hard disk. through vibration (Wang (Wang et al., 2020). fatigue
Algorithm management Add, select and delete visualization et al., 2020). (Boring et al., 2009).
algorithms reasonably. Hand Low equipment requirements Affected by the environment,
Algorithm parameter Configure or change the types and and cost, high degree of tracking range is limited, not
configuration values of visualization algorithm freedom, natural (Yang et al., accurate (Yang et al., 2019).
parameters. 2019).
Spatial parameter Determine the scope or boundary Head Stable and accurate, higher Rotate frequently in the large
3D attention and interest FOV scene (Blattgerste et al.,
configuration of the visualization algorithm.
(Sidenmark and Gellersen, 2018). slower than eye (Bizzi,
3D Geometric Move, zoom and rotate the 2019). 1974) . Midas problem (Drewes,
transformation visualization graphics. 2010).
Eye Faster and lower cost than Not stable (Blattgerste et al.,
head (Blattgerste et al., 2018). Eye calibration required,
2018). Reduce fatigue Midas problem (Drewes, 2010).
to analyze the requirements of specific interaction tasks in flow (Drewes, 2010).
visualization. By studying the workflow of current mature flow
Voice Suitable for non graphic Commands need to be
visualization software such as Ensight and Paraview, and com- (Billinghurst et al., 2018). memorized, recognition delay,
munication with domain experts, we define five kinds of flow Efficient and precise input of start and end need to be
visualization interaction requirements. In order to facilitate the text (Harris, 2005). determined Harris (2005).
study of different interaction methods, we divide the interac-
tion requirements into two categories: 2D interaction and 3D
interaction. As shown in Table 1, 2D interaction involves three Gaze. Both head gaze and eye gaze have the Midas problem. The
interaction requirements: data reading, algorithm management reason may be that the visual channel can only provide direc-
and algorithm parameter configuration: 3D interaction involves tional information but cannot quickly reconfirm the interaction.
two interactive requirements: spatial parameter configuration Comparing the two gaze interaction methods, the head is more
and 3D geometric transformation. stable but more tired than the eye, and the eyeball is faster
In 2D interaction, data reading needs to support a variety of and more convenient but more unstable. The head and eye will
formats of numerical simulation calculation files. The common cooperate with each other when observing from a large angle of
visualization data file formats are Tecplot, plot3d, VTK, CGNS, view.
etc. In addition, it also needs to have the ability to calculate
various vectors in the grid data attributes for further analysis. Voice. It has the ability of accurate text input, but the input
In the analysis of flow visualization, multiple visualization algo- contents should not be too many, otherwise it will bring great
rithms are involved, which need to be managed correctly to avoid recognition delay and user memory pressure. The beginning and
confusion and illegal operation. The commonly used visualiza- end of speech recognition need the cooperation of other channels.
tion algorithms are streamline, clipping, isosurface, and so on. Based on the information above, we put forward some prelim-
Each algorithm has the corresponding parameter configuration inary ideas of multimodal interaction design:
requirements. For example, in the streamline algorithm, drawing (1) The introduction of gaze (head or eye) interaction can
direction, seed area and seed number need to be configured; In reduce the burden of hand interaction. The direction provided by
scalar coloring algorithm, we need to configure scalar type, scalar visual interaction can reduce the frequency of gesture interaction,
range, color distribution and color order. which helps to alleviate the problem of hand fatigue.
In 3D interaction, widgets are interactive components created (2) Introducing other channels for confirmation is a better way
to solve the Midas problem of gaze interaction, such as head gaze
to determine some 3D spatial information of visualization algo-
selection + gesture confirmation; In addition, the combination of
rithm. For example, linear widget is used to control the seed
eye gaze and head gaze can also solve the problem without hand,
points area of the streamline. 3D Geometric Transformation is
which can further reduce the use of hands.
used to control the perspective of 3D scene, including navigation,
(3) Voice interaction is suitable for short and precise interac-
zooming and rotation, which is convenient for users to observe
tion or non-graphic commands, especially for retrieval interac-
the visualization results from multiple angles.
tion. When there are many similar commands, it can directly hit
the target, reducing manual search time. In order to avoid the
3.2. Channel analysis delay of voice wake-up, the start and end of voice interaction can
be controlled by other interaction channels (such as gestures).
Considering the common interaction channels and the avail-
ability of interaction devices, this paper discusses the combina- 3.3. Multimodal interaction method
tion of five interactive channels: joystick, hand, head, eye and
voice. Referring to the existing researches and papers, we summa- According to the characteristics of interaction operation and
rize the characteristics of each channel as shown in Table 2. The interaction channels, we propose three multimodal interaction
joystick is used as a traditional interaction mode for comparison, methods.
so we focus on the analysis of the other interaction channels.
Dual-channel interaction: head & hand. Gesture interaction is
Hand. No matter what implementation method is, there is a more flexible and natural than joystick interaction, so gesture
fatigue problem of long-time use. The reason may be that hands interaction is used as the main interaction channel. However,
hanging needs the support of the whole arm muscle, which gesture interaction is not accurate enough. In previous studies,
will consume more physical strength. Considering the cost and it is found that the instability of gesture brings some negative
interaction efficiency, visual-based gesture interaction is a better effects when making more accurate selection interaction, which
interaction method in hand channel. reduces the accuracy and efficiency of interaction. In addition,
58
C. Su, C. Yang, Y. Chen et al. Visual Informatics 5 (2021) 56–66
Table 3
Interactive channel assignment.
Interaction event Dual- Three- Four-channel Fusion
channel channel mode
UI select Head gaze Head gaze Head gaze/voice
UI click Gesture Eye gaze Eye gaze/voice Parallel
UI switch Gesture Gesture Voice/gesture
Scene interaction Gesture Gesture Gesture
Serial
Widget interaction Gesture Gesture Gesture
if all operations are completed by gesture interaction channel 3.4. Interaction event
only, it will bring heavy interaction burden and fatigue to users.
The head gaze has a relatively stable direction and low motion In the light of the interactive information from channels, we
burden, so the head gaze interaction channel is added to provide can trigger the corresponding interaction event. An interaction
direction selection information. Users click and confirm with event contains a series of related operations to finish a func-
gestures while aiming by the head(see Fig. 1.a). tion. In the requirements analysis section, we summarized the
interactive requirements of flow field visualization, but it is a
Three-channel interaction: head & eye & hand. Selection and click user-oriented classification rather than function-oriented, so We
interaction is a common interaction in flow visualization. In dual- reclassify all interaction operations into three categories: UI in-
channel interaction, head gaze is used as the selection, while ges- teraction, scene interaction and widget interaction (see Table 3).
ture is used as the click. This modality provides a more accurate In the immersive environment, 3D interaction needs can be com-
and stable choice. However, gesture still needs to be suspended pleted through 2D interface, which can be called UI Interaction.
frequently in the whole interaction process. Besides, long-term It consists mainly of interface control, selection and click op-
use may lead to heavy interaction fatigue. In order to reduce erations. The Spatial parameter configuration requirements are
interaction fatigue, it is necessary to reduce the rate of gesture completed by the interaction with widgets, which can be called
interaction. Therefore, eye gaze interaction channel is added to Widget Interaction. The 3D geometric transformation in 3D inter-
complete the selection and click interaction through the cooper- action needs is completed by transforming the scene. We call the
ation of head and eye gaze. So the purpose of reducing gesture 3D geometric transformation as Scene Interaction, which involves
interaction fatigue can be achieved. the translation, scaling and rotation of the scene.
Users aim with the eyeball and use the head gaze aiming to
confirm(see Fig. 1.b). Only when the UI interface is called out 4. Implementation
by gestures, eye aiming and head aiming will start. The green
translucent aperture indicates the general scope of the user’s eye In order to establish the mapping relationship between the
staring. The red cursor indicates the precise position of the user’s interaction task and the interaction channel, we propose a mul-
head staring. When the red cursor enters the green aperture, the timodal immersive flow visualization system framework. The
angle between the head gaze direction and the eye gaze direction overall framework is shown in Fig. 2.
is less than 8.5 degree. When the Coincidence time exceeds 0.5 s,
it is recorded as a click operation, then the cursor position of head 4.1. Hardware
gaze is transmitted to the UI interface for processing as the click
position. The hardware devices of the interaction channel are as fol-
lows:
Four-channel interaction: head & eye & hand & voice. The func-
VR headset: HTC Vive Pro Eye. Eye tracker: HTC Vive Pro Eye
tions involved in the flow visualization are quite complex.
Tracker. Hand tracker: Leap Motion. Voice: Microphone in VR
Whether the two-dimensional icons or three-dimensional models
headset. Joystick: HTC VR joystick. The specific layout of each
are used as metaphors, the number of metaphors will increase
input device is shown in Fig. 3.
with the increase of functions. When the number of metaphors
is large, users will spend more time on searching. At the same
4.2. Single interactive channel
time, more icons or models in the virtual reality space will also
cause serious visual occlusion and tedious interaction. Therefore,
4.2.1. Head
based on the three-channel interaction method, we add the voice
In head gaze interaction, the VR headset provides head orien-
interaction channel to control and manage the flow visualization
tation. We get the orientation of the headset which is also the
algorithm by making use of the characteristics that voice is
head direction vector by the OpenVR SDK. The vector is used for
suitable for fast retrieval. Compared with the wake-up word,
collision detection with the visualization results or intersection
gesture interaction can wake up and stop the voice interaction
calculation with the user interface (UI) to obtain the head gaze
more quickly.
object or UI coordinates.
Users make a voice gesture to turn on the voice recognition ⃗ 1 , r2 , r3 ) and the staring
The user’s head direction vector is R(r
function(see Fig. 1.c). During the duration of voice gesture, the point is A(a1 , a2 , a3 ). The origin of UV coordinate of UI inter-
microphone will be turned on to record the user’s voice. When face is O(o1 , o2 , o3 ). The normal vector of the UI interface is
the voice gesture ends, the voice recording will be ended too. The W⃗ (w1 , w2 , w3 ). The two axes of the UI interface are U(u ⃗ 1 , u2 , u3 )
recorded voice will be sent to the recognizer for language recog- and V⃗ (v1 , v2 , v3 ). The parameter equation of user gaze vector R ⃗
nition. After a short time, the corresponding interaction events is:
will be triggered according to the recognized voice. Besides, the
system sound can also be used as an output channel to give users x = a1 + r1 ∗ t ; y = a2 + r2 ∗ t ; z = a3 + r3 ∗ t (1)
interactive feedback. When the graphic feedback is not obvious,
The equation of UI interface is:
the headset sound can tell the user whether the operation is
correct or wrong. w1 ∗ (x − o1 ) + w2 ∗ (y − o2 ) + w3 ∗ (z − o3 ) = 0 (2)
59
C. Su, C. Yang, Y. Chen et al. Visual Informatics 5 (2021) 56–66
Fig. 1. (a) head & hand: Aim through the head and confirm by gestures. (b) head & eye: Aim through the head (red spot) and confirm by head eye coincidence (the
green circle is the eye gaze area). (c) hand & voice: Select by voice command while start and end with gestures.
According to (1) (2), the intermediate variable t can be solved as: dithering procedure is carried out to ensure gaze stability. Finally,
eye gaze is combined with head gaze to complete the interaction.
∑3 Before using eye gaze interaction, users need to conduct an
i=1 (oi − ai ) ∗ wi
t= (3) initial eye calibration to ensure that the measured gaze direc-
3
wi ∗ ri
∑
i=1 tion is the same as the actual gaze direction. Because of the
dual-channel interaction between the head and eyeball, we can
Substituting (3) into (1), we can get the intersection coordinates
propose a combination of fast initial calibration and dynamic
as B(x, y, z). By the transformation of 3D coordinate system, the
implicit calibration. Compared with the common 5-point or 9-
intersection B is changed from the world coordinate to the UI
point calibration, 1-point calibration can be applied to initial
coordinate, which is C (u, v, 0). This coordinate C can be used to
calibration, even if the calibration accuracy of this method is
trigger interface interaction.
low. With the use of eye gaze interaction, the calibration re-
sults can be continuously corrected through dynamic implicit
4.2.2. Eye calibration as shown in Fig. 4. In this way, even if there is a
Firstly, the eye gaze vector output by the eye tracker is trans- deviation in eye tracking during the interaction, the usability of
formed into world coordinates, then the calibration procedure is eye-tracking interaction can be guaranteed through continuous
carried out to ensure the accuracy of gaze direction, and then the correction without special and complex calibration again.
60
C. Su, C. Yang, Y. Chen et al. Visual Informatics 5 (2021) 56–66
4.2.5. Joystick
The interactive information of the joystick is provided by the
joystick controller, including the position, direction and button
information of the joystick. We define the function of each key
according to the requirements of operation. The information of
all joystick controllers can be obtained through the OpenVR SDK.
Fig. 5. According to the characteristics of fingers and joints, such as the number of fingers extended, the degree of grasp and the normal direction of palm, the
recognition algorithm is developed for gesture recognition.
Fig. 6. The visualization algorithms are managed by tree structure. All visual-
ization results are mapped in the same VR environment to realize composite
visual analysis and each node can independently control the corresponding
visualization results.
Fig. 8. a: Digital input. b: Data reading. c: Streamline adding. d: Streamline generating. e: Streamline coloring. f: Streamline recycling.
5.2. Procedure
6. Discussion
accurate feedback. Besides, the questionnaire of Experiment 1 a large amount of interaction time is saved by voice interaction
shows that the learning difficulty of dual-channel interaction is in subtask 5, which proves that we have achieved the purpose
the second difficult. Participants think that the learning difficulty of saving interface interaction and function search time by voice
is to maintain a relatively standard gesture. Because of the small interaction in the design. As can be seen from Fig. 11, the total
recognition range of the gesture controller and the occlusion interaction completion time of four-channel interaction is the
problem of recognition, the range and accuracy of gesture recog- least, and the occupation time of each interaction channel is more
nition are not good enough. Nevertheless, in Experiment 2, the balanced. Therefore, the application of voice interaction optimizes
total completion time of the dual-channel interaction method multimodal interaction. Overall, four-channel interaction is the
is still the second fastest. It can be seen from Fig. 10 that its best interaction method at present. It has certain advantages in
efficiency in other tasks is almost the same as that of a joystick interaction efficiency and interaction fatigue. Although the advan-
interaction, but it has flexible gestures, which makes its efficiency tages are not obvious, there is a huge space for optimization. The
in streamline generating subtask better than that of joystick. more interaction channels, the higher the requirements for every
In the questionnaire survey, the overall evaluation of the dual- single interaction channel. It should be noted that pronunciation
channel interaction method is relatively general. The participants and accent should be fully considered in the design of voice
instructions.
do not have too many negative comments, except for dissat-
isfaction with the tracking quality of gestures and the design 7. Conclusion
of some gestures. In general, dual-channel interaction is a good
interaction method. Its performance is comparable to traditional In this paper, we study the application of multimodal inter-
joystick interaction. It has great advantages in three-dimensional action in flow visualization. We analyze the interaction require-
interaction, but also needs to be optimized in interaction fatigue. ments in flow visualization. Then the advantages and disadvan-
Three-channel interaction: head & hand & eye. We can see from tages of gesture, head, eye and voice channel through literature
are summarized. Based on the principle of multimodal comple-
Experiment 1 that the error rate of the three-channel interaction
mentarity, we propose three multimodal interactions: head &
is in the middle, which mainly comes from Midas problem. Al-
hand, head & eye & hand and head & eye & hand & voice. The
though the restrictions are added, there is still a small probability
parallel cooperative interaction methods of head & hand, head &
that the user will trigger the click operation unconsciously. By
eye and hand & voice are described in detail. We also designed
analyzing the questionnaire data of experiment 1, we find that
an immersive flow visualization system with multimodal inter-
the proportion of participants who think that the three-channel
action. The evaluation shows that natural multimodal interaction
interaction is the most difficult to learn is the highest. The reason can improve the user’s interaction experience by improving the
may be that the way of head eye collaborative interaction is interaction accuracy, accelerating the interaction, dispersing and
rare and the eye-tracking is not accurate and stable enough. reducing the interaction fatigue. Our future research includes: (1)
Nevertheless, a significant benefit of this method is to reduce Optimization of sub-channel. Improving the range and accuracy
fatigue. The proportion of users who think this method is the of gesture and eye-tracking. (2) Application expansion of each
easiest is the highest. It should be emphasized that those users sub-channel. We plan to expand the tools and widgets based on
who do not think the method is easy may mind the fatigue gesture interaction and study more eye interaction application
caused by the weight of VR headset rather than the method methods.
itself. In order to avoid false touch and improve accuracy, the
restrictions imposed to reduce the interaction efficiency, so the Ethical Approval
total task completion time of three-channel interaction is the
longest, especially in subtask 2 with many interface interactions. All procedures followed were in accordance with the ethical
However, there is still an interesting phenomenon. Although the standards of the institutional and/or national research committee
interface interaction efficiency of this method is low, the com- and with the 1964 Helsinki Declaration and its later amendments
pletion time in subtask 4 and 5 is actually less than that of or comparable ethical standards. All participants provided written
dual-channel interaction. We find that these two tasks require informed consent prior to enrolment in the user study.
fewer parameters, so the interaction interface is relatively small,
CRediT authorship contribution statement
and there are few and scattered buttons, so it is very conducive
to head-eye collaborative interaction. It can be seen from Fig. 11
Chengyu Su: Methodology, Software, Validation, Formal
that after the introduction of the eye interaction channel, the
analysis, Investigation, Data curation, Writing – original
occupation time of the hand channel finally decreased signifi-
draft. Chao Yang: Conceptualization, Writing – review &
cantly, but the cost is that the interaction efficiency decreases
editing. Yonghui Chen: Conceptualization, Writing – review &
slightly. By the experiment and analysis, we can find that three- editing. Fupan Wang: Resources, Project administration. Fang
channel interaction has a good performance in interactive load Wang: Resources, Supervision, Funding acquisition. Yadong
balancing, which alleviates the interaction fatigue. However, the Wu: Conceptualization, Funding acquisition. Xiaorong Zhang:
interaction efficiency decreased a little and the learning cost Resources, Supervision, Project administration.
increased slightly. Therefore, three-channel interaction needs to
be improved in efficiency. It is still an efficient interaction method Declaration of competing interest
in an appropriate application scenarios.
The authors declare that they have no known competing finan-
Four-channel interaction: head & hand & eye & voice. On the
cial interests or personal relationships that could have appeared
basis of three-channel interaction, voice interaction is added to to influence the work reported in this paper.
form four-channel interaction. In Experiment 1, we tried to input
numbers by voice interaction, but found it difficult to complete Acknowledgments
the whole experiment. The pronunciation of similar numbers
and the accent of participants have a great impact on speech This work was supported in part by the National Natural
recognition. The participants in Experiment 1 also expressed the Science Foundation of China (No. 61872304, No. 61802320), the
same view, that is, the experience is good when recognition is State Key Laboratory of Aerodynamics (SKLA20200203) and the
successful, but poor when it fails. It can be seen from Fig. 10 that National Numerical Windtunnel Project (NNW2019ZT6-A17).
65
C. Su, C. Yang, Y. Chen et al. Visual Informatics 5 (2021) 56–66
Appendix A. Supplementary data Koons, D.B., Sparrell, C.J., Thorisson, K.R., 1998. Integrating simultaneous input
from speech, gaze, and hand gestures. In: Readings in Intelligent User
Interfaces. pp. 53–64.
Supplementary material related to this article can be found
LaViola, J., 2000. MSVT: A virtual reality-based multimodal scientific visualization
online at https://doi.org/10.1016/j.visinf.2021.12.005. tool. In: Proceedings of the Third IASTED International Conference on
Computer Graphics and Imaging, pp. 1–7.
References Lei, J., Wang, S., Zhu, D., Wu, Y., 2019. Non contact gesture interaction method
for immersive medical visualization based on cursor model. J. Comput.-Aided
Billinghurst, M., Cordeil, M., Bezerianos, A., Margolis, T., 2018. Collaborative Des. Comput. Graph. 031 (002), 208–217.
immersive analytics. In: Immersive Analytics. pp. 221–257. Li, S., Cai, X., Wang, W., 2013. Large-scale flow field scientific visualization. Natl.
Bizzi, E., 1974. The coordination of eye-head movements. Sci. Am. 231 (4), Def. Ind. Press 16–18.
100–109. Lucente, M., Zwart, G., George, A., 1998. Visualization space: A testbed for
Blattgerste, J., Renner, P., Pfeiffer, T., 2018. Advantages of eye-gaze over head- deviceless multimodal user interface. In Proc. AAAI Intelligent Environments
gaze-based selection in virtual and augmented reality under varying field Symposium, pp. 87–92.
of views. In: Proceedings of the Workshop on Communication by Gaze Oviatt, S., Cohen, P., Wu, L., Duncan, L., Suhm, B., Bers, J., Holzman, T.,
Interaction. pp. 1–9. Winograd, T., Landay, J., Larson, J., et al., 2000. Designing the user interface
Bolt, R.A., 1980. Put-that-there: Voice and gesture at the graphics interface. Acm for multimodal speech and pen-based gesture applications: state-of-the-art
Siggraph Comput. Graph. 262–270. systems and future research directions. Hum.-Comput. Interact. 15, 263–322.
Boring, S., Jurmu, M., Butz, A., Scroll, tilt or move it: using mobile phones to Paeres, D., Santiago, J., Lagares, C.J., Rivera, W., Craig, A.B., Araya, G., 2021. Design
continuously control pointers on large public displays. In: Proceedings of of a virtual wind tunnel for CFD visualization. AIAA Scitech 2021 Forum.
the 21st Annual Conference of the Australian Computer-Human Interaction Pfeuffer, K., Mayer, B., Mardanbegi, D., Gellersen, H., 2017. Gaze + pinch
Special Interest Group: Design: Open 24/7, pp. 161–168. interaction in virtual reality. In: Proceedings of the 5th Symposium on Spatial
Bryson, S., Gerald-Yamasaki, M., 1992. The distributed virtual windtunnel. In: User Interaction. Association for Computing Machinery, pp. 99–108.
Proceedings of the 1992 ACM/IEEE Conference on Supercomputing, pp. Sidenmark, L., Gellersen, H., Eye&head: Synergetic eye and head movement
275–284. for gaze pointing and selection. In: Proceedings of the 32nd Annual ACM
Bryson, S., Levit, C., 1991. The virtual windtunnel: An environment for the Symposium on User Interface Software and Technology, pp. 1161–1174.
exploration of three-dimensional unsteady flows. In: Proceedings of the 2nd Wang, P., Zhang, S., Bai, X., Billinghurst, M., Zhang, L., Wang, S., Han, D., Lv, H.,
Conference on Visualization. Yan, Y., 2019. A gesture-and head-based multimodal interaction platform for
Chen, W., Shen, Z., Tao, Y., 2013. Data visualization. Electron. Ind. Press 29–33. MR remote collaboration. Int. J. Adv. Manuf. Technol. 105 (7), 3031–3043.
Cohen, P.R., Johnston, M., McGee, D., Oviatt, S., Pittman, J., Smith, I., Chen, L., Wang, S., Zhu, D., Yu, H., Wu, Y., 2020. Immersive WYSIWYG (what you see
Clow, J., 1997. Quickset: Multimodal interaction for distributed applications. is what you get) volume visualization. In: 2020 IEEE Pacific Visualization
In: Proceedings of the Fifth ACM International Conference on Multimedia. Symposium. PacificVis, pp. 166–170.
Association for Computing Machinery, pp. 31–40. Wernert, E.A., Sherman, W.R., O’Leary, P., Whiting, E., 2012. A common path
Drewes, H., 2010. Eye gaze tracking for human computer interaction. (Ph.D. forward for the immersive visualization community. IEEE VR 2012.
thesis). lmu. Xu, S., Zhao, D., Su, C., 2021. Immersive virtual reality interactive system for
Harris, R.A., 2005. Chapter 1 - introduction. In: Harris, R.A. (Ed.), Voice Interaction flow field visualization. J. Syst. Simul. 1–13.
Design. Morgan Kaufmann, San Francisco, pp. 3–31. Yang, L., Huang, J., Feng, T., Hong-An, W., Guo-Zhong, D., 2019. Gesture
Kok, A.J., Van Liere, R., 2007. A multimodal virtual reality interface for 3D interaction in virtual reality. Virtual Real. Intell. Hardw. 1, 84–112.
interaction with VTK. Knowl. Inf. Syst. 13, 197–219. Zhang, F., Dai, G., Peng, X., 2016. A survey on hum.-comput. interact. in virtual
reality. Sci. Sinica Informationis 46 (12), 1711–1736.
66