You are on page 1of 7

Displays 82 (2024) 102655

Contents lists available at ScienceDirect

Displays
journal homepage: www.elsevier.com/locate/displa

Investigating visual determinants of visuomotor performance in


virtual reality☆
Ken McAnally a, *, Guy Wallis a, Philip Grove b
a
School of Human Movement and Nutrition Sciences, The University of Queensland, Brisbane, Queensland, Australia
b
School of Psychology, The University of Queensland, Brisbane, Queensland, Australia

A B S T R A C T

We report the relative efficiency of visually guided movement in virtual reality (VR) compared to that in the real world using a standardised visuomotor task based on
Fitts’ tapping. Haptic cues were veridical across both displays to ensure that any differences in performance could be attributed to characteristics of the visual display.
The presence of binocular cues, and of monocular surface texture and motion parallax cues was manipulated across four ordinal visual conditions: binocular +
texture + motion, monocular + texture + motion, monocular + motion, monocular. Binocular cues were found to have a similar benefit for VR and real-world
performance. Surface texture cues were found to have no benefit for either display. Motion parallax cues were found to have a similar benefit for VR and real-
world performance. Overall, task performance in VR was lower than in the real world due to both a slowing of movements and a decrease in their precision.
Taken together, these results did not provide evidence for deficiencies in the presentation of binocular, texture or motion parallax cues in VR and suggests a non-
visual locus for the deficit in visuomotor performance in VR. Limitations in the accuracy and precision of hand tracking are sufficient to explain the observed
decrement in performance in VR.

1. Introduction bits/s [14,15]). Interaction in VR was improved with the introduction of


haptic cues [14], with most improvement being seen for a veridical,
Virtual reality (VR) is increasingly being employed for training in passive-haptic stimulus [16] in which the participants felt the touch of
aviation and medicine [1], remote operation [2], rehabilitation [3], and their fingertips on the real touchscreen while they viewed their virtual
in psychophysics research [4]. While these displays engender a high fingertip touching the virtual screen. Despite veridical haptic feedback,
degree of realism and presence, their utility in any particular environ­ throughput in VR remained around 1 bit/s worse than for the real
ment will depend on the fidelity required for the task at hand [5]. Pre­ touchscreen, which was attributed to deficiencies in the VR visual
vious studies have examined the effects of visual variables such as display.
texture and visual structure on visual perception in VR, for example of There are several ways in which viewing VR differs from viewing the
distance and size constancy [6–8]. However, an important function of real world [4]. First, images are graphically rendered by a computer to
vision is the guidance of goal-directed action [9,10]. Many tasks that simulate spatial relationships in the world, as well as object textures and
may be conducted or trained in VR require visually guided manual lighting. These images are dynamically revised in accordance with
interaction with objects in the world. This study examined the effect of movements of the observers’ heads in the virtual world. Second, these
visual display variables on the performance of visually guided action. images are presented to each eye via displays with limited field-of-view,
Any identified deficiencies of visual rendering would be targets for resolution, and focal range. These differences can lead to distortions in
further technological development. visual perception. For example, the perception of distance normally
We used a standardised reaching task [11,12] to assess the efficiency arises from the combination of monocular cues, including pictorial cues
of visually guided movement. Throughput (in bits/s) reflects the effi­ (e.g., occlusion, perspective, texture gradient, shading), motion parallax
ciency of movement in this task and is robust to variations in speed/ and accommodation, with binocular cues (e.g., disparity, vergence; see
accuracy criteria [13]. We have previously shown that throughput is [17] for a review). These cues provide complimentary information and
significantly higher for interactions with a real touchscreen (around 8 are reported to be similarly effective in VR at intermediate distance (3
bits/s) than for those on a virtual touchscreen simulated in VR (around 6 m), with pictorial cues dominating at far distance [18]. Conversely,


This paper was recommended for publication by Prof Guangtao Zhai.
* Corresponding author at: School of Human Movement and Nutrition Sciences, The University of Queensland, St. Lucia 4067, Queensland, Australia.
E-mail address: kenneth.mcanally@uq.edu.au (K. McAnally).

https://doi.org/10.1016/j.displa.2024.102655
Received 26 November 2023; Received in revised form 3 January 2024; Accepted 17 January 2024
Available online 22 January 2024
0141-9382/© 2024 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
K. McAnally et al. Displays 82 (2024) 102655

errors of binocular cues, in particular the matching of disparities, lead to designated as the current target and rendered in red. When touched, it
moderate errors in reaching to targets in peri-personal space [19]. Thus, was rendered white and a new target on the opposite side of the circle
both the visual perception of depth [18,20,21] and visual-motor inter­ was designated as the next target. The sequence of targets progressed
action in depth [15,19] are impaired in VR. around the circle twice, such that each trial comprised 42 movements
We examined the interaction between displays (real world, VR) and (Fig. 1b). Participants were instructed to move as quickly and accurately
an ordinal set of visual cues to position and distance including binocular as possible. Audio feedback (500-Hz 100-ms sine wave tone for hits,
cues, monocular texture cues, and motion parallax. The contribution of 250-Hz 100-ms square wave tone for misses) was given to encourage
binocular cues was examined in a planned comparison between binoc­ participants to maintain this criterion. Misses were allowed in each trial
ular presentation and monocular presentation to the dominant eye. The and all movements were included in the analyses.
contribution of monocular texture cues was examined in a planned The Index of Difficulty (ID) of the task is a function of the size and
comparison of a chequerboard texture with a plain grey background to distance between targets [22] and was equal to 3.75 bits. The effective
the display. Finally, the contribution of motion parallax cues was distance of movement (De) along the axis of specified movement was
examined in a planned comparison between conditions with free head calculated and the effective width of the target (We) was calculated as
movement and with the head stabilised using a bite bar. We examined 4.133 times the standard deviation of touch positions along the same
movements on a horizontal display because they comprise both lateral axis [22]. We represents the target width for which 96% of touches
movements and movements in depth. A previous study [15] has shown would have been hits. Throughput (TP) is the ratio of task difficulty and
that movements on a horizontal display are particularly impaired in VR movement time (MT).
compared to a real, physical touchscreen. Importantly, we kept haptic
TP = log2 (De /We + 1)/MT
(tactile and proprioceptive) conditions veridical and constant across real
and VR displays so that differences in performance could be attributed to
differences in visual display alone. 2.3. Design and apparatus

2. Methods Two display conditions (touchscreen, VR) were presented in four


ordinal viewing conditions (binocular + texture + motion [BTM],
2.1. Participants monocular + texture + motion [MTM], monocular + motion [MM] and
monocular [M]) in counterbalanced order. The monocular conditions
Thirty adults (19 female) with normal or corrected-to-normal vision were presented to the dominant eye. In the conditions with a texture
participated in the study. Their average age was 25.5 years. All were gradient, a 1-cm square chequerboard (fundamental frequency
right-handed and 16 were right eye dominant, as determined with the approximately 0.5 degrees of visual angle) was presented as a back­
Dolman hole-in-the-card test. Their average interpupillary distance ground to the Fitts targets. In the conditions with head motion, partic­
(IPD) was 63 mm (sd 3 mm). Twenty-four had prior experience in VR ipants were free to make natural head movements. In the conditions
(median 1 h). The experiment was approved by the University of without head motion, participants bit on a bite bar to stabilise their
Queensland Health and Behavioural Sciences Low and Negligible Risk head. Two trials of 42 movements were performed in each condition.
Ethics Committee (2020/HE001750). All participants gave informed In all conditions, including with the touchscreen, participants held
consent and were paid AUD20 for their time. the VR controller (Vive, HTC Corporation) backward with their hand in
a pointing gesture (Fig. 1c). In all conditions, the progression of targets
2.2. Task was triggered by the touch of participants’ index fingertips with the
touchscreen. This ensured that both tactile and proprioceptive feedback
Participants sat at a desk within comfortable reach of a horizontally and the interaction of the touchscreen with the task were the same
oriented touchscreen (Fig. 1a). Participants performed a Fitts’ task that across all conditions.
required them to use the index finger of their dominant hand to touch a
sequence of targets on the touchscreen. Twenty-one disks (diameter 20 2.3.1. Touchscreen condition
mm) were presented around a circle of diameter 250 mm. One disk was The Fitts’ task was programmed in a game engine (Unity 2D,

Fig. 1. (a) Photograph of a participant using the bite bar and performing the task in VR with passive haptic feedback from the touchscreen. (b) Screenshot of the
Fitts’ task without the texture grid. The first four movements are indicated by arrows. (c) Participants’ view of the touchscreen condition. (d) Participants’ view of the
VR condition.

2
K. McAnally et al. Displays 82 (2024) 102655

v2019.4.0f1) with the Fullscreen editor plugin (v2.2.1) and run on a


Windows 10 PC with dedicated graphics card (GeForce RTX 3060). The
resolution of the touchscreen (Dell P2418HT) was set to 800 × 600
pixels to more closely match the resolution of the virtual touchscreen in
the VR display. Participants wore safety goggles (3M) which had been
masked with electrical tape to closely match the field-of-view of the
headset in the VR conditions (116 × 97 degrees).

2.3.2. VR condition
A virtual desk and touchscreen of identical dimensions to the real
desk and touchscreen were rendered in an empty room (Fig. 1d). To
enable passive haptic feedback, the virtual desk and touchscreen were
spatially aligned with their real-world counterparts using a VR tracker
(Vive tracker, HTC Corporation). Passive haptic feedback resulted in
simultaneous seen and felt touch. The real and virtual hands (Oculus
hand models v1.0) were spatially aligned by ensuring the index finger
extended 25 mm beyond the controller handle (Fig. 1b). The finger was
taped to the controller to maintain their relative position. The stimuli
were programmed in a game engine (Unity 3D 2021.3.4f1) with the
Fullscreen editor plugin (v2.2.1) and presented via a VR headset (Vive
Pro 2, HTC Corporation). Head position and orientation and the position Fig. 2. Throughput for each condition while viewing the real touchscreen (TS)
of the virtual fingertip were sampled at 90 Hz and recorded for kine­ and virtual reality (VR) displays. Error bars are standard errors. Note. BTM =
matic analysis. Hand movement data were low-pass filtered (cut-off 10 binolular + tecture + motion, MTM = monocular + texture + motion, MM =
Hz) before analysis. monocular + motion, M = monocular. ** p < .01, *** p < .001.

2.4. Data analysis revealed a significant effect of Display, F(1,29) = 20.8, p < .001, η2p =
.42, and Condition, F(1,29) = 63.6, p < .001, η2p = .69, but no interac­
A total of 20,160 movements (30 participants × 2 displays × 4 visual tion, F(1,29) = 0.08, p = .78, η2p = .003.
conditions × 2 trials × 42 movements) were recorded. On rare occasion,
the VR system momentarily lost track of the controller, which resulted in 3.2. Movement time
uncommanded rapid movements of the virtual hand. Ninety-seven
movements in VR (0.96%) were excluded from the analysis because As throughput is a compound variable, comprised of components of
the recorded virtual fingertip velocity exceeded 1 m/s. Data were ana­ movement time, precision and distance, we conducted separate ANOVAs
lysed (JASP 0.17.2.1, [23]) with 2-way repeated-measures ANOVAs for each of these variables. Movements slowed as visual cues were
with the variables Display (touchscreen, VR) and Condition (binocular reduced (Fig. 3). Average movement times for VR were 25 ms longer
+ texture + motion [BTM], monocular + texture + motion [MTM], than for the touchscreen. The ANOVA revealed a significant effect of
monocular + motion [MM] and monocular [M]). Greenhouse-Geisser Display, F(1,29) = 11.6, p = .002, η2p = .29, and Condition, F(2.3,67.4) =
corrections were applied where required for violations of sphericity. 32.1, p < .001, η2p = .52, but no interaction, F(2.0,58.8) = 2.41, p = .10,
Post-hoc tests incorporated the Holm correction for multiple compari­ η2p = .08. The planned comparison of binocular cues (BTM-MTM)
sons. A sensitivity analysis confirmed the experiment had sufficient revealed a significant effect of Display, F(1,29) = 7.29, p = .011, η2p =
power to observe a small-to-medium interaction effect of η2p = .046 be­ .20, and Condition, F(1,29) = 8.68, p = .006, η2p = .23, but no interac­
tween displays and visual conditions (GPower 3.1, [24]). tion, F(1,29) = 3.73, p = .06, η2p = .11. The planned comparison of
The role of different visual cues was examined with planned 2-way
ANOVAs of Display and Condition. The role of binocular cues was
examined by comparing BTM and MTM conditions. The role of
monocular texture cues was examined by comparing MTM and MM
conditions. The role of motion parallax was examined by comparing MM
and M conditions.

3. Results

3.1. Movement efficiency

Throughput reflects the efficiency of movement. Throughput


decreased as visual cues were removed and was uniformly lower in VR
than with the real touchscreen (Fig. 2). The ANOVA revealed a signifi­
cant effect of Display, F(1,29) = 40.8, p < .001, η2p = .58, and Condition,
F(3,87) = 42.4, p < .001, η2p = .59, but no interaction, F(3,87) = 1.31, p
= .28, η2p = .04. The planned comparison of binocular cues (BTM-MTM)
revealed a significant effect of Display, F(1,29) = 49.2, p < .001, η2p =
.63, and Condition, F(1,29) = 10.8, p = .003, η2p = .27, but no interac­
tion, F(1,29) = 0.02, p = .88, η2p < .001. The planned comparison of
monocular texture cues (MTM-MM) revealed a significant effect of Fig. 3. Movement times for each condition while viewing the real touchscreen
Display, F(1,29) = 32.4, p < .001, η2p = .53, but not of Condition, F(1,29) (TS) and virtual reality (VR) displays. Error bars are standard errors. Note. BTM
= 0.12, p = .73, η2p = .004, and no interaction, F(1,29) = 1.74, p = .20, η2p = binolular + tecture + motion, MTM = monocular + texture + motion, MM =
= .06. The planned comparison of monocular motion cues (MM-M) monocular + motion, M = monocular. ** p < .01, *** p < .001.

3
K. McAnally et al. Displays 82 (2024) 102655

monocular texture cues (MTM-MM) revealed a significant effect of


Display, F(1,29) = 7.63, p = .01, η2p = .21, but not of Condition, F(1,29)
= 1.98, p = .17, η2p = .06, and no interaction, F(1,29) = 2.24, p = .15, η2p
= .07. The planned comparison of monocular motion cues (MM-M)
revealed a significant effect of Display, F(1,29) = 7.99, p = .008, η2p =
.22, and Condition, F(1,29) = 86.4, p < .001, η2p = .75, and a significant
interaction, F(1,29) = 8.44, p = .007, η2p = .22, where movements in the
M condition were slower for VR, t(29) = 3.40, p = .002.

3.3. Movement precision

The effective width of the targets (We) is inversely related to the


precision of movements. The precision of movements decreased, leading
to an increase in the effective target width, as visual cues were removed
(Fig. 4). Effective target widths were uniformly larger for VR than for the
real touchscreen. For binocular viewing of the touchscreen, the effective
target width of the target (24 mm) was slightly larger than the 20-mm
target. When binocularly viewing in VR, the effective width increased
to 31 mm. The ANOVA revealed a significant effect of Display, F(1,29) =
25.2, p < .001, η2p = .46, and Condition, F(2.2,65.0) = 21.1, p < .001, η2p Fig. 5. Effective movement distance (De) for each condition while viewing the
= .42, but no interaction, F(2.3,66.1) = 0.73, p = .50, η2p = .02. The real touchscreen (TS) and virtual reality (VR) displays. Error bars are standard
planned comparison of binocular cues (BTM-MTM) revealed a signifi­ errors. Note. BTM = binolular + tecture + motion, MTM = monocular +
cant effect of Display, F(1,29) = 26.2, p < .001, η2p = .47, and Condition, texture + motion, MM = monocular + motion, M = monocular. * p < .05, *** p
F(1,29) = 4.93, p = .034, η2p = .16, but no interaction, F(1,29) = 0.06, p < .001.
= .82, η2p = .002. The planned comparison of monocular texture cues
(MTM-MM) revealed a significant effect of Display, F(1,29) = 26.0, p < 80.2, p < .001, η2p = 0.73, and Condition, F(1.9,54.1) = 3.87, p = .03, η2p
.001, η2p = .47, but not of Condition, F(1,29) = 0.88, p = .36, η2p = .03, = .12, but no interaction, F(2.0,59.0) = 1.85, p = .17, η2p = .06. The effect
and no interaction, F(1,29) = 0.40, p = .53, η2p = .01. The planned of Condition did not reach significance in any of the planned compari­
comparison of monocular motion cues (MM-M) revealed a significant sons, but a post-hoc test revealed that effective distance in condition M
effect of Display, F(1,29) = 14.4, p < .001, η2p = .33, and Condition, F was significantly larger than in conditions BTM, t = 3.06, p = .018, and
(1,29) = 22.1, p < .001, η2p = .43, and no interaction, F(1,29) = 0.47, p = MTM, t = 2.82, p = .03.
.50, η2p = .02.
3.5. Pointing accuracy
3.4. Movement distance
Average unsigned pointing errors with respect to the target centres
The task specified movements of 250 mm. The Fitts’ analysis assesses were larger for VR than for the touchscreen (Fig. 6). The ANOVA
movements with respect to the magnitudes of the actual movements revealed significant effects of Display, F(1,29) = 30.6, p < .001, η2p = .51,
made in the direction of specified movement - the Effective distance and Condition, F(2.4, 68.5) = 15.9, p < .001, η2p = .35, but no interac­
(De). Average effective movement distance was 3 mm larger than tion, F(3,87) = 2.12, p = .10, η2p = .07. The planned comparison of
specified for the touchscreen and 5 mm lower than specified in VR binocular cues (BTM-MTM) revealed a significant effect of Display, F
(Fig. 5). The ANOVA revealed a significant effect of Display, F(1,29) = (1,29) = 22.1, p < .001, η2p = .43, and Condition, F(1,29) = 6.27, p = .02,

Fig. 4. Effective target widths (We) for each condition while viewing the real Fig. 6. Unsigned pointing errors for each condition while viewing the real
touchscreen (TS) and virtual reality (VR) displays. Error bars are standard er­ touchscreen (TS) and virtual reality (VR) displays. Error bars are standard er­
rors. Note. BTM = binolular + tecture + motion, MTM = monocular + texture rors. Note. BTM = binolular + tecture + motion, MTM = monocular + texture
+ motion, MM = monocular + motion, M = monocular. * p < .05, *** p < .001. + motion, MM = monocular + motion, M = monocular. * p < .05, *** p < .001.

4
K. McAnally et al. Displays 82 (2024) 102655

η2p = .18, and a significant interaction, F(1,29) = 5.29, p = .03, η2p = .15.
In VR, errors in the monocular condition (MTM) were larger than those
in the binocular condition (BTM), z = 2.46, p = .014. The planned
comparison of monocular texture cues (MTM-MM) revealed a significant
effect of Display, F(1,29) = 38.7, p < .001, η2p = .57, but not of Condition,
F(1,29) = 0.03, p = .87, η2p < .001, and no interaction, F(1,29) = 0.02, p
= .88, η2p < .001. The planned comparison of monocular motion cues
(MM-M) revealed a significant effect of Display, F(1,29) = 25.4, p <
.001, η2p = .47, and Condition, F(1,29) = 16.6, p < .001, η2p = .36, and no
interaction, F(1,29) = 0.006, p = .94, η2p < .001. On average, unsigned
pointing errors were 7.2 mm for the real touchscreen and 9.3 mm in VR.

3.6. Head movement

We examined the variability of head position and orientation in


condition MM where natural head movements were allowed and in
condition M where the head was stabilised with the bite bar. RMS var­
iations in head position were of the order of 5 to 10 mm when head
movements were allowed (Table 1). As expected, use of the bite bar
significantly reduced the variation of head position and orientation Fig. 7. Acceleration and deceleration times of movement for each condition
across all 3 axes. while viewing the virtual reality display. Error bars are standard errors. Note.
BTM = binolular + tecture + motion, MTM = monocular + texture + motion,
3.7. Hand movement MM = monocular + motion, M = monocular. * p < .05, *** p < .001.

It is generally accepted that the acceleration phase of hand move­ Throughput for binocular viewing was around 8 bits/s for the
ments reflects processes of propulsion toward the target and the decel­ touchscreen and close to 7 bits/s for VR, which is consistent with pre­
eration phase reflects processes of homing-in to the target [25]. vious reports [14,15,26,27] but somewhat higher than for other VR
Although open- and closed-loop control overlap, closed-loop control is studies with passive haptics [28,29]. While reduced throughput in VR
higher in the deceleration phase. We hypothesised that the incremental was due to both a slowing and a decrease in the precision of movements,
removal of visual cues across our conditions would affect both the comparison of the effect sizes indicates that the reduction in precision
localisation of the target (affecting the acceleration phase), and the vi­ was the larger of these two effects. Movements in VR were also smaller,
sual feedback of the relative positions of target and fingertip (affecting which for distal movements is consistent with reports of slight
the deceleration phase). Acceleration times were shorter than deceler­ compression of peripersonal depth in VR [8].
ation times (Fig. 7). An ANOVA of acceleration times showed a signifi­ Binocular vision provides vergence cues to distance and disparity
cant effect of Condition, F(1.8,52.9) = 30.6, p < .001, η2p = .51. Planned cues to relative distance around fixation (see [17] for a review). Eye
comparisons between successive conditions showed that the removal of vergence also provides input to the accommodation system to bring
binocular cues resulted in a significant lengthening of acceleration time, fixated objects into sharp focus [30,31]. Although the eyes continue to
t(29) = 5.02, p < .001, d = .92, the removal of the chequerboard texture verge during monocular viewing, the vergence error signal of binocular
significantly shortened acceleration time, t(29) = 2.59, p = .015, d = fusion is eliminated. Monocular viewing may therefore be expected to
.47, and the removal of motion parallax cues significantly lengthened reduce the accuracy of reaching movements in depth. Monocular
acceleration time, t(29) = 9.36, p < .001, d = 1.71. An ANOVA of viewing will also eliminate disparity cues to the relative depths of the
deceleration times showed a significant effect of Condition, F(1.9,54.1) target and the fingertip when either of these is fixated. In contrast to
= 6.66, p = .003, η2p = .19. Planned comparisons between successive these negative effects, monocular viewing is expected to reduce the
conditions showed that the removal of motion parallax cues resulted in a vergence-accommodation conflict (VAC) which is present in the VR
significant lengthening of deceleration time, t(29) = 5.03, p < .001, d = display. In normal viewing, vergence and accommodation of the eye lens
.92. work together to bring fixated objects into single clear vision [30], but in
VR this coupling is disrupted because the image plane of the display is at
4. Discussion a fixed focal length [32]. The vergence distance to the Fitts targets
varied from around 0.4 to 0.6 m but the focal distance of the VR headset
The results for throughput, movement time and precision were was fixed at an estimated 1.35 m1. In monocular viewing, the viewing
consistent: performance was lower for VR than for the real touchscreen, eye is free to fixate and focus on the target and to drive vergence and
and performance reduced similarly for both displays with the removal of accommodation in the other eye. The absence of a vergence error signal
binocular cues and monocular motion cues. Throughput for the VR reduces the conflict between accommodation and vergence, which could
display was consistently around 1 bit/s lower than for the touchscreen. be expected to lead to an increase in visuomotor performance. The
removal of binocular cues in this study resulted in a slowing and a
Table 1 reduction in accuracy and precision of movements, indicating the ben­
RMS head movement of trials in the MM and M conditions in VR. efits of binocular cues outweighed the costs of the VAC. The lack of an
interaction between Display and Condition (BTM-MTM) for all depen­
MM M t(29) Z p
dent variables except pointing accuracy indicates that the cost of
X (mm) 8.7 0.7 13.3 <.001
monocular viewing was largely the same for the touchscreen (where
Y (mm) 5.5 0.9 4.78 <.001
Z (mm) 8.8 1.2 4.78 <.001
Az (degrees) 2.6 0.7 11.18 <.001
El (degrees) 1.5 1.0 4.04 <.001 1
The focal length of the Vive Pro 2 headset was estimated by viewing a high-
Roll (degrees) 1.7 0.5 4.78
contrast boundary with a camera (Cannon AT-1) incorporating a split-prism
<.001

viewfinder. The focussed distance was then calibrated against a real high-
Note. MM = monocular + motion, M = monocular. contrast boundary. The average of 5 estimates was 1.35 m (sd = 0.08 m).

5
K. McAnally et al. Displays 82 (2024) 102655

there is no VAC) and VR and suggests that reduction of the VAC did not reliability [39]. The dynamic weighting of visual cues has further been
affect visuomotor performance. demonstrated in the control of movement. For example, binocular cues
Monocular cues to distance include occlusion, linear perspective, to hand-target distance are reliable for hand movements along the line-
height in the scene, texture gradients, shadowing, and areal perspective of-sight, but are down-weighted for movements in the fronto-parallel
[17]. VR provides a unique opportunity to render objects with no plane where they are not reliable [40]. The relative contributions of
texture. While the touchscreen bezel and Fitts’ targets would have binocular, texture, and motion cues are expected to vary across different
provided some perspective and texture cues, the texture of the surface experimental conditions such as viewing angle. Differences in cue reli­
upon which the targets were rendered was manipulated. We examined ability have also been shown to affect visually-guided movements in VR,
the effect of introducing a salient chequerboard texture to the surface on where the efficiency of movement was lower for horizontal than for
visuomotor performance. We expected the removal of that texture to vertical movements [15]. The viewing angle in the current study was
reduce performance, and especially so for VR where surface texture cues around 45 degrees (see Fig. 1a), where both binocular and monocular
could be eliminated. However, the results did not provide evidence for a cues are expected to have been available and moderately reliable. While
reduction in throughput, speed, accuracy or precision of movements for the magnitude of the decrement observed in the present study with
either the touchscreen or for VR. We conclude that the salient surface removal of each visual cue will be dependent on the relative weighting
texture in this study did not add significant additional information to of that cue and remaining cues, the equivalence of decrements observed
other monocular cues present (e.g. perspective, target texture, height in for the touchscreen and for VR supports the conclusion that cues were
the visual field, motion parallax) for performance of the task. similarly reliable and weighted for each display.
Motion parallax cues that arise from self-motion are powerful cues to In conclusion, this study examined the effects of binocular cues, and
distance [33]. We investigated the effect of removing motion cues by monocular texture and motion cues on the performance of a speeded
comparing conditions allowing free head movement with those that visuomotor task in real space and in VR. The results consistently showed
stabilised the head with a bite bar. As expected, the removal of motion lower performance in VR, and significant effects of binocular cues and
parallax cues resulted in a reduction of performance by slowing and monocular motion cues for both displays on Throughput and movement
reducing the accuracy and precision of movements. These effects were time and precision. For all dependent variables except pointing accu­
similar for the touchscreen and for VR as indicated by the lack of an racy, there was no evidence of a differential effect of removing binoc­
interaction. This lack of interaction suggests that motion parallax cues ular, texture, or motion parallax visual cues on the real touchscreen or
arising from head motion were conveyed satisfactorily in VR. VR, which suggests that either (i) any deficiencies of the VR visual
Bootsma et al. [25] found the acceleration phase of movement in a display were also present in the presentation of a static world scene
reciprocal Fitts’ task to be around half of the movement duration when (condition M), or (ii) the deficiencies of VR are not of visual origin.
the difficulty of the task was low (ID = 3 bits) and somewhat less than Considering the first possibility, static monocular cues include accom­
half as the difficulty of the task increased. An average acceleration phase modation and pictorial cues such as perspective, occlusion, and shad­
of around .43 of the movement duration in the current study (ID = 3.75 owing. Accommodation is an ordinal cue to depth [41] and erroneous
bits) is consistent with that that observation. We expected reductions in for VR where the focal plane is fixed. However, it might be expected that
visual cues to have effects on both the initial localisation of the target in any deleterious effects of erroneous accommodation or pictorial cues
3D space (affecting the acceleration phase of the movement) and on would be mitigated by more consistent and reliable cues when motion
judgments of the relative positions of the target and fingertip (affecting parallax and vergence are introduced. The lack of significant interaction
the deceleration phase). The removal of binocular cues lengthened the between visual condition and Display indicates that the performance
acceleration phase, but not the deceleration phase of the movement, deficit in VR was essentially constant and suggests the deficit may not
consistent with the role of binocular cues in judgements of egocentric have been due to static monocular cues. Considering the second possi­
distance to the target. The removal of the chequerboard texture short­ bility above, inaccuracies in tracking of the hand may have contributed
ened the acceleration phase but had no effect on the deceleration phase, to the reduced visuomotor performance. Close to 1% of VR trials in this
suggesting that initial localisation of the target was easier with the plain study were excluded due to instability of tracking of the controller, but
background. Although the chequerboard provided a strong texture other tracking errors remained. VR controllers integrate optical and
gradient, it also provided clutter of similar spatial frequency within inertial sensors in a compromise between spatial accuracy and temporal
which the targets had to be localised. The fundamental frequency of the responsiveness. While the precision of static position estimates is less
targets was around 0.5 cycles/degree and that of the chequerboard was than one millimeter, errors of position estimates can be as high as 43 mm
around 0.25 cycles/degree and both had high harmonic content given when the tracker is in motion [42]. This may be partly attributable to
their square wave profiles. These data suggest that the interfering effect time delays between position sampling and visual display (the motion-
of localising the target in the background was stronger than any benefit to-photon latency (MPL), [43,44]). Prediction of movements by the
of the texture gradient to depth perception. The removal of motion Vive controller reduces the MPL from around 30 ms to less than 10 ms
parallax cues resulted in a lengthening of acceleration and deceleration [44]. In both the real touchscreen and VR conditions, hand movements
times of the movements, suggesting that motion parallax played an were abruptly terminated by contact with the physical screen. Such
important role in both the initial localisation of the target and in the rapid changes in acceleration cannot be accurately predicted and will
closed-loop visual guidance of the fingertip onto the target. This inter­ result in spatiotemporal errors of tracking. The general absence of
pretation is consistent with the known utility of motion parallax as a interaction between visual conditions and Display in this study is
powerful cue to both absolute and relative distance [33,34]. The roles of consistent with errors of hand tracking that were constant across VR
vergence and motion parallax in the acceleration phase of movement are conditions. Tracking errors of the order of a few millimetres are suffi­
consistent with their roles in guiding blind (open-loop) reaching cient to account for the observed reductions in movement accuracy and
[35–37]. The present study found self-motion, but not binocular cues, to precision. While a systematic evaluation of VR tracking was beyond the
be effective in the closed-loop control of movement endpoint. In scope of the current study, further research could compare visuomotor
contrast, Hu et al. [38] found vergence to be important in the closed- performance between VR trackers and other technologies such as optical
loop control of object proximity in VR. Also in contrast, Watt and motion capture.
Bradshaw [37] found motion parallax to be ineffective in changing the
closed-loop control of grip aperture for end-point grasp. Declarations
While binocular, texture, and motion cues may be independently
manipulated, they are not processed independently in vision. These vi­ This research was supported by ARC Discovery project grant
sual cues are integrated with their relative weighting dependent on their DP190100533 (to GW) and ARC Linkage grant LP180100377 (Industry

6
K. McAnally et al. Displays 82 (2024) 102655

Partner: Boeing) (to GW). [16] K. Hinckley, R. Pausch, J.C. Goble, N.F. Kassell, Passive real-world interface props
for neurosurgical visualization, In Proceedings of the SIGCHI Conference on
Human Factors in Computing Systems, 1994, pp. 452-458.
Compliance with ethical standards [17] I.P. Howard, B.J. Rogers, Seeing in Depth, Oxford University Press, 2008.
[18] R.L. Hornsey, P.B. Hibbard, Contributions of pictorial and binocular cues to the
The authors declare no conflicts of interests. The experiment was perception of distance in virtual reality, Virtual Real. 25 (4) (2021) 1087–1103.
[19] G.P. Bingham, A. Bradley, M. Bailey, R. Vinner, Accommodation, occlusion, and
approved by the University of Queensland Health and Behavioural Sci­ disparity matching are used to guide reaching: a comparison of actual versus
ences Low and Negligible Risk Ethics Sub-Committee. Participants gave virtual environments, J. Exp. Psychol. Hum. Percept. Perform. 27 (2001) 1314.
informed consent to participate and for their deidentified results to be [20] R.S. Renner, B.M. Velichkovsky, J.R. Helmert, The perception of egocentric
distances in virtual environments-a review, ACM Comput. Surv. (CSUR) 46 (2)
published. Data are available at https://doi.org/10.6084/m9.fig­ (2013) 1–40.
share.24933945.v1. [21] J.W. Kelly, L.A. Cherep, Z.D. Siegel, Perceived space in the HTC vive, ACM Trans.
Appl. Percept. (TAP) 15 (2017) 1–16.
[22] R.W. Soukoreff, I.S. MacKenzie, Towards a standard for pointing device evaluation,
CRediT authorship contribution statement perspectives on 27 years of Fitts’ law research in HCI, Int. J. Hum Comput Stud. 61
(6) (2004) 751–789.
Ken McAnally: Conceptualization, Investigation, Writing – original [23] JASP Team, JASP Version 0.17.2, 2023.
[24] F. Faul, E. Erdfelder, A.G. Lang, A. Buchner, G* Power 3: a flexible statistical power
draft. Guy Wallis: Funding acquisition, Conceptualization, Writing – analysis program for the social, behavioral, and biomedical sciences, Behav. Res.
review & editing. Philip Grove: Conceptualization, Writing – review & Methods 39 (2007) 175–191.
editing. [25] R.J. Bootsma, L. Fernandez, D. Mottet, Behind Fitts’ law: kinematic patterns in
goal-directed movements, Int. J. Hum Comput Stud. 61 (2004) 811–821.
[26] R.J. Teather, W. Stuerzlinger, Target pointing in 3D User Interfaces, Poster at
Declaration of Competing Interest Graphics Interface, 2010.
[27] V. Schwind, J. Leusmann, N. Henze, Understanding visual-haptic integration of
The authors declare that they have no known competing financial avatar hands using a fitts’ law task in virtual reality, in: Proceedings of Mensch Und
Computer, 2019, pp. 211–222.
interests or personal relationships that could have appeared to influence [28] R.J. Teather, W. Stuerzlinger, Pointing at 3D targets in a stereo head-tracked
the work reported in this paper. virtual environment, in: 2011 IEEE Symposium on 3D User Interfaces (3DUI),
2011, pp. 87–94.
[29] A.U. Batmaz, A.K. Mutasim, M. Malekmakan, E. Sadr, W. Stuerzlinger, Touch the
Data availability wall: comparison of virtual and augmented reality with conventional 2D screen
eye-hand coordination training systems, in: 2020 IEEE Conference on Virtual
Data are available at https://doi.org/10.6084/m9.fig­ Reality and 3D User Interfaces (VR), 2020, pp. 184-193.
[30] C.M. Schor, A dynamic model of cross-coupling between accommodation and
share.24933945.v1 convergence: simulations of step and frequency responses, Optom. Vis. Sci. 69 (4)
(1992) 258–269.
References [31] R.J. Leigh, D.S. Zee, Vergence eye movements. The Neurology of Eye Movements,
fifth ed., Oxford University Press, 2015.
[32] D.M. Hoffman, A.R. Girshick, K. Akeley, M.S. Banks, Vergence–accommodation
[1] A.D. Kaplan, J. Cruit, M. Endsley, S.M. Beers, B.D. Sawyer, P.A. Hancock, The
conflicts hinder visual performance and cause visual fatigue, J. Vis. 8 (2008) 33.
effects of virtual reality, augmented reality, and mixed reality as training
[33] B. Rogers, M. Graham, Motion parallax as an independent cue for depth perception,
enhancement methods: a meta-analysis, Hum. Factors 63 (4) (2021) 706–726.
Perception 8 (2) (1979) 125–134.
[2] Z. Makhataeva, H.A. Varol, Augmented reality for robotics: a review, Robotics 9
[34] S.H. Ferris, Motion parallax and absolute distance, J. Exp. Psychol. 95 (1972)
(2020) 21.
258–263.
[3] D.E. Levac, M.E. Huber, D. Sternad, Learning and transfer of complex motor skills
[35] M. Mon-Williams, J.R. Tresilian, Some recent studies on the extraretinal
in virtual reality: a perspective review, J. Neuroeng. Rehabil. 16 (2019) 1–15.
contribution to distance perception, Perception 28 (2) (1999) 167–181.
[4] P. Scarfe, A. Glennerster, The science behind virtual reality displays, Annu. Rev.
[36] J.W. McCandless, S.R. Ellis, B.D. Adelstein, Localization of a time-delayed,
Vis. Sci. 5 (1) (2019) 529–547.
monocular virtual object superimposed on a real environment, Presence 9 (1)
[5] N. Dahlstrom, S. Dekker, R. van Winsen, J. Nyce, Fidelity and validity of simulator
(2000) 15–24.
training, Theor. Issues Ergon. Sci. 10 (4) (2009) 305–314.
[37] S.J. Watt, M.F. Bradshaw, The visual control of reaching and grasping: binocular
[6] I.T. Feldstein, F.M. Kölsch, R. Konrad, Egocentric distance perception: a
disparity and motion parallax, J. Exp. Psychol. Hum. Percept. Perform. 29 (2003)
comparative study investigating differences between real and virtual
404–415.
environments, Perception 49 (2020) 940–967.
[38] H.H. Hu, A.A. Gooch, W.B. Thompson, B.E. Smits, J.J. Rieser, P. Shirley, Visual
[7] C. Vienne, S. Masfrand, C. Bourdin, J.L. Vercher, Depth perception in virtual reality
cues for imminent object contact in realistic virtual environments, in: Proceedings
systems: effect of screen distance, environment richness and display factors, IEEE
Visualization 2000. VIS 2000 (Cat. No. 00CH37145), 2000, pp. 179–185.
Access 8 (2020) 29099–29110.
[39] M.S. Landy, L.T. Maloney, E.B. Johnston, M. Young, Measurement and modeling of
[8] K. Kohm, S.V. Babu, C. Pagano, A. Robb, Objects may be farther than they appear:
depth cue combination: in defense of weak fusion, Vision Res. 35 (1995) 389–412.
depth compression diminishes over time with repeated calibration in virtual
[40] B.D. Keefe, S.J. Watt, Viewing geometry determines the contribution of binocular
reality, IEEE Trans. Vis. Comput. Graph. 28 (11) (2022) 3907–3916.
vision to the online control of grasping, Exp. Brain Res. 235 (12) (2017)
[9] M.A. Goodale, Transforming vision into action, Vision Res. 51 (13) (2011)
3631–3643.
1567–1587.
[41] M. Mon-Williams, J.R. Tresilian, Ordinal depth information from accommodation?
[10] M.M. Hayhoe, Vision and action, Annu. Rev. Vis. Sci. 3 (1) (2017) 389–413.
Ergonomics 43 (2000) 391–404.
[11] International Organization for Standardization, Ergonomic Requirements for Office
[42] M. Borges, A. Symington, B. Coltin, T. Smith, R. Ventura, HTC vive: analysis and
Work with Visual Display Terminals (VDTs)—Part 9: Requirements for Non-
accuracy improvement, in: IEEE/RSJ International Conference on Intelligent
keyboard Input Devices (ISO 9241-9), 2002.
Robots and Systems (IROS), 2018, pp. 2610–2615.
[12] P.M. Fitts, The information capacity of the human motor system in controlling the
[43] J.P. Stauffert, F. Niebling, M.E. Latoschik, Simultaneous run-time measurement of
amplitude of movement, J. Exp. Psychol. 47 (1954) 381–391.
motion-to-photon latency and latency jitter, in: 2020 IEEE Conference on Virtual
[13] I.S. MacKenzie, P. Isokoski, Fitts’ throughput and the speed-accuracy tradeoff, in:
Reality and 3D User Interfaces (VR), 2020, pp. 636–644.
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems,
[44] M. Warburton, M. Mon-Williams, F. Mushtaq, J.R. Morehead, Measuring motion-
2008, pp. 1633–1636.
to-photon latency for sensorimotor experiments with virtual reality systems,
[14] K. McAnally, G. Wallis, Visual–haptic integration, action and embodiment in
Behav. Res. Methods 55 (2023) 3658–3678.
virtual reality, Psychol. Res. 86 (6) (2022) 1847–1857.
[15] K. McAnally, K. Wallwork, G. Wallis, The efficiency of visually guided movement in
real and virtual space, Virtual Real. 27 (2) (2023) 1187–1197.

You might also like