White Paper: Remote Assistance in Mixed Reality

whIte pAper
bY ImmerSIon
ReMOte
assIstaNCe
IN MIxed
RealIty
What is APPLICATION DOMAINS FOR
Mixed Reality REMOTE ASSISTANCE
Introduction.............................................................................................. 4 3. TeleAdvisor: An example of

Remote Assistance in AR for the industry ............................. 17
1. An historic perspective on Mixed Reality ................................. 7
3.1 Remote assistance challenges ................................................ 18
1.1 Reality-Virtuality Continuum and Virtual Reality .............. 7
3.2 AR for remote assistance .......................................................... 18
1.2 Hybrid displays: Augmented
Reality and Augmented Virtuality .................................................. 8 3.2 Design and implementation
of TeleAdvisor ....................................................................................... 19
1.3 A taxonomy for Mixed Reality displays .............................. 9
3.3 Evaluation and limitations
2. What is (really) Mixed Reality? .................................................. 11 of the system ......................................................................................... 20
2.1 MR beyond visual perception .................................................. 11 4. Remote assistance

in Augmented Surgery ............................................................... 21
2.2 Blurred borders between AR and MR .................................. 12
4.1 Challenges of MR for surgery ................................................... 22
2.3 Different definitions for different aspects of MR .............. 13
4.2 Remotely guiding a surgeon in AR ......................................... 23
2.4 A framework for MR systems ................................................. 14
The zoom: Groupware ....................................................................... 25
The zoom: Cooperation vs Collaboration ................................. 15
VISUALLY REPRESENTING OUT-OF-THE-BOX
USeRS AND THEIR ACTIVITY CONCEPTS
5. Visual cues for social presence in MR ................................. 27 8. Using light fields for
Hand-held mobile MR ................................................................. 41
5.1 Different aspects of presence ................................................... 27
8.1 MR light fields and system calibration .................................. 41
5.2 Improving collaboration using visual cues ........................... 29
8.2 Adding annotations
into the shared workspace ............................................................... 43
6. Avatar and telepresence of remote tutor ........................... 31
8.3 Evaluating the usability of the system ................................... 41

6.1 Industry 4.0 and
machine taks.......................................................................................... 32
9. Facilitating spatial referencing in MR ................................. 45
6.2 Visually representing the remote user .................................. 33
9.1 Letting the MR system
handle the referencing process ....................................................... 45
7. Mini-Me: adding a
miniature adaptative avatar ..................................................... 35
9.2 Evaluating the prototype ............................................................ 47
7.1 Design of the Mini-Me system .................................................. 36
10. Using virtual replicas for
object-positionning tasks .......................................................... 49
7.2 Experimental results
for cooperative and collaborative tasks ....................................... 37
10.1 Design of the two
interaction techniques ........................................................................ 50
The zoom: Full- immersion avatars ................................................ 39
10.2 Comparing virtual replicas

to a 2D baseline technique ............................................................... 50
About US................................................................................................. 53
Acronyms and definitions.................................................................. 54
References ............................................................................................. 55
INtROdUCtION
Working with others has always raised multiple ques-

tions. What is the optimal process to take together
the best decisions? Which solutions can facilitate the
communication between participants? How to handle
conflicts and contradictory opinions?
Answering such questions is already complex when

users are co-located, but it becomes even more tricky
when it is not the case.
Remote assistance scenarios imply two main characte-

ristics: 1) users do not share the same physical space
and 2) they do not have the same knowledge and capa-
cities. On the one hand, local users can physically act on
their surroundings, but need help because they do not
know how to proceed with the task they have in mind.
On the other hand, remote helpers have the expertise
to perform this task, but cannot achieve it because they
are not physically present at the corresponding location.
Remote assistance is thus closely linked to remote gui-
dance.
ACM copyright for selected papers: Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted
without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full
citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit
is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request
permissions from permissions@acm.org.
P4
Local worker Remote helper
Figure 1 : Example of remote assistance scenario in Mixed Reality.
The recent Covid-19 pandemic and technological pro- Then, we focus on visual activity cues and methods to
gress increased further the already growing interest for represent remote users in order to facilitate guidance
remote assistance. In particular, Mixed Reality (MR) is and remote cooperation (Section 3). Eventually, we go
currently explored as a promising tool for many applica- over a selection of out-of-the-box papers with unique
tion domains like industry [43] and surgery [18]. concepts or approaches (Section 4).
The goal of this white paper is to give an overview of By adopting a Human-Computer Interaction (HCI) point
the current research about remote assistance in MR. To of view, we hope to inspire developers, designers and
do so, we present 10 selected research articles on this researchers interested in remote assistance to bring fur-
topic: 9 recent articles (from 2015 or more recent) and 1 ther Mixed Reality applications.
legacy article (from 1994). These articles are regrouped
into four main sections. After discussing the notion of
MR (Section 1), we present two key application domains
for remote assistance: industry and surgery (Section 2).
P5
SECTION 1
What is
MIXED
REALITY?
An historic perspective 07
Current definitions of MR 12
whAt IS mixed reALitY
aN hIstORIC PeRsPeCtIVe ON
MIxed RealIty
DID YOU KNOW?
Article : Milgram, P., & Kishino, F. (1994). A taxonomy The intricate relations between concepts like
of mixed reality visual displays. IEICE TRANSACTIONS Augmented Reality, Augmented Virtuality,
on Information and Systems, 77(12), 1321-1329. PDF Virtual Reality and Mixed Reality is a common
freely accessible here. source of mistakes, even for professionals. One
example in many: the Usine Digitale magazine
Before focusing on remote assistance, it is necessary published an article about the training of
to precise what lies behind the terms Mixed Reality surgeons in VR. The article was illustrated
(MR). Many technologies like Augmented Reality (AR) with an AR headset, the Hololens… which is
and Virtual Reality (VR) are interconnected to MR. considered as a MR headset by its constructor
To the point that it may be difficult to differenciate Microsoft. Confusing indeed.
these different approaches merging real and virtual
environments.
To address this confusion, we chose to start from the
historical point of view of the notion of MR. In the early 1.1 Reality-Virtuality Continuum and Virtual
90s, Milgram and Kishino proposed a first definition of Reality
MR based on the Reality-Virtuality Continuum (Figure 1). While the democratization of cheap Head-Mounted
This vision had a ground-breaking impact on different Displays (HMD) only started a few years ago, VR is far
research communities, and to this date is still one of the from being a new technology [10]. Immersing the user
most used definition of MR. This is particularly true for inside a synthetic environment was concertized as early
the Human-Computer Interaction (HCI) community [53]. as the mid 1950 with the Sensorama device. In 1960,
In this section, we start by presenting this Continuum the first VR HMD was created. And only five years later,
to define the main existing technologies related to MR. Sutherland proposed the concept of Ultimate display, a
Then, we detailed the definition of MR based on it and fictional technology allowing to simulate a virtual world
the taxonomy proposed by the authors to classify MR so realistically that it could not be distinguished from the
displays. actual reality [56].
reAL mixed VIrtUAL

enVironment reALitY (mr) enVironment
Augmented Augmented
Reality (AR) Virtuality (AV)
Figure 2 : The Reality-Virtuality Continuum proposed by Milgram and Kishino.
P7
Mixed Reality
a) Augmented Reality a) Augmented Virtuality, c) Virtual Reality

Remixed Reality
Real environment augmented with Virtual environment augmented with Fully virtual environment
virtual elements real elements
Figure3 : Main technologies mixing real and virtual environments
Nearly 30 years later, Milgram and Kishino start their first for technical reasons. AR systems appeared second.
work with this notion of Virtual Reality (VR), where Approaches in the middle of the spectrum like Remixed
the user is fully immersed in a computer-generated Reality [32] only became possible recently. Moreover,
environment and can interact with virtual objects. it is also interesting to note that the case of VR is not
fully clear. Milgram and Kishino placed VR at the right
extremity of the continuum, which leaves a confusion
The authors observe that beyond the technological about deciding if it can be considered as part of MR or
progresses, other paradigms have started to appear. not.
Some systems do not provide a total user immersion, but
rather merge real and virtual elements up to a certain
degree. To classify these systems, they propose to use a 1.2 Hybrid displays: Augmented Reality and
continuous scale: the Reality-Virtuality Continuum. This Augmented Virtuality
continuum can be divided in three sections: 1) the real
To complete this definition of MR, Milgram and Kishino
environment on one side, 2) the fully virtual environment
identified six classes of displays than they consider
on the other side and 3) everything in-between (Figure
as MR interfaces [29]. As shown in T1, these classes
1).
cover a large range of technologies, from augmenting
videos on a monitor to partially immersive large displays
allowing tangible interaction.
The extremities of the continuum are straightforward.
On the one hand, the real environment corresponds to
the world we are used to, fully perceived by our bare
The authors then link these classes of display to existing
senses and without computer-based medium. On the
technologies. For instance, they explain that the emerging
other hand, the virtual environment refers to a totally
terminology of Augmented Reality mainly corresponds
synthetic world and is directly linked to VR. According
to class 3 displays. This observation would need to be
to Milgram and Kishino, everything in between this fully
nuanced nowadays. Over the last two decades, mobiles
real and fully virtual extremum belongs to Mixed Reality
devices have tremendously evolved, allowing the rise
[29]. In other words, they do not envision MR as a specific
of AR with mobile devices such as smartphones and
technology but rather as a super set of technologies
tablets instead of HMDs only. Interestingly, Milgram and
mixing the real and virtual(s) environment(s) (Figure 2).
Kishino also report that they also started to consider
displays from classes 1, 2 and 4 as AR displays in their
lab. They argue that the core principle is the same for all
A fascinating phenomenon is that reading the Reality-
displays: augmenting real scenes with virtual content.
Virtuality continuum from left to right does not match at
While this still holds for class 4 displays nowadays, it
all with the historical development of these technologies.
may not be the case for classes 1 and 2.
As mentioned at the beginning, VR systems appeared
P8
On the contrary, the terms # Description Current equivalent nowadays
Augmented Virtuality did not exist in
the 90’s literature and was proposed 1 Monitor displays where a video of Using a video-editing software and
by the authors. The concept of an environment is augmented with seeing the result on a monitor.
augmenting a virtual world with virtual images overlaid on it.
real elements had just started to 2 Same as #1, but using an HMD Watching an edited video with an
HMD
be explored in early studies [34].
Many technological have been 3 See-through HMD. The user Optical see-through AR
perceives directly the current real
made since, and current video see-
environment, which is augmented
through HMDs like the Varjo 3 [38] with virtual objects
have started to blurry the limits of
4 Same as #3, but with video see- Video see-through AR
AR and AV, as predicted by Milgram through HMD. The user cannot see
and Kishino. the real world directly but watches
a real-time video reconstruction of
Besides, other studies have started it based on cameras input
to explore new concepts based on 5 Completely graphic displays, on Augmented Virtuality
video see-through, such as Remixed which videos of real elements are
Reality [32]. overlaid.
6 Completely graphic, partially Tangible interaction on a tabletop,
immersive displays (for instance: tangible AR
large screens) where the user can
use real-world objects to interact
Table 1: The 6 classes of MR display identified by Milgram and Kishino.
DID YOU KNOW?

The opposite of Augmented Reality is also based on a video-see-through approach!
Called Diminished Reality, it involves masking real-world elements by filtering them before
displaying the scene on a video see-through device. This allows to remove, replace objects or
see through obstacles [16].
1.3 A taxonomy for Mixed Reality displays freely superimpose virtual elements on it and handle
visual clues like occlusion and shadows. On the contrary,
In the rest of their paper, Milgram and Kishino refine
VR systems fully know virtual world they generated.
the classes of displays into a complete taxonomy. This
Similarly to the Reality-Virtuality Continuum, many AR
taxonomy is based on three axes: the Extend of world
systems can be placed somewhere in between since they
knowledge, the Reproduction fidelity and the Extend of
need to “understand” and model the real environment to
presence metaphor.
be able to correctly display virtual objects within it. As
The Extend of world knowledge axis refers to amount shown in Figure 4, the authors refer to the intermediary
of knowledge possessed by the system about the state with the Where and What keywords, which
environment. In some basic cases, the system does correspond to the knowledge of locations and objects/
not need to know anything about the environment. For elements respectively.
instance, a basic video-editing software can consider
the video frames as black-box images, leaving the user
P9
worLd worLd pArtiALLY worLd fULLY
UnmodeLLed modeLLed modeLLed
Where/What Where + What
Figure 4 : Extent of World Knowledge axis in the taxonomy of MR displayed by

Milgram and Kishino.
The two other axes are more straightforward. Milgram techniques often involve clever tricks such as foveal
and Kishino present them as two distinct ways to rendering to maximize the image quality only where it is
convey realism: image quality for Reproduction Fidelity strictly necessary (i.e. where is user is currently looking
and immersion for Extend of Presence Metaphor. at).
One could argue that the Reproduction Fidelity axis Similarly, the idea behind the Extend of Presence
(Figure 5a) may need to be updated to better match Metaphor (Figure 5b) is still perfectly relevant to this
with the current technologies. Nowadays, even cheap day. The feeling of presence is still an active research
hardware can handle stereoscopy or high-quality topic in MR [33]. However, researchers have also
rendering. However, the principle behind this axis still started to explore other approaches to increase this
holds since we have not yet reached the “ultimate feeling of immersion that go beyond visual perception,
display” where virtual elements would be too realistic as discussed in the next chapter.
to be distinguished from real ones. In fact, current
monoScopic coLor StereroScopicS hd 3d

VideoS VideoS VideoS VideoS hdtV
a
Simple wireframes Shadings, texture, Real-time, high-fidelity
transparency 3D anmations
monitor - LArge Sceen hdmS

bASed
b
Monoscopic Panoramic Surrograte Real-time
imaging imaging travel imaging
Figure 5 : The two other axes of the taxonomy. A) Reproduction Fidelity axis. B)
Extend of Presence Metaphor axis.
KEY TAKE-AWAYS
The historic definition of MR: everything in the middle of the Reality-Virtuality Continuum. In other words, a
set of technologies mixing the real and a virtual environment, including Augmented Reality and Augmented
Virtuality.
This definition and the taxonomy proposed by the authors are focused on visual displays, and thus consider
only visual perception.
P10
whAt IS (reALLY) mixed reALitY
Article : Maximilian Speicher, Brian D. Hall, and and to which extend is the user visually immerged in it.
Michael Nebeling. 2019. What is Mixed Reality?. In CHI
Conference on Human Factors in Computing Systems This approach can be explained by the dominance
Proceedings (CHI 2019), May 4–9, 2019, Glasgow, of visual perception for humans. Nonetheless, this
Scotland, UK. ACM, New York, NY, USA, 15 pages. observation should not obscure the fact that Mixed
https://doi.org/10.1145/3290605.3300767. Reality could also imply mixing real and virtual
environments by using our other senses. This is the
Did we not just explain what Mixed Reality is? Yes… And case for haptics that has been expensively studied,
no. As mentioned in the previous section, defining MR especially in VR [9]. For instance, having a realistic
by using the Reality-Virtuality Continuum is probably haptic feedback is crucial for the training of surgeons
the most classical approach. However, that does not in AR and VR [49]. The interns must develop both their
mean that it is the best or the only existing one. In dexterity and learn to recognize the haptic feedback of
fact, defining precisely and completely what is MR is different kind of surfaces and organic tissues. A few
so complex that there is to this date no consensus on studies also considered other senses such as audio [11]
this notion. This is true both in industry and academia and smell [47] in the context of MR. Mixing virtual and
[53]. Despite the recent technological progress and real environments thus goes much further than inserting
an increasing popularity, the exact boundaries of MR virtual objects into the field of vision of users.
remain a subject of intense discussions.
DID YOU KNOW?

The problem is far from being a simple rhetorical
argument of experts around a given terminology. Some studies even explored augmenting the sense
Defining the limits of MR implies considering major of taste. For instance, Niijima and Ogawa proposed
aspects of mixing real and virtual environments such as a method to simulate different virtual food textures
the possible level of immersion and the user interactions. [40]. Ranasinghe and Do were also able to virtually
Therefore… What is (really) Mixed Reality? This question simulate the sensation of sweetness using thermal
is at the heart of the second paper presented in this stimulation [48].
book, a work by Speicher et al. recently presented at the Will a complete virtual dinner be possible in a few
CHI 2019 conference. By conducting interviews with years?
experts and a literature survey, the authors identified
the currently co-existing definitions of MR and proposed
a conceptual framework to classify the different aspects
of MR systems. In the following, we present this work
and use it to precise the definition of MR that we will use
hereafter in this book.
2.1 MR beyond visual perception

Speicher et al. start their work by highlighting this
absence of agreement around the notion of MR and the
limitations of Milgram and Kishino’s definition. Its main
weakness is that it only relies on the visual perception.
As presented in the previous section, the authors
considered how realistic is the displayed environment
P11
2.2 Blurred borders between AR and MR • The possibility to interact in real time,
Speicher et al. also report the result of interviews • The registration in three dimensions.
conducted with 10 experts from academia or industry. Nonetheless, if it is possible to interact in AR… When
These interviews lead to a clear conclusion: the difference does an AR system becomes an MR system? Is there
between AR and MR is far from being straightforward. a specific threshold in terms of interaction techniques?
Some of the interviewed experts argued that MR is a « There is no clear answer to this question for now.
stronger version» of AR, in the sense that the blending However, Speicher et al. found that most experts
between virtual and real is seamless. agreed at least on something: the spatial nature of MR.
More precisely, they referred to the notion of spatial
registration (Figure 6).
They explained that in the case of MR users can interact
with both real and virtual content. On the contrary, other
experts argued that this is also the case in AR. They A virtual object is spatially registered when its spatial
even declared that MR is mainly a « marketing term ». position considers all the features of the 3D environment.
This vision may come from the efforts of companies Visual clues like occlusion with other physical objects
like Microsoft promoting the usage of the terms MR to are respected. In other words, the virtual object is
describe their own product like the Hololens HMD [35]. positioned withing the reference of the 3D world (Figure
6b). On the contrary, a virtual object defined according
to the reference of its display (Figure 6a) is not spatially
Is one version more relevant that the other one? Previous
registered.
definitions of AR do not necessarily help to decide. For
instance, Azuma defined 3 criteria for AR systems: [3]
• The combination of virtual and real,
a
b
Figure 6 : The notion of spatial registration. a) The virtual panel in blue is displayed according to the tablet screen
only. b) Virtual elements have a coherent spatial position within the 3D environment: they are spatially registered.
P12
2.3 Different definitions for # Name Description
different aspects of MR
1 The most common definition, from
To further explore these questions, the Reality-Virtuality Continuum
Speicher et al. conducted a literature Continuum-based
by Milgram and Kishino [29]. MR is
review to identify the different seen has the superset regrouping
MR every technology in-between the
usages of the terms Mixed Reality.
real and virtual environment..
The authors analyzed 68 papers
2 MR AR MR being a synonym for AR. Some-
from well-known Human-Computer
times also noted as “MR/AR”.
Interaction (HCI) conferences such
3 MR = AR + VR MR as the combination of AR and
as CHI, ISMAR and UIST. Overall,
VR parts inside a system.
Speicher et al. identified 6 co-
4 Collaboration In this definition, the emphasis is
existing definitions of Mixed Reality
put on the collaboration between
(Table 2). AR and VR users, potentially in
different physical locations.
As mentioned by the authors,
the goal of this study was not to
5 Alignment of virtual and real Augmented Virtuality The syn-
determine which of these definitions environments chronization between two different
is the best. On the contrary, they environments, one being physical
aimed at highlighting the complexity and the other one virtual.
For instance, Yannier et al. pro-
of wrapping all the aspects of posed a MR system where a Kinect
MR into a single vision shared by observes physical block towers on
all actors working with MR. The a table during an earthquake and
reflect in real time their state on
authors explain that the priority for digital towers [61].
MR actors is to clearly communicate
6 MR as “stronger” AR MR defined as a spatially registered
their understanding of what is MR. and interactive AR
Table 2: The 6 co-existing definitions of MR identified by Speicher et al
Name EarthShake MR game

by Yannier et al
Number of Many (physical bock towers and virtual copies)
environments
Number of One to many
users
Level of No immersion (physical towers)
immersion Partial immersion (virtual towers)
Level of Not immersive (physical towers), Partially immersive (virtual
virtuality towers)
Degree of Implicit and explicit interaction
interaction
Input Motion (shaking the towers)
Output Visual (seeing which tower falls first after the earthquake
Figure 7: The framework of Speicher et al. Left: framework dimensions with respect to the
work of Yannier et al. mentioned in Table 2. Right: Setup of the EarthShake MR game [61],
picture courtesy of the authors.
P13
2.4 A framework for MR systems
To help classifying the existing MR systems independently of a global definition, Speicher et al. proposed a
conceptual framework based on 7 criteria, as shown in Figure 7.
The 5 initial criteria include dimensions like the number of environments and users, the level of immersion and
virtuality and the degree of interaction. Two other criteria were then added to consider the input and output of the
system. Such a framework aims at being general enough to be usable of many MR systems as possible and not
only on precise use-cases.
Can such a framework solve our initial question about what is MR? Probably not, but it is not its objective. Speicher
et al. end their paper by insisting on the importance of building a common unambiguous vocabulary to characterize
MR systems. Their framework is a step in this direction.
KEY TAKE-AWAYS
So, what is Mixed Reality? It depends, multiple definitions co-exist. But MR systems go beyond visual
perception only.
In the following, we will use MR as a mix of definitions 3 and 4. Collaboration is of course a crucial aspect
since we focus on remote assistance. Besides, we will mainly consider AR/VR technologies and interactions.
P14
the zoom:
cooperAtIon VS coLLAborAtIon
In specialized conferences such as CSCW, a distinction is sometimes made between cooperating and collaborating.
While this distinction varies between different academic fields [15], it is interesting to consider within the scope of
remote assistance.
Cooperation implies that participants have predefined On the contrary, collaboration imply a group
roles: technical expert, presenter, spectator gest… working without hierarchy defined in advance. The
These roles directly impact interactions between group roledistribution can evolve freely between participants
members, defining the responsibilities and privileges for during the meeting. This may encourage less formal
each type of user. For instance, only the organizer may exchanges with a dynamic evolution of tasks.
have access to screen-sharing at the beginning of a
brainstorming session.
Of course, it is possible to be somewhere in between cooperation and collaboration or to switch from one to
the other. However, remote assistance often involves predefined roles. Therefore, we will prioritize the notion of
cooperation hereafter. When both cooperation and collaboration could be involved, we will use group work instead.
P15
SECTION 2
Application domains
FOR REMOTE
ASSISTANCE
Industry: example of TeleAdvisor 17
Surgery and mixed reality 21
AppLIcAtIon domAInS for remote ASSIStAnce
teleadVIsOR: aN exaMPle OF ReMOte

assIstaNCe IN aR FOR the INdUstRy
Article : Gurevich, P. et al. 2015. Design and have extended knowledge of a given procedure for a
Implementation of TeleAdvisor: a Projection-Based complex case and need some advice from colleagues
Augmented Reality System for Remote Collaboration. that are specialized on it. Technicians cannot know
Computer Supported Cooperative Work (CSCW). 24,
every aspect of each machine in the factory. Instead of
6 (Dec. 2015), 527–562. DOI:https://doi.org/10.1007/
s10606-015-9232-7. relying on cumbersome paper documentation, MR is in
itself a promising solution. MR can be used to support
Now that we distinguish better the multiple aspects of training and guidance purposes [18, 43]. However, when
Mixed Reality, it is time to explore the second key-notion no pre-existing guidance solution is available (which is
of this book: remote assistance. Once again, this is a wide currently often the case), remote assistance given by an
notion with many potential meaning and applications. experienced colleague is a powerful tool to save time
And while narrating the tribulations of calling customer while reducing errors and accidents.
service because the cat confused the Internet box with a
mouse is tempting, well… We will focus on professional
cases of remote assistance instead. This chapter presents a study about an AR system for
remote assistance: TeleAdvisor [21], illustrated in Figure
8. In their work, Gurevich et al. detail the benefits and
Many studies have considered surgery [1] and industry challenges of AR for remote assistance in industrial
[12, 21] as key application domains for remote assistance scenarios. The system they propose is an interesting
scenarios. The complexity of these environments and entry point on these questions.
tasks plays a major role here. Surgeons often do not
Figure 8: TeleAdvisor, an AR remote assistance system by Gurevich et al [21]. Picture courtesy from the authors.
P17
3.1 Remote assistance challenges
The first characteristic of remote assistance noted by
the authors is its asymmetry. The remote helper has
knowledge of the task to be performed but cannot
access the physical environment while the local worker
is inside this environment but does not know how to
proceed. This difference raises major constraints on the
cooperation. Only the local worker can concretely act
to achieve the physical task that must be performed.
Besides, the two users are not in the same location, and
thus cannot see each other.
Studies have shown that beyond having audio Figure 9: Example of deictic gesture: a user in AR pointing at
communications, sharing a common visual view is a component on a virtual machine.
crucial for cooperative tasks [31]. Being able to see The question is now to determine how to make these
the gestures of the other user significantly facilitates gestures perceivable for both users. Imitating common
the communication. This is typically the case for deictic videoconference tools by adding a video screen in each
gestures, i.e. gestures made to designate a specific workspace could seem a suitable solution. Nonetheless,
location or point at a given object (see Figure 9). The it would force the local worker to visually focus on both
“Put-that-there” metaphor [8] is a well know HCI the task to be performed and the distant screen. Such
example of voice command linked with a deictic gesture. a divided attention task a heavily impact performance.
This kind of multimodal interaction is very common in Displaying the video on a mobile device may solve this
everyday life scenarios, especially when working within issue, but at the cost of mobilizing one local worker’s
a group. hand.
DID YOU KNOW? 3.2 AR for remote assistance

Most of the times yes, the local worker is the only Augmented Reality is a promising technology for remote
one able to interact with the physical environment assistance because it addresses many of these issues.
to perform the task. However, this may change in In particular, AR with an HMD leaves both user hands
the incoming years thanks to Digital Twins [25]. free and allows to display virtual content in the current
This technology allows to recreate an exact virtual Field of View (FoV) of the user. This approach may limit
replica of a given physical system (for instance, divided attention side effects compared to a distant
a building). Many sets of sensors can be used to monitor approach [54]. However, virtual objects can still
make sure that the virtual replica reflects in real impact attentiveness because they can distract users
time the state of its physical twin. and prevent them for noticing real anomalies in the
workspace [16].
What is the link with remote assistance? In fact,
the data connection between the twins goes
in both directions. This means that interacting
Besides, Gurevich et al. highlight in their state of the art
with the virtual version of a production line could
that with HMDs, cameras are directly linked with the
also impact its physical version by sending the
head position [11]. This allows the local worker to share
corresponding commands to the real machines.
a real-time, movable view of the workspace, sure. But it
Digital twins may thus allow a remote expert in MR
also means that the remote helper has no control over
to directly influence the physical environment!
this view and is constrained to look at the same location.
Head jittering (“shaky-cam” effect) and sudden head
movements can also disturb the other user.
P18
Mobile AR AR with HMD Projection-based AR
Benefits Mobility Hands free No jittering

Mobility No equipment on user
Limitations Require at least 1 hand Head jittering No mobility
Hand jittering Local worker dependent
Local worker dependent
Table 3: Comparison of the classical benefits and limitations of the

three main approaches for AR.
In their work, Gurevich et al. focus on projector-based workspace objects and virtual drawing while orally
AR [21]. Early studies on this approach used mainly giving explanations and details [21].
pointers projected in the local worker environment
To achieve such a result, the authors conceived a device
while later studies explored sharing hand gestures
regrouping two cameras and a projector fixed on the
and annotations from the remote helper [28]. With
same 5-DOF articulated arm (Figure 10). This arm is
TeleAdvisor, the authors aim at going one step further
itself placed on a robotic wheeled workstation with a
by overcoming the lack of mobility of fixed projector AR
laptop handling computations and communications. The
solutions. Of course, mobility is required when the local
remote helper can control both the wheeled workstation
user moves between different locations, for instance if a
and the robotic arm to view the workspace and project
technician needs to inspect machines in different rooms.
AR content from many different viewpoints.
However, mobility is also a key feature to allow a remote
helper to get a different point of view inside a given
workspace without disturbing the local worker. A significant challenge with this kind of approach is
to make sure correctly synchronize the observed view
(from cameras) and the projection view. If the mapping
3.3 Design and implementation of TeleAdvisor between the two is erroneous, the local worker will
The TeleAdvisor system is designed to achieve see projected virtual objects with a significant spatial
independent view navigation. In other words, it allows offset… While the remote helper will be convinced to
the remote helper to freely explore the workspace in be perfectly aligned with real world objects! This has of
real-time, independently of the local worker. Gurevich course a non-negligible impact on performance, causing
et al. wanted to reproduce the metaphor of a someone confusion and creating errors.
looking above the shoulder of a user to see what he/
she is doing, providing visual guidance by pointing at
Figure 10: The second TeleAdvisor prototype conceived by [21]. Picture courtesy of the authors.
P19
With a non-movable system and a fixed workspace, this different camera locations in order to automatically go
issue may be addressed thanks to a careful calibration back to these positions later on [21]. This concept is
[28]. However, TeleAdvisor is a mobile solution. It thus close to the discrete viewpoint switching technique in
requires a dynamic camera-projector views mapping AR conceived by Sukan et al. [55].
which consider in real time the distance between
the projector and the location virtual objects are
projected on. The authors propose an approach based 3.4 Evaluation and limitations of the system
on an offline calibration (for stereo cameras and the Experimental evaluations of TeleAdvisor suggest two
projector) followed by a real-time correction based on main results. First, the system seems to be a promising
homography. Discussing the technical implementation tool for remote assistance. Participants were able to
of this procedure is out of the scope of this book but all use the system effectively and mainly focused on the
details can be found in the paper [21]. free hand tool for drawing annotations. Qualitative
results indicate that TeleAdvisor was judged intuitive
and very useful. Secondly, the authors also compared
To control the robotic arm and change the point of view,
the classical system (remote helper controlling the
the remote helper has access to a 2D Graphical User
view) with an alternate one where the local worker is
Interface (GUI) on a traditional computer. This approach
in charge of physically moving the arm. Results suggest
has the benefit of being straightforward to use. The
that it may be better to let the remote helper manage
remote helper point at real objects by sharing a virtual
the view. Such a phenomenon may seem intuitive, but it
cursor, draw annotations and insert text and predefined
needed to be confirmed experimentally and quantified.
shapes into the workspace (Figure 11). This can be
achieved either by mouse and keyboard or by using a Nonetheless, TeleAdvisor still comes with a few
touch screen. limitations. While using a 2D GUI on a computer makes
the learning phase straightforward, it also drastically
limits the remote helper immersion. This limitation may
The authors made the choice of having a single impact performance and usability in complex large-
exocentric view of the workspace instead of multiple sized workspaces or when the task to be performed
ones. Multiples views is a common paradigm in involves a lot of 3D interaction. Besides, the feeling of
Information Visualization [59], but the authors argue telepresence is limited for the local worker, who cannot
than it may also be confusing. Gurevich et al. propose see hand gestures or facial expressions made by the
positional bookmarks instead: the ability to save remote helper.
KEY TAKE-AWAYS
TeleAdvisor is a great example of remote assistance
system based on projected AR. It involves many
important cooperation features such as free hand
drawing and independent navigation for the
remote helper.
The immersion feeling is however limited for the
remote helper.
Figure 11: The Graphical User Interface of TeleAdvisor for the

remote helper. Picture courtesy from the authors.
P20
AppLIcAtIon domAInS for remote ASSIStAnce
ReMOte assIstaNCe IN
aUGMeNted sURGeRy
Article : Andersen, D. et al. 2016. Virtual annotations In this chapter, we present a paper focusing on
of the surgical field through an augmented reality remote assistance for surgical operations, also called
transparent display. The Visual Computer. 32, 11 (Nov. telesurgery. Andersen et al. proposed a collaborative
2016), 1481–1498. DOI:https://doi.org/10.1007/s00371-
system where a remote expert surgeon can create
015-1135-6.
AR content like annotations and virtual instructions to
Mixed Reality has impacted many industrial sectors guide a local trainee [1]. This local trainee is inside the
and the TeleAdvisor system [21] presented in the operating room and visualizes the AR content thanks to
previous chapter is an example of remote assistance a tablet fixed above the patient’s body. An overview of
system among many. However, there is another large the system is available in Figure 12.
application domain where MR is more and more
investigated and used: the medical field, and especially
Before entering into the details of the system proposed
surgery. Adding medical information into the operating
by Andersen et al., we will start by reviewing the
room or directly superimposing it on the patient’s body
challenges of surgery in MR.
is a very interesting feature to facilitate the work of
surgeons. Nonetheless, MR goes beyond this addition of
virtual content: it also bring further remote assistance
features. In some cases, surgeons need to use a specific
surgical procedure they are not fully familiar with.
The assistance of expert colleagues thus becomes a
valuable support.
FiGURe 12
Figure 12: The envisioned system proposed by Andersen et al.: a tablet above the patient’s body acting as a
«transparent» AR display [1]. Images courtesy from the authors.
P21
4.1 Challenges of MR for surgery
DID YOU KNOW?
Surgeries are long, complex and stressful procedures. In
addition to the complex technical gestures to perform, The different technologies of MR can be useful for
surgeons must adapt their work to the differences of surgery, but in different contexts [18]. For instance,
each patient’s body and sometimes take life-or-death VR can be useful for training, teaching and
decisions on the fly [13]. Surgeons thus have a significant patient reeducation purposes. However, during an
cognitive load during operation. Anything breaking their operation, surgeons need to focus on the patient’s
concentration or their feeling of being in control should body. That is why hereafter we mostly discuss
be avoided or removed from the operating room (OR). about AR.
AR has a lot of potential to support the work of surgeons because it facilitates the access to medical information. It
allows to visualize virtual content such as patient’s data and radios within the patient area. Virtual instruction and
medical information can also be directly superimposed on the patient’s body to guide surgical gestures. Instead of
going back and forth between the patient and a distant monitor, the surgeon can thus visually focus only on the
patient [5]. Nonetheless, the OR context imposes several strict constraints to surgeons which directly impact MR
usage, as detailed in Table 4.
Contraint Name Description
OR environment No contact with non-sterile object

No hand-held device (tablet, controllers…)
Asepsis Cannot reposition or clean the HMD with sterile gloves
No body-touch interaction techniques like [2]
High OR environment Holograms may be harder to see
luminosity Gestures may be more difficult to detect
Ambient OR environment Harder to use voice commands: noisy medical machines,
noise medical team communications, surgical masks…
High stress Surgical task Surgeons need to focus on the patient, not on MR content
and cognitive Must not disturb surgical workflow
load Must be able to turn off MR at any time
Surgical task Requires accurate real-time tracking and positioning of
Need of virtual content (order of magnitude: a few mm, sometimes
precision less)
Table 4: Overview of the main constraints in the Operating Room (OR) and their consequences on MR usage.
The main constraint is the asepsis: every object in

contact with the patient must have been sterilized DID YOU KNOW?
beforehand by following the appropriate procedure. To These constraints did not stopped Microsoft
reduce as much as possible the risks of infection for the from promoting the usage of the Hololens 2 for
patient, all medical team members also have a specific augmented surgery. After a first operation in AR
sterilization phase before entering the OR. For instance, at the end of 2017, the company organized in
surgeons wear sterile gloves and cannot touch non- February 2021 a 24 hours marathon of augmented
sterilized objects. HMD-based AR is compatible with surgeries. Surgeons wearing the HMD could see
OR requirements but is far from being a perfect solution. holograms in the OR and exchange in real-time
Since the HMD cannot be fully sterilized (electronic with remote colleagues. Followed by 15000 viewers
components would be damaged in the process), the from 130 countries, the event is a clear sign of the
surgeons cannot touch it after putting it. This can be an current interest about MR for surgery.
issue it the HMD needs to be repositioned on the head or
if some projections (blood for instance) reach the HMD
glass or sensors. Besides, wearing an HMD for extended
periods of time (up to several hours) can increase the
physical tiredness of surgeons.
P22
What about remote assistance? As mentioned before, gestures to draw different types of annotations,
surgeons may need to seek the help of colleagues for an representing different surgical gestures and tools
operation. It can be because they are facing a specific (incision, stitch and palpation) [1]. An overview of the
patient profile or because they need to perform a state- corresponding GUI is available in Figure 13.
of-the-art procedure they are not fully familiar with.
Since can for instance happen in rural hospitals where
surgeons perform less operations. Training surgeons As mentioned in Table 4, operations require precision
is difficult, costly and time-consuming while surgical from surgeons and a high manual dexterity. This
techniques are evolving quickly. Real-time guidance is crucial need of precision is already hard to match in a
thus a valuable tool, especially compared to transferring static context. However, the respiration cycle creates
patients to another hospital with specialists. movements within the patient’s body. Soft tissues may
be particularly difficult to track in real time because they
are easily deformed. To address this issue, the authors
4.2 Remotely guiding a surgeon in AR proposed an annotation anchoring approach based on
reference video frames with OpenCV (for more details,
The paper of Andersen et al. focus on this need of remote
please refer to the paper).
cooperation in the operation room [1]. The authors
envision an AR system based on tablets, as illustrated in
Figure 12. To respect the asepsis constraints, the tablet is
Andersen et al. conducted three evaluations on their
not hand-held by the surgeon but fixed on a mechanical
prototype [1]. First, they conducted a performance test
arm above the patient. Thanks to its camera, the tablet
to check the robustness of their annotation anchoring.
acts as a “transparent” device through which the
Then, they collected qualitative feedback during a
patient body can be seen. In addition, virtual AR content
usability study with two surgeons. Finally, the authors
created by the remote expert is displayed to guide the
compared their AR system to a classical monitorbased
surgeon. The surgeon do not need to hold the tablet,
one during a pilot study (participants did not have a
which is suitable in the OR (hands free, no contact with
medical background). An overview of experimental
sterile gloves). However, if really needed, the position
results is available in Table 5.
and orientation of the tablet can still be adjusted.
The remote expert receives the real-time video stream
from the local surgeon’s tablet and can see the patient
body. This remote expert is not in the OR and is thus
not affected by its constraints. The authors proposed
a touch-based interface on a tablet to create virtual
annotations. They implemented three main hand
Figure 13: The system interface for the remote expert. Images courtesy from the authors.
P23
Test Observed results Experimental results suggest that the proposed system
has potential for remote assistance in augmented
surgery. Preserving the visual attention of users (less
visual shifts) and allowing them to perform more precise
gestures are key desired outcomes of MR. The ability for
Performance test System fairly robust to tablet remote experts to add virtual annotations anchored on
movements and occlusions the surgical field is also a great step forwards compared
to classical oral guidance only.
The system is fully implemented in the two tablets:
Deformations on patient’s body no computation is done by an external device. This is
cause much more issues
a valuable design choice for surgery since the OR is
a resource-limited environment. Nonetheless, some
participants of the pilot study reported than the lack of
Surgical field area: need lowest la- depth perception from the tablet screen increased the
tency and highest framerate
framerat task difficulty. It would be interesting to compare an
improved version of this prototype with a HMD-based
Usability study
with surgeons approach.
The GUI for remote expert was per-
ceived as complex Moreover, qualitative feedback from the usability study
highlight the strong need to include surgeons in the
design of remote assistance systems. Surgeons are
particular users facing unique challenges and specific
Less visual focus shifts with AR constraints in the OR: generic designs, interfaces and
Pilot study
MR interaction techniques may not be adapted to the
surgical context.
Slightly slower with AR but much
better accuracy
Table 5: Overview of experimental results from the three evaluations

conducted by Andersen et al. [1].
KEY TAKE-AWAYS
Surgery is a key application domain for MR and remote assistance, but raises unique challenges related to
the operating room environment and the complexity of surgical procedures.
Instead of using an HMD, a tablet above the patient’s body is an interesting approach to visualize anchored
virtual content. This approach respects the constraints of surgeons and can facilitate remote guidance.
P24
the zoom:
groUpwAre
Groupware are a specific type of software designed for cooperative and collaborative tasks. They are built upon a
well-known statement: groups are complex social entities which are difficult to study. Many social and psychological
factors can influence the activity of a group: the location and personality of its members, the number of participants,
the chosen method to take decisions and handle conflicts…
It is thus difficult to design a piece of software adapted to a task with concurrent users. Yes, a Google Doc can do
the trick for a short school report, but have you tried to use it to redact a complete European project call with many
partners? You may soon realize that many key-features are missing to work together efficiently…
Many conceptual tools have been proposed in the Test Description – Related features
literature to analyze groupware [23, 52]. For instance,
ergonomic criteria regroup a set of properties like group
awareness and identification. The table has the right Being conscious of the activity of
below gives an overview of a few of them: others
Group awareness
Observing, making public or filtering

elements
Observability of
resources
and actions
Having the same unique view (“What

You See Is What I
See”) or different views for each user
Level of coupling
And many others…
Remote assistance solutions can be considered as a subset of groupware. It may thus be valuable to havea look at
the design guidelines and past CSCW studies about groupware. They can simply give ideas or inform the conception
of the whole system!
P25
SECTION 3
Visually representing
USERS AND
THEIR ACTIVITY
Visual cues and social presence 27
Avatars and telepresence 31
Mini-me: miniature avatars 35
VISUALLY representing users and their activity
VISUAL CUES FOR SOCIAL

PRESENCE IN MR
Article : Teo, T. et al. 2019. Investigating the use of a powerful tool for sure, but we are not quite able to
Different Visual Cues to Improve Social Presence within project perfect representation of ourselves like in many
a 360 Mixed Reality Remote Collaboration. The 17th SF books and movies..
International Conference on Virtual-Reality Continuum
and its Applications in Industry (Brisbane QLD
Australia, Nov. 2019), 1–9.
In this chapter, we present a paper from Teo and al.
Giving the feeling that local and remote users are about this feeling of telepresence [57]. The authors
working next to each other in the same environment focused on a 360° panorama system in MR and
can have a significant impact on user experience and investigated different cues to increase this feeling and
performance. In fact, this goes further than remote to facilitate the collaboration between remote users.
assistance scenarios: remote communications in general While visual cues like shared pointers and drawings
can benefit from “adding human into the loop”, of may seem straightforward, it is interesting to see their
getting closer to face-to-face physical exchanges. This benefits and challenges in the case of a 3D environment
is particularly true in the current Covid-19 pandemic in MR. Besides, this paper regroups three different
context, were technological tools must be used to stay experimental evaluations and extract valuable design
connected to others. However, simple videos on a 2D guidelines from them.
screen are far from giving the feeling of really being
together. How can we achieve such a result? MR is
DID YOU KNOW?

Physicists are currently struggling to teleport away even a single molecule in a controlled environment.
Nonetheless, while the laws of Physics may be stubborn, virtual teleportation seems much more feasible in a
not so far future. Holoportation (for holographic teleportation) consists in capturing a volumetric video of users
in real time and displaying the corresponding hologram in the same shared environment. For an overview of
what it currently looks like, we recommend you this video (https://www.youtube.com/watch?v=Yy8XoPsbAk4
) from the i2CAT foundation about holoconferences.
Star Wars better watch out!
5.1 Different aspects of presence about spatial presence includes questions about the
feeling of being part of the environment, of being
Let’s start with a bit of terminology. The notion of remote
physically present there and being part of the action
presence is complex and similarly to MR, multiples
[58]. To some extends, it is similar to the concept of
definitions have been used over time to characterize it.
immersion.
In their work, the authors focus on two different aspects:
• Social presence (also called co-presence), on the
• Spatial presence refers to the feeling of self-location
contrary, is focused on others. It refers to the feeling
in a given environment and the perception of possible
than other users are “there” with us [41]. Social presence
actions within it. For instance, the MEC questionnaire
is linked with the realism of the representation of other
users and the feeling of being connected to them through to study both spatial and social presence in the context
the medium. This aspect is particularly important for of remote collaboration in MR [57]. The local worker
collaborative tasks and remote assistance. wears a 360 camera mounted on top of an AR HMD (the
Hololens). This camera records a live 360 video of the
scene, allowing the remote helper in VR to be completely
What about telepresence then? This term refers to the immersed in the environment of the local worker (cf
notion of presence through a technological medium. It Figure 15). Even better, it becomes possible to have
is thus a broader concept englobing spatial and social independent viewpoints between users: the remote
presence, as shown in Figure 14. helper is not restricted to the current field of view of the
In their work, Teo et al. used a 360° panorama system local worker but has access to the whole panorama.
Presence Telepresence
Figure 14: Comparison of the notions of presence and telepresence.
Of course, this solution still has some limitations. The remote user can freely execute rotations (“turning the head”)
but is still restricted to the physical position of the local worker. It is thus not possible for the remote helper to
translate to other positions in the workspace to have a different point of view. Besides, it may be hard to be aware
of where the other user is currently looking at and 360 camera can also be affected by head tremors which might
impact user comfort.
VR HMD
360° Camera
Hand tracker
AR HMD
LIVE 360 VIDEO
FiGURe 15
VR controller
Remote User Local User
Figure 15: Overview of the 360 panorama MR system used in [57].
P28
5.2 Improving collaboration using visual cues local worker. These annotation are spatially registered
in the real environment and thus stay at the same fixed
Many studies have proposed visual cues to facilitate
position independently of user movements.
the collaboration between remote users in MR. Here,
Teo et al. implemented several visual feedback for hand The authors also added two additional visual cues
gestures [36]. The remote user’s hand is represented related to users field of view. The View Frame is a
in AR by a virtual model in the FOV of the local user. colored rectangle indicating the FOV of the other user
Pointing gestures are supported by drawing the while the View Arrow always points at the View Frame.
corresponding virtual ray. This line is drawn 1) from the This arrow becomes visible as soon as the view frame
extremity of the local user’s finger or 2) from the head of is out of sight to help users know where the other is
the VR controller of the remote helper. A dot at the end looking at. Figure 16 gives an overview of the visual
of this ray can be used as a precise cursor. Moreover, cues implemented by the authors.
the remote helper can also draw annotation to guide the
Figure 14: Visual cues considered by Teo et al. [57]. Image courtesy from the authors.
Teo et al. conducted three experimental studies to threshold about this number as too much cues would
investigate the effect of these visual cues on social and create visual occlusion. This is particularly true for AR
spatial presence and user experience: HMDs like the Hololens: many users reported that their
experience was negatively impacted by the limited size
• Study A compared individual visual cues. Four
of the augmented FOV.
conditions were considered: no visual cues, virtual hand
only, pointing ray only and virtual hand + pointing ray.
In each condition, verbal communications were allowed
DID YOU KNOW?
and View frames/arrows available.
The field of vision of humans is close to 180°
• Study B focused on two conditions: virtual hand only
horizontally and 125° vertically. Even if our gaze
vs virtual hand + annotations. Contrary to study A, users
converges on a precise point only at a time, the
had to perform an asymmetric task this time: instead of
different sectors of the peripheral vision still allow
having the same role, users were either acting as worker
us to perceive colors and movements. Therefore,
or helper.
having augmented FOV of 30-40° (horizontally,
• Study C explored users preference about the different and even less vertically) with current AR headsets
visual cues, allowing them to switch at will between represents a strong limitation. The challenge is
conditions. to address optical issues (distortions, luminance
of virtual objects, and so on) while preserving
Each time, users were performing a collaborative task
use comfort (eye tiredness, bulkiness of head
based on decorating or filling a bookshelf with different
equipment…).
objects.
Some studies have nonetheless investigated
Experimental results suggest than more than a single
the effects of having a large AR FOV [30], with
visual cue, it is the combination of several cues that
sometimes surprising results. For instance, it seems
matters. Such a combination can increase social
than having a bigger FOV does not necessarily lead
presence, partially improve spatial presence and reduce
to better performance in visual search tasks [30].
subjective cognitive load [57]. The number of visual cues
thus plays a significant role. However, there may be a
Another interesting result is the Design Description
effect of user roles. Teo et al. guideline
observed that the combination of DG1
visual cues had an effect mostly on The size and number of visual cues must match with the size of
the local user and not on the remote the FoV. The benefits of combining several cues may be lost if
they create too much visual occlusion.
user [57]. This role asymmetry
was also observed for subjective DG2
preferences: local users were more Ensure that hand trackers can handle a suitable range of
different angles. The goal is to convey natural gestures.
interested on easily-noticeable cues
while remote users preferred cues DG3
The relevance of the different visual cues depends on the task
that were easy to use. The nature of
to be performed. However, a shared pointer can be a primary
the task may also influence results: cue for many tasks, followed by drawings and then hand
hand gestures may be more useful gestures.
for object manipulation tasks while a
shared pointer may be more suitable Table 6: Design guidelines about MR systems for remote collaboration [57].
for time-critical tasks.
The authors highlighted the fact that participants mainly used verbal communication to achieve the task. Visual
cues were only supporting oral exchanges and were mostly used when verbal communication was not efficient.
Finally, Teo et al. proposed three design guidelines based on their experimental results, as shown in Table 6.
It is interesting to notice than a similar study of Bai et al. [4], which also included a visual feedback for user eye-
gaze, also led to similar results. This ray-cast gaze cue even gave better results than the combination of other cues
for spatial layout and self-location awareness. Eye-gaze thus seems an promising candidate modality to support
with visual cues.
KEY TAKE-AWAYS
Combining several visual cues can support remote collaboration by increasing social and spatial presence.
These visual cues must be chosen with respect to the nature of the task and the roles of users to be
efficient. In particular, the local worker should have access in priority to visual cues as the remote helper
may not benefit as much from them.
P30
VISUALLY repreSenting USerS And their ActiVitY
aVataR aNd telePReseNCe

OF ReMOte tUtOR
Article : Cao, Y. et al. 2020. An Exploratory Study of To start exploring this topic of user representation, Cao
Augmented Reality Presence for Tutoring Machine et al. recently proposed an experimental study of AR
Tasks. Proceedings of the 2020 CHI Conference on presence for a remote tutoring task [12]. The authors
Human Factors in Computing Systems (Honolulu HI
investigate different representations for the remote tutor,
USA, Apr. 2020), 1–13.
from a simple location cue to a full-body avatar in AR. An
In the previous chapter, we focused on visual feedback overview of the considered representations is available
related to the current activity of users. However, instead at Figure 17. In addition to the valuable experimental
of simple activity cues, can we not directly represent results based on different types of interaction, choosing
remote users? Would it not be better to display the whole this paper also allow us to explore the domain of remote
user body in MR in order to convey the whole non-verbal tutoring, which shares many similarities with remote
communication? While this option seems reasonable, it assistance.
also raises many questions. What would be the best
approach to represent the user full-body? Is a realistic
representation always better than a “cartoonish” one?
What about visual occlusion with a human-sized
avatar?
(1) - a Video (3) - c Full-body+AR
(2) - b Non-Avatar-AR (4) - d Half-body+AR
Figure17: The different remote user representations explored by Cao et al. [12]. Image courtesy from the authors.
P31
6.1 Industry 4.0 and machine tasks by adding virtual instructions into the workspace. For
synchronous remote group work, sharing visual cues
The authors start their work by highlighting the need of
allow to guide the attention of other users and to convey
adapted formations for Industry 4.0 workers. Industry
their current activity.
4.0 aims at transforming traditional factories into “smart
factories” thanks to the Internet of Things (IoT) and
autonomous systems including AI and Machine Learning
technologies (often called cyber-physical systems). In DID YOU KNOW?
other words, new processes and equipment are quickly Despite its potential, MR is still far from being
emerging and workers need to adapt to these changes. deployed in a majority of factories. There are
In particular, they need to master new machines and of course technical limitations related to the
systems. technology itself (size of FoV, limited computational
AR has been and is still currently explored as a power on current HMD, and so on). But beyond
promising tool to facilitate learning phases and tutoring these, an appropriate network infrastructure is
sessions in industrial scenarios. Such scenarios include also required to support MR usages for remote
maintenance training on machines and vehicles, facility assistance, especially in terms of latency and
monitoring and mechanical part assembly [12]. AR also bandwidth.
allows to share complex 3D virtual objects of tools and European projects like Evolved-5G aim at
machine components [39]. For asynchronous scenarios overcoming this gap using 5G network capabilities.
like tutoring sessions motivating this paper, experts can Meanwhile, researchers already started working
create guidance content in advance by recording videos on 6G [51]. See you in a few years!
of the procedure to be performed (see Figure 17a) and
Nonetheless, Cao et al. argue that many previous studies 2. Body-coordinated steps imply a two-handed action
and tutoring systems only consider local tasks, i.e. steps requiring body, hand and eye coordination to be
that can be performed within arm’s reach [12]. In that achieved. Turning two knobs at the same time while
case, simply adding virtual content on the machine monitoring their effect on a temperature gauge would
may be enough to guide the local worker. Nonetheless, fall into this category.
machine tasks may require larger spatial movements
3. Spatial steps require a significant navigation phase
like moving inside the workspace or physically turning
before the machine interaction. An example of spatial
around a large machine. Could adding an avatar
step could be to look for a specific tool a few meters
to represent this kind of movement help users? To
away in the workspace before using it on the machine.
investigate this question, the authors explored here
types of steps illustrated in Figure 18:
1. Local steps can be performed with one hand and What is the link between these three types of steps and
without body-scale movement. For instance, pressing a our chapter topic about avatars? In one sentence: the
switch button within arm’s reach. authors explore different visual representations of the
remote tutor with respect to these different steps.
(1) (2) (3)
Figure 18: The different types of steps identified by Cao et al. [12]. Image courtesy from the authors.
P32
6.2 Visually representing the remote user sessions of machine tasks, where each session included
a mix the different types of steps.
Cao et al. explored four different representations of
the remote user, as illustrated in Figure 17. The Video Overall, the two avatar conditions (Half-Body+AR and
condition represents the classical non-AR baseline. The Full-Body+AR) were preferred over the two baselines
Non-avatar-AR condition represents the standard AR (Video and Non-avatar-AR) [12]. However, most
approach, with virtual content superimposed on the participants preferred the Half-Body+AR condition
machine and a circle representing the location of the because it created less visual occlusion. Quantitative
remote user. The Half-Body+AR conditions builds on results support this preference: participants were
the previous one by adding a partial avatar with only quicker when using this representation while keeping
a head, a torso and hands. Finally, the Full-Body+AR the same level of accuracy. The authors suggest that
condition completes this avatar by adding arms and by masking a larger section of the machine, the Full-
legs on it. Overall, there is thus a progressive increase of Body+AR representation may have increased user
visual information about the remote tutor. cognitive load and attention distraction.
To evaluate these representations and their impact on Nonetheless, experimental results also highlight
social presence, the authors conceived a conducted the importance of the type of task. The Full-
an experimental study based on the mockup machine Body+AR condition was perceived as the most useful
illustrated in Figure 19. representation for body-coordinated tasks and gave a
better feeling of social presence. Using a representation
closer to a real human made the tutor more “friendly
This testbed machine was conceived with two goals in and believable”. Meanwhile, the Non-avatar-AR was
mind: 1) reproducing interaction metaphors found on the favorite and quickest condition for local tasks. In that
real machines (with physical widgets like knobs and case, avatar representations provided little to benefit, or
levers) and 2) allowing to test local, bodycoordinated where even judged cumbersome by some participants.
and spatial steps. Participants were invited to perform 4
Figure 19: Experimental setup used by Cao et al. [12]. Edited pictures courtesy of the authors.
Overall, it thus seems that the type of task should be

DID YOU KNOW? considered by interaction designers as a major factor, as
Getting closer to a realistic human representation summarized in Table 7.
may provide benefits, but be careful about the
uncanny valley [62]! This famous effect was
Another observation made by the authors concerns
proposed as early as 1970. Mori theorized that
the tutor following paradigm. One the one hand, some
at a given point, humans would feel repulsion
participants preferred staying “inside” of the tutor
in front of robots that are too close to humans
avatar and reproduce its gestures synchronously. This
in terms of appearance and motions. Instead of
allowed them to have a first person view of the gestures
trying to be as realistic as possible, other studies
to be performed. On the other hand, other participants
focus on other approaches to trigger a positive
preferred to stay apart from the tutor avatar. They
emotional response about machines. For instance,
explained that they preferred this third person view
Herdel et al. explored giving cartoonish (and cute)
because they felt uneasy with colliding into a virtual
expressions to drones [22].
humanoid. This effect was already observed in a
P33
previous study [27]. Two guidelines may be extracted
from these observations:
• It is important to let users choose between a first person
and third person view of the remote user’s avatar. KEY TAKE-AWAYS
• Spatially aware avatars avoiding “collisions” with Avatars are useful to represent a remote user in MR.
humans may increase the comfort of some users. They can increase performance, social presence
while reducing subjective cognitive load.
More generally, Cao et al. suggest following a user-
responsive design for tutoring [12]. Beyond having a However, their size and level of visual details should
avatar aware of the movements of users, this also mean be considered carefully to limit visual occlusion.
adapting the AR content to the activity of users. For The responsiveness to the activity of users is also
instance, a recorded tutor avatar could be active only important.
when workers are looking at it to avoid disturbing their Both first-person and third person views of the
attention. remote user avatar can be useful, depending on
Of course, classical cases of remote assistance is a users.
bit different since it is users form both sides are often
working synchronously. Nonetheless, most of the
findings of this paper can be generalized to other MR
remote group work contexts.
Type of task Representation to consider Reasons
Local Only local AR content Avatar provide limited to no benefit

Better performance and comfort without it
Body-coordinated Full-body avatar Increased social presence
Spatial Half-body avatar Limited visual occlusion

Preferred overall
Overall Half-body avatar
Table 7: Visual representations to consider depending of the nature of the task.
P34
VISUALLY repreSenting USerS And their ActiVitY
MINI-Me: addING a MINIatURe

adaPtatIVe aVataR
Article : Piumsomboon, T. et al. 2018. Mini-Me: in MR. The paper presented in this chapter, Mini-Me,
An Adaptive Avatar for Mixed Reality Remote explores an innovative concept: adding a second,
Collaboration. Proceedings of the 2018 CHI Conference miniature avatar to complete the traditional human-
on Human Factors in Computing Systems (Montreal
sized one. The remote user in VR is thus represented
QC Canada, Apr. 2018), 1–13.
by two avatars with different scale, location and
Mixed Reality allows to explore many dimensions and orientation. The Mini-Me avatar reflects the eye-gaze
new concepts for group work and remote assistance, direction and the gestures of the remote user and stays
including scaling. Scaling virtual objects allows to within the local worker’s FoV. An overview of the system
make them bigger to see specific details or to shrunk is available in Figure 20.
them down to prevent them from occupying too much
space. In both VR and AR, scaling virtual objects is
Instead of playing on the amount of visible details of the
now straightforward and natively available in existing
avatar like in the previous chapter [12], Piumsomboon
frameworks. In some cases, a whole environment can
et al. thus play on its duplication and its size [46].
be unscaled, for instance to obtain a World-In-Miniature
This approach is an interesting compromise between
(WIM) [14]. More exotically, it is also possible to change
increase the feeling of social presence without creating
the scale of users. This can give the same impression
too much visual occlusion.
than having a WIM if the user becomes gigantic
compared to the environment. Or, on the contrary, MR
can be used to transform the user into the equivalent of
AntMan, lost in a world much bigger than usual.
Thammathip Piumsomboon, an HCI researcher working
on immersive technologies and AI, proposed several
studies built around these concepts of different scales
Figure 20: Overview of the Mini-Me system [46]. The local user in AR can see two avatars conveying the activity of the remote
user: a human-sized avatar and a miniature one. Image courtesy from the authors.
P35
7.1 Design of the Mini-Me system
DID YOU KNOW?
The authors start by motivating their work: they state
that MR group work between AR and VR users may Remote embodiment is a type of activity cues
become commonplace in the future. Their system is thus based on physical states representations. They
designed for a local user in AR with the Hololens HMD convey body information like location, pose and
and a remote user in VR. Both users share the workspace kinematics. Avatars are the one of the most
as the system targets room-scale scenarios. Mini-Me common approach for remote embodiment, but
builds on previous work about remote embodiment and they do not necessarily need to represent the
user miniaturization. full body. For instance, Eckhoff et al. proposed a
pipeline to extract the tutor hand gestures on a
first-aid video and to display the corresponding
Piumsomboon et al. identified several issues and needs hands in AR over a training dummy [17].
related to group work in MR and used them to guide
the design of their system. We already mentioned
some of these problems: the limited size of augmented
different perspective, as shown in Figure 21. The authors
FoV in current HMDs, the need to share non-verbal
thus applied a specific shader (toon shader) to the Mini-
communication cues or to know the location of remote
Me avatar to make it more distinguishable from the main
users… However, the authors also identified other
avatar. A ring indicator is also displayed around the feet
requirements like the need of transitions when an avatar
of the Mini-Me and to indicates the direction of the VR
becomes visible to users or when it disappears. The
user. This additional feedback seems particularly useful
goal is to respect social conventions to avoid disturbing
in this kind of setup because the VR user can move using
users. Gracefully entering or exiting the user’s FoV? Yes.
teleportation.
Jumpscares? No thank you. Therefore, the authors added
a blue aura around the miniature avatar (see Figure 20).
This aura indicates in advance the proximity of avatar.
The authors also considered the size and the positioning
While the authors first intention was to use this halo
of the Mini-Me avatar in the local user’s FoV. Always
only during the avatar enters or exit the user’s FoV, their
placing the miniature avatar in front of the gaze of the
realized that such a temporary visual effect was mostly
user was too distracting. Placing it on one side of the
disturbing for participants. Therefore, they transform
HMD screen was better but still created significant
the transient initial aura into a permanent visual cue.
visual occlusion. Therefore, Piumsomboon et al. made a
third iteration made a third design iteration. In this final
version, the scale of the Mini-Me avatar is dynamically
Another specific requirement concerns the ability to
adapted by taking into account it’s distance to the
differentiate easily the two avatars of the remote user.
AR user. Besides, the surface where the user’s gaze
In a classical scenario, the difference of size between
is projected also influences the miniature avatar. For
the two would be an reliable indicator for the local user.
instance, if the user is looking at a table, the Mini-Me will
Nonetheless, the system allow the remote VR user to
appear as if it was standing on it.
scale up or down to explore the environment from a
Figure 21: Different scales of the remote user in VR. a) VR user shrunk down, seeing the AR user (woman) as a giant. b) How
the AR user sees the miniaturized remote user (small avatar inside the dome). c) VR user (man) scaled up as a giant. Image
courtesy from the authors [46].
P36
To convey the activity of the remote user, the Mini-Me reflects both gaze and pointing gestures. Its head is controlled
to always face the point the remote VR user is currently looking at. Similarly, the arms of the miniature avatar are
linked to the real-time tracking data from the VR controllers and the HMD. A visual ray is emerging from the finger
of the avatar also appears to reflect pointing actions, as shown in Figure 22a. Inverse kinematics are then applied
to obtain a coherent body pose.
Overall, the authors thus present a complete system with several interesting features about this combination of
avatars. The rest of their paper is focus on its evaluation through an experimental study.
Figure 22: Mini-Me features. A) Reflecting eye gaze and pointing gestures of the remote VR user. b) Merging with human-sized
avatar when the local user is looking at it. c) Pinning the Mini-Me at a fixed location to prevent it from following the gaze of the
local user. Image courtesy from the authors.
7.2 Experimental results for cooperative and collaborative tasks

To evaluate the benefits and usability of their system, Piumsomboon et al. conducted an experimental study [46].
In the baseline, the Mini-Me avatar is absent: only the human-sized avatar is visible and reflects the location
and actions of the remote user. Independently of the condition, the remote user was played by an experimenter:
participants always had the role of the local user in AR.
The study was divided in two tasks: a cooperative task (called asymmetric collaboration task by the authors) and
a collaborative task. In the cooperative task, the remote helper was guiding the participant to organize a retail
shelf following a precise configuration. Only the remote helper knew this configuration and only the participant
could place the AR objects on the shelf. This task thus perfectly reflected a remote assistance scenario. In the
collaborative task, participants had to solve an AR puzzle game together and had equal roles.
Experimental results suggest that overall, the Mini-

DID YOU KNOW? Me avatar increase the social presence of the remote
user. Most participants found that Mini-Me was very
This puzzle game was inspired by previous studies useful and 12/16 participants preferred it over that the
on group work based on AR [6]. More precisely, baseline. For instance, they reported that the miniature
they were using Tangible AR[7]: users were avatar required “less looking back and forth between the
manipulating squared tiles with a unique visual partner and the task space”. The adaptive positioning
pattern on each to be identified by the system. of Mini-Me may thus help to limit divided attention.
Such a configuration is often not required Objective data also suggest that participants achieve
anymore to interact with virtual objects in AR. the cooperating task faster with Mini-Me than without
However, using physical proxies to manipulate it. No quantitative result is reported for the collaborative
virtual objects still has valuable benefits! Haptic task as no time constraint was given to participants.
feedback, proprioception and object affordance Nonetheless, the realism of the human-sized avatar
are valuable tools which are often absent from was also appreciated and judged positively with respect
mid-air interaction techniques. to social presence. Moreover, participants paid similar
level of attentions to the remote user regardless of the
presence/absence of Mini-Me.
P37
Interestingly, the authors also observed differences
between the two types of tasks, as reported in Table 8.
KEY TAKE-AWAYS
The authors then draw a few implications for the design
Adding a secondary, miniature avatar reflecting
of groupware systems in MR. They encourage to use
the activity of the remote user is a promising
Mini-Me or an equivalent to reduce the need to look at
approach to support group work in MR (and
the other user. This may be especially for cooperative
especially remote cooperation).
tasks around a spatial layout, which is the case of many
remote assistance scenarios. Having an adaptative
avatar: 1) conveying the eye-gaze and pointing
It is important to consider the positioning and the
gestures and 2) visible at salient locations at any time
scale of this secondary avatar. It should be visible
may facilitate the task to be performed.
enough to increase user awareness without
Encouraged by these promising results, Piumsomboon disturbing the task.
et al. envision to bring further their work by adding
facial expressions to the Mini-Me avatar [46]. They also
mention going further than visual feedback by adding
spatial sound to the system.
Criteria Cooperative task Collaborative task
Perceived task
Lower with Mini-Me No observed effect
difficulty
Subjective cognitive
Lower with Mini-Me No observed effect
load
Level of task focus No observed effect Higher with Mini-Me…
Task completion time Lower with Mini-Me No time constraint
Table 8: Summary of observed differences between the cooperative task and the collaborative task.
P38
the zoom:
fULL- ImmerSIon AVAtArS
In several Science Fiction works, the notion of avatar can go to an extreme aspect: the complete immersion into
another body or mind. All it takes is a genius scientist, a complex machine with many strange lights and a bit of
scenarium to temporarily become someone else. This concept shares some similarities with the idea of the Ultimate
Display by Sutherland [56]: a system so perfect that it cannot be distinguished from the “classical”, unmediated
reality. It may not sound that much visionary nowadays as many authors and artists explored similar concepts, but
Sutherland wrote this report in 1965!
The concept of full-immersion is for instance present in Another recent example of full-immersion can be found
the Avatar. No, not the animated series about a flying in the recent Cyberpunk 2077 video game. Braindance
arrow-head monk (which are cool too, but that is not the is a bit different as it involves a full immersion into
point). Here, we are referring to James Cameron’s movie, someone’s memory. Once again, the five senses are
where the main character’s consciousness into the body involved as the subject experiences the senses and
of a blue humanoid alien. With a bit of training, Jake is emotions felt by the target at the selected moment. A
soon able to control this body and has access to all its concept maybe inspired from the Strange Days movie
five senses. from 1995, which story was originally written a few
years before by… James Cameron.
What is the link with remote assistance in MR? The fact that Science Fiction and other anticipation works have
always influenced technological progress (the other way around is also true). MR already raises questions about
the notion of reality and immersion. Of course, we are very far from full-immersion avatars, but the progresses
made in domains like Brain-Computer Interfaces (BCI) are impressive. Maybe future remote assistance systems
will be a mix of both?
P39
SECTION 4
Out-of-the-box
CONCEPTS
Light fields for mobile MR 41
Spatial referrencing 45
Using virtual replicas 49
oUt-of-the-box conceptS
UsING lIGht FIelds FOR

haNd-held MOBIle MR
Article : Mohr, P. et al. 2020. Mixed Reality Light Fields may be a barrier for on-the-fly remote assistance in
for Interactive Remote Assistance. unprepared environments.
Proceedings of the 2020 CHI Conference on Human
Factors in Computing Systems
(Honolulu HI USA, Apr. 2020), 1–12.
On the contrary, Mohr et al. proposed a remote assistance
system only based on smartphones [36]. This approach
So far, we mostly presented studies based on MR with
offer a great mobility with commonly found hardware.
HMDs. Depending on the setup, the mobility of users
Problem solved? Yes and no. MR with hand-held mobile
may be strictly limited to a predefined environment: for
devices has its own limitations: it blocks at least one user
instance, Cao et al. used external cameras in the room
hand to hold the device, can suffer from hand tremor
to track user gestures [10]. In other studies where only
and arm fatigue, offers a small augmented FoV… And it
HMDs are involved, we could imagine letting users freely
is far from being new [19].
move between different workspaces. The TeleAdvisor
system and its wheeled camera+projector robotic
arm [21] was even built with this purpose in mind.
The novelty of the work of Mohr et al. comes from their
Nonetheless, the significant mobility offered to users
innovative approach: the exploitation of unstructured
by these systems comes at the cost of a non-negligible
light fields. To learn more about this intriguing concept,
amount of hardware and calibration. This limitation
please follow the guide…
DID YOU KNOW?

AR with hand-held mobile devices is often compared to peephole pointing. This concepts refers to cases
where the workspace is much greater than the screen. Thus, a window of the virtual space (or the augmented
FoV) is moved to reveal the targeted content [26].
Many studies proposed new interaction techniques to guide users towards this offscreen hidden content
[44]. It is far from simply adding virtual arrows pointing at every objects!
8.1 MR light fields and system calibration Two main approaches to obtain such annotations are
discussed: scanning in advance the environment to
The authors identified several requirements in their
know its 3D features or doing it in real-time. On the
review of previous work. They begin by stating that
one hand, the former conflicts with the spontaneous,
adding visual remote instructions is an interesting and
on-the-fly remote assistance aimed by the authors,
well-known tool for remote assistance. However, using
and was thus not considered [36]. On the other hand,
2D overlays may only work with a static view. In a realistic,
real-time scanning approaches like Simultaneous
3D environment, both the local and remote users often
Localization And Mapping (SLAM) require high-quality
need to have dynamic and independent viewpoints. In
sensors and high computational power to obtain a good
that case, 3D spatially-registered annotations become
visual quality. These conditions are often not matched
necessary.
P41
with current mobile devices. Besides, geometric reconstructions may be hard to achieve in some specific conditions,
for instance with shiny or textureless objects.
To overcome the limitations of these approaches, Mohr et al. proposed an alternative based on a database of
images registered in 3D space. These images represent a sampling of the light rays emitted from the local user’s
workspace, (hence the light field terminology). The images are pictures taken by the local user under the guidance
of the remote user, as illustrated in Figure 23.
a/ Local user b/ Remote user c/Local user
Figure 23: Scene capture. a) The local worker takes pictures of the targeted workspace. b) The remote helper, using
an Augmented Virtuality interface, can explore the coarse scene and guide the local users to complete the scene
capture. c) The local user sees the virtual annotations in AR. Images courtesy from the authors [36].
Light fields require dense image spaces. Therefore, after the initial recording of a few reference images, the local
user must focus on specific positions proposed by the remote helper. To support this recording of dense local light
fields, a virtual sphere in AR is displayed. Its color indicates the level of sampling for each direction, as illustrated
in Figure 24. With enough image density, it is possible to obtain a photometric appearance of the workspace. In
addition, this high-quality view support a large variety of textures and materials like shiny, metallic or transparent
objects [36].
Local user
Figure 24: Visual feedback for dense light field recording. The triangles become more and more green as they gain
in sample density from the corresponding angle. Snapshots are taken automatically: the local the local user only
needs to move the mobile device. Images courtesy from the authors [36].
After the cooperative scene recording phase, the local user has thus recorded a set of local light fields. Using
only salient fragment of the environment allows to reduce the required network and computational resources:
it is thus adapted to the resources of current mobile devices. The downside of this approach is its limited depth
knowledge about the environment. Reconstructing all 3D surface information would be computation-heavy and
time expensive, which do not match with the mobile devices context. Fortunately, there are still ways to share
spatially-registered annotations.
P42
8.2 Adding annotations into the shared to mention that the remote helper can still translate and
workspace rotate (yaw and pitch rotations) the canvas afterwards
if needed using the provided GUI.
Mohr et al. included a 3D scene annotation feature in
their system [36]. Thanks to the scene recording phase, Overall, this approach allows to make a good trade-off
the remote user can navigate within the workspace until between the limited 3D information and the amount
finding a suitable viewpoint. A tap gesture on the tactile of computations required to create annotations in the
phone screen indicates to the system the main object of 3D workspace. Moreover, both users share the same
interest. Based on the depth of the corresponding area, system coordinates (the one from the local user during
the system then places a 2D plane on the scene. The the scene recording phase). Displaying the correct
remote helper can use this plane as a canvas to draw spatially registered annotations on the local user side
annotations and share them with the local worker. is thus straightforward. An overview of the resulting 3D
annotations is available in Figure 25.
Entering into the details of the automatic canvas
placement is beyond the scope of this chapter, but the
whole method is described in the paper [36]. It is worth
Figure 25: Overview of 3D annotations. a) The remote user draws an arrow on a 2D canvas. b) The local user sees
the corresponding virtual arrow in AR. Images courtesy from the authors [36].
8.3 Evaluating the usability of the system The main experimental results from these three studies
are reported in Table 9.
The authors conducted three experimental studies to
evaluate mainly the usability of their system: The current prototype is limited to annotations made
on 2D planes and the scene recording phase may
• Study 1 focus on the authoring of annotations. To do
need a few improvements. Nonetheless, the usability
so, the proposed system is compared with an alternative
of the system proposed by the authors seems overall
3D interaction technique based on multiple views.
already good in the current state. This approach thus
• Study 2 evaluates the effectiveness of annotations seems promising for on-the-fly remote assistance
for local users. It also considers the impact of erroneous with smartphone devices only. Besides, it introduces
registration (offsets between the annotation and the interesting and innovative aspect for remote assistance
targeted real object) on users. such as a cooperative setup phase and the usage of
local light fields in MR.
• Study 3 focus on the scene recording phase.
P43
Experimental results
Study 1 Initial results: Visual feedback required to support the canvas placement phase
After corresponding changes: Faster and less errors with the proposed system,
Also preferred by participants
Study 2 Positive qualitative feedback about usability

No observed impact from registration errors
Study 3 Overall, positive qualitative feedback about usability
Several improvement suggestions:
• Adding a live-stream view of the local worker activity for the remote user
• Facilitating the estimation of object scales for the remote user
• Facilitating the local light field recording by adding performance feedback and sharing the virtual sphere
with the remote helper.
Table 9: Overview of the main experimental results from [36].
Future work may focus on spatial and social presence of this kind of system, two aspects absent from the current
paper.
KEY TAKE-AWAYS
The notion of light fields refers to a set of images registered in a 3D environment. They provide a high visual
quality of the workspace, but may be difficult to capture and interact with.
• Mohr et al. proposed a MR system based on local light fields [36]. This system allows to share spatially
registered annotations after a cooperative setup phase.
• The proposed system seems promising for on-the-fly remote assistance in terms of usability.
P44
FaCIlItatING sPatIal
ReFeReNCING IN MR
Article : Johnson, J.G. et al. 2021. Do You Really 9.1 Letting the MR system handle the
Need to Know Where “That” Is? Enhancing Support referencing process
for Referencing in Collaborative Mixed Reality
Environments. Proceedings of the 2021 CHI Conference Johnson et al. start by reviewing previous studies and
on Human Factors in Computing Systems (Yokohama highlight the fact that remote helpers often prefer a
Japan, May 2021), 1–14. least-effort approach [24]. Instead of lengthy verbal
descriptions, we naturally tend to use short phrases
In the context of group work, referencing corresponds
complemented by deictic gestures. The good news is
to the ability of referring to an object in a way that is
that MR remote assistance systems allow precisely to
understood by others [24]. This ability is at the heart
do that: sharing a common virtual or mixed environment,
of many remote assistance tasks: remote helpers often
make visible activity cues and highlight user gestures
need to indicate target objects of interest while local
and body language…
users may reference objects to ask confirmations or
more precise details about them. By nature, referencing More precisely, the authors distinguish two methods to
is thus a spatial ability linked to pointing gestures. share contextual elements in MR:
For co-located users, being in the same environment 1. Passive sharing uses features already present in
significantly facilitate this process. However, in remote the environment. For instance, sharing a view of the
assistance scenarios, users need to share enough workspace allows users to visualize virtual and physical
context to make referencing possible. elements present within it. One way to achieve this is to
use 3D reconstructions of the local user’s environment.
However, it is still complex to overcome the technical
This penultimate chapter focus on spatial referencing in
limitations of current hardware and to achieve a high
MR by presenting a recent study conducted by Johnson et
level of performance. Using a 2D video feed like [1,
al. [24]. The authors investigate the impact of providing
21] is simpler, but also raises questions like viewpoint
spatial information and system-generated guidance to
independence (see Chapter 3) and depth perception
users. For local users, it could seem straightforward that
(Chapter 4).
having visual guidance in MR would help to perform
the task compared to having no guidance. However, 2. Explicit sharing is based on features added on purpose
what about remote helpers? Can a partially automated into the environment. It includes visual feedback, audio
guidance facilitate referencing on their side? Do they cues… Interestingly, such added features often rely on
need as much spatial information as possible (for passive ones (for instance, remote helpers may need
instance, being completely immersed in a shared VR to view the local environment to correctly register a
environment) or is a more abstract representation of the guidance feedback into it). Explicit features independent
environment enough? These are the kind of questions from any workspace view is quite rare.
explored in this paper.
P45
In this study, the authors explore this uncommon type of explicit features.
They consider the influence of an 2D external representation between
DID YOU KNOW? objects of interest and the local user. Figure 26 illustrates the setup of their
prototype and the corresponding GUI for the remote helper.
Other studies focused on
different aspects of referencing. In the prototype proposed by Johnson et al., the local user wears and AR
For instance, Müller et al. HMD while the remote helper uses a traditional desktop computer. A simple
investigated adding virtual MR guidance based on a virtual arow (an explicitly shared visual cue) can
objects in the workspace to draw the attention of this local user towards a targeted object. On the
obtain shared landmarks remote helper side, a live view of the local workspace is displayed on the
[39]. They observed that monitor of this computer.
this approach improved user To study the how to offload the referencing process to the system, the
experience. The landmarks authors selected two design factors:
were used as anchors and
allowed participants to 1. The presence/absence of the visual guidance system (virtual arrow
use less ambiguous spatial mentioned above).
expressions. 2. The presence/absence of spatial information about the workspace on the
Adding too much virtual guidance authoring interface (remote helper side).
objects into the workspace For this second design factor, Johnson et al. designed two GUI for the
can create visual occlusion, remote helper [24]. The Map (Figure 26c) is a graphical bird-eye view of
but temporarily displaying the local workspace. The real-time position and orientation of the local
them can facilitate the work of user is displayed while main objects are represented with icons at the
both users! corresponding position. The Map thus include spatial information about the
workspace.
Figure 26: Prototype setup. a) Local user workspace. b) Remote helper workspace. c) Spatial interface seen by the re-
mote helper. d) List interface seen by the remote helper. Images courtesy from the authors [24].
On the contrary, the List interface (Figure 26d) simply indicate the existing objects in a random order, without giving
any spatial information. Overall, the authors obtained 4 conditions, as illustrated in Figure 27.
In the two conditions where MR guidance is active, the remote helper can select an element from either the Map
or the List (displayed on a nearby tablet). This selection automatically triggers the corresponding visual guidance
feedback on the local user side. Therefore, the remote helper do not need to position the visual arrow in the scene:
the system handles most of the referencing process.
P46
With MR Guidance (Gyes) No MR Guidance (Gno)
Map
(Smap)
List
(Sno)
Figure 27: The four conditions evaluated by the Johnson et al. [24].
Image recreated from the original paper and courtesy from the authors.
9.2 Evaluating the prototype The authors analyzed many elements during the
experiment, including completion times for both tasks,
Johnson et al. conducted an experimental study with 40
the communication and referencing behavior of users,
participants playing either the role of the local user or
qualitative feedback about the two interfaces… Many
remote helper [24]. Each of the 20 resulting dyad had
experimental results are reported in detail in the paper.
to perform two types of task: a pair-matching task and
Table 10 presents an overview of the main observed
an object gathering task. In the pair matching tasks,
results.
participants had to find objects in the workspace with
the same picture on it and regroup them together. In Overall, experimental results suggest that offloading the
the object gathering task, a grid was present in the referencing process to the system improved task
workspace. The local user had to place objects into the performance and communication efficiency. It seems
appropriate grid cell but only the remote helper had the that it also reduced subjective cognitive load for users.
grid solution.
Category Observed result
Supports effective referencing, even without spatial information

MR guidance Increased confidence: less acknowledgments, better parallelization
Spatial information is useful for guiding users

Spatial information Slightly more verbal communication, but simplify the work of helpers
Can also divide helper’s attention between several interfaces
MR guidance makes spatial information superfluous

But both can be useful in unfamiliar environments
Spatial information allows helpers to prepare local users about the
incoming guidance.
Combining MR guidance
with spatial information
Table 10: Overview of the main experimental results.
P47
it is not possible to share a fully immersive environment,
these approaches still offer valuable benefits to achieve
Giving explicit task object information to helpers (List
a successful group work. Until 3D reconstructions
and map interfaces) and visual guidance cues to local
and more complex technologies are fully available to
users had non negligible benefits [24]: everyone, using simpler 2D interfaces and automated
• It facilitated referring to occulted or hidden objects visual cue positioning can be accessible approaches
with satisfying results.
• It helped to remove ambiguities
• Finally, it simplified communications by reducing the
need of acknowledgments
Interestingly, remote helpers perceived their partner as
more successful when using the Map interface, but local
users had the feeling of being better when the remote
helper used the list interface. One possible cause is
that it is not the more features the better. Having both KEY TAKE-AWAYS
MR guidance and spatial information (Guidance + Map Referencing objects is a key element of remote
condition) was perceived as more difficult for both group work.
users. In this case, helpers had a lot of information at
the disposal and often communicate too much of it to Facilitating the referencing process (here, partially
local users, which was confusing. Nonetheless, helpers automatizing it) can improve performance and
perceived the map as more sophisticated and useful, communication while reducing cognitive load.
even if it required more efforts than the List interface. Giving spatial information to helpers about the local
One of the key lesson from this work is that offloading the workspace is useful, but giving visual guidance
referencing process to the system and providing guidance cues to local users may be more beneficial overall.
are promising alternatives for remote assistance. When
P48
UsING VIRtUal RePlICas FOR

OBjeCt-POsItIONNING tasks
Article : Oda, O. et al. 2015. Virtual Replicas for Manufacturing is a common application domain for MR
Remote Assistance in Virtual and Augmented Reality. [43]. In this chapter, we present a study from Oda et
Proceedings of the 28th Annual ACM Symposium on al. about 3D object assembly tasks [42]. The authors
User Interface Software & Technology (New York, NY,
proposed two MR interaction techniques based on
USA, Nov. 2015), 405–415.
virtual replicas. Instead of a furniture, they focus on
Have you ever tried to assemble an IKEA furniture or an an industrial manufacturing scenario with aircraft
equivalent? If so, you probably realized the difficulty of combustion chambers (see Figure 28), but the initial
using a 2D paper manual to help somebody, somewhere, need is the same: using MR to avoid paper manual,
at a more or less distant point in time to perform a saving time and limiting errors.
sequence of tasks in 3D. Many instructions probably
consists in assembling pieces together or placing them
We selected this paper because beyond of the two
at an precise 3D position, which is far from being as
interaction techniques, the global approach about using
easy as the manual suggest. Only one error and you will
virtual replicas is promising and generalizable to other
end up with an unexpected result, such as the infamous
domains. For instance, it can be adapted to operating
last extra piece that probably should have been used
rooms where a remote surgeon could guides trainees
somewhere…
trying to position a prosthesis!
Figure 28: Overview of the interaction techniques by Oda et al. A virtual replica augmented with colored reference points
guide the local user in an AR assembly task. Images courtesy from the authors.
P49
10.1 Design of the two interaction techniques
Oda et al. start their work by recalling the difficulties DID YOU KNOW?
linked to remote guidance for 3D tasks [42]. One of the The concept of VooDoo dolls was first explored
main issue is that language describing spatial locations by Pierce et al. in 1999 [45]. The key idea is to
and actions is often ambiguous (where exactly is the create a doll, i.e. a virtual copy of an existing
there where you’re supposed to place that?). Gestures object in the environment. The virtual copy can
and virtual annotations are great means to complete then be manipulated using a bimanual interaction
oral instructions, but they can be limited for fully 3D technique, for instance to scale it up and down
tasks. For instance, Goto et al. proposed a system compared to the original object. It is also possible
based on pre-recorded AR videos of a remote helper to work with two dolls at the same time, where
performing a manual assembly task [20]. This approach manipulations on one copy can also affect the
may work well on flat surfaces with small objects, but other doll (hence the VooDoo denomination).
may suffer when applied to bigger 3D objects creating
visual occlusion.
A common method to explain a procedure to a co-located object B by executing the corresponding movement.
user is to point at relevant places and demonstrate the The remote helper thus creates a virtual replica of
concrete actions to be performed when orally describing object B and places it in its target position with respect
them. In their work, Oda et al. took inspiration from this to object A. However, it may be difficult to have a
intuitive method by allowing the expert to manipulate perfect alignment when using virtual objects as the
virtual copies of existing objects [42]. This principle was haptic feedback and physical collisions are absent. The
already used in some previous work like the VooDoo virtual replica of B can thus be placed “inside” object A,
dolls [45]. leading to unrealistic interactions, misunderstandings
and errors. To address this issue, the authors added the
The two interaction techniques proposed by Oda et al.
possibility for the remote helper to define constraints
allow to share the relevant objects of the workspace
when manipulating the virtual replica [42]. For instance,
between users. The remote helper only sees a few virtual
it is possible to temporarily reduce the DoF of the replica
objects (proxies), but cannot manipulate them directly
to translate or rotate it around one axis only.
(after all, it is not possible to alter the corresponding
physical objects remotely). However, the remote helper Moreover, the authors added a second support
can create a virtual replica of the object from its proxy, mechanism to help the local user to place correctly the
and manipulate this replica as necessary to illustrate physical object B. Similarly to the Point3D techniques,
how to achieve the procedure. On the other side, the contact points can indicate how to position the object.
local users sees the physical objects plus the virtual When the local user moves the physical object close, the
replica(s) in AR. This principle is illustrated in Figure 28a. virtual replica progressively fades out to reduce visual
clutter, but the contact points remain visible. The goal of
this visual feedback is to reduce the cognitive load of the
In the paper, the authors focus on tasks where an mental user due to mental rotations.
object A must be placed relatively to an object B. In the
Point3D technique, the remote helper can point at any
location on either a replica or a proxy to create a colored Overall, one of the benefits of working with relevant
reference point. Adding another reference point on the objects only (VR view for remote helper, see Figure 29a)
other virtual object allows to create a correspondence is that it requires a lighter network bandwidth. Besides,
between the two. The local user will then see guidance the remote helper does not depend from the local user’s
lines in AR between the physical object and the virtual viewpoint and can navigate at will in the scene.
replica, as shown in Figure 29a. The overall idea of this
technique is to make visible contact points to help the
placement of object B relatively to object A. Creating 10.2 Comparing virtual replicas to a 2D
three pairs of contact points allows to define a non- baseline technique
ambiguous 3D position (6 DoF). To do so, the remote Oda et al. conducted several experimental studies to
helper has an hand-held tracked device and can use ray- evaluate Point3D and Demo3D. In the initial pilot studies,
casting to position the points. In the Demo3D technique, the authors tested different approaches like adding an
the idea is to directly demonstrate the placement of the animation for Demo3D showing the movement between
P50
the physical object and the physical replica. The
remote helper could also switch between the VR view DID YOU KNOW?
presented above and an AR view from the local user There is an extensive literature about mental
perspective. Interestingly, both of these two approaches transformations and cognitive load [37].
were discarded afterwards. The authors observed that Experimental results suggest that performing
the animation did not really helped local users while physical rotations help to achieve mental
the AR view was not needed for remote helpers since rotations[60]. Moreover, as mentioned by Oda et
virtual proxies were already reflecting the position of al., the mirror neuron system may facilitate the
physical objects. In the complete study reported in the reproduction of a perceived action [50].
paper, Oda et al. compared Point3D and Demo3D to a
third one called Sketch2D. As the name suggests, the Do not underestimate the cognitive aspects
Sketch2D techniques was based on virtual annotations when interacting in MR!
in AR made created thanks to a multitouch tablet. The
remote helper can thus draw on proxies and replicas
and navigate into the virtual scene. This commonly
found technique was chosen as baseline in the study. that the bimanual interaction with Point3D was time-
consuming and required a significant effort. Besides,
the authors observed collisions between the hand-held
After a demonstration of the system, participants played tracked devices. They hypothesize that letting users
the role of either the local user or the remote helper and see directly their hands in AR rather than seeing only
had to perform or to guide the 3D positioning of a target a representation of the tracking device in VR could limit
object. Task completion time was the main quantitative this issue. In the Sketch2D, remote helpers spent less
measure. time manipulating replicas than with Point3D . However,
they dedicated more time to the GUI (for instance, to
change colors and validate steps) and to plan scene
Table 11 presents an overview of the experimental navigation.
results.
Supporting H1 and H3, Demo3D was the fastest and
preferred interaction technique. Participants felt that
is was a more direct, “human” approach compared to The concept behind Demo3D and Point3D is therefore
placing contact points or drawing annotations. It is worth promising overall. Of course, one of the initial assumption
to notice that overall, the difference of performance is that the system has the exact 3D models of each
between the three techniques was mostly due to the target object of the scene. This may achievable in some
remote helpers. For local users, task completion times industrial workspaces, but is definitely not possible in all
were similar between the three conditions. remote assistance scenarios. In some cases, real-time
analysis of the scene may be necessary, which raises
several questions about performance. It may also affect
Nonetheless, the authors did not expect Point3D to the creation of constraints in the Demo3D technique.
be as slow as Sketch2D. Further inspection revealed
Id Description Observed result
H1 Demo3D should be faster than Point3D Confirmed for both

(Single motion vs 6 contact point creation) type of users
H2 Point3D should be faster than Sketch2D No difference found

(Faster scene navigation, presence of guidance lines)
H3 Demo3D should be the preferred technique for both the local 7/11 remote helpers
and the remote user (Quicker, lower cognitive load) 8/11 local users
Table 11: Overview of main experimental results compared to initial hypothesis..
P51
Figure 29: The two interaction techniques proposed by Oda et al. [42]. A) The Point3D technique with contact points. b) The
Demo3D technique (remote helper view). c) The Demo3D technique for the local user, who is placing the physical object.
Images courtesy from the authors.
KEY TAKE-AWAYS
Letting a remote helper in VR demonstrating a procedure by manipulating a virtual replica gave promising
results. It is a fast and intuitive approach.
Adding movement constraints when positioning a virtual replica can help to position it precisely despite the
lack of haptic feedback or collision in VR.
Establishing visual links between a virtual copy and the original object can be a helpful feedback for assembly
tasks in AR.
P52
AboUt US
the COMPaNy
Immersion: Technologies that place people at the core of decision making.
Created in 1994 Immersion is European expert in virtual reality, augmented
reality and collaborative solutions in the fields of industry and research. Its
ambition? Develop industrial projects to achieve a human and technological
success.
As a pioneer, Immersion has built its know-how around customized virtual
and augmented reality solutions by integrating all the necessary skills -
human factor, design, development - to meet customer needs and industrial
processes.
It also develops its own innovative products including Shariiing: a
presentation and collaboration software. Its activity is at the crossroads
between immersive 3D, collaborative tools and decision support. Immersion
supports companies in their digital transformation and helps them to adapt
new working methods.
Some examples:
Industrial uses at Alstom:
https://www.youtube.com/watch?v=cHKn8km1o0Y
Renault Trucks on the way to Industry 4.0:
https://www.youtube.com/watch?v=RVafWc9Jqa8
Augmented Reality and Factory of the Future for Sunna Design:
https://www.youtube.com/watch?v=NIyyb5Jj7tk»
P53
the aUthOR
Fascinated by the potential of Augmented Reality, Charles Bailly did a PhD

thesis on Human-Computer Interaction between the Computer Science lab
of Grenoble (LIG, EHCI team) and the Aesculap company. During this CIFRE
thesis, he studied new MR interaction techniques adapted to augmented
surgery. He joined the R&D team of Immersion in April 2021 and now
works on several projects related to MR, including the European Evolved-5G
project.
This project has received funding from the European Union’s Horizon 2020
research and innovation program under grant agreement n°101016608
(Evolved5G project).
AcronYmS And defInItIonS
AR Augmented Reality. A technology where virtual objects augment the real 3D environment by being superimposed onto
it.
CAVE Cave Automatic Virtual Environment
DOF Degrees of Freedom
FOV Field of View
GUI Graphical User Interface
HCI Human-Computer Interaction
HMD Head-Mounted Display
MR Mixed Reality
VR Virtual Reality
P54
referenceS
[1] Andersen, D. et al. 2016. Virtual annotations of the surgical field through an augmented reality
transparent display. The Visual Computer. 32, 11 (Nov. 2016), 1481–1498. DOI:https://doi.org/10.1007/
s00371-015-1135-6.
[2] Azai, T. et al. 2017. Selection and Manipulation Methods for a Menu Widget on the Human Forearm.
Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems (
Denver Colorado USA, May 2017), 357–360.
[3] Azuma, R.T. 1997. A Survey of Augmented Reality. Presence: Teleoperators and Virtual Environments. 6,
4 (Aug. 1997), 355–385. DOI:https://doi.org/10.1162/pres.1997.6.4.355.
[4] Bai, H. et al. 2020. A User Study on Mixed Reality Remote Collaboration with Eye Gaze and Hand Gesture
Sharing. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu HI
USA, Apr. 2020), 1–13.
[5] Bailly, C. et al. 2020. Bring2Me: Bringing Virtual Widgets Back to the User’s Field of View in Mixed Reality.
Proceedings of the International Conference on Advanced Visual Interfaces (New York, NY, USA, Sep.
2020), 1–9.
[6] Billinghurst, M. et al. 2002. Experiments with Face-To-Face Collaborative AR Interfaces. Virtual Reality. 6,
3 (Oct. 2002), 107–121. DOI:https://doi.org/10.1007/s100550200012.
[7] Billinghurst, M. et al. 2008. Tangible augmented reality. ACM SIGGRAPH ASIA 2008 courses on -
SIGGRAPH Asia ’08 (Singapore, 2008), 1–10.
[8] Bolt, R.A. 1980. Put-that-there: Voice and gesture at the graphics interface. Proceedings of the 7th
annual conference on Computer graphics and interactive techniques (New York, NY, USA, Jul. 1980),
262–270.
[9] Bouzbib, E. et al. 2021. “Can I Touch This?”: Survey of Virtual Reality Interactions via Haptic Solutions.
(2021), 16.
[10] Bown, J. et al. 2017. Looking for the Ultimate Display. Boundaries of Self and Reality Online. Elsevier.
239–259.
[11] Çamcı, A. et al. 2017. INVISO: A Cross-platform User Interface for Creating Virtual Sonic Environments.
Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology (New York,
NY, USA, Oct. 2017), 507–518.
[12] Cao, Y. et al. 2020. An Exploratory Study of Augmented Reality Presence for Tutoring Machine Tasks.
Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu HI USA,
Apr. 2020), 1–13.
[13] Cassell, J. 1987. On control, certitude, and the paranoia of surgeons. Culture, Medicine and Psychiatry. 11,
2 (Jun. 1987), 229–249. DOI:https://doi.org/10.1007/BF00122565.
[14] Danyluk, K. et al. 2021. A Design Space Exploration of Worlds in Miniature. Proceedings of the 2021 CHI
Conference on Human Factors in Computing Systems (New York, NY, USA, May 2021), 1–15.
[15] Dillenbourg, P. et al. The evolution of research on collaborative learning. 27.
[16] Dixon, B.J. et al. 2013. Surgeons blinded by enhanced navigation: the effect of augmented reality on
P55
attention. Surgical Endoscopy. 27, 2 (Feb. 2013), 454–461. DOI:https://doi.org/10.1007/s00464-012-2457-
3.
[17] Eckhoff, D. et al. 2018. TutAR: augmented reality tutorials for hands-only procedures. Proceedings of the
16th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in
Industry (Tokyo Japan, Dec. 2018), 1–3.
[18] Gasques Rodrigues, D. et al. 2017. Exploring Mixed Reality in Specialized Surgical Environments.
Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems
(Denver Colorado USA, May 2017), 2591–2598.
[19] Gauglitz, S. et al. 2012. Integrating the physical environment into mobile remote collaboration.
Proceedings of the 14th international conference on Human-computer interaction with mobile devices
and services - MobileHCI ’12 (San Francisco, California, USA, 2012), 241.
[20] Goto, M. et al. 2010. Task support system by displaying instructional video onto AR workspace. 2010 IEEE
International Symposium on Mixed and Augmented Reality (Seoul, Korea (South), Oct. 2010), 83–90.
[21] Gurevich, P. et al. 2015. Design and Implementation of TeleAdvisor: a Projection-Based Augmented
Reality System for Remote Collaboration. Computer Supported Cooperative Work (CSCW). 24, 6 (Dec.
2015), 527–562. DOI:https://doi.org/10.1007/s10606-015-9232-7.
[22] Herdel, V. et al. 2021. Drone in Love: Emotional Perception of Facial Expressions on Flying Robots.
Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama Japan,
May 2021), 1–20.
[23] Herrmann, T. et al. 1996. Requirements for a Human Centred Design of Groupware. Human Factors in
Information Technology. 12, (1996), 77–99.
[24] Johnson, J.G. et al. 2021. Do You Really Need to Know Where “That” Is? Enhancing Support for
Referencing in Collaborative Mixed Reality Environments. Proceedings of the 2021 CHI Conference on
Human Factors in Computing Systems (Yokohama Japan, May 2021), 1–14.
[25] Jones, D. et al. 2020. Characterising the Digital Twin: A systematic literature review. CIRP Journal of
Manufacturing Science and Technology. 29, (May 2020), 36–52. DOI:https://doi.org/10.1016/j.
cirpj.2020.02.002.
[26] Kaufmann, B. and Ahlström, D. 2012. Revisiting peephole pointing: a study of target acquisition with a
handheld projector. Proceedings of the 14th international conference on Human-computer interaction
with mobile devices and services - MobileHCI ’12 (San Francisco, California, USA, 2012), 211.
[27] Kim, K. et al. 2017. Exploring the effects of observed physicality conflicts on real-virtual human interaction
in augmented reality. Proceedings of the 23rd ACM Symposium on Virtual Reality Software and
Technology (New York, NY, USA, Nov. 2017), 1–7.
[28] Kirk, D. et al. 2007. Turn it this way: grounding collaborative action with remote gestures. Proceedings of
the SIGCHI Conference on Human Factors in Computing Systems (New York, NY, USA, Apr. 2007), 1039–
1048.
[29] Kishino, F. and Milgram, P. A taxonomy of mixed reality displays. IEICE TRANSACTIONS on Information
and Systems. 77, 12, 16.
[30] Kishishita, N. et al. 2014. Analysing the effects of a wide field of view augmented reality display on search
performance in divided attention tasks. 2014 IEEE International Symposium on Mixed and Augmented
Reality (ISMAR) (Munich, Germany, Sep. 2014), 177–186.
[31] Kraut, R.E. et al. 2003. Visual Information as a Conversational Resource in Collaborative Physical Tasks.
Human–Computer Interaction. 18, 1–2 (Jun. 2003), 13–49. DOI:https://doi.org/10.1207/
S15327051HCI1812_2.
P56
[32] Lindlbauer, D. and Wilson, A.D. 2018. Remixed Reality: Manipulating Space and Time in Augmented
Reality. (2018), 13.
[33] Louis, T. et al. 2019. Is it Real? Measuring the Effect of Resolution, Latency, Frame rate and Jitter on the
Presence of Virtual Entities. Proceedings of the 2019 ACM International Conference on Interactive
Surfaces and Spaces (New York, NY, USA, Nov. 2019), 5–16.
[34] Metzger, P.J. 1993. Adding reality to the virtual. Proceedings of IEEE Virtual Reality Annual International
Symposium (Sep. 1993), 7–13.
[35] Microsoft HoloLens | Technologie de réalité mixte pour les entreprises: https://www.microsoft.com/fr-fr/
hololens. Accessed: 2021-06-18.
[36] Mohr, P. et al. 2020. Mixed Reality Light Fields for Interactive Remote Assistance. Proceedings of the 2020
CHI Conference on Human Factors in Computing Systems (Honolulu HI USA, Apr. 2020), 1–12.
[37] Moreau, D. 2012. The role of motor processes in three-dimensional mental rotation: Shaping cognitive
processing via sensorimotor experience. Learning and Individual Differences. 22, 3 (Jun. 2012), 354–359.
DOI:https://doi.org/10.1016/j.lindif.2012.02.003.
[38] Most advanced virtual and mixed reality headsets for professionals - Varjo: https://varjo.com/. Accessed:
2021-06-28.
[39] Müller, J. et al. 2017. Remote Collaboration With Mixed Reality Displays: How Shared Virtual Landmarks
Facilitate Spatial Referencing. Proceedings of the 2017 CHI Conference on Human Factors in Computing
Systems (Denver Colorado USA, May 2017), 6481–6486.
[40] Niijima, A. and Ogawa, T. 2016. Study on Control Method of Virtual Food Texture by Electrical Muscle
Stimulation. Proceedings of the 29th Annual Symposium on User Interface Software and Technology
(New York, NY, USA, Oct. 2016), 199–200.
[41] Nowak, K. 2001. Defining and Differentiating Copresence, Social Presence and Presence as
Transportation. Presence 2001 Conference. (2001), 24.
[42] Oda, O. et al. 2015. Virtual Replicas for Remote Assistance in Virtual and Augmented Reality. Proceedings
of the 28th Annual ACM Symposium on User Interface Software & Technology (New York, NY, USA, Nov.
2015), 405–415.
[43] Ong, S.K. et al. 2008. Augmented reality applications in manufacturing: a survey. International Journal of
Production Research. 46, 10 (May 2008), 2707–2742. DOI:https://doi.org/10.1080/00207540601064773.
[44] Perea, P. et al. 2019. Spotlight on Off-Screen Points of Interest in Handheld Augmented Reality: Halo-
based techniques. Proceedings of the 2019 ACM International Conference on Interactive Surfaces and
Spaces (Daejeon Republic of Korea, Nov. 2019), 43–54.
[45] Pierce, J.S. et al. 1999. Voodoo dolls: seamless interaction at multiple scales in virtual environments.
Proceedings of the 1999 symposium on Interactive 3D graphics - SI3D ’99 (Atlanta, Georgia, United
States, 1999), 141–145.
[46] Piumsomboon, T. et al. 2018. Mini-Me: An Adaptive Avatar for Mixed Reality Remote Collaboration.
Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC Canada,
Apr. 2018), 1–13.
[47] Ramic-Brkic, B. and Chalmers, A. 2010. Virtual smell: authentic smell diffusion in virtual environments.
Proceedings of the 7th International Conference on Computer Graphics, Virtual Reality, Visualisation and
Interaction in Africa (New York, NY, USA, Jun. 2010), 45–52.
[48] Ranasinghe, N. and Do, E.Y.-L. 2016. Virtual Sweet: Simulating Sweet Sensation Using Thermal
Stimulation on the Tip of the Tongue. Proceedings of the 29th Annual Symposium on User Interface
Software and Technology (New York, NY, USA, Oct. 2016), 127–128.
P57
[49] Rhienmora, P. et al. 2010. Augmented reality haptics system for dental surgical skills training. Proceedings
of the 17th ACM Symposium on Virtual Reality Software and Technology (New York, NY, USA, Nov. 2010),
97–98.
[50] Rizzolatti, G. and Craighero, L. 2004. THE MIRROR-NEURON SYSTEM. Annual Review of Neuroscience.
27, 1 (Jul. 2004), 169–192. DOI:https://doi.org/10.1146/annurev.neuro.27.070203.144230.
[51] Saad, W. et al. 2019. A Vision of 6G Wireless Systems: Applications, Trends, Technologies, and Open
Research Problems. arXiv:1902.10265 [cs, math]. (Jul. 2019).
[52] Salvador, T. et al. 1996. The Denver model for groupware design. ACM SIGCHI Bulletin. 28, 1 (Jan. 1996),
52–58. DOI:https://doi.org/10.1145/249170.249185.
[53] Speicher, M. et al. 2019. What is Mixed Reality? Proceedings of the 2019 CHI Conference on Human
Factors in Computing Systems (Glasgow Scotland Uk, May 2019), 1–15.
[54] Stewart, J. and Billinghurst, M. 2016. A wearable navigation display can improve attentiveness to the
surgical field. International Journal of Computer Assisted Radiology and Surgery. 11, 6 (Jun. 2016),
1193–1200. DOI:https://doi.org/10.1007/s11548-016-1372-9.
[55] Sukan, M. et al. 2012. Quick viewpoint switching for manipulating virtual objects in hand-held augmented
reality using stored snapshots. 2012 IEEE International Symposium on Mixed and Augmented Reality
(ISMAR) (Atlanta, GA, USA, Nov. 2012), 217–226.
[56] Sutherland, I. 1965. The Ultimate Display. Proceedings of the IFIPS Congress 65(2):506-508. New York:
IFIP (1965).
[57] Teo, T. et al. 2019. Investigating the use of Different Visual Cues to Improve Social Presence within a 360
Mixed Reality Remote Collaboration*. The 17th International Conference on Virtual-Reality Continuum
and its Applications in Industry (Brisbane QLD Australia, Nov. 2019), 1–9.
[58] Vorderer, P. et al. MEC Spatial Presence Questionnaire (MEC-SPQ). 15.
[59] Wang Baldonado, M.Q. et al. 2000. Guidelines for using multiple views in information visualization.
Proceedings of the working conference on Advanced visual interfaces - AVI ’00 (Palermo, Italy, 2000),
110–119.
[60] Wexler, M. et al. 1998. Motor processes in mental rotation. Cognition. 68, 1 (Aug. 1998), 77–94.
DOI:https://doi.org/10.1016/S0010-0277(98)00032-8.
[61] Yannier, N. et al. 2015. Learning from Mixed-Reality Games: Is Shaking a Tablet as Effective as Physical
Observation? Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems
(New York, NY, USA, Apr. 2015), 1045–1054.
[62] Zhang, J. et al. 2020. A Literature Review of the Research on the Uncanny Valley. Cross-Cultural Design.
User Experience of Products, Services, and Intelligent Environments. P.-L.P. Rau, ed. Springer International
Publishing. 255–268.
P58

White Paper: Remote Assistance in Mixed Reality

Uploaded by

Document Information

Original Title

Copyright

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

White Paper: Remote Assistance in Mixed Reality

Uploaded by

Copyright:

whIte pAper

Introduction.............................................................................................. 4 3. TeleAdvisor: An example of

2.1 MR beyond visual perception .................................................. 11 4. Remote assistance

8.3 Evaluating the usability of the system ................................... 41

10.2 Comparing virtual replicas

Acronyms and definitions.................................................................. 54

Working with others has always raised multiple ques-

Answering such questions is already complex when

Remote assistance scenarios imply two main characte-

Figure 1 : Example of remote assistance scenario in Mixed Reality.

reAL mixed VIrtUAL

Figure 2 : The Reality-Virtuality Continuum proposed by Milgram and Kishino.

a) Augmented Reality a) Augmented Virtuality, c) Virtual Reality

Table 1: The 6 classes of MR display identified by Milgram and Kishino.

DID YOU KNOW?

Where/What Where + What

Figure 4 : Extent of World Knowledge axis in the taxonomy of MR displayed by

monoScopic coLor StereroScopicS hd 3d

monitor - LArge Sceen hdmS

DID YOU KNOW?

2.1 MR beyond visual perception

Table 2: The 6 co-existing definitions of MR identified by Speicher et al

Name EarthShake MR game

teleadVIsOR: aN exaMPle OF ReMOte

DID YOU KNOW? 3.2 AR for remote assistance

Benefits Mobility Hands free No jittering

Table 3: Comparison of the classical benefits and limitations of the

Figure 11: The Graphical User Interface of TeleAdvisor for the

Contraint Name Description

OR environment No contact with non-sterile object

The main constraint is the asepsis: every object in

Table 5: Overview of experimental results from the three evaluations

Observing, making public or filtering

Having the same unique view (“What

And many others…

VISUAL CUES FOR SOCIAL

DID YOU KNOW?

Figure 14: Comparison of the notions of presence and telepresence.

Remote User Local User

Figure 15: Overview of the 360 panorama MR system used in [57].

aVataR aNd telePReseNCe

(1) - a Video (3) - c Full-body+AR

(2) - b Non-Avatar-AR (4) - d Half-body+AR

(1) (2) (3)

Overall, it thus seems that the type of task should be

Type of task Representation to consider Reasons

Local Only local AR content Avatar provide limited to no benefit

Body-coordinated Full-body avatar Increased social presence

Spatial Half-body avatar Limited visual occlusion

Table 7: Visual representations to consider depending of the nature of the task.

MINI-Me: addING a MINIatURe

7.2 Experimental results for cooperative and collaborative tasks

Experimental results suggest that overall, the Mini-

Criteria Cooperative task Collaborative task

Level of task focus No observed effect Higher with Mini-Me…

Task completion time Lower with Mini-Me No time constraint

UsING lIGht FIelds FOR

DID YOU KNOW?

a/ Local user b/ Remote user c/Local user

Study 2 Positive qualitative feedback about usability

Table 9: Overview of the main experimental results from [36].

Category Observed result

Supports effective referencing, even without spatial information

Spatial information is useful for guiding users

MR guidance makes spatial information superﬂuous

Table 10: Overview of the main experimental results.