You are on page 1of 19

Article

HyMoTrack: A Mobile AR Navigation System for


Complex Indoor Environments
Georg Gerstweiler *,† , Emanuel Vonach † and Hannes Kaufmann †
Received: 30 September 2015; Accepted: 17 December 2015; Published: 24 December 2015
Academic Editor: Kourosh Khoshelham

Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstrasse
9-11-188/2, Vienna 1040, Austria; vonach@ims.tuwien.ac.at (E.V.); kaufmann@ims.tuwien.ac.at (H.K.)
* Correspondence: gerstweiler@ims.tuwien.ac.at; Tel./Fax: +43-158-8011-8823
† These authors contributed equally to this work.

Abstract: Navigating in unknown big indoor environments with static 2D maps is a challenge,
especially when time is a critical factor. In order to provide a mobile assistant, capable of supporting
people while navigating in indoor locations, an accurate and reliable localization system is required
in almost every corner of the building. We present a solution to this problem through a hybrid
tracking system specifically designed for complex indoor spaces, which runs on mobile devices
like smartphones or tablets. The developed algorithm only uses the available sensors built into
standard mobile devices, especially the inertial sensors and the RGB camera. The combination of
multiple optical tracking technologies, such as 2D natural features and features of more complex
three-dimensional structures guarantees the robustness of the system. All processing is done locally
and no network connection is needed. State-of-the-art indoor tracking approaches use mainly
radio-frequency signals like Wi-Fi or Bluetooth for localizing a user. In contrast to these approaches,
the main advantage of the developed system is the capability of delivering a continuous 3D position
and orientation of the mobile device with centimeter accuracy. This makes it usable for localization
and 3D augmentation purposes, e.g. navigation tasks or location-based information visualization.

Keywords: indoor tracking; navigation; localization; augmented reality; mobile

1. Introduction
Especially in big indoor environments such as airports, shopping malls, hospitals or exhibition
halls, people are usually not familiar with the floor plan of the building and find it therefore difficult
to get to a desired destination in time. For a mobile assistant, capable of supporting people while
navigating in indoor locations, an accurate and reliable position of the user is required in almost every
corner of the building. To provide a natural navigation experience a virtual navigation application,
which is augmenting the real world video stream with location-based visual information would be
very helpful. For displaying location-based information and the correct overlay on the mobile screen
an exact location (position and orientation) must be computed in real-time on a mobile device. This
enables the system to support position-dependent augmented reality (AR) applications not only for
navigation but also for games, 3D inspection or education.
Global Positioning System (GPS) is frequently used in outdoor scenarios, but does not work
reliably indoors. Competing state-of-the-art indoor localization approaches use so called beacons,
which are devices distributed all over the building that broadcast a unique identifier. This technology,
usually based on Bluetooth or other radio frequency (RF) technologies, only operates at a short
distance and needs a complex infrastructure. Localization via WiFi requires a dense network of access
points, which leads to high maintenance costs, but only provides positional accuracy of 1–3 m with
only three degrees of freedom.

Sensors 2016, 16, 17; doi:10.3390/s16010017 www.mdpi.com/journal/sensors


Sensors 2016, 16, 17 2 of 19

Indoor positioning on mobile devices is an actively researched topic [1,2]. Existing algorithms
and technologies in this field do not provide the required centimeter tracking accuracy of the mobile
device. The aforementioned indoor use cases pose specific requirements. Big closed tracking
volumes have changing lightning situations, daylight and artificial light, seasonal decorations (e.g., at
Christmas) and occlusions caused by moving objects or persons. A solution handling these challenges
has not been published yet.
In order to precisely track the user for AR tasks only, computer vision methods (optical i.e.,
visual feature tracking) in combination with other localization methods (usually inertial tracking)
can be used. This has the advantage that no additional infrastructure is required due to the use of
existing landmarks, thereby keeping the installation costs low. However, there are basic and applied
research questions that hinder the use of existing solutions in the real world, e.g., how to increase
robustness of existing algorithms for large scale tracking, how to reliably create a feature map of
the whole indoor environment in a reasonable time, since especially public places like airports that
cannot be shut down during standard operation times.
This paper presents the design and the development of a full mapping, tracking and
visualization pipeline for large indoor environments. For this purpose a novel method of combining
two different visual feature tracking algorithms was developed. In a first stage clearly recognizable
area like advertisements, company logos or posters are used as visual descriptors for detecting the
initial position and for continuous tracking. The second level of the hybrid solution is using an
adapted visual simultaneous localization and mapping (SLAM) algorithm to achieve continuous
tracking in almost all areas of the environment if none of the afore mentioned descriptors is available.
As a fallback solution the data of the inertial sensor is used to bridge short distances where visual
tracking does not work. To achieve this goal new concepts for all the individual components
mentioned before had to be designed.
Mapping

‚ Capturing with an advanced computer vision (CV) camera system considering that the captured
3D maps will be used with a different camera setup on a mobile device;
‚ Fusion of 2D markers and 3D features under special conditions with respect to different scale
values and storing the data into a global map structure for localization purposes;

Tracking

‚ Handling of multiple visual tracking solutions simultaneously on a mobile device, but still
keeping computational effort low;
‚ Design of an intelligent fallback heuristic, which decides during tracking for the best alternative;
‚ Design of a predictive reinitialization procedure;

User Interaction

‚ Interface for exact placement of 3D location based content;


‚ Development of an AR navigation scenario;
‚ Testing the navigation scenario under real world conditions in a subsection of an airport.

The main contribution of the work at hand is a complete system, which enables user studies
about user behavior, user interaction possibilities with AR content and other scientific research
questions targeting large indoor AR enabled environments. Within this paper, we have developed
a first scenario, which shows the potential for an end user AR application outside a desktop
environment, the usual environment for visual SLAM approaches.
Section 2 of this article describes the differences to related work, followed by a description of
the methodology in Section 3. Section 4 gives an overview of all developed software and hardware
components needed for preparing a large indoor environment for an AR indoor navigation scenario.
Technical details about the mapping and tracking process are described afterwards. We also present
Sensors 2016, 16, 17 3 of 19

an evaluation scenario, which was specially developed for the Vienna International Airport in order
to prove the applicability within a real world scenario. Finally, we summarize all collected results
and experiences from all necessary steps containing capturing, mapping fusion and tracking.

2. Related Work
Indoor positioning systems capable of three degree of freedom (DOF) and six DOF pose tracking
is an actively researched topic in different scientific communities, e.g., robotics, autonomous flight,
automobiles, navigation, etc. A common research question is the estimation of an object’s or a user’s
position within a large environment, with varying requirements on the tracking quality depending on
the application domain. The proposed approach is based on two different visual tracking algorithms.
The following sections will first describe approaches based on point features such as SLAM with
different base technologies. In addition to that, related work dealing with 2D markers in big
environments is discussed. In the final section of this chapter different approaches are mentioned,
which cannot be used for indoor positioning on their own, but can extend the capabilities of a
hybrid tracking system. Tracking technologies, designed for large areas like GPS, magnetic tracking
or acoustic tracking are not included in the discussion, because of their low inherent accuracy and
possible interferences, especially in big indoor environments.

2.1. Simultaneous Localization and Mapping


A large number of six DOF pose tracking approaches are based on the simultaneous localization
and mapping (SLAM) concept. Originally indoor localization with SLAM emerged out of the robotics
community [3] with the aim of driving robots autonomously through unknown environments
without colliding with objects. With robots different high quality hardware such as laser scanners [3]
or high performance cameras can be used. In addition to that high performance computers may be
used which surpass the capacities of mobile devices considerably. For SLAM, different sensors and
different kinds of signals e.g. RF signals and visual based methods (laser scanners, point features)
may be used. The SLAM mapping process tries to extract spatial information (signal strength, 3D
points . . . ) of the real world environment in order to store it in a global reference map while at the
same time keeping track of the agent’s position. When using smartphones as a base platform [1] low
computational power, limited battery life and low quality sensors pose additional requirements on a
SLAM algorithm. There are already different approaches and a broad number of concepts based on
different hardware such as Bluetooth, WiFi or visual features. All of these technologies can be used
for implementing a SLAM system for localization.

2.1.1. WiFi SLAM


WiFi SLAM is frequently used for localization since many buildings are already equipped with
numerous access points. For the mapping process the mobile device has to be moved through the
whole environment in order to periodically measure the signal strength of all surrounding access
points. The accuracy that can be achieved with WiFi SLAM is below one meter [4] in the best case.
Some researchers have combined WiFi with Bluetooth beacons [5] in order to improve the accuracy in
interesting areas. In any of these cases, the result of the tracking is a 2D position in a 2D map. Another
topic, which is usually not discussed in these papers, is the update rate of the tracked position. Mobile
hardware is usually restricted to scan the signal strength just a few times per second, which results
in a very low refresh rate for new measurements. In addition to that, signals tend to be extremely
noisy and signal strength highly depends on surrounding building structures and materials. For these
reasons, WiFi SLAM on its own is not suitable for accurate interactive augmented reality applications.
The main drawbacks of the systems are poor tracking quality, reliability and the missing degrees
of freedom.
Sensors 2016, 16, 17 4 of 19

2.1.2. Vision-Based SLAM


In order to reach the desired accuracy for AR applications, it is necessary to utilize vision based
methods, which can provide real six degrees of freedom poses of a user’s device. Vision-based SLAM
algorithms relying on point features are able to estimate a position as well as an orientation of a
stereo setup. Spatial information in terms of point features like the Scale Invariant Feature Transform
(SIFT) [6], the Features from Accelerated Segment Test (FAST) [7], the Oriented FAST and Rotated
BRIEF (ORB) [8], etc. is extracted from sequent frames in order to simulate stereo vision. The baseline
between two frames has to be high enough in order to estimate the distance of the features [9]. This
data is stored in a global map representation. Over time, a certain area can be mapped and used
for tracking.
In recent years, algorithms were developed by utilizing a monocular camera in desktop
environments for localization and displaying augmented reality content. Parallel Tracking and
Mapping (PTAM) [10,11] by Klein et al. allows one to map a complex 3D environment while
at the same time estimating the camera pose. The limitation to small desktop environments is
similar to the issue with other algorithms like the extended Kalman filter SLAM (EKF-SLAM) [12]
or Fast-SLAM [13], mainly due to missing optimization algorithms dealing with loop closure, map
size optimization and re-initialization. Furthermore, desktop grade computing power is required.
These algorithms mostly suffer from a small drift over time and usually do not implement a method
of detecting already visited places (loop closure), which often results in adding the same features
multiple times. Other successful implementations were done by [14,15]. Since these algorithms were
developed for monocular setups, several publications already tried to adapt them to mobile devices.
Some approaches use panoramas [16,17] for matching the current view with the previously saved
map, which works very well but only in the vicinity of the mapped panorama. Our approach aims
for a continuous tracking over a distance of 100 meters and more.
The field-of-view of smartphones is usually very narrow around (60˝ –70˝ ) and is therefore not
ideal for narrow indoor environments. In addition, when dealing with wide spaces the reference
maps are getting very large and acquiring a first position without prior knowledge can take several
seconds by brute force search through a recorded map. To release the mobile device from intensive
processing, Ventura et al. [18] published a client-server structure. The server is managing the global
reference map, whereas the client unit is just building a local map, which is regularly exchanged for
global localization. The advantage of this approach is the possibility of handling big environments,
but a complex client-server structure including a continuous network connection is needed.

2.2. Marker-Based Tracking


Another possibility of creating a visual indoor localization system was developed by Wagner
et al. [19]. They presented a solution based on fiducial markers, which were manually placed in
an indoor environment. They achieved an overlay registration with augmented reality elements by
using an early PDA like device with ARToolKit [20]. The advantage of using well-placed markers
is unique identification within a very short time. However, equipping a public environment with
fidicual markers is not reasonable, but using existing 2D textured surfaces with the approach of [21]
could enhance the capabilities of a standard SLAM algorithm.

2.3. Complementary Tracking Technologies


Other technologies for extending visual tracking solutions can be integrated into a hybrid
tracking system to either increase robustness or to serve as a backup solution. Beacons are small
devices, which are distributed in the tracking area with high density. These beacons can use different
signals in order to be detected by mobile devices. Usually Bluetooth [5], WiFi or even a combination
of multiple signals [22] is used in order to estimate the approximate position of the user’s device. The
accuracy depends on the amount of devices spread over the environment. As in other approaches, an
Sensors 2016, 16, 17 5 of 19

accurate
Sensors map
2016, of
16, 17the beacons has to be generated. Usually algorithms estimating the position use the
concept of Received Signal Strength Indicator (RSSI) [23] which delivers a more accurate position and
estimate the estimate
does not just distance the
to the next to
distance beacon. It is
the next not feasible
beacon. It is nottofeasible
use this
to kind of kind
use this technology within
of technology
environments like airports,
within environments becausebecause
like airports, of the high hardware
of the and maintenance
high hardware costs. Incosts.
and maintenance addition to that,
In addition
it
to is notit possible
that, to detect
is not possible the orientation
to detect of the
the orientation device,
of the device,which
whichisisessential
essential for
for augmenting
augmenting
location-based information.
location-based information.

3. Experimental
ExperimentalSection
Section

3.1. Methodology
3.1. Methodology
Our Hybrid
Our Hybrid Motion
Motion Tracking
Tracking (HyMoTrack)
(HyMoTrack) framework
framework contains
contains multiple individual modules
multiple individual modules
with novel
with novel concepts.
concepts. It It is
is capable
capable ofof mapping
mapping aa big
big indoor
indoor environment
environment to to place
place location-based
location-based AR AR
content. Using a pre-recorded map, a mobile device is able to estimate a six DOF
content. Using a pre-recorded map, a mobile device is able to estimate a six DOF pose in real time pose in real time
while at
while at the
the same
same time
time showing
showing 3D3D content
content to
to the
the user.
user. InIn order
order to
to lower
lower computational
computational power power
requirements while
requirements while tracking
tracking and
and totofocus
focusononindividual
individualsteps
stepsofofthe
themapping
mapping process,
process, thethe concept
concept is
is divided into two parts. The first task is the mapping of the environment and the
divided into two parts. The first task is the mapping of the environment and the second task is about second task is
about performing
performing a real atime
real tracking
time tracking approach.
approach. The single
The single stepssteps of mapping
of mapping andand tracking
tracking areare listed
listed in
in Figure 1. They are briefly described in Sections 3.1.1 and 3.1.2. Details are provided
Figure 1. They are briefly described in Sections 3.1.1 and 3.1.2. Details are provided in Section 3.2. in Section 3.2.

Offline Map Generation


Global Environment Map

Online Mobile Tracking

Feature Acquisation Image Capture


Feature Selection Feature Matching
Registration IMU Tracking
Fusion Fallback Heuristic
AR Content Registration Content Visualization

Figure
Figure 1.
1. Developed
Developed packages
packages for
for the
the HyMoTrack framework.
HyMoTrack framework.

The concept of the HyMoTrack framework is based on a hybrid tracking solution, which tries to
The concept of the HyMoTrack framework is based on a hybrid tracking solution, which
exploit the capabilities of two different visual tracking technologies. Our approach targets locations
tries to exploit the capabilities of two different visual tracking technologies. Our approach targets
such as airports, shopping malls, hospitals, etc., which typically have controlled lighting situations
locations such as airports, shopping malls, hospitals, etc., which typically have controlled lighting
during day and night times. Therefore a visual SLAM algorithm is very well suited for this
situations during day and night times. Therefore a visual SLAM algorithm is very well suited for
application scenario. SLAM is used as the main tracking solution. We used the implementation by
this application scenario. SLAM is used as the main tracking solution. We used the implementation
Metaio (http://www.metaio.com/), which is based on point features. It is capable of tracking
by Metaio (http://www.metaio.com/), which is based on point features. It is capable of tracking
constructional characteristics such as ledges, pillars or edges. The SLAM implementation of the
constructional characteristics such as ledges, pillars or edges. The SLAM implementation of the
Metaio SDK is similar to PTAM and runs on mobile devices. Since it is designed for small
Metaio SDK is similar to PTAM and runs on mobile devices. Since it is designed for small
environments we implemented a novel algorithm for handling multiple small reference maps in
environments we implemented a novel algorithm for handling multiple small reference maps in order
order to extend the capabilities of the restricted Metaio framework to be used in big environments.
to extend the capabilities of the restricted Metaio framework to be used in big environments.
The indoor environments, which the presented framework is designed for, are already fitted
The indoor environments, which the presented framework is designed for, are already fitted
with visual content like signs or company brands. These characteristics rarely change over time due
with visual content like signs or company brands. These characteristics rarely change over time due
to temporal decorations and can therefore be seen as permanent features in a constantly changing
to temporal decorations and can therefore be seen as permanent features in a constantly changing
environment. Therefore, we use 2D feature markers as a secondary tracking solution to ensure that
environment. Therefore, we use 2D feature markers as a secondary tracking solution to ensure that
the most attractive areas to humans like signs and posters always result in a valid tracking position.
the most attractive areas to humans like signs and posters always result in a valid tracking position.
For identifying and tracking these natural feature markers [21], the Metaio SDK is also used. By using
For identifying and tracking these natural feature markers [21], the Metaio SDK is also used. By using
only visual features for localization purposes, the presented concept does not require any additional
only visual features for localization purposes, the presented concept does not require any additional
hardware within the environment such as beacons, manually placed visual markers or any kind of
hardware within the environment such as beacons, manually placed visual markers or any kind of
Wi-Fi infrastructure.
Wi-Fi infrastructure.
3.1.1. Mapping and Content Registration
In order to build a unique fingerprint of the environment multiple steps have to be performed
(Figure 1, left part). The advantage of splitting the mapping of the environment from tracking is that

5/19
Sensors 2016, 16, 17 6 of 19

3.1.1. Mapping and Content Registration


In order to build a unique fingerprint of the environment multiple steps have to be performed
(Figure 1, left part). The advantage of splitting the mapping of the environment from tracking is that
a high performance notebook and professional computer vision camera can be used for mapping,
which also results in a faster mapping process. In a first step, a map generation process has to
be performed. This contains the feature acquisition, by capturing spatial information with the
SLAM algorithm and distinctive 2D feature markers. This data is registered and fused into a global
coordinate system. In addition to that, a 2D map of the environment is integrated into this reference
map. This makes it possible to exactly place location based AR content within the environment. All
the information, which is processed in these mapping and capturing steps is saved and transferred to
a mobile device running the proposed tracking algorithm.

3.1.2. Mobile Tracking Concept


Unlike the mapping process, tracking was developed for an Android smartphone, whereby all
processing is done locally and no network connection is needed. The generated information of the
environment is loaded on the mobile device. The tracking concept is responsible for managing the
camera stream and feature matching with both visual tracking solutions. In order to achieve a robust
estimation of the position and orientation, we developed a fallback heuristic. It is responsible of
switching between the different tracking solutions. Therefore, it is possible to take the most reliable
method at a certain situation and thereby minimizing the risk of losing the tracking while at the same
time keeping the computational effort low.
If all tracking approaches fail due to any reason, we utilize the inertial sensors of the mobile
device. This makes it possible to bypass short losses in the visual tracking system, thereby providing
a seamless tracking experience. In addition to the tracking solution, we also present a concept for
guiding people in indoor environments with AR content. The user is able to localize herself/himself
by using an Android-based smartphone within a building with centimeter accuracy.

3.2. System Overview


The HyMoTrack framework consists of multiple interfaced subsystems responsible for capturing
and mapping the environment, fusion of the used tracking technologies, placing 3D content and
finally tracking a mobile device with the camera input. These modules are running on different
platforms and devices, due to performance reasons. Since there are different steps needed until the
global reference map is generated, the whole system is planned to work as automated as possible, but
still needs some manual steps, mainly for copying metadata between the modules.
Figure 2 shows the designed workflow in detail. It contains all steps between capturing the
environment and using a smartphone for tracking and visualizing AR content. In the project we
utilize the functionality of the Metaio SDK since the main innovation is not the implementation of
another SLAM algorithm, but showing the capabilities of the proposed hybrid mapping and tracking
approach and how it can improve the usability of AR applications. We take advantage of parts of the
Metaio SDK in six steps within the workflow (marked in blue in Figure 2). The SLAM approach is
used for mapping only small chunks of the environment in a 3D point cloud to guarantee a good
mapping quality. Furthermore, the tracking capabilities of the Metaio SLAM and Metaio image
marker implementation is part of the fusion and the tracking process on the mobile device. In addition
to that, the SDK is accessing the inertial sensor of the phone and therefore provides the functionality
to predict the pose and the orientation for a specific number of frames when the visual tracking fails.
Within the workflow the modules one to three are responsible for creating a global reference
map. The first component (module one) consist of an application, which is responsible for capturing
3D point clouds by using SLAM. In order to capture an environment in a reasonable time an external
computer vision camera equipped with a wide-angle lens is connected to a notebook. For the
Sensors 2016, 16, 17 7 of 19

mapping procedure, overlapping SLAM maps of the location of interest have to be captured. The
maps including the video footage are saved on the notebook for registration purposes. In parallel,
it is necessary to capture good textured 2D surfaces like signs or posters of the environment, which
could have a good potential to be integrated as 2D feature markers into the global map. In addition
to that, digital copies of posters or billboards can also be integrated into the list of feature markers.
The second module of the pipeline fuses the individual recorded information into a global
coordinateSensors
system.
2016, 16,In
17 a first step the 3D maps have to be registered to each other. In a second step

the image markers have to be registered with the previously recorded video footage. The result of
The second module of the pipeline fuses the individual recorded information into a global
the mapping procedure
coordinate system.is In
ana xml structured
first step file describing
the 3D maps the offset
have to be registered of the
to each individual
other. In a secondmarkers
step to the
global origin.
the image markers have to be registered with the previously recorded video footage. The result of
After the mapping procedure
successful is an another
registration xml structured file describing
application the offset
(module of thetakes
three) individual
over.markers to the the Unity
Therefore,
global origin.
3D (https://unity3d.com/) editor was adapted to our needs. Unity 3D is a game engine with a
After successful registration another application (module three) takes over. Therefore, the Unity
flexible editor that we utilizededitor
3D (https://unity3d.com/) as an wasinterface
adapted toforour map
needs. editing.
Unity 3D isThereby it iswith
a game engine possible
a flexibleto load all
markers and point
editor that clouds within
we utilized as an one graphical
interface for mapuser interface
editing. Thereby for
it is evaluation and
possible to load allediting
markers purposes.
and By
overlayingpoint
a 2Dclouds
mapwithinof theone graphicalany
location, user3Dinterface
model forcan
evaluation and editing
be exactly placedpurposes.
withinBytheoverlaying
3D environment
a 2D map of the location, any 3D model can be exactly placed within the 3D environment for
for augmentation purposes.
augmentation purposes.

Module 1:SLAM Mapping of the Environment

Keep Latest Features and


Start
Restart

Video Input Metaio 3D SLAM Max Size Sub-Map


Sub-Map End
80Hz Mapping Reached Database

Module 2: Mapping Fusion of 2D Image Markers and 3D Point Clouds

Register
Sub-Map Global 3D
Successive Sub- Final 2D
Database Reference Map End of Mapping
Maps with ICP Image Markers

Metaio 3D SLAM
Tracking
Video Input Two Valid Global 2D, 3D
Image Poses Reference Map
80Hz
Metaio 2D Image
Tracking
Start
Register 2D
Potential 2D
Optimization Marker in Global
Image Markers
Map

Module 3: HyMoTrack Visualizer for Content Placement

Extract Global
Global Visualize
Start Add 3D Content Information End
Reference Map Mapping Data
Package

Module 4: Online Android Tracking Application

Metaio 3D SLAM
Tracking

Video Input Metaio 2D Image


Start Tracking Fusion Rendering End
30Hz Tracking

IMU

Figure 2. System workflow of the presented indoor tracking solution.


Figure 2. System workflow of the presented indoor tracking solution.
7/19
Sensors 2016, 16, 17 8 of 19
Sensors 2016, 16, 17
Sensors 2016, 16, 17
The tracking application is running running one one thread
thread for for each
each tracking
tracking solution
solution inin parallel.
parallel. Detecting
The tracking
the initial pose without application
any prior is running
knowledge one thread for each tracking
is implemented as well as solution in parallel.
a fallback heuristic,Detecting
which
the initial
decides whichposeofwithout any prior
the tracking knowledge
technologies is implemented
is suitable in the current
currentas well as
situation.
situation.a fallback
The mobileheuristic, which
application
decides which of the tracking technologies is suitable in the current
is also handling a prediction model for loading suitable sub-maps and image markers at the right situation. The mobile application
is also
time handling
in order to keepa prediction
the time for model for loading suitable
re-initialization low. Thesub-maps
low. followingand image
sections markers
describe theatdesigned
the right
time in order
workflow to keep
of each module
module the presented
time for re-initialization
presented in Figure
in Figure 2. 2. low. The following sections describe the designed
workflow of each module presented in Figure 2.
3.3. Module 1: Mapping Large Environments
3.3. Module 1: Mapping Large Environments
This section describes the workflow and implementation of the first module (Figure 2), which is
This section
responsible describes
for mapping the the workflowwith
environment and implementation
the chosen SLAMofapproach the first module (Figure2D
and capturing 2),textured
which is
responsible
surfaces for mapping
for potential imagethemarkers.
environment with the chosen SLAM approach and capturing 2D textured
markers.
surfaces for potential image markers.
3.3.1. 3D Feature
3.3.1. 3D Feature Mapping
Mapping
3.3.1. 3D Feature Mapping
In
In order
ordertotocapture
capture3D 3Dspatial information
spatial information of the
of theenvironment,
environment, a Windows-based
a Windows-based application was
application
developed.
In orderA high
to performance
capture 3D notebook
spatial equipped
information of with
the an Intel
environment,
was developed. A high performance notebook equipped with an Intel i7 processor was used in a i7 processor
a was
Windows-based used in a mobile
application
setup
mobile tosetup
capture
was developed. the
to capture information.
A high theperformance For notebook
information. capturing
For capturing the the
environment,
equipped with an multiple
environment, Intel cameras
cameraswith
i7 processor
multiple was different
withused in a
different
specifications
mobile setup regarding
to capture resolution,
the frame
information. Forrate and
capturing field-of-view
the were
environment,
specifications regarding resolution, frame rate and field-of-view were tested. For the final mapping tested.
multiple For the
cameras final
withmapping
different
process
process aa computer
specifications regarding
computer vision
vision USB
USB 3.0
resolution,3.0 camera
frame rate
camera (IDS-UI337, IDS
IDS Imaging
and field-of-view
(IDS-UI337, Imaging were Development
tested. For the
Development Systems GmbH,
final mapping
Systems GmbH,
Obersulm,
Obersulm, Germany) with a resolution of 1024 by 1024 pixels and a maximum frame rate of 100
process a Germany)
computer with
vision a resolution
USB 3.0 of
camera 1024 by
(IDS-UI337,1024 pixels
IDS and
Imaging a maximum
Development frame rate
Systemsof Hz
GmbH,
100 Hz
was chosen.
Obersulm, The
Germany) squaredwith sensor
a format
resolution of had
1024 the
by advantage
1024 pixels
was chosen. The squared sensor format had the advantage of detecting more important features onofanddetecting
a maximummore important
frame rate features
of 100 Hz
onwassigns
signs mounted
chosen.
mounted The on
thethe
onsquared ceiling.
sensor
ceiling. TheThe
format
highhigh
had frame
frame therate ratemakes
makesof
advantage ititpossible
possibleto
detecting to
moremove
move faster through
important
faster through
featuresthe on
the
environment
signs mounted withouton thegenerating
ceiling. blurred
The high images.
frame A
rate reasonable
makes
environment without generating blurred images. A reasonable framerate of 80 Hz was chosen. In it framerate
possible to of 80
move Hz was
faster chosen.
through In
the
addition
environment
addition to
to that aa global
thatwithout shutter
shutter ensures
globalgenerating blurred
ensures that
that the captured
images.
the captured feature
A reasonable points
points are
featureframerate correctly
areof 80 Hz was
correctly aligned while
chosen.
aligned In
while
moving
additionthe
moving to camera.
the that a global
camera. The lens
The lens
shutterused
used for the
ensures
for thethat
camera
camera had aa focal
the captured
had focal length
feature
length of 88 mm
points
of mm and therefore
are correctly
and therefore
alignedaa wider
while
wider
field-of-view
moving the than
camera. usualThesmartphone
lens used cameras.
for the With
camera this
had setup
a focal we were
length
field-of-view than usual smartphone cameras. With this setup we were able to capture images within able
of 8 to
mm capture
and images
therefore within
a wider
aafield-of-view
field-of-view
field-of-viewthan of
of about
usual85
about degrees
degrees in
smartphone
85 the
the horizontal
in cameras.
horizontalWith and this the
the same
and setup in
in the
we were
same vertical
theable angle
to capture
vertical of
of view.
angle images This
view.within
This
has
a the advantage
field-of-view of of capturing
about 85 more
degrees features
in the with
horizontal one frame
and
has the advantage of capturing more features with one frame compared to usual lenses. the compared
same in theto usual
vertical lenses.
angle of view. This
has the advantage of capturing more features with one frame compared to usual lenses.

Figure 3. IDS camera UI337, resolution: 1024 by 1024 pixels at 100 Hz.
Figure 3. IDS camera UI337, resolution: 1024 by 1024 pixels at 100 Hz.
Figure 3. IDS camera UI337, resolution: 1024 by 1024 pixels at 100 Hz.

Figure 4. Low-cost IDS camera UI-3251LE-M-GL, resolution: 1600 by 1200 pixels at 60 Hz.
Figure4.4.Low-cost
Figure Low-costIDS
IDScamera
cameraUI-3251LE-M-GL,
UI-3251LE-M-GL,resolution:
resolution:1600
1600by
by1200
1200pixels
pixelsatat60
60Hz.
Hz.
The intrinsic calibration of the camera was done with the Matlab Calibration Toolbox based on [24]
Thetointrinsic
in order calibration
distort the of the camera
image according was
to the done
lens with theparameters.
distortion Matlab Calibration Toolbox
This makes based to
it possible onuse
[24]
in order to distort the image according to the lens distortion parameters. This makes it possible to use
8/19
8/19
Sensors 2016, 16, 17 9 of 19

The intrinsic calibration of the camera was done with the Matlab Calibration Toolbox based
on [24] in order to distort the image according to the lens distortion parameters. This makes it possible
to use the captured map with other calibrated hardware setups like smartphones. Figure 3 shows
the used camera for the setup. Since this camera setup has a high price of EUR 5000, later on in
Sensors 2016, 16, 17
the project we discovered and tested a cheaper alternative camera-lens combination with similar
specifications available for EUR 400. We were able to achieve comparable map quality with the
the captured map with other calibrated hardware setups like smartphones. Figure 3 shows the used
IDS-UI_3251LE_M_GL
camera for the setup.camera Since thisshown
camera in setup
Figure 4.a high price of EUR 5000, later on in the project we
has
As a base for
discovered andour work
tested the SLAM
a cheaper implementation
alternative camera-lensofcombination
the Metaiowith SDKsimilar
was used. To achieve
specifications
fast loading
availabletimes
for EUR on400.theWe mobile devices
were able onlycomparable
to achieve small manageable
map qualitysub-maps were recorded. Since
with the IDS-UI_3251LE_M_GL
monocular
camera SLAM
shown in algorithms
Figure 4. are not able to measure the baseline between two keyframes, the scale
of a recorded map is our
As a base for onlywork the SLAM
estimated. implementation
The proposed approach of the Metaio SDK was
however used. To
requires notachieve
only thefastsame
loading times on the mobile devices only small manageable sub-maps
scale for every map recorded, but also the real world scale. Two features in the point cloud have towere recorded. Since monocular
have SLAM
the same algorithms
distance areasnot
theable to measure thepoints
corresponding baselinein between two keyframes,
the real world. the scale of because
This is necessary a recorded the 2D
map is only estimated. The proposed approach however requires not only the same scale for every
image markers used in the project have fixed measurements (see next chapter) and will be registered
map recorded, but also the real world scale. Two features in the point cloud have to have the same
within the point cloud. To ensure a correct registration we had to define the scale of the sub-maps
distance as the corresponding points in the real world. This is necessary because the 2D image
with markers
an artificially
used in placed
the project fiducial marker.
have fixed Before recording
measurements the firstand
(see next chapter) map,willabe2D fiducialwithin
registered marker of
fixedthe
sizepoint
(usually with an edge length of 25 cm) was placed in the environment.
cloud. To ensure a correct registration we had to define the scale of the sub-maps with an
The Metaio
artificially SLAM
placed algorithm
fiducial marker. is able
Before torecording
start with thethe tracking
first map, a of 2Dthe 2D marker.
fiducial marker of It fixed
then size
expands
the tracking withan
(usually with theedgeSLAMlengthapproach
of 25 cm) thereby
was placed using
in the the image marker features within the SLAM
environment.
map, therebyThe Metaio
definingSLAM thealgorithm
real world is able to start
scale. Thiswith the tracking
approach of the 2Dthe
guarantees marker.
scaleItfor
thentheexpands
first map.
the tracking with the SLAM approach thereby using the image marker
Another advantage of this method is, that the usual initial stereo matching step is obsolete, which features within the SLAM
map, thereby defining the real world scale. This approach guarantees the scale for the first map.
can be challenging in certain conditions. In this way the mapping process can start as soon as the
Another advantage of this method is, that the usual initial stereo matching step is obsolete, which can
marker is detected. The developed mapping application occasionally saves the map in order to keep
be challenging in certain conditions. In this way the mapping process can start as soon as the marker
the sub-maps small. Several tests showed that point clouds with more than 2000 features tend to drift
is detected. The developed mapping application occasionally saves the map in order to keep the
dramatically in
sub-maps small. big environments.
Several tests showed For thatthatreason, the maximum
point clouds with morenumberthan 2000 of features in one
features tend captured
to drift
sub-map was limited to 2000. Another advantage of small maps is the
dramatically in big environments. For that reason, the maximum number of features in one capturedsize. One map can be calculated
with sub-map
about 500 wasKB per map,
limited to 2000.which
Anotherhasadvantage
a significant effect
of small maps onisloading
the size. times.
One map The
canmapping process
be calculated
with about 500
is a continuous KB perinmap,
process our which has a significant
approach. Every time effect
a mapon loading
is saved times.
all The mapping
features of theprocess is
previously
a continuous process in our approach. Every time a map is saved
saved map are deleted and the currently saved sub-map is loaded again into the mapping process.all features of the previously saved
This map
ensuresare that
deleted
the and the currently
sub-maps are next saved sub-map
to each otherisand loadedshare again
onlyinto the mapping
a small part of the process.
features.
This ensures that the sub-maps are next to each other and share only a small part of the features.
Furthermore, by using preceding features in new sub-maps it is guaranteed that the scale of the map
Furthermore, by using preceding features in new sub-maps it is guaranteed that the scale of the map
stays the same over time. During this whole process the video footage is captured for later use in
stays the same over time. During this whole process the video footage is captured for later use in
orderorder
to register natural feature markers in the global reference map.
to register natural feature markers in the global reference map.

Figure 5. Visualization of mapped point features of a region covering a distance of 150 meters. The
Figure 5. Visualization of mapped point features of a region covering a distance of 150 meters. The red
red point correlates to the starting position of the mapping process, whereas the blue dashed line
pointshows
correlates to the
the path to astarting position of the mapping process, whereas the blue dashed line shows
target area.
the path to a target area.
The result of a mapping process can be seen in Figure 5. Within the 150 m rage 10 maps were
saved while walking along the path in one direction. It can be clearly seen that the density of the
features is varying. As a consequence, the size of the area covered by a sub-maps varies according to
9/19
Sensors 2016, 16, 17 10 of 19

The result of a mapping process can be seen in Figure 5. Within the 150 m rage 10 maps were
saved while walking along the path in one direction. It can be clearly seen that the density of the
features is varying. As a consequence, the size of the area covered by a sub-maps varies according to
the scenery. Long hallways like in the lower left corner of the image are represented through sparse
point clouds in contrast to environments like the shopping area (middle of the image), where many
visual features are available and one map only covers a few meters of the path.

3.3.2. 2D Image Markers


Sensors 2016, 16, 17

Publictheplaces
scenery. are usually
Long fitted
hallways like with
in the 2D
lowertextured surfaces
left corner of the image likeare signs, advertisements
represented through sparse or paintings.
These areas contain “good” textures that provide many features for tracking.where
point clouds in contrast to environments like the shopping area (middle of the image), SLAMmany algorithm
visual features are available and one map only covers a few meters of the path.
usually try to capture features, which are equally distributed over the captured frame in order not
to add too3.3.2.
many features
2D Image to the map. Because of this, the SLAM mapping process adds too less
Markers
features in such Public places aretextured
important usually fittedsurfaces
with 2Din ordersurfaces
textured to start liketracking from a close
signs, advertisements distance. For that
or paintings.
reason, weThese
integrated tracking
areas contain of 2D
“good” image
textures thatmarkers with features
provide many the Metaio implementation
for tracking. SLAM algorithm of marker less
image tracking to increase robustness of our solution. This allows the framework to add toalmost any
usually try to capture features, which are equally distributed over the captured frame in order not
add too many features to the map. Because of this, the SLAM mapping process adds too less features
rectangular textured surface as a marker, although not every texture is suitable, due to low contrasts
in such important textured surfaces in order to start tracking from a close distance. For that reason,
or missingwecorners
integrated in tracking
the image. The image
of 2D image markersmarkers also have
with the Metaio the function
implementation of detecting
of marker less image the initial
position oftracking
the device, if no robustness
to increase prior knowledge is available.
of our solution. This allows In contrast
the framework to 3Dtopoint features,
add almost any the image
markers haverectangular
to be textured surface as manually.
pre-processed a marker, although not every
In order to get texture is suitable,
a digital due to low contrasts
representation of the desired
or missing corners in the image. The image markers also have the function of detecting the initial
areas it is necessary to take a picture at a certain quality. In order to use the right scale, the size of the
position of the device, if no prior knowledge is available. In contrast to 3D point features, the image
markers must be have
markers measured. These steps
to be pre-processed are only
manually. relevant
In order to getifa it is not
digital possible toofget
representation access to digital
the desired
originals. areas
In the future
it is thetoauthors
necessary planattoa certain
take a picture automate these
quality. steps
In order by the
to use scanning thethe
right scale, environment
size of the with a
markers must be measured. These steps are only relevant if it is
combination of an RGB and depth camera in order to detect good textured planar surfaces. With thenot possible to get access to digital
originals. In the future the authors plan to automate these steps by scanning the environment with a
depth camera it would also be possible to measure the size of the detected area.
combination of an RGB and depth camera in order to detect good textured planar surfaces. With the
Figure 6 shows
depth camerasomeit wouldexamples of selected
also be possible to measuretextures,
the size which were used
of the detected area. as feature markers in the
testing environment. Especially
Figure 6 shows in big indoor
some examples environments
of selected textures, whichsimilarwere used or aseven themarkers
feature same advertisements
in the
testingmultiple
usually appear environment. Especially
times. in big indoor
Therefore, environments
for every similar or even
representation the same advertisements
the number of occurrence is noted
usually appear multiple times. Therefore, for every representation the number of occurrence is noted
and we use only one digital representation repeatedly.
and we use only one digital representation repeatedly.

(a) (b) (c)


Figure 6. Examples of textured surfaces used as 2D image markers. (a) Advertisement; (b) Destination
Figure 6. Examples of textured
board; (c) General map. surfaces used as 2D image markers. (a) Advertisement; (b) Destination
board; (c) General map.
3.4. Module 2: Fusion

3.4. Module 2: The


Fusion
mapping fusion process described as module 2 in Figure 2, describing the framework, is an
important step within the whole process. The fusion of the multiple tracking technologies has to be
The mapping fusion process
done very carefully in order todescribed
guarantee aas module
useable 2 inperformance.
tracking Figure 2, describing the framework, is an
important stepFor this process
within a Windows
the whole application
process. The was developed.
fusion of theItmultiple
is accessing the SLAMtechnologies
tracking maps and the has to be
markerless image tracking technology of the Metaio framework. In a first step, all captured SLAM
done very carefully in order to guarantee a useable tracking performance.
maps are registered within one global coordinate system. Two alternatives have been implemented
For this process
to connect twoa successive
Windowsmaps. application wasandeveloped.
The first uses It is
iterative closest accessing
point (ICP) [25]the SLAMtomaps
algorithm get and the
markerlesstheimage tracking technology of the Metaio framework. In a first step,
relation between two successive point clouds. The ICP from the point cloud library (PCL): all captured SLAM
(http://docs.pointclouds.org/trunk/classpcl_1_1_iterative_closest_point.html) is
maps are registered within one global coordinate system. Two alternatives have been implemented used to minimize the
to connectdistance between two point clouds. The result of this process is converted into translation and
two successive maps. The first uses an iterative closest point (ICP) [25] algorithm to
10/19
Sensors 2016, 16, 17 11 of 19

get the relation between two successive point clouds. The ICP from the point cloud library (PCL):
(http://docs.pointclouds.org/trunk/classpcl_1_1_iterative_closest_point.html) is used to minimize
the distance between two point clouds. The result of this process is converted into translation
and rotation vectors, describing the offset between the two local coordinate systems. If this is not
successful, because of a gap in the mapping process or having an insufficient number of matching
features, the saved video footage is used to extend the first map in order to add more features to
increase the probability of finding a reliable match with the ICP algorithm. The sub-maps are not
fused together into one single map file, but the information about the offsets between each individual
map is saved in order to be able to visualize the point cloud. Figure 5 shows the result of a globally
fused reference map.
For adding image markers to the created global reference map structure, it is necessary to build a
database of all captured images. In an XML file all necessary information is collected containing size,
image path, ID and number of repetitions. The developed algorithm to register the markers within
the global map uses the captured video footage of the mapping process. Every frame of the captured
video passes through two instances. In the first instance, the 3D SLAM is tracking the position of the
camera according to the fused sub-maps. In order to guarantee a continuous tracking the currently
active SLAM map is extendable with a maximum of 500 features, after that the adjacent map is loaded
for further tracking. The current image is then handed over to the 2D image marker tracking instance,
which is scanning the image for any relevant image markers of the database. The used hardware
allowed the scanning of 100 markers per second without dropping below 30 fps during the fusion
process. If an image marker is found the pose as well as the tracking quality is monitored over
time. The final position of the marker is calculated according to the average pose of the marker and
weighted by the monitored tracking quality over time in order to estimate a feasible global pose. In
case a poster is changed in the mapped environment, it is possible to just exchange the image, thereby
keeping the metadata like size and position and keeping the map up-to-date.
As a result a well-structured XML file is generated. It contains the pose in relation to the
global coordinate system of each registered marker and each SLAM map. The package, which is
containing the information for the tracking device is including the XML file, the raw 2D markers and
the point clouds.

3.5. Module 3: Viewer


Before loading the reference map on a mobile device for tracking, the data is visualized in
the content viewer application. For this purpose, we developed an interface with the editor of the
Unity (https://unity3d.com/) game engine. It has three main purposes: visualization and editing of
mapped data and placing 3D objects for interactive AR content.
In a first step, the metadata of the mapping process is loaded and all content containing the
SLAM point clouds as well as the 2D image markers are visualized within a global coordinate
framework. A first evaluation of the mapping procedure can be done with this first step. Figure 7
shows a screenshot of the application. Furthermore, in case the location has changed it is possible
to edit or replace different markers within the environment, for example due to an exchanged
advertisement or because a marker was removed.
In order to demonstrate the usability of such a system the content viewer was extended for
adding AR content. Because it is hard to relate a position within a point cloud with a real world
position a simplified version of a 2D map of the environment can be registered within the mapped
data. For the testing location a starting point and an endpoint can be defined. We integrated a path
planning algorithm [26,27], which calculates a path between these two points. For the AR navigation
scenario, arrows were placed along the path. The path will be shown to the user within the camera
stream. Since these are overlaid information, it is also necessary to deal with occlusions to present a
realistic and understandable visualization of the path. The virtual scene contains all arrows. As soon
as the tracking starts, the whole path will be visible to the user. Due to walls this can lead to a wrong
Sensors 2016, 16, 17 12 of 19

registration in the real world. Because of that, simplified walls are placed according to the 2D map
representation, which are blocking arrows that are not in the direct line of sight of the user.
Sensors 2016, 16, 17

Figure 7. Screenshot of the content viewer.


Figure 7. Screenshot of the content viewer.
3.6. Module 4: Tracking in Large Environments
3.6. Module 4: All
Tracking in Large about
the information Environments
the global reference map and the 3D models is transferred to the
mobile tracking application, which is implemented for an Android device. As a testing platform a
All the information
Samsung Note 2014 about
Editiontheequipped
global reference
with an Exynos map5 Octaand processor
the 3D models(4 × 1.90 is transferred
GHz) was used. The to the mobile
tracking application,
main advantage which
of theispresented
implemented solution for
is theanfact,
Android
that two device. As a concepts
major tracking testing platform
are available a Samsung
Note 2014toEdition
estimateequipped
the positionwith of thean user respectively
Exynos 5 Octa the processor
smartphone.(4 Toˆachieve
1.90 GHz)this two wasthreads
used. are The main
running in parallel, each one responsible for one tracking method. The tracking procedure is based
advantage of the presented solution is the fact, that two major tracking concepts are available to
on a multi-level approach, which starts with the detection of the initial position.
estimate the position
First the of thestream
video user respectively
is searched forthe smartphone.
a match with the 2D Toimage
achieve thisdatabase.
marker two threadsFor thisare running
in parallel,purpose,
each one responsible
the complete set of for one tracking
2D markers is loaded method. The tracking
into the tracking thread. Each procedure
frame is is onlybased on a
scanned for a package of ten image markers in order
multi-level approach, which starts with the detection of the initial position. to prevent a drop of the frame rate below 20 fps.
A search for an initial position with the available SLAM maps would take too long, because of a
First the video stream is searched for a match with the 2D image marker database. For this
loading time between 0.1 to 0.5 s per map. As soon as enough markers have been identified to
purpose, the complete
pinpoint set both
the location, of 2D markers
tracking threads isare
loaded
updatedinto withthe
a set tracking
of surrounding thread.markers Each frame is only
and maps
scanned for a package
in order of tentimes
to keep loading image at a markers
minimum. in Theorder to prevent
image marker thread ais drop
provided of with
the 20frame
imagesrate below
whereasfor
20 fps. A search the an
SLAM thread
initial holds fivewith
position mapsthe of the surrounding
available SLAMenvironment.
maps would take too long, because
As long as the detected image marker delivers a pose, the SLAM thread is running in parallel
of a loading time between 0.1 to 0.5 s per map. As soon as enough markers have been identified to
searching for a match. In general SLAM tracking is preferred, which means as soon as a valid position
pinpoint the canlocation,
be estimated, boththetracking
2D imagethreadstracking is aresetupdated
on pause. with a set ofactive
The currently surrounding markers
map is enabled to be and maps
in order toextended
keep loading times inatorder
for 500 features a minimum.
to guaranteeThe a more image markerThis
solid tracking. thread is provided
extended map is notwith
saved20 images
whereas the to the
SLAM database,
threadbut used
holds as five
long maps
as the device
of theissurrounding
in the near vicinity of that map. This ensures that
environment.
the reference map is not degenerating over time. While the user is walking through the environment,
As long as the detected image marker delivers a pose, the SLAM thread is running in parallel
the feature-sets loaded in both threads are updated according to the current position. Since loading
searching for a match.
the features In some
takes general time,SLAM tracking
this procedure is preferred,
is alternately donewhich means
in one thread as soon
while as aone
the other valid
is position
can be estimated,
still able tothe 2D aimage
deliver tracking
pose. During is set on
the tracking pause.
process The heuristic
a fallback currently (seeactive
Figure map is enabled to be
8) is handling
extended for the switching
500 features between the two
in order tovisual approaches
guarantee a moreand the estimation
solid tracking. of the
Thisinertial sensor. It
extended is also
map is not saved
responsible for activating and pausing the tracking technologies at the right time, since every running
to the database, but used as long as the device is in the near vicinity of that map. This ensures that
instance is also consuming additional computing power. For estimating the first position, the SLAM
the reference map is not degenerating
thread is deactivated and is waiting overfor time.
the 2DWhilemarkerthe user
thread to is walking
find through
a first position. thetoenvironment,
Due the
the feature-sets
multiple loaded in both
occurrence of thethreads
same 2D aremarkers
updated according
in several placestointhethe current
environment,position.
the SLAMSince loading
the features approach
takes is started
some as soon
time, thisasprocedure
the 2D tracking thread limitsdone
is alternately the number
in one of thread
possible while
areas down to
the other one is
two. This can be done, because during the capturing process the occurrence of each marker type is
still able tosaven
deliver a pose. During the tracking process a fallback heuristic
in the 2D marker database. As soon as this happens the SLAM algorithm is activated and
(see Figure 8) is handling
the switchingloaded between
with maps thefrom
twoboth visual approaches
possible areas. Bothand the are
threads estimation
then searchingof the forinertial
a correctsensor.
match. It is also
responsible for activating and pausing the tracking technologies at the right time, since every running
12/19
instance is also consuming additional computing power. For estimating the first position, the SLAM
thread is deactivated and is waiting for the 2D marker thread to find a first position. Due to the
multiple occurrence of the same 2D markers in several places in the environment, the SLAM approach
is started as soon as the 2D tracking thread limits the number of possible areas down to two. This can
be done, because during the capturing process the occurrence of each marker type is saven in the 2D
Sensors 2016, 16, 17 13 of 19

marker database. As soon as this happens the SLAM algorithm is activated and loaded with maps
from Sensors
both 2016, 16, 17 areas. Both threads are then searching for a correct match. Depending on which
possible
thread is successful in finding the correct position, different choices are made. If the 2D marker thread
Depending on which thread is successful in finding the correct position, different choices are made.
is successful, the SLAM approach is loaded with maps that correspond to the found position. If the
If the 2D marker thread is successful, the SLAM approach is loaded with maps that correspond to the
SLAM approach
found position. is successful
If the SLAMfirst, the 2D
approach thread will
is successful be the
first, deactivated.
2D thread will be deactivated.

Sensors 2016, 16, 17


2D Textured Surfaces

Depending on which thread is successful in finding the correct position, different choices are made.
3D SLAM Features
If the 2D marker thread is successful, the SLAM approach is loaded with maps that correspond to the
found position. If the SLAM approach is successful first, the 2D thread will be deactivated.
Inertial Senor IMU
2D Textured Surfaces

6 DOF Pose
3D SLAM Features

Figure 8. The waterfall model shows the combination of the different tracking methods.
Figure 8. The waterfall model shows the combination of the different tracking methods.
Inertial Senor IMU
In an ideal case, the SLAM thread is now able to provide a tracking position. Since we know the
In an ideal
camera pose,case, thethread
the 2D SLAMwill thread is now6able
be activated inDOFtwo to provide aFirst,
situations.
Pose tracking
if theposition. Since we
fallback heuristic know the
detects
camerathatpose, the 2Disthread
the camera pointedwill be activated
towards a virtualin2Dtwo situations.
marker. The second First,reason
if the forfallback heuristic
activation detects
is if the
Figure 8. The waterfall model shows the combination of the different tracking methods.
tracking
that the cameraquality gets low,towards
is pointed which is a reported
virtualby 2D themarker.
Metaio SDK The insecond
an interval
reason[0,1].for activation is if the
The fallback
tracking quality gets
In an
heristic
low,
ideal which
case,
is also
the SLAM
able to setby
is reported
thread is now
specific
the values SDK
able Metaio
to provide
like the
in an
a tracking
number
interval
position.
ofSince
frames
[0, we to control
1]. know the
the
inertial sensor.
camera pose,
The fallback The IMU data
the 2Disthread
heristic is
also willused to predict
ablebetoactivated the pose
in two values
set specific and
situations. the orientation
First,
like theif the for
fallbackof
number a specific
heuristic
frames number
detects
to controlof the
frames. Since
that a high drift
the camera is expected
is pointed towardsthe tracking
a virtual 2D data
marker.fromThe the IMU reason
second is onlyfor used for a duration
activation is if the of 30
inertial sensor. The IMU data is used to predict the pose and the orientation for a specific number of
frames, whichquality
tracking usuallygetscorrelates
low, whichtoisareported
duration byof theone second.
Metaio SDK Thisin an is enough
interval time to bridge the gap
[0,1].
frames. Since The a high drift
fallback is expected theto tracking
set specificdata from thenumber
IMU isof onlyframesused for athe duration of
between switching of heristic
two maps. is also able values like the to control
30 frames,Figure
which9sensor.
inertial usually
andFigure 10 show two screenshots of the developed application. In Figure 9 the
The correlates
IMU data isto a
used duration
to predict of
the one
pose second.
and the This is
orientation enough
for a time
specific to
number bridge
of no gap
between frames.
switching Since
ofa high
two drift
maps. is expected the tracking data from the IMU is only
tracking is available at the moment and both threads are searching for a match. For this reason, the used for a duration of 30
frames, which usually correlates to a duration of one second. This is enough time to bridge the gap
Figure 9 and effort
computational Figure 10 show
is high and the two
framescreenshots
rate drops of thefps.
to 15 developed
Figure 10 on application.
the other hand In shows
Figurea 9 no
between switching of two maps.
tracking
validistracking
availablewithat the
the moment
SLAM and
approach both threads
including the are
point searching
features of
Figure 9 andFigure 10 show two screenshots of the developed application. In Figure 9 nofor
the a match.
map and For
the this reason, the
visualization
of the
computational path. effort
tracking is highatand
is available the frame
the moment rate threads
and both drops are to 15 fps. Figure
searching 10 on
for a match. Forthethisother
reason,hand
the shows a
computational
valid tracking with theeffort
SLAM is high and the frame
approach rate drops
including thetopoint
15 fps.features
Figure 10of onthe
the map
other hand
and the shows a
visualization
valid tracking with the SLAM approach including the point features of the map and the visualization
of the path.
of the path.

Figure 9. Screenshot of the tracking application.


Figure 9. Screenshot of the tracking application.
Figure 9. Screenshot of the tracking application.

13/19
13/19
Sensors 2016, 16, 17 14 of 19
Sensors 2016, 16, 17

Figure 10. Tracking application including point features and path visualization.
Figure 10. Tracking application including point features and path visualization.
3.7. Evaluation Scenario
3.7. Evaluation Scenario
Since the interest in an accurate indoor navigation system is high, the Vienna International
Airportthe
Since gave us access
interest in toana security
accuratearea for ournavigation
indoor evaluation. system
It was possible to gain
is high, the aVienna
lot of experience
International
by testing the system under authentic conditions with real world challenges. Together
Airport gave us access to a security area for our evaluation. It was possible to gain a lot of experience with staff
members an especially neuralgic location was chosen, where passengers
by testing the system under authentic conditions with real world challenges. Together with staffusually have orientation
problems. The selected area contains a path of about 150 meters, which is equivalent to a region of
members an especially neuralgic location was chosen, where passengers usually have orientation
about 2000 m², where the passengers have to decide two times for the right direction.
problems. The selected area contains a path of about 150 meters, which is equivalent to a region of
Figure 9 shows the starting point right after the security check where people have to choose
aboutbetween
2000 m², where the passengers have to decide two times for the right direction.
two paths. The left way is just a small passage in contrast to the bright and visually attractive
Figure
direction 9 shows
throughthethestarting
shoppingpointarea.right after
In this the security
environment therecheck where tracking
are several people have to choose
challenges.
between
It starts at a long wide hallway with partly daylight, but less features and some signs. Next the attractive
two paths. The left way is just a small passage in contrast to the bright and visually path
direction
leadsthrough thepreviously
through the shoppingmentioned
area. In shopping
this environment
area, which there
is fullare severalfeatures
of natural trackingandchallenges.
potential It
starts2Dat image
a longmarkers. This areawith
wide hallway is of special
partly interest,
daylight,because of afeatures
but less constantly changing
and some environment,
signs. Next the due path
leadsto a dynamic
through product range.
the previously Finally, ashopping
mentioned long hallway
area,iswhich
leading to 33
is full ofdifferent gates with
natural features anda high
potential
amount
2D image of similarities.
markers. The path
This area is ofends at a specific
special gate
interest, along this
because of acorridor.
constantly changing environment,
The interface of the tracking application is reduced to the camera stream and the AR content.
due to a dynamic product range. Finally, a long hallway is leading to 33 different gates with a high
The path that is visualized for the user consists of arrows, which are places approximately five meters
amount of similarities. The path ends at a specific gate along this corridor.
apart. In addition to that a sign is telling the user when the desired destination is reached.
The interface of the tracking application is reduced to the camera stream and the AR content.
The path that isand
4. Results visualized
Discussion for the user consists of arrows, which are places approximately five meters
apart. In addition to that a sign is telling the user when the desired destination is reached.
This section is discussing the evaluation of the system, which was done on the airport. In
addition to that, the experiences of setting up a tracking system within a public environment will be
4. Results and Discussion
integrated into the discussion. At the beginning, the experiences of the mapping step are described,
This section
followed istracking
by the discussing
step. the evaluation
Furthermore, theof the system,
technical which
details are was
listed. Thedone
paperon the airport.
concludes with In
a report
addition about
to that, theexperiences
the experience ofofthe people,
setting upwho used thesystem
a tracking system for navigation
within through
a public the airport.
environment will be
integrated into the discussion. At the beginning, the experiences of the mapping step are described,
4.1. Mapping
followed by the tracking step. Furthermore, the technical details are listed. The paper concludes with
Therethe
a report about are experience
special challenges
of the when mapping
people, a public
who used buildingfor
the system likenavigation
an airport.through
The mapping of
the airport.
point features should be done with lightning situations similar to the tracking conditions. Because the
4.1. Mapping
14/19
There are special challenges when mapping a public building like an airport. The mapping of
point features should be done with lightning situations similar to the tracking conditions. Because
Sensors 2016, 16, 17 15 of 19

the airport cannot close off the environment for a longer time, the mapping process has to be done
during operation. Since the reference map has to be as perfect as possible, adding non-permanent
features has to be avoided during the mapping procedure. At this point, the concept of mapping an
environment with multiple small maps helps, because it makes it easy to react to sudden changes
like the appearance of a crowd. Dynamic features which are not occluding important areas of
the environment are usually not a problem when using a SLAM algorithm, because usually the
RANSAC [28,29] or a similar algorithm is removing these features as outliers. For the airport scenario
especially features on the ceiling like signs or special structures were very important in order to
keep the tracking alive. For the whole path in one direction only 4,5 MB of point cloud data was
captured within ten maps. This makes it possible to save all relevant data for tracking directly on the
mobile device.
Capturing and preparing the 2D markers is a highly manual step at that moment. For the testing
environment 23 image markers could be found, which were able to be tracked with sufficient quality.
It took approximately four hours in order to collect all necessary map data (SLAM maps as well as 2D
image marker) of the target area. However a large part of the time on sight was spent with waiting
for big groups of passengers to pass.

4.2. Tracking
Tracking within the airport is a big challenge, due to very different circumstances like daylight,
artificial light and long corridors with similar structures. Our solution could handle all of these
situations quite well, except a large window environment where outside features were registered
in the map, but couldn’t be detected during the tracking process. The initialization pose without any
previous knowledge of the position could be found usually within three seconds, depending on the
viewing direction of the camera, which was usually containing at least one 2D image marker. It was
also tested to get the initial pose only by using the SLAM thread. In this case, the time for initialization
highly depends on the number of maps in the database, since the loading time is much higher. For
small indoor environments SLAM maps can be preferred for detecting an arbitrary position since
any viewing direction should contain sparse point features. During our tests a reasonable set of five
small SLAM maps for initialization of a position was always found within five seconds, whereas the
loading consumes most of the time. The fast matching of SLAM maps can also be proved with the
re-initialization when tracking is lost, by looking on the floor or covering the lens. In this situation
mainly the SLAM algorithm finds the position within a couple of frames after the camera is pointed
up again. When searching with a SLAM map it turned out that at least five frames are needed in
order to find a position within a mapped environment. During tracking tests it was discovered that
reoccurring structures are only a problem if they look exactly the same and are close to each other.
There are two facts which keep the possibility of a mismatch low: First the implementation of using
the combination of two tracking solutions following the path of the user. Second the fact that only the
surrounding maps are kept active during the tracking process.
2D feature markers showed a good performance especially in situations where the camera was
very close to certain objects, since the SLAM map usually does not contain enough point features to
initialize at short distances. The small field-of-view of smartphone cameras did not have a positive
effect on the tracking, but within the airport scenario the hallways were big enough to be able to
overcome this problem.
An important part in tracking applications is the accuracy of the method used. Since the
proposed approach is using different technologies during the tracking process, no global quality
factor can be determined. The main advantage of using small SLAM maps is that the accuracy is
independent of the size of the mapped environment. Every new loaded sub-map for tracking starts
with an optimal pose. Therefore, the accuracy of our approach is dependent on the quality of the
SLAM algorithm, which is usually in the magnitude of millimeters. The exact error could not be
measured within the tested environment. Within such big environments it is almost impossible to
Sensors 2016, 16, 17 16 of 19

setup a reference tracking system in the mapped area in order to generate ground truth values, due
to costs
Sensors and
2016, 16,size.
17 Unfortunately in this case we had no way to confirm the absolute accuracy of the
system over the whole mapped area. However, since the mapping process is split up into multiple
sub-maps, the tracking quality is equal to the quality of the Metaio SLAM implementation, which is
accuracy.
generally below centimetre accuracy.
Defining the quality of the 2D markers is even harder, since the 2D textured areas are of different
quality concerning the detected features, size and position in the room. However, if the viewing angle
below 30
is below 30 degrees
degreesof ofthe
thecenter,
center,tests
testsshowed
showeda ajitter
jitter free
free visualization
visualization of of
thethe visual
visual 3D3D content.
content. In
In general,
general, bothboth visual
visual approaches
approaches are qualified
are qualified in orderin order to augment
to augment 3D content
3D content atany
at almost almost any
position
positionthe
within within
mapped the mapped environment.
environment.

4.3. Testing
4.3. Testing
An application for users, especially when dealing with interactive AR content, has to be highly
responsive with real-time
real-time performance.
performance. In In the
the worst
worst case,
case, when
when both tracking technologies were
searching for a match, the frame rate drops to 15 fps, but as soon as a tracking pose is available,
between 20 and 25 fps can be reached,
reached, which
which is
is high
high enough
enough forfor real-time
real-time interaction.
interaction.
With the presented tracking solution it was possible to walk along the described path with
continuous tracking.
tracking. Figure
Figure1111shows
showspart
part
of of
thethe map
map with
with the the
walkedwalked path,
path, whereas
whereas the color
the color of
of each
each point represents the ownership of the corresponding recorded sub-map. Since the
point represents the ownership of the corresponding recorded sub-map. Since the maps have to be maps have to
be switched
switched fromfrom time
time to time,
to time, a noticeable
a noticeable butbut
notnot disturbing
disturbing jitter
jitter occurs.
occurs.

Figure 11. Point cloud of the mapped area on the airport. Lines show the distance traveled. Red (stop
Figure 11. Point cloud of the mapped area on the airport. Lines show the distance traveled. Red (stop
sign) and blue (path) rectangles are the AR content visualized for the user during a tracking session.
sign) and blue (path) rectangles are the AR content visualized for the user during a tracking session.

A direct comparison between two tracking runs is hard to make, since every walkthrough of a
A direct comparison between two tracking runs is hard to make, since every walkthrough of a
person has different conditions like a different path and different amount of people passing by and
person has different conditions like a different path and different amount of people passing by and
as already mentioned before no ground truth is available at the airport. In order to evaluate the
as already mentioned before no ground truth is available at the airport. In order to evaluate the
robustness of the solution we performed a walkthrough along the mapped environment four weeks
robustness of the solution we performed a walkthrough along the mapped environment four weeks
after the mapping process was done. Within this timeframe several parts of the airport changed the
after the mapping process was done. Within this timeframe several parts of the airport changed the
appearance. A wall directly facing the camera path was redesigned, the shops partly changed the
appearance. A wall directly facing the camera path was redesigned, the shops partly changed the
product line and some smaller construction work was in process at the side of the hallway. Due to
product line and some smaller construction work was in process at the side of the hallway. Due to the
the high number of features within the map and the 2D image markers, the tracking was still working
high number of features within the map and the 2D image markers, the tracking was still working
all along the path. Similar experiences were made when bigger groups of people were entering the
all along the path. Similar experiences were made when bigger groups of people were entering the
field-of-view of the camera. To our experience, almost 50% of the image could be covered without
field-of-view of the camera. To our experience, almost 50% of the image could be covered without
resulting in loss of tracking, due to the features placed on the ceiling of the airport, like signs or brand
resulting in loss of tracking, due to the features placed on the ceiling of the airport, like signs or
logos above the shops. Considering that 100 feature matches are enough in order to estimate a
brand logos above the shops. Considering that 100 feature matches are enough in order to estimate
position in a feature rich environment like in shopping areas, where there could be thousands of
trackable features, extensive changes in the environment would have to be performed to prevent the
tracking from working. In addition to that, as soon as the mobile application has produced a reliable
position, the map is extended and the changed environment is also used for positional tracking in
this situation.
Sensors 2016, 16, 17 17 of 19

a position in a feature rich environment like in shopping areas, where there could be thousands of
trackable features, extensive changes in the environment would have to be performed to prevent the
tracking from working. In addition to that, as soon as the mobile application has produced a reliable
position, the map is extended and the changed environment is also used for positional tracking in
Sensors 2016, 16, 17
this situation.
The tracking
The trackingteststestswere
weredonedone with
with a smartphone
a smartphone andand a tablet.
a tablet. FigureFigure 12a shows
12a shows a situation
a situation where
a route is blocked through a virtual sign for the user. Figure 12b on the other hand shows a patha
where a route is blocked through a virtual sign for the user. Figure 12b on the other hand shows
path visualized
visualized throughthrough the shopping
the shopping areaarea
with with a number
a number of of arrowswhile
arrows whileconsidering
consideringthe the occlusion
occlusion
through walls. During the tests on the airport ten people were using the
through walls. During the tests on the airport ten people were using the system. They were ledsystem. They were led over
over
the whole distance of 150 m to the gate. Within these observed user tests two ways
the whole distance of 150 m to the gate. Within these observed user tests two ways of using the system of using the system
could be
could be identified,
identified, whereas
whereas the the majority
majority was
was using
using both
both approaches. First, the
approaches. First, the device
device waswas used
used all
all
the time to explore the AR environment and therefore a continuous tracking
the time to explore the AR environment and therefore a continuous tracking was needed. Later on was needed. Later on the
pathpath
the people tendtend
people to look
to away from the
look away fromscreen, pointing
the screen, the camera
pointing the away
camera from
awaythefrom
walking the direction
walking
for example
direction fortowards
examplethe floor, thereby
towards loosing
the floor, therebytracking.
loosingOnly after aOnly
tracking. while, they
after demand
a while, for demand
they tracking
again in order to get new information on the path.
for tracking again in order to get new information on the path.

(a) (b)
Figure
Figure 12.
12. (a)
(a) The
The user
user was
was looking
looking at
at the
the wrong
wrong passage
passage indicated
indicated by
by the
the red
red sign;
sign; (b)
(b) Example
Example of
of
visualizing a path calculated especially for the user to find the way to the
visualizing a path calculated especially for the user to find the way to the gate.gate.

5. Conclusions and Future Work


5. Conclusions and Future Work
In this paper, we have proposed a complete framework specially designed for an augmented
In this paper, we have proposed a complete framework specially designed for an augmented
reality indoor application. The mobile real-time implementation of the localization and visualization
reality indoor application. The mobile real-time implementation of the localization and visualization
modules enables non-professional users to walk through an unknown environment while being
modules enables non-professional users to walk through an unknown environment while being
navigated with a mobile device. The framework builds upon multiple novel concepts, which were
navigated with a mobile device. The framework builds upon multiple novel concepts, which were
developed and tested:
developed and tested:
1.
1. A Amapping
mappingpipeline
pipelinebased
basedononsmall
small SLAM
SLAM mapsmaps with
with aa wide
wide angle
angle camera
camera
2. A high quality offline fusion module for SLAM maps and 2D feature markers
2. A high quality offline fusion module for SLAM maps and 2D feature markers
3. An interface for generating paths and placing 3D content for AR visualization
3. An interface for generating paths and placing 3D content for AR visualization
4. A mobile hybrid tracking concept with a fallback heuristic
4. A mobile hybrid tracking concept with a fallback heuristic
5. An independent mobile tracking solution at real-time frame rates
5.
6. Low An independent
hardware costsmobile tracking
for big indoorsolution at real-time frame rates
environments
6. Low hardware costs for big indoor environments
Compared to previous methods for tracking the HyMoTrack framework is a complete tool for
Compared
setting to previous methods
up an environment for indoor for localization
tracking the and HyMoTrack frameworklike
AR visualization is apath
complete tool
planning
for setting up an environment for indoor localization and AR visualization
or location-based information. We have demonstrated the usability of the tracking solution for like path planning
or location-based information.
non-professionals within a realistic We have
airportdemonstrated
scenario. The thesystem
usability of thea tracking
showed solution for
robust performance
non-professionals within a realistic airport scenario. The system showed
despite the challenging influences of a real environment like occlusions from passing people, a robust performance
despite the
different challenging
light situationsinfluences
or different of moving
a real environment like occlusions
speeds of users. fromalso
The user tests passing
showedpeople, different
the intuitive
light situations
usability of an AR orbased
different movingsystem.
navigation speedsFuture
of users.
work The user
will be donetests also showed
dealing the intuitive
with a higher degree
usability of an AR based navigation system. Future work will be done
of automation especially for automated capturing and evaluation of 2D feature markers. The dealing with a higher
main
degree of automation
contribution of the work especially
at handforis automated
a completecapturing and evaluation
system, which enables userof 2D feature
studies markers.
about user
The main contribution of the work at hand is a complete system, which enables
behavior, user interaction possibilities with AR content and other scientific research questions user studies about
targeting large indoor AR enabled environments.

Acknowledgments: This study was supported by research funds from the Austrian Research Promotion Agency
(FFG—Bridge 838548). The authors acknowledge the support provided by project partners Card eMotion and
Insider Navigation Systems. In addition to that, we would like to mention the support by the Vienna
Sensors 2016, 16, 17 18 of 19

user behavior, user interaction possibilities with AR content and other scientific research questions
targeting large indoor AR enabled environments.

Acknowledgments: This study was supported by research funds from the Austrian Research Promotion Agency
(FFG—Bridge 838548). The authors acknowledge the support provided by project partners Card eMotion
and Insider Navigation Systems. In addition to that, we would like to mention the support by the Vienna
International Airport for providing time and access. Moreover, the authors acknowledge the Vienna University
of Technology Library for financial support through its Open Access Funding Program.
Author Contributions: Georg Gerstweiler performed the manuscript preparation and implemented the
methodology and the main algorithm. Emanuel Vonach helped in the refinement of the developed solution
and contributed to the evaluation. Hannes Kaufmann supervised the work. All authors discussed the basic
structure of the manuscript and approved the final version.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Perey, C. Indoor positioning and navigation for mobile AR. In Proceedings of the 2011 IEEE International
Symposium on Mixed and Augmented Reality—Arts, Media, and Humanities, Basel, Switzerland, 26–29
October 2011.
2. Schöps, T.; Engel, J.; Cremers, D. Semi-dense visual odometry for AR on a smartphone. IEEE Int. Symp.
Mixed Augment. Real. 2014. [CrossRef]
3. Bailey, T.; Durrant-Whyte, H. Simultaneous localization and mapping (SLAM): Part II. IEEE Robot. Autom.
Mag. 2006, 13, 108–117. [CrossRef]
4. Hong, F.; Zhang, Y.; Zhang, Z.; Wei, M.; Feng, Y.; Guo, Z. WaP: Indoor localization and tracking using
WiFi-Assisted Particle filter. In Proceedings of the 2014 IEEE 39th Conference on Local Computer Networks
(LCN), Edmonton, AB, Canada, 8–11 September 2014; pp. 210–217.
5. Cruz, O.; Ramos, E.; Ramirez, M. 3D indoor location and navigation system based on Bluetooth. In
Proceedings of the CONIELECOMP 2011—21st International Conference on Electronics Communications
and Computers, San Andres Cholula, Mexico, 28 February–2 March 2011; pp. 271–277.
6. Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110.
[CrossRef]
7. Rosten, E.; Drummond, T. Machine learning for high-speed corner detection. Lect. Notes Comput. Sci. 2006,
3951, 430–443.
8. Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In
Proceedings of the 2011 IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13
November 2011; pp. 2564–2571.
9. Mulloni, A.; Ramachandran, M.; Reitmayr, G.; Wagner, D.; Grasset, R.; Diaz, S. User friendly SLAM
initialization. In Proceedings of the 2013 IEEE International Symposium on Mixed and Augmented Reality
(ISMAR), Adelaide, Australia, 2013; pp. 153–162.
10. Klein, G.; Murray, D. Parallel tracking and mapping for small AR workspaces. In Proceedings of the ISMAR
2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, Nara, Japan, 13–16
November 2007.
11. Klein, G.; Murray, D.W. Improving the Agility of Keyframe-Based SLAM. Lect. Note Comput. Sci. 2008,
5303, 802–815.
12. Smith, R.; Self, M.; Cheeseman, P. Estimating uncertain spatial relationships in robotics. Autonom. Robot
Veh. 1990, 4, 167–193.
13. Montemerlo, M.; Thrun, S.; Koller, D.; Wegbreit, B. FastSLAM: A Factored Solution to the Simultaneous
Localization and Mapping Problem. AAAI-02 Proc. 2002, 593–598. [CrossRef]
14. Forster, C.; Pizzoli, M.; Scaramuzza, D. SVO: Fast Semi-Direct Monocular Visual Odometry. In Proceedings
of the IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, May 31–7
June 2014; pp. 15–22.
15. Mur-Artal, R.; Tardós, J.D. Fast Relocalisation and Loop Closing in Keyframe-Based SLAM.
In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Hong Kong,
China, 31 May–7 June 2014; pp. 846–853.
Sensors 2016, 16, 17 19 of 19

16. Wagner, D.; Mulloni, A.; Langlotz, T.; Schmalstieg, D. Real-time panoramic mapping and tracking on
mobile phones. In Proceedings of the IEEE Virtual Reality Conference, Waltham, MA, USA, 20–24 March
2010; pp. 211–218.
17. Arth, C.; Klopschitz, M.; Reitmayr, G.; Schmalstieg, D. Real-time self-localization from panoramic images
on mobile devices. In Proceedings of the 2011 10th IEEE International Symposium on Mixed and
Augmented Reality, ISMAR 2011, Basel, Switzerland, 26–29 October 2011; pp. 37–46.
18. Ventura, J.; Arth, C.; Reitmayr, G.; Schmalstieg, D. Global Localization from Monocular SLAM on a Mobile
Phone. IEEE Trans. Vis. Comput. Graph. 2014, 20, 531–539. [CrossRef] [PubMed]
19. Wagner, D.; Schmalstieg, D. First steps towards handheld augmented reality. In Proceedings of the Seventh
IEEE International Symposium on Wearable Computers, Vienna, Austria, 18–21 October 2003; pp. 127–135.
20. Wagner, D.; Schmalstieg, D. ARToolKit on the PocketPC platform. In Proceedings of the 2003 IEEE
International Conference on Augmented Reality Toolkit Workshop, St. Lambrecht, Austria, 7 October 2004;
pp. 14–15.
21. Wagner, D.; Reitmayr, G.; Mulloni, A.; Drummond, T.; Schmalstieg, D. Pose tracking from natural
features on mobile phones. In Proceedings of the 7th IEEE/ACM International Symposium on Mixed and
Augmented Reality, 2008, ISMAR 2008. Cambridge, UK, 15–18 September 2008; pp. 125–134.
22. Mirowski, P.; Ho, T.K.; Yi, S.; MacDonald, M. SignalSLAM: Simultaneous localization and mapping with
mixed WiFi, Bluetooth, LTE and magnetic signals. In Proceedings of the 2013 International Conference
on Indoor Positioning and Indoor Navigation (IPIN), Montbeliard-Belfort, France, 28–31 October 2013;
pp. 1–10.
23. Luo, X.; O'Brien, W.J.; Julien, C.L. Comparative evaluation of Received Signal-Strength Index (RSSI) based
indoor localization techniques for construction jobsites. Adv. Eng. Inf. 2011, 25, 355–363. [CrossRef]
24. Zhang, Z.Z.Z. Flexible camera calibration by viewing a plane from unknownznorientations. In Proceedings
of the Seventh IEEE International Conference on Computer Vision, 1999, Kerkyra, Greece, 20–27 September
1999; pp. 666–673.
25. Rusinkiewicz, S.; Levoy, M. Efficient variants of the ICP algorithm. In Proceedings of the Third International
Conference on 3-D Digital Imaging and Modeling, Quebec City, QC, Canada, 28 May–1 June 2001;
pp. 145–152.
26. Seo, W.J.; Ok, S.H.; Ahn, J.H.; Kang, S.; Moon, B. An efficient hardware architecture of the A-star algorithm
for the shortest path search engine. In Proceedings of the Fifth International Joint Conference on INC, IMS
and IDC, 25–27 August 2009; pp. 1499–1502.
27. Granberg, A. A* Pathfinding Project. Available online: http://arongranberg.com/astar/ (accessed on 15
September 2015).
28. Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to
image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [CrossRef]
29. Derpanis, K.G. Overview of the RANSAC Algorithm. Image Rochester NY 2010, 4, 2–3.

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open
access article distributed under the terms and conditions of the Creative Commons by
Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

You might also like