You are on page 1of 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/321976405

Integrating 3D Reconstruction and Virtual Reality: A New Approach for


Immersive Teleoperation

Chapter · January 2018


DOI: 10.1007/978-3-319-70836-2_50

CITATIONS READS
4 3,036

5 authors, including:

Javier Fdez Mario Garzon


Cross-Compass, Ltd. Delft University of Technology
5 PUBLICATIONS   10 CITATIONS    46 PUBLICATIONS   460 CITATIONS   

SEE PROFILE SEE PROFILE

Juan Jesús Roldán Gómez Antonio Barrientos


Universidad Autónoma de Madrid Universidad Politécnica de Madrid
44 PUBLICATIONS   625 CITATIONS    235 PUBLICATIONS   2,726 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

ROCCO: Robot Assembly System for Computer Integrated Construcción (EU ESPRIT III 6450) View project

EEG-Based Emotion Recognition View project

All content following this page was uploaded by Juan Jesús Roldán Gómez on 10 January 2019.

The user has requested enhancement of the downloaded file.


Integrating 3D reconstruction and virtual
reality: a new approach for immersive
teleoperation

Francisco Navarro, Javier Fdez, Mario Garzón, Juan Jesús Roldán, and
Antonio Barrientos

Centro De Automática y Robótica, UPM-CSIC,


Calle José Gutiérrez Abascal, 2. 28006 Madrid, Spain
francisco.navarro.merino@alumnos.upm.es,javier.ferfernandez@alumnos.upm.
es,ma.garzon@upm.es,jj.roldan@upm.es,antonio.barrientos@upm.es

Abstract. The current state of technology permits very accurate 3D


reconstructions of real scenes acquiring information through quite differ-
ent sensors altogether. A high precision modelling that allows simulat-
ing any element of the environment on virtual interfaces has also been
achieved. This paper illustrates a methodology to correctly model a 3D
reconstructed scene, with either a camera RGB-D or a laser, and how to
integrate and display it in virtual reality environments based on Unity, as
well as a comparison between both results. The main interest regarding
this line of research consists in the automation of all the process from the
map generation to its visualisation with the VR glasses, although this
first approach only managed to get results using several programs man-
ually. The long-term objective would be indeed a real-time immersion in
Unity interacting with the scene seen by the camera.

Keywords: Robotics, 3D Reconstruction, Virtual Reality, Immersive


Teleoperation

1 Introduction
The promise of Virtual Reality (VR) lies mainly in its ability to transport a
user into a totally different world and make him feel that he is immersed in it.
This immersion generally implies a total separation from the real world, meaning
that the virtual environment rarely, if ever, represents any real-world location.
In contrast with this, several 3D Reconstruction techniques are being developed
in order to create models of the real world, and also in general terms, those
reconstructions rarely include fictional or virtual elements.
The motivation of this work is to combine the benefits of both techniques, and
therefore, create a VR world from a reconstruction of a real location. With this
solution, it will be possible to immerse a user in a virtual representation of a real
location. For example, it would be possible to see and walk in an remote location,
and if this technology is combined with teleoperation of mobile manipulators,
actually interact with an unaccesible, dangerous or remote location.
2 F. Navarro, J. Fdez, M. Garzón, J.J. Roldán, A. Barrientos

1.1 Objectives

In order to achieve the goal of integrating 3D reconstructions into virtual


worlds, the following partial objectives have been defined.

– Reconstruct 3D maps of a location using with two different devices and


techniques.
– Create a mesh of the reconstruction.
– Contrast the real environment with the virtual maps.
– Integrate the reconstruction in a VR interface.

1.2 Related works

Nowadays, there are three different immersive technologies. First of all, the
augmented reality (AR), which overlaps virtual elements on the videos or images
of real scenes. Secondly, the virtual reality (VR), which is based on virtual models
of real or imaginary scenarios. The third technology is a combination of the first
two, which is known as mixed reality (MR). Virtual reality is the option in which
this paper has been focused.
On the other hand, in robotics and other fields like medicine [1] 3D recon-
struction is used as a process of capturing the shape and appearance of real
objects. There are several devices, such as a lidar, RGB-D cameras or sonars
that can display this function. In addition, there are also several programs or
libraries that filter and reconstruct the point clouds that those devices generate.
Both 3D reconstruction and VR have been subject of study for many years
and there are hundreds of papers published on these topics separately. Never-
theless, the number of publications that connect both subjects is very low. Some
of the existing research work are listed next.
A successful integration of 3D reconstructions of antique objects in a virtual
museum, using a 3D scanner and a high-resolution camera was accomplished
by the University of Calabria [2]. Additionally, Stanford correctly managed to
import a mesh model of a reconstructed scene in Unity using the Kinect One [3].
Lastly, York University succeeded at modelling a 2D mapped trajectory, using a
laser along with wheel odometry, and display it in Unity using a ROS connection
based on Rosbridge [4].

1.3 Overview of the system

Figure 1 illustrates a general view of the process and the main devices and
programs used along with this research. The environment information is acquired
by two different devices and it is processed using diverse software tools in order
to obtain a mesh of the scene. Then, these meshes are imported into Unity and
visualised with the virtual reality glasses.
Integrating 3D reconstruction and virtual reality 3

Fig. 1: Overview. This figure illustrates a general view of the process and the
main devices and programs used along with this research.

1.4 Main contributions


This research encompasses different ideas from previous works, but it also
proposes several novel contributions, listed next, that distinguish this research
form previous ones.

– It includes an interface and recreates a real world.


– It compares two different 3D reconstructions and their results.
– It uses ROS, allowing to use this same method in real time just by doing a
socket between ROS and Unity.

2 3D Map generation
This section presents the necessary steps to accomplish a 3D representation
of a real world scenario from two different methods. The first one processes the
information of a RGB-D camera at real-time while the second one creates the
reconstruction after the data acquisition from a 2D laser scanner.

2.1 Real-time 3D reconstruction


This technique uses a Microsoft´s Kinect One (v2) RGB-D camera in order
to acquire enough data from the environment. This device presents an RGB
camera to get colour information as well as a depth sensor, with an effective
range of 40-450 cm, to obtain distance estimation from the scene.
Once the environment data is available, colour and depth information are
processed with using the RTAB-map software [5], which uses the OpenCV, QT
and PCL libraries. One of the main advantages of this method is the creation
of a real-time point cloud, mapping the scene as the camera is moving through
the scenario.
Figure 2 shows the final point cloud after the desired scene is completely
mapped manipulating this tool and the camera. Additionally, the odometry of
4 F. Navarro, J. Fdez, M. Garzón, J.J. Roldán, A. Barrientos

Fig. 2: Kinect Point Cloud. This figure shows the final point cloud after the
desired scene is completely mapped manipulating this tool and the camera.

the camera has been overlapped in the figure for the purpose of illustrating the
mapping trajectory.

2.2 Off-line 3D reconstruction

The second technique, is based on the use of an Hokuyo 2D laser scanner.


This sensor measures the distance to a target by illuminating it with a pulsed
laser light, and measuring the reflected pulses with a sensor. Differences in laser
return phases and wavelengths can be used to make digital 3D-representations
of the target.
Unlike the RGB-D camera, the map reconstruction is not done in real time.
Since the Lidar can only provide distance measurements in a plane, a rotating
mechanism was designed, it rotates the scanner on its x axis in order to get 3D
information. The data is recorded and stored in a rosbag. Then, it is transformed
to point clouds and stored in PDC files. Those point clouds can be grouped into
one large 3D reconstruction using the libpointmatcher [6] library, that uses an
Iterative Closest Point (ICP) algorthim to estimate the correct transformation
between one point cloud and the next. Finally, the result is visualized, filtered
and post-processed with Paraview software [7], as shown in Figure 3.

3 Mesh creation

Since the Unity VR engine does not support point clouds or PCD formats,
it is compulsory to use one that allows a correct integration. The mesh format
is the option selected in this work, a mesh is a collection of vertices, edges and
Integrating 3D reconstruction and virtual reality 5

Fig. 3: Lidar Point Cloud. This figure shows a view of the reconstructed scene
with the Lidar.

faces that defines a 3D model. In this research, a polygonal mesh, whose faces
are triangles, will be used. The mesh format was chosen not only because it is
compatible with Unity, but also due to the fact that a solid 3D mesh contains
more useful information than a point cloud. In short, a discrete 3D reconstruction
is converted into a continuous 3D model.
Before creating the mesh, a post-processing phase is applied to the map. It
consists on applying noise reduction and smoothing filters in order to obtain a
more realistic reconstruction. The filtering can be applied to both on-line and
off-line maps, and it is based on the use of the Point Cloud Library [8].
Regarding the camera reconstruction, the obtained point cloud presents sev-
eral local and global flaws, which result in a non-consistent 3D map. Firstly, to
reduce these defects, RANSAC algorithm is applied in order to reject outliers.
Once the noisy points are removed, more loop closures are detected along with
the Iterative Closest Point (ICP) algorithm. This step increases the correct iden-
tification of features, avoiding mapping an already visited area twice, as well as
improving the alignment of object edges. Figure 4 presents the improved point
cloud with fairly corrected edges and objects.
The main parameters used for the organised meshing are the size and the
angle tolerance of the faces, which are, in this case, triangles. The angle tolerance
was set to 15 for checking whether or not an edge is occluded whereas the faces
size was 2 pixels. It should be pointed out that smaller triangles mean higher
resolution of the resulting mesh.
One of the main concerns during the meshing is the inclusion of textures,
being this step essential for a correct colour display in Unity. So as to accomplish
a complete integration, the triangulated point cloud along with the cameras
used are exported from RTAB-map to MeshLab [9] for a further post-processing
adding the corresponding textures. The final camera mesh is shown in Figure 5.
On the other hand, the Lidar mesh creation consists in two steps. First,
a filter to the point cloud, which reduces the number of points. Then the 3D
triangulation, which is speeded up by the previous filter.
In the first step, the libpointmatcher library [6] was used, which is a modular
library that implements the ICP algorithm for aligning point clouds. Addition-
ally, it has been used the filter Voxel Grid Filter which down-samples the data
6 F. Navarro, J. Fdez, M. Garzón, J.J. Roldán, A. Barrientos

Fig. 4: Post-processed map. Point cloud with corrected edges and objects.

Fig. 5: Resulting Mesh. Final mesh form RGB-D camera map.

by taking a spatial average of the points in the cloud. The sub-sampling rate is
adjusted by setting the voxel size along each dimension. The set of points, which
lie within the bounds of a voxel, are assigned to that voxel and will be combined
into one output point.
There are two options in order to represent the distribution of points in
a voxel by a single point. In the first, the centroid or spatial average of the
point distribution is chosen. In the second, the geometrical centre of the voxel
is taken. For this point cloud, it has been chosen the first one for its higher
accuracy. Figure 6a shows the model after the triangulation.
The triangulation has been done with the Point Cloud Library (PCL). This
library includes a function called Greedy Projection Triangulation that creates
the Polygon File Format document. After the triangulation, with the program
MeshLab the files are converted into DAE or OBJ format so that the model could
Integrating 3D reconstruction and virtual reality 7

b imported into Unity (as it only accepts certain formats). Figure 6 presents the
triangulation and the mesh that has been finally imported into Unity.

(a) Triangulation

(b) Mesh

Fig. 6: Lidar mesh creation. 6a shows the model after the triangulation. 6b
presents the mesh that has been finally imported into Unity

3.1 Integration with unity

For the development of the virtual reality interface, the program selected has
been Unity and, in order to visualise the 3D reconstruction, it has been used the
HTC Vive glasses and the program Steam VR. Unity is a cross-platform game
engine which is primarily used to develop games and simulations for computers,
consoles and mobile devices. The package HTC Vive includes an HTC Vive
headset, two controllers with buttons and two base stations that can locate
objects inside the wok area. Last, Steam VR is a complement for Unity required
to compile and run programs in virtual reality.
In order to see the meshes into Unity, it was necessary to have a mesh and its
textures. These textures are necessary so as to visualise the mesh with colours.
Once the model is imported into Unity, it is possible to create scripts and interact
with it as it can be done with other objects.
8 F. Navarro, J. Fdez, M. Garzón, J.J. Roldán, A. Barrientos

The scale and localization of the 3D map can be modified in order to better
reflect the real world, or to allow a better immersion. Also, new elements, real
or virtual can be added to the scene, and it is possible to interact with them
using the Steam VR capabilities.

4 Experiments and Results

This section presents the experiments developed in order to test the proposed
system. Two different scenarios, one for each of the mapping techniques have
been used. Their description and the results of the experiments are described
next.

4.1 Test scenarios

The first scenario, where there RGB-D camera was used, is an indoors lab-
oratory of the Department of Automatics at Technical University of Madrid
(UPM). This room is equipped with sensors for VR, what makes it interest-
ing for an immersion at the same place of the reconstruction. In addition, this
immersion could ensure an accurate validation of the dimensions and distances
mapped of the room to assess how faithful it is and its quality.
The second scenario, used with the laser reconstruction, took place in an
olive field located in the Spanish province of Toledo. Unlike the previous case,
this experiment was done in an outdoor environment so that it was possible to
get other results and compare them. It is important to mention that the test was
done rotating a base connected to the Lidar in order to be able to reconstruct
360◦ .

4.2 Analysis and Results

One of the drawbacks regarding of the Kinect RGB-D mapping is its inability
to reliably operate outdoors, the reason for this is that the sensor can not deal
with direct and intense lightning. This factor limited the research and comparison
between both methods, being the camera reconstruction obliged to take place
indoors.
Focusing on the final mesh obtained, which is shown in Figure 7b, compared
to the real world, shown Figure 7a, the global outcome is considered satisfying.
The vast majority of its objects are easily identified, and they have adequate
shapes and sizes. However, although the overall result from this RTAB-Map re-
construction is very good and usable, it presents some local defects that occurred
during the mapping stage. For example, most objects on the table are not cor-
rectly reconstructed because they were not on the camera field of vision. Moving
on to the Unity integration and immersion, as a consequence of reconstructing
the same room containing the VR equipment, the process of calibrating and
validating the mesh was straightforward.
Integrating 3D reconstruction and virtual reality 9

(a) Real World

(b) Virtual World

Fig. 7: Kinect 3D reconstruction in Unity. 7a presents the real room and


one of the experiments. 7b illustrates the view of each eye of the virtual glasses.

In the second scenario, different results and conclusions were found. First of
all, as shown in Figure 8, there is an area in the middle of the point cloud that
does not have any points. That is caused due to the structure that rotates the
lidar, which can not reach the full range and consequently leaves an area without
information. Furthermore, even tough this is not relevant from a visual point of
view, it causes many problems when creating the mesh reconstruction, because
of the discontinuities found.
Another important result of the use of lidar is that the quality of the virtual
map is worse than other created with other tools. This results from the fact
that the point cloud generated by the lidar does not have colour. Therefore, it is
not possible to import any textures in the virtual interface. However, the main
advantage is that it is able to reconstruct outdoor worlds, what makes that tool
10 F. Navarro, J. Fdez, M. Garzón, J.J. Roldán, A. Barrientos

really useful. Figure 8 displays the visualization of the mesh reconstructed from
the Lidar.

Fig. 8: Lidar 3D reconstruction in Unity. This figure displays the visualiza-


tion of the mesh reconstructed from the Lidar.

Finally, it is important to mention that there are two main difficulties in


the developing of the complete process. Firstly, the generation of the textures
in MeshLab and the point cloud triangulation imply a lot of time consumption.
Secondly, a large part of the process is manual which could hamper the future
automation of the method.

4.3 Comparison

The main differences observed objectively between both methods rely heavily
on two different items, the sensor used and the scene mapped.
Regarding the sensor, the results with the camera are of a higher quality
due to the fact of colour information acquisition, along with the infrared depth
sensor. However, the laser is limited in this kind of reconstruction having to
rotate to correctly map the environment. Lastly, the field of view of the laser is
much longer and wider, with a range of 30 m and an field of view of more than
180 deg.
Referring to the scene reconstructed, the real-time mapping was done indoors
while the post-created reconstruction was achieved outdoors. Here the longer
range of the laser scanner, and its ability of handling sun light and other exterior
conditions, make it a better solution for the reconstruction of the scene, since
the RGB-D camera suffers in adverse outdoors conditions.
Integrating 3D reconstruction and virtual reality 11

5 Conclusions
This work correctly integrated 3D reconstructions into virtual worlds. For
this purpose, two maps were reconstructed using two different devices: Camera
RGB-D and Lidar. Both of them worked in a different way since the first one
managed to reconstruct with colour an indoor environment, whereas the second
could not. However, the laser worked satisfactorily in outdoor scenes.
Regarding the mesh creation, it turned out to be an adequate approach
not only because of a valid format for its integration into Unity but also for
improving the overall map resulting in a more accurate, continuous and faithful
3D representation.
Another necessary task was to contrast the virtual and real world because it
was essential to know if it was possible to identify the elements from the virtual
sight. Fortunately, even if there were parts that were not positioned exactly in
their pretended place, the virtual map allowed us to distinguish and recognise
the objects. It is important to mention the fact that this occurred in both maps.
Finally, future works will continue this line of research, trying to improve
the reconstruction of the map and the performance of the process. Furthermore,
next efforts will be focused on the reconstruction and integration in Unity in real
time since the long-term objective is to be able to reconstruct an environment
while the robot is tracking a path, allowing this aim to see the elements of the
robot and interact with them from the distance thanks to the virtual reality.

6 Acknowledgments
This work was partially supported by the Robotics and Cybernetics Group
at Universidad Politécnica de Madrid (Spain), and it was funded under the
projects: PRIC (Proteccin Robotizada de Infraestructuras Crticas; DPI2014-
56985-R), sponsored by the Spanish Ministry of Economy and Competitiveness
and RoboCity2030-III-CM (Robtica aplicada a la mejora de la calidad de vida de
los ciudadanos. fase III; S2013/MIT-2748), funded by Programas de Actividades
I+D en la Comunidad de Madrid and cofunded by Structural Founds of the EU.

References
1. Vu. Cong and Hq. Linh. 3d medical image reconstruction. Biomedical Engineering
Department Faculty of Applied Science, HCMC University of Technology, pages 1 –
5, 2002.
2. F Bruno, S Bruno, G De Sensi, M Luchi, S Mancuso, and M Muzzupappa. From 3d
reconstruction to virtual reality: A complete methodology for digital archaeological
exhibition. Journal of Cultural Heritage, 11:42 – 49, 2010.
3. J Cazamias and A Raj. Virtualized reality using depth camera point clouds. Stanford
EE 267: Virtual Reality, Course Report, 2016.
4. R Codd-Downey, P Forooshani, A Speers, H Wang, and Jenkin M. From ros to unity:
Leveraging robot and virtual environment middleware for immersive teleoperation.
2014 IEEE International Conference on Information and Automation, ICIA 2014,
pages 932 – 936, 2014.
12 F. Navarro, J. Fdez, M. Garzón, J.J. Roldán, A. Barrientos

5. M Labb and F. Michaud. Online global loop closure detection for large-scale multi-
session graph-based slam. IEEE International Conference on Intelligent Robots and
Systems, pages 2661 – 2666, 2014.
6. François Pomerleau, Francis Colas, Roland Siegwart, and Stéphane Magnenat. Com-
paring ICP Variants on Real-World Data Sets. Autonomous Robots, 34(3):133–148,
February 2013.
7. James Ahrnes, Berk Geveci, and Charles Law. Paraview: An end-user tool for
large-data visualization. In Charles D. Hansen and Chris R. Johnson, editors, Vi-
sualization Handbook, pages 717 – 731. Butterworth-Heinemann, Burlington, 2005.
8. R. Rusu and S. Cousins. 3d is here: point cloud library. IEEE International Con-
ference on Robotics and Automation, pages 1 – 4, 2011.
9. P. Cignoni, M. Callieri, M. Corsini, M. Dellepiane, F. Ganovelli, and G. Ranzuglia.
Meshlab: an open-source mesh processing tool. Sixth Eurographics Italian Chapter
Conference, pages 129 – 136, 2008.

View publication stats

You might also like