Professional Documents
Culture Documents
Electrical Engineering
September 2016
Master Thesis
Electrical Engineering with
emphasis on Signal Processing
September 2016
3D Object Reconstruction
Using XBOX Kinect v2.0
Srikanth Varanasi
Vinay Kanth Devu
Contact Information:
Author(s):
Srikanth Varanasi
E-mail: srva15@student.bth.se
Supervisor:
Irina Gertsovich
University Examiner:
Dr. Sven Johansson
Firstly we would like to thank our thesis advisor Irina Gertsovich for her valuable
guidance, feedback and support throughout our thesis. We are indebted to our
parents, professors, colleagues and friends for their immense support and the help
they have oered during various phases of our thesis.
Srikanth Varanasi
Vinay Kanth Devu
ii
Contents
Abstract i
Acknowledgements ii
1 Introduction 1
1.1 Aims & Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Thesis Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Literature Review 6
3 Theoretical Background 8
3.1 Intrinsic Parameters of the Kinect Sensor . . . . . . . . . . . . . . 8
3.2 Depth Image Enhancement . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Point Cloud Generation . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Point Cloud Alignment & Merging . . . . . . . . . . . . . . . . . 10
4 Implementation 13
4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.1 Data Acquisition & Depth Image Enhancement . . . . . . 15
4.2.2 Point Cloud Generation, Alignment and Merging . . . . . 16
5 Results 18
5.1 Intrinsic Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2 Data Acquisition & Depth Image Enhancement . . . . . . . . . . 19
iii
5.3 Point Cloud Generation . . . . . . . . . . . . . . . . . . . . . . . 26
5.4 Point Cloud Alignment & Merging . . . . . . . . . . . . . . . . . 28
6 Discussion 38
6.1 Validation of Results . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.2 Sources of Error in 3D Model Creation . . . . . . . . . . . . . . . 43
6.3 Advantages and Limitations . . . . . . . . . . . . . . . . . . . . . 47
References 49
iv
List of Figures
3.1 Kinect for Windows V2.0 sensor showing the orientation of the
coordinate system.
Image courtesy: Microsoft Corporation. . . . . . . . . . . . . . . . 9
5.1 A set of depth maps of the scene without the objects of interest. . 19
5.2 A set of depth maps of the scene with the objects of interest in the
initial position(1st ). . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.3 A set of depth maps of the scene with the objects of interest in the
nal position(21st ). . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.4 Averaged depth map of the scene without the objects of interest. . 21
5.5 Enhanced depth map of the scene without the objects of interest. 22
5.6 Averaged depth map of the scene with the objects of interest in
the initial position( 1st ). . . . . . . . . . . . . . . . . . . . . . . . . 23
5.7 Enhanced depth map of the scene with the objects of interest in
the initial position( 1st ). . . . . . . . . . . . . . . . . . . . . . . . . 23
5.8 Subtracted depth map of the scene with the objects of interest in
the 1st position. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
v
5.9 Depth map of the objects of interest in the 1st position. . . . . . . 25
5.10 Depth map of the objects of interest in the 21st position. . . . . . 25
5.11 Point cloud of the scene with the objects of interest in the 1st position. 26
5.12 Point cloud of the objects of interest in the 1st position. . . . . . . 27
5.13 Point cloud of the objects of interest in the 21st position. . . . . . 27
5.14 Point clouds of the objects of interest in the rst and second positions. 28
5.15 Point cloud of the objects of interest in the second position with
the registered point cloud of the rst position with respect to the
second position. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.16 Point clouds of the objects of interest in the 21st and the 20th position. 30
5.17 Point cloud of the objects of interest in the 20th position with the
registered point cloud of the 21st position with respect to the 20th
position. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.18 Merged point cloud of the objects of interest in the second position
with the registered point cloud of the rst position with respect to
the second position. . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.19 Merged point cloud of the objects of interest in the 20th position
with the registered point cloud of the 21st position with respect to
the 20th position. . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.20 Merged point cloud of the objects of interest in the 11st position
obtained from the positions 1 to 10. . . . . . . . . . . . . . . . . . 32
5.21 Merged point cloud of the objects of interest in the 11st position
obtained from the positions 12 to 21. . . . . . . . . . . . . . . . . 32
5.22 Final point cloud of the objects of interest in the 11th position. . . 33
5.23 Final point cloud of the objects of interest in the 11th position. . . 34
5.24 Final point cloud of the objects of interest in the 11th position. . . 34
5.25 Final point cloud of the objects of interest in the 11th position. . . 35
5.26 Depth map of the objects of interest generated based on the nal
point cloud. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
vi
6.5 Front view of the object on a graph paper. . . . . . . . . . . . . . 41
6.6 Front view of the point cloud of the object. . . . . . . . . . . . . . 42
6.7 Colour image of the scene showing the placement of the object of
interest and the Kinect sensor on dierent levels. . . . . . . . . . 43
6.8 Depth map of the object of interest showing the IR reection in
front of the object. . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.9 Two depth maps of the same scene showing the depth variations
over a period of time due to the drift in the temperature. . . . . . 45
6.10 Two depth maps of the same scene showing how the holes at the
edges of the objects vary over time. . . . . . . . . . . . . . . . . . 45
6.11 Colour image and depth map of a scene depicting the intensity
related issues in depths maps. . . . . . . . . . . . . . . . . . . . . 46
vii
List of Tables
1.1 Comparison between XBOX 360 Kinect & XBOX One Kinect . . 4
4.1 The positions of the object of interest and their corresponding angle
of rotation with respect to the reference pointer. . . . . . . . . . . 17
viii
Chapter 1
Introduction
1
Chapter 1. Introduction 2
Single view approaches, on the other hand, depend upon only one viewpoint
for inferring the 3D shape of an object in an environment. They mainly depend
upon information cues such as shading, texture, focus for 3D modelling of the
object [2].
In contrast to these, active 3D imaging systems make use of controlled arti-
cial illumination or other forms of electromagnetic radiation for acquiring the
dense range maps of the environment with minimum ambiguity [2, 3]. The use of
articial illumination makes it easy for acquisition of dense and accurate depth
maps of texture-less objects. Active 3D imaging systems make use of a large
variety of methods for creating an accurate depth map of an environment. Based
on the technique being used, the range of operation and accuracy of the system
vary. Structured Light ( SL) and Time of Flight ( ToF ) are some of the techniques
being used in active 3D imaging systems.
In the systems using structured light for measuring the depth, a sequence of
known patterns of electromagnetic radiation is projected on to the environment.
The patters in the electromagnetic radiation get deformed due to geometry of the
objects in the environment. The distorted patterns are observed using a camera,
and are analysed based on the disparity from the original projected pattern and
intrinsic parameters of the camera to generate the depth maps of the environment.
This system is similar to the binocular stereo vision system in passive 3D imaging,
where one of the cameras is replaced by a projector [2, 3, 5]. Hence this technique
is also called active stereo vision. XBOX 360 Kinect is an active 3D imaging
system that works based on this principle [6]. Figure 1.1 shows an XBOX 360
Kinect with the dierent sensors present in it and their internal locations.
ToF technology is mainly based on measuring time taken by the light emitted
from a source to travel to an object and back to a sensor [1, 2, 5]. The illumination
in most of the cases is considered to be continuous wave since it helps in the delay
estimation. The source of illumination and the sensor are assumed to be at the
same location. Since the distance between the object and the sensor is constant,
speed of light c is nite, the time shift caused in the emitted signal is equivalent to
the phase shift in the received signal. Based on the phase shift( Δϕ) between the
received signal and generated signal, the ToF is calculated, which in turn is used
for generating the depth maps. XBOX One Kinect is an active 3D imaging system
that works based on this principle [6]. Figure 1.2 shows the operating principle
of a ToF sensor, where Δϕ is the phase dierence between the transmitted and
received signal. Figure 1.3 shows a tear down of the XBOX One Kinect sensor
with the locations of the dierent sensors present in it.
Table 1.1: Comparison between XBOX 360 Kinect & XBOX One Kinect
Microsoft Corporation has released two Kinect sensors, namely XBOX 360
Kinect and XBOX One Kinect. XBOX 360 Kinect is also called Kinect for Win-
dows V1.0 sensor, while XBOX One Kinect is also known as Kinect for Windows
V2.0 sensor. Table 1.1 gives a comparison of the changes between both the ver-
sions of the Kinect sensor. It should also be noted that the IR sensor in XBOX
One Kinect is lighting independent, i.e. the sensor used for capturing the IR
data and the depth information is not aected by the amount of lighting in the
environment. The IR sensor of XBOX One Kinect has a three times higher data
delity when compared to that of XBOX 360 Kinect, i.e. the depth maps acquired
from XBOX One Kinect are more clear, noise free and reliable when compared
to that from XBOX 360 Kinect due to the change in depth sensing technology.
Chapter 1. Introduction 5
Acquire sets of depth maps of the environment with the objects of interest
Enhance the acquired depth maps and separate the depths of the object of
interest.
create an accurate 3D point cloud for the object of interest from the en-
hanced depth maps.
Kinect for Windows sensor is one of the most popular sensors being used for 3D
reconstruction of objects in an environment at a low cost. It could also be used
for the 3D reconstruction of a scene in real time, creating a virtual reality, motion
sensing, making a display touch enabled, etc. Most of these applications make
use of the Microsoft Kinect SDK. The main issue with the use of Kinect SDK for
real time reconstruction of a scene is that the hardware requirements to process
such large amounts of data are quite high. Due to these hardware requirements,
one must depend upon other programming tools such as MATLAB, etc.
In [1] the authors gave a thorough account about Time of Flight( ToF ) cam-
eras, their operating principle, calibration and alignment of ToF cameras, dierent
ToF and structured light cameras such as the Kinect for Windows V1.0 and V2.0
sensors.
3D Imaging techniques, creation of 3D objects, their representations, regis-
tration of 3D point clouds in dierent positions, applications of 3D imaging and
analysis in dierent areas of science are discussed by the authors in [2].
In [3, 4], the authors have given an account of dierent computer vision tech-
niques such as image segmentation, feature detection matching and alignment,
structure from motion(SfM ), 3D reconstruction, etc.
In [5], authors have given an objective comparison between the structured light
and ToF technologies for range sensing using a Kinect for Windows sensor. They
also discuss about the dierent error sources faced while using a Kinect sensor and
give a constructive framework for evaluating the performance of Kinect sensors.
In [6], the authors have given a brief account about the Kinect for Windows
sensors, 3D reconstruction, segmentation, matching and recognition, and the dif-
ferent algorithms used for these purposes.
6
Chapter 2. Literature Review 7
8
Chapter 3. Theoretical Background 9
The criterion for detecting outliers is based on a threshold value set based on
the value of MAD. The equation
is the criterion for detecting outliers. If a given sample xi of data set X satises
this condition, then the sample belongs to the data set. If more than 50% of the
data has the same value, then the MAD becomes zero. In such scenarios this
detection technique does not work.
After the outliers in the depth pixel intensities are detected and discarded for
each set X, the remaining samples of depth intensities are averaged to acquire a
value for pixel location (j, k) in the averaged depth map.
A pixel in a depth map is said to be invalid if it does not hold any depth
information, i.e. if the intensity value of that pixel is undened or zero. The
invalid pixels in a depth map are called holes in this work. These holes need to
be lled with valid depth values in order to avoid any holes in the point clouds.
The holes in the depth data are lled using the eight nearest neighbour principle,
which uses a set of intensities of the 8 nearest neighbours of a hole for calculating
its depth value based on the mean of those 8 nearest neighbours. Holes in the
averaged depth maps are lled and stored.
Figure 3.1: Kinect for Windows V2.0 sensor showing the orientation of the coor-
dinate system.
Image courtesy: Microsoft Corporation.
The depth maps of the environment are then converted into 3D point clouds using
the intrinsic parameters of the depth camera of the Kinect sensor, the acquired
depth data and the perspective projection relationship [8]. Each pixel p(j, k) in
Chapter 3. Theoretical Background 10
j − Cx
X= ·Z (3.3)
Fx
k − Cy
Y = ·Z (3.4)
Fy
Here (j, k) is the location of the pixel p in the depth map, ( Cx , Cy ) and (Fx ,
Fy ) are the intrinsic parameters of the depth camera which are the location of the
focal centre and the focal length respectively. Z is the depth value of the pixel
p(j, k) in the depth map.
Here Nd , Nm are the number of points in the point clouds D and M respec-
tively.
Chapter 3. Theoretical Background 11
Nd
EICP (a, D, M ) = E(T (a, D), M ) = ||(Rdu + t) − mv ||2 (3.6)
u=1
where a is the transformation function that best aligns the point cloud D to M,
a = (R, t) where R is the rotation matrix, t is the translation vector, ( du , mv )
are the corresponding points. Fixing du ∈ D, the corresponding point mv ∈ M is
computed such that
1
Nd
C= [du − d][mv − m]T (3.8)
Nd u=1
where d, m are the means formed over the Nd correspondences. Performing SVD
of C, we get
U SV T = C (3.9)
where U,V are two orthogonal matrices and S is a diagonal matrix of singular
values. The rotation matrix R can be estimated from the pair of orthogonal
matrices using the equation [13].
R = V S U T, (3.10)
where
I if det(U)det(V) = 1
S = (3.11)
diag(1, 1, ..., 1, −1) if det(U)det(V) = -1.
Here diag() denotes a diagonal matrix, I denotes identity matrix and det() denotes
the determinant of a given matrix. The translation vector t can be estimated as
t = m − Rd. (3.12)
Chapter 3. Theoretical Background 12
and
Y · Fy
k= + Cy (3.14)
Z
and the depth value of the pixel p is given by
p(j, k) = Z (3.15)
Chapter 4
Implementation
The hardware requirements of this research study are a Microsoft Kinect for Win-
dows V2.0 sensor, a PC that satises the requirements for using Microsoft Kinect
for Windows V2.0 sensor and a turntable. The software requirements for acquir-
ing and processing data from a Kinect sensor are Windows 8.1 operating system,
MATLAB 2016a with image processing toolbox, Kinect for Windows hardware
support package for MATLAB, Kin2 toolbox for MATLAB [9], Microsoft Kinect
SDK V2.0 and Microsoft Visual Studio 2013.
Figure 4.1: Experimental setup without the objects of interest being present.
13
Chapter 4. Implementation 14
4.2 Experiment
In this research study, depth maps of the scene with and without the OOI are
acquired to create a good quality 3D model of the OOI. This includes data ac-
quisition followed by various pre and post processing techniques for obtaining the
nal point cloud of the OOI to an appreciable accuracy.
Table 4.1: The positions of the object of interest and their corresponding angle
of rotation with respect to the reference pointer.
Chapter 5
Results
Kinect SDK
Intrinsic Parameters
Values (pixels)
Fx 365.2946
Focal Length
Fy 365.2946
Cx 259.7606
Focal Centre
Cy 205.8992
2nd
Order 0.0923
Radial Distortion 4th Order -0.2701
6th Order 0.0927
18
Chapter 5. Results 19
Figure 5.1: A set of depth maps of the scene without the objects of interest.
The OOI along with the two non identical objects is then placed on the turn-
table and the distance between the Kinect sensor and imaginary plane-1 is noted
down as D1 . Similarly the distance between the wall and the imaginary plane-2
is noted down as D2 . A set of depth maps and colour maps of the scene with the
OOI are acquired for each position of the scene according to the table 4.1 and
stored.
Chapter 5. Results 20
Figures 5.2, 5.3 show the depth maps and colour maps of the scene in the 1st
and 21st positions respectively.
Figure 5.2: A set of depth maps of the scene with the objects of interest in the
initial position(1st ).
Figure 5.3: A set of depth maps of the scene with the objects of interest in the
nal position(21st ).
It can be observed that the orientation of the sole slightly changes from g.
5.3 to g. 5.2.
Chapter 5. Results 21
The outliers from the acquired depth maps of the scene with and without the
OOI are detected based on the MAD using the equations 3.1 and 3.2 and are then
removed. The sets of depth maps of the scene in each position are then averaged
and the holes in the averaged depth maps are lled using the MATLAB function
"imll". Figures 5.4 and 5.5 show the averaged depth maps and enhanced depth
maps of the scene without the OOI respectively. When compared to the originally
captured depth maps, it could be observed that the averaged depth maps have
considerably less holes, due to the averaging of the set of depth maps over a
period of time. From gure 5.5, we can observe that the holes present on the
surface of the turn-table in gure 5.4, are lled. The black areas inside the circle
in the gure 5.4 show the holes.
Figure 5.4: Averaged depth map of the scene without the objects of interest.
Chapter 5. Results 22
Figure 5.5: Enhanced depth map of the scene without the objects of interest.
Chapter 5. Results 23
Figures 5.6 and 5.7 show the averaged and enhanced depth maps of the scene
with the OOI in the 1st position respectively.
Figure 5.6: Averaged depth map of the scene with the objects of interest in the
initial position(1st ).
Figure 5.7: Enhanced depth map of the scene with the objects of interest in the
initial position(1st ).
Chapter 5. Results 24
The enhanced depth maps of the scene with the OOI in dierent positions
considered are subtracted from that of the scene without the OOI. Based on the
distance from the wall D2 , the pixels with depth values less than D2 are discarded.
For the remaining valid pixels, the corresponding depth values from the enhanced
depth maps of the scene with the OOI are copied into another empty depth map
of the same dimensions. Figure 5.8 shows the subtracted depth map of the scene
with the objects of interest in the initial position. Figures 5.9 and 5.10 show the
retrieved depth maps of the OOI objects of interest from the Kinect sensor in the
1st and 21st positions.
Figure 5.8: Subtracted depth map of the scene with the objects of interest in the
1st position.
Chapter 5. Results 25
Figure 5.9: Depth map of the objects of interest in the 1st position.
Figure 5.10: Depth map of the objects of interest in the 21st position.
Chapter 5. Results 26
Figure 5.11: Point cloud of the scene with the objects of interest in the 1st position.
Chapter 5. Results 27
Figure 5.12: Point cloud of the objects of interest in the 1st position.
Figure 5.13: Point cloud of the objects of interest in the 21st position.
Chapter 5. Results 28
Figure 5.14: Point clouds of the objects of interest in the rst and second posi-
tions.
Chapter 5. Results 29
Figure 5.15: Point cloud of the objects of interest in the second position with the
registered point cloud of the rst position with respect to the second position.
From gures 5.14 and 5.15, it could be observed that the green points that are
slightly misaligned with respect to the pink ones in the gure 5.14 are perfectly
aligned in the gure 5.15.
Chapter 5. Results 30
Similarly gures 5.16 and 5.17 show the point clouds in the 21st position into
the 20th positions before and after ICP algorithm is implemented for aligning
point cloud in the 21st position into the 20th position.
Figure 5.16: Point clouds of the objects of interest in the 21st and the 20th position.
Figure 5.17: Point cloud of the objects of interest in the 20th position with the
registered point cloud of the 21st position with respect to the 20th position.
20th
20th
21 st
20th
Chapter 5. Results 32
This process is continued till all the point clouds before and after the middle
position are aligned and merged into it. Figures 5.20 and 5.21 show the point
clouds generated from aligning and merging of the point clouds before and after
the middle position into it.
Figure 5.20: Merged point cloud of the objects of interest in the 11st position
obtained from the positions 1 to 10.
Figure 5.21: Merged point cloud of the objects of interest in the 11st position
obtained from the positions 12 to 21.
11th
11th
11th
11th
11th
Chapter 5. Results 37
Table 5.2: The ICP registrations and their corresponding RMSE values.
The RMSE of each of the iterations of the ICP algorithm is calculated using
the MATLAB function "pcregrigid". The RMSE values for each of the registra-
tions are shown in the table 5.2. The average root mean square error of all the
registrations is
N
−1
RM SEn
n=1 0.00576958
RM SEavg = = = 0.002885m (5.1)
N −1 20
Chapter 6
Discussion
d= (23 − 4)2 + (−10 − (−10))2 + (13 − 12)2 = 19.02mm. (6.2)
On the surface of the point cloud, we try to locate the nearest matches to
the two points considered earlier on surface of the object. The points cannot be
exactly located as there still exists some loss of resolution. The two points from
the point cloud are P3 (4.163, −9.907, 11, 85) mm and P4 (22.98, −9.955, 13.35) mm
as shown in gure 6.4 and the distance d' between them is calculated as
d = (22.98 − 4.163)2 + (−9.955 − (−9.907))2 + (13.35 − 11.85)2 = 18.876mm.
(6.3)
38
Chapter 6. Discussion 39
From d and d , it could be observed that the variation in the measured distance
is very much negligible. Based on this, it could be said that there is not much
distortion in the point cloud thus generated. From gures 6.2, 6.4 and 6.6, it could
be observed that the depth intensities of the points in the point cloud present
around the edges of the OOI are quite erroneous due to the presence of ying
pixels in the data.
Similarly, two points P5 (55, −14, 15) mm and P6 (65, −9, 22) mm on the real
object are considered with a variation in the depth as shown in gure 6.5, and
the distance d1 between them is calculated.
d1 = (65 − 55)2 + (−9 + 14)2 + (22 − 15)2 = 13.19mm. (6.4)
Nearest matches to these two 3D points are then located on the point cloud of
the object as P7 (55.22, −14.21, 17) mm and P8 (65.27, −9.02, 22.18) mm and the
distance d1 between them is calculated.
d1 = (65.27 − 55.22)2 + (−9.02 + 14.21)2 + (22.18 − 17)2 = 12.44mm. (6.5)
From d1 and d1 , it could be observed that the dierence between them is negligible.
Based on this, it could be concluded that the point cloud of the object of interest
is generated to a good accuracy.
Chapter 6. Discussion 43
Figure 6.7: Colour image of the scene showing the placement of the object of
interest and the Kinect sensor on dierent levels.
Figure 6.8: Depth map of the object of interest showing the IR reection in front
of the object.
Chapter 6. Discussion 44
The minimum distance between the Kinect and the OOI for the OOI to be
recognized by the Kinect depth sensor is 0.5 meters, hence we consider an optimal
distance of approximately 0.85 meters between the Kinect and the OOI. If the
OOI is considered nearer or too farther than that of the optimal distance(0.8 to
1.2 meters), the variance of the depth intensities over a period of time increases,
in turn leading to an increase in the RMSE.
Two non identical objects are considered along with the main OOI to simplify
the detection of points used for the alignment of point clouds in two dierent
positions using the ICP algorithm.
In order to remove the noise created during the conversion of a depth map
into a point cloud, we consider two imaginary planes IP1 and IP2 parallel to the
face of the Kinect sensor such that plane IP1 is in front of the object's surface and
the plane IP2 behind the object's surface as shown in gure 4.2. The imaginary
planes IP1 and IP2 also facilitate isolation of the depths of the OOI from the
background.
The Kinect for Windows V2.0 depth sensor is vulnerable to a various kinds
errors [5, 8]. The main sources of errors in the acquired depth data are due to
Temperature drift
Depth inhomogeneity
Unlike the depth sensors working based on the structured light principle, the
time of ight depth sensors produce a lot of heat. During the warm up cycle,
it could be observed that the acquired depth values from the Kinect sensor vary
signicantly over a period of time as shown in gure 6.9. This variation of depth
during the warm up cycle can be avoided by allowing the Kinect sensor to heat
up for at least twenty minutes before initializing the data acquisition [5, 8]. From
gure 6.9, it could be observed that the depth intensities , i.e. the brightness of
the depth map, vary signicantly from one depth map to another.
Chapter 6. Discussion 45
Figure 6.9: Two depth maps of the same scene showing the depth variations over
a period of time due to the drift in the temperature.
Figure 6.10: Two depth maps of the same scene showing how the holes at the
edges of the objects vary over time.
At the object boundaries, the depth values become erroneous due to the su-
perimposing of the signals reecting from surfaces with dierent depths causing
the depth values of such pixels to vary signicantly over a period of time. Hence
such pixels are called ying or mixed pixels [5]. These can be eliminated using
a proper outlier detection algorithm. From gure 6.10, it could be observed that
the pixel depths of certain pixels around the edges of the objects in scene vary
over a period of time, i.e. some of them become valid while some of them become
holes in the depth maps.
Chapter 6. Discussion 46
Figure 6.11: Colour image and depth map of a scene depicting the intensity
related issues in depths maps.
The intensities recorded by the Kinect depth sensor for highly reective sur-
faces or very dark surfaces are particularly low. Thus, the corresponding depth
values are larger than expected. This mainly occurs due to the absorption of Near
IR radiation by the dark coloured surfaces. When the reectivity of a surface is
very high, it might result in multipath reection of the signal, creating an increase
in the time of ight, leading to an increase in the recorded depth for that pixel
[5, 8]. From gure 6.11, it could be observed that even though the sole of the shoe
is dark, the depth intensities of the sole are consistent with the rest of the depth
map where as the depth intensities of the top half of the pen are not consistent
with that of its other half. This is because of the high reectivity of the surface
of the top half of pen.
Chapter 6. Discussion 47
Based on the results, it could be concluded that the 3D point cloud of an irregular
surfaced object can be reconstructed using Kinect for Windows V2.0 sensor in an
indoor environment. The average RMSE of the reconstructed sole of the shoe is
0.002885 m. This system could be eectively used for the 3D reconstruction in
forensics, for preserving historic artefacts, etc.
48
References
[6] L. Shao, J. Han, P. Kohli, and Z. Zhang, Computer Vision and Machine
Learning with RGB-D Sensors . Advances in Computer Vision and Pattern
MATLAB,
[10] Oracle crystal ball reference and examples guide - outlier detection meth-
ods.
49
References 50
[15] Register two point clouds using ICP algorithm - MATLAB pcregrigid -
MathWorks india.
[16] Merge two 3-d point clouds - MATLAB pcmerge - MathWorks india.