Scan 3D Objects With Kinect

Master Thesis
Electrical Engineering
September 2016
Master Thesis
Electrical Engineering with
emphasis on Signal Processing
September 2016
3D Object Reconstruction
Using XBOX Kinect v2.0
Srikanth Varanasi
Vinay Kanth Devu
Department of Applied Signal Processing

Blekinge Institute of Technology
SE371 79 Karlskrona, Sweden
This thesis is submitted to the Department of Applied Signal Processing at Blekinge Institute
of Technology in partial fullment of the requirements for the degree of Master of Science in
Electrical Engineering with Emphasis on Signal Processing.
Contact Information:
Author(s):
Srikanth Varanasi
E-mail: srva15@student.bth.se
Vinay Kanth Devu

E-mail: vide15@student.bth.se
Supervisor:
Irina Gertsovich
University Examiner:
Dr. Sven Johansson
Department of Applied Signal Processing Internet : www.bth.se

Blekinge Institute of Technology Phone : +46 455 38 50 00
SE371 79 Karlskrona, Sweden Fax : +46 455 38 50 57
Abstract
Three dimensional image processing and analysis, particularly, the

imaging and reconstruction of an environment in three dimensions has
received signicant attention, interest and concern during the recent
years. In the light of this background, this research study intends to
provide an ecient way to reconstruct an irregular surfaced object, for
example "the sole of a shoe", with good precision at a low cost using
XBOX Kinect V2.0 sensor. Three dimensional reconstruction can be
achieved either by using active or passive methods. Active methods
make use of light source such as lasers or infra-red emitters for scanning
a given environment and measuring the depth, to create a depth map.
In contrast, in passive methods, colour images of the environment in
dierent perspectives are used to create a three dimensional model
of the environment. In this study, an active method using a set of
depth maps of the object of interest is implemented, where the object
of interest is represented by a sole of a shoe. Firstly, a set of depth
maps of the object of interest are acquired in dierent perspectives.
The acquired depth maps are rst pre-processed for removing any
outliers in the data acquired and are then enhanced. Enhanced depth
maps are converted into 3D point clouds using the intrinsic parameters
of the Kinect sensor. These obtained point clouds are subsequently
registered into a single position using the Iterative Closest Point( ICP )
algorithm. Aligned point clouds of the object of interest are then
merged to form a single dense point cloud of the object of interest.
Analysis of the generated single dense point cloud has shown that
accurate 3D reconstruction of the real object of the interest has been
achieved.
Keywords: XBOX Kinect, 3D Reconstruction, ICP.

Acknowledgements
Firstly we would like to thank our thesis advisor Irina Gertsovich for her valuable
guidance, feedback and support throughout our thesis. We are indebted to our
parents, professors, colleagues and friends for their immense support and the help
they have oered during various phases of our thesis.
Srikanth Varanasi
Vinay Kanth Devu
ii
Contents
Abstract i
Acknowledgements ii
1 Introduction 1
1.1 Aims & Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Thesis Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Literature Review 6
3 Theoretical Background 8
3.1 Intrinsic Parameters of the Kinect Sensor . . . . . . . . . . . . . . 8
3.2 Depth Image Enhancement . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Point Cloud Generation . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Point Cloud Alignment & Merging . . . . . . . . . . . . . . . . . 10
4 Implementation 13
4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.1 Data Acquisition & Depth Image Enhancement . . . . . . 15
4.2.2 Point Cloud Generation, Alignment and Merging . . . . . 16
5 Results 18
5.1 Intrinsic Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2 Data Acquisition & Depth Image Enhancement . . . . . . . . . . 19
iii
5.3 Point Cloud Generation . . . . . . . . . . . . . . . . . . . . . . . 26
5.4 Point Cloud Alignment & Merging . . . . . . . . . . . . . . . . . 28
6 Discussion 38
6.1 Validation of Results . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.2 Sources of Error in 3D Model Creation . . . . . . . . . . . . . . . 43
6.3 Advantages and Limitations . . . . . . . . . . . . . . . . . . . . . 47
7 Conclusions and Future Scope 48

7.1 Further Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
References 49
iv
List of Figures
1.1 XBOX 360 Kinect showing its internal sensors.

Image courtesy: Microsoft Corporation. . . . . . . . . . . . . . . . 2
1.2 Active 3D imaging system using time of ight technology.
Image courtesy: [1] . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 XBOX One Kinect showing internal sensors.
Image courtesy: iFixit.com . . . . . . . . . . . . . . . . . . . . . . 4
3.1 Kinect for Windows V2.0 sensor showing the orientation of the
coordinate system.
Image courtesy: Microsoft Corporation. . . . . . . . . . . . . . . . 9
4.1 Experimental setup without the objects of interest being present. 13

4.2 Experimental setup with the objects of interest. . . . . . . . . . . 14
5.1 A set of depth maps of the scene without the objects of interest. . 19
5.2 A set of depth maps of the scene with the objects of interest in the
initial position(1st ). . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.3 A set of depth maps of the scene with the objects of interest in the
nal position(21st ). . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.4 Averaged depth map of the scene without the objects of interest. . 21
5.5 Enhanced depth map of the scene without the objects of interest. 22
5.6 Averaged depth map of the scene with the objects of interest in
the initial position( 1st ). . . . . . . . . . . . . . . . . . . . . . . . . 23
5.7 Enhanced depth map of the scene with the objects of interest in
the initial position( 1st ). . . . . . . . . . . . . . . . . . . . . . . . . 23
5.8 Subtracted depth map of the scene with the objects of interest in
the 1st position. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
v
5.9 Depth map of the objects of interest in the 1st position. . . . . . . 25
5.10 Depth map of the objects of interest in the 21st position. . . . . . 25
5.11 Point cloud of the scene with the objects of interest in the 1st position. 26
5.12 Point cloud of the objects of interest in the 1st position. . . . . . . 27
5.13 Point cloud of the objects of interest in the 21st position. . . . . . 27
5.14 Point clouds of the objects of interest in the rst and second positions. 28
5.15 Point cloud of the objects of interest in the second position with
the registered point cloud of the rst position with respect to the
second position. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.16 Point clouds of the objects of interest in the 21st and the 20th position. 30
5.17 Point cloud of the objects of interest in the 20th position with the
registered point cloud of the 21st position with respect to the 20th
position. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.18 Merged point cloud of the objects of interest in the second position
with the registered point cloud of the rst position with respect to
the second position. . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.19 Merged point cloud of the objects of interest in the 20th position
with the registered point cloud of the 21st position with respect to
the 20th position. . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.20 Merged point cloud of the objects of interest in the 11st position
obtained from the positions 1 to 10. . . . . . . . . . . . . . . . . . 32
5.21 Merged point cloud of the objects of interest in the 11st position
obtained from the positions 12 to 21. . . . . . . . . . . . . . . . . 32
5.22 Final point cloud of the objects of interest in the 11th position. . . 33
5.26 Depth map of the objects of interest generated based on the nal
point cloud. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.1 A top view of the object on a graph paper. . . . . . . . . . . . . . 39

6.2 A top view of the point cloud of the object. . . . . . . . . . . . . 40
6.3 A scene containing the object on the graph paper. . . . . . . . . . 40
6.4 A point cloud of the object. . . . . . . . . . . . . . . . . . . . . . 41
vi
6.5 Front view of the object on a graph paper. . . . . . . . . . . . . . 41
6.6 Front view of the point cloud of the object. . . . . . . . . . . . . . 42
6.7 Colour image of the scene showing the placement of the object of
interest and the Kinect sensor on dierent levels. . . . . . . . . . 43
6.8 Depth map of the object of interest showing the IR reection in
front of the object. . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.9 Two depth maps of the same scene showing the depth variations
over a period of time due to the drift in the temperature. . . . . . 45
6.10 Two depth maps of the same scene showing how the holes at the
edges of the objects vary over time. . . . . . . . . . . . . . . . . . 45
6.11 Colour image and depth map of a scene depicting the intensity
related issues in depths maps. . . . . . . . . . . . . . . . . . . . . 46
vii
List of Tables
1.1 Comparison between XBOX 360 Kinect & XBOX One Kinect . . 4
4.1 The positions of the object of interest and their corresponding angle
of rotation with respect to the reference pointer. . . . . . . . . . . 17
5.1 Intrinsic parameters of the Kinect for Windows sensor. . . . . . . 18

5.2 The ICP registrations and their corresponding RMSE values. . . . 37
viii
Chapter 1
Introduction
Computer vision is an interdisciplinary eld that helps us in gaining high-level

understanding of digital images or videos. In the last few decades computer vision
has grown into a very broad and diverse eld. Particularly, the concepts of three
dimensional(3D ) imaging and reconstruction have been of much concern in the
past few years. 3D data acquisition can be achieved mainly in two ways [2, 3].
They are:
Passive 3D Imaging
Active 3D Imaging
A passive 3D imaging system is a system that does not project its own source of
light or other form of electromagnetic radiation, for capturing the 3D information
of a scenario. A passive imaging system relies on the ambient-lit images i.e. colour
images of the scene in dierent perspectives to generate a 3D model [2]. Passive
3D imaging can be achieved in two ways, namely multiple view and single view
approaches.
Multiple view approaches make use of more than one viewpoint, either by
using multiple cameras at the same time or by using a single moving camera
at dierent times. The systems which use more than one camera at a time
for capturing multiple images of the scene are called stereo vision systems and
collectively all the cameras are referred to as a stereo camera [2]. These methods
make use of triangulation for locating a same point in the environment in more
than one of the images to determine the points 3D position. In contrast to stereo
imaging, a single camera moving over a period of time and capturing the scene
in dierent perspectives, for creating a 3D model, is called structure from motion
(SfM ) [2, 4].
1
Chapter 1. Introduction 2
Single view approaches, on the other hand, depend upon only one viewpoint
for inferring the 3D shape of an object in an environment. They mainly depend
upon information cues such as shading, texture, focus for 3D modelling of the
object [2].
In contrast to these, active 3D imaging systems make use of controlled arti-
cial illumination or other forms of electromagnetic radiation for acquiring the
dense range maps of the environment with minimum ambiguity [2, 3]. The use of
articial illumination makes it easy for acquisition of dense and accurate depth
maps of texture-less objects. Active 3D imaging systems make use of a large
variety of methods for creating an accurate depth map of an environment. Based
on the technique being used, the range of operation and accuracy of the system
vary. Structured Light ( SL) and Time of Flight ( ToF ) are some of the techniques
being used in active 3D imaging systems.
In the systems using structured light for measuring the depth, a sequence of
known patterns of electromagnetic radiation is projected on to the environment.
The patters in the electromagnetic radiation get deformed due to geometry of the
objects in the environment. The distorted patterns are observed using a camera,
and are analysed based on the disparity from the original projected pattern and
intrinsic parameters of the camera to generate the depth maps of the environment.
This system is similar to the binocular stereo vision system in passive 3D imaging,
where one of the cameras is replaced by a projector [2, 3, 5]. Hence this technique
is also called active stereo vision. XBOX 360 Kinect is an active 3D imaging
system that works based on this principle [6]. Figure 1.1 shows an XBOX 360
Kinect with the dierent sensors present in it and their internal locations.
Figure 1.1: XBOX 360 Kinect showing its internal sensors.

Image courtesy: Microsoft Corporation.
ToF technology is mainly based on measuring time taken by the light emitted
from a source to travel to an object and back to a sensor [1, 2, 5]. The illumination
in most of the cases is considered to be continuous wave since it helps in the delay
estimation. The source of illumination and the sensor are assumed to be at the
same location. Since the distance between the object and the sensor is constant,
speed of light c is nite, the time shift caused in the emitted signal is equivalent to
the phase shift in the received signal. Based on the phase shift( Δϕ) between the
received signal and generated signal, the ToF is calculated, which in turn is used
for generating the depth maps. XBOX One Kinect is an active 3D imaging system
that works based on this principle [6]. Figure 1.2 shows the operating principle
of a ToF sensor, where Δϕ is the phase dierence between the transmitted and
received signal. Figure 1.3 shows a tear down of the XBOX One Kinect sensor
with the locations of the dierent sensors present in it.
Figure 1.2: Active 3D imaging system using time of ight technology.

Image courtesy: [1]
Figure 1.3: XBOX One Kinect showing internal sensors.

Image courtesy: iFixit.com
Features XBOX 360 Kinect XBOX One Kinect

0.4 to 3 meters or
Range of operation 0.5 to 4.5 meters
0.8 to 4 meters
Colour camera
640 x 480 @ 30 Hz 1920 x 1080 @ 30 Hz
resolution (pixels)
Near IR camera
640 x 480 @ 30 Hz 512 x 424 @ 30 Hz
resolution (pixels)
Field of view (FoV)
57°x 43° 70°x 60°
(horizontal x vertical)
Depth sensing
Structured Light Time of Flight
Technology
Working conditions Indoor Indoor & Outdoor
Table 1.1: Comparison between XBOX 360 Kinect & XBOX One Kinect
Microsoft Corporation has released two Kinect sensors, namely XBOX 360
Kinect and XBOX One Kinect. XBOX 360 Kinect is also called Kinect for Win-
dows V1.0 sensor, while XBOX One Kinect is also known as Kinect for Windows
V2.0 sensor. Table 1.1 gives a comparison of the changes between both the ver-
sions of the Kinect sensor. It should also be noted that the IR sensor in XBOX
One Kinect is lighting independent, i.e. the sensor used for capturing the IR
data and the depth information is not aected by the amount of lighting in the
environment. The IR sensor of XBOX One Kinect has a three times higher data
delity when compared to that of XBOX 360 Kinect, i.e. the depth maps acquired
from XBOX One Kinect are more clear, noise free and reliable when compared
to that from XBOX 360 Kinect due to the change in depth sensing technology.
1.1 Aims & Objectives

The main aim of this research is to create a reliable three dimensional (3D) model
for an object with irregular surfaces present in an indoor environment using a
XBOX One Kinect sensor. XBOX One Kinect was primarily developed as a
gaming device, but subsequently it generated signicant interest in the academic
and research world due its relatively low cost IR RGB-D sensor which can work
with personal computers, high data delity, depth resolution and accuracy. In
this research we make use of MATLAB and Microsoft Visual Studio for creating
a reliable 3D point cloud of the object of interest. The main objectives of this
research are to:
Acquire sets of depth maps of the environment with the objects of interest
Enhance the acquired depth maps and separate the depths of the object of
interest.
create an accurate 3D point cloud for the object of interest from the en-
hanced depth maps.
The research questions for this thesis are:
Is the 3D reconstruction of irregular surfaced object using XBOX Kinect

V2.0 possible?
How is the proposed algorithm better in terms of speed of reconstruction

and accuracy when compared to the existing ones?
Is this a realistic approach for performing online 3D reconstruction rather

than an oine one?
1.2 Thesis Organisation

Chapter 2 includes a brief literature review of dierent 3D reconstruction ap-
proaches. Chapter 3 includes a brief description of the dierent algorithms being
implemented in this thesis. In chapter 4, a detailed explanation of the experi-
mental setup and the proposed solution for creating an accurate 3D point cloud
of the object of interest is given. Chapter 5 contains a detailed explanation of the
results, and Chapter 6 consists of a discussion on the results. Finally, Chapter 7
consists of the conclusions and scope for further work.
Chapter 2
Literature Review
Kinect for Windows sensor is one of the most popular sensors being used for 3D
reconstruction of objects in an environment at a low cost. It could also be used
for the 3D reconstruction of a scene in real time, creating a virtual reality, motion
sensing, making a display touch enabled, etc. Most of these applications make
use of the Microsoft Kinect SDK. The main issue with the use of Kinect SDK for
real time reconstruction of a scene is that the hardware requirements to process
such large amounts of data are quite high. Due to these hardware requirements,
one must depend upon other programming tools such as MATLAB, etc.
In [1] the authors gave a thorough account about Time of Flight( ToF ) cam-
eras, their operating principle, calibration and alignment of ToF cameras, dierent
ToF and structured light cameras such as the Kinect for Windows V1.0 and V2.0
sensors.
3D Imaging techniques, creation of 3D objects, their representations, regis-
tration of 3D point clouds in dierent positions, applications of 3D imaging and
analysis in dierent areas of science are discussed by the authors in [2].
In [3, 4], the authors have given an account of dierent computer vision tech-
niques such as image segmentation, feature detection matching and alignment,
structure from motion(SfM ), 3D reconstruction, etc.
In [5], authors have given an objective comparison between the structured light
and ToF technologies for range sensing using a Kinect for Windows sensor. They
also discuss about the dierent error sources faced while using a Kinect sensor and
give a constructive framework for evaluating the performance of Kinect sensors.
In [6], the authors have given a brief account about the Kinect for Windows
sensors, 3D reconstruction, segmentation, matching and recognition, and the dif-
ferent algorithms used for these purposes.
6
Chapter 2. Literature Review 7
In [7], the author suggests the real-time reconstruction of a 3D scene using

Kinect Fusion, which requires a very powerful GPU and a large amount of mem-
ory. The main issues with the use of Kinect for Windows V1.0 sensor for 3D
reconstruction are that the depth data acquired is not as reliable as the data
acquired using Kinect for Windows V2.0 sensor due to the change in the sensing
technology, delity of the data is very low. The usage of Kinect Fusion for 3D
reconstruction is very tedious and has a number of hardware requirements.
A comprehensive evaluation of the Kinect for Windows V2.0 sensor for the
purpose of 3D reconstruction is explained in detail in [8]. In [9], the authors
introduce Kinect for Windows V2.0 sensor toolbox for MATLAB versions prior to
MATLAB R2016a. This toolbox makes use of C++ and MATLAB Mex functions
for providing access to colour, depth, infra-red streams of the Kinect sensor using
Microsoft Kinect Fusion SDK V2.0.
In [10, 11], the authors have given us a brief account of dierent outlier detec-
tion approaches. In [12], the authors give a comprehensive account of the dierent
variants of iterative closest point algorithm that could be used for registration of
dierent 3D models of the same object into a single position. In [13], the author
gives us a method for calculating the transformation matrix between two point
clouds based on the least squares estimation.
Chapter 3
Theoretical Background
3.1 Intrinsic Parameters of the Kinect Sensor

The intrinsic parameters of a Kinect sensor's Near IR ( NIR ) camera are focal
length (Fx, Fy ), location of the focal point ( Cx, Cy ) and the radial distortion
parameters (K1, K2, K3). Each Kinect sensor has its own depth intrinsic pa-
rameters which are sensor and lens dependent. Each Kinect sensor is calibrated
in the factory and the intrinsic parameters are stored in the internal memory
of the Kinect sensor. The depth intrinsic parameters of the Kinect sensor can
be acquired and stored with the help of the function "getDepthIntrinsics" of the
Kin2 toolbox developed for MATLAB [9].
3.2 Depth Image Enhancement

Consider a scenario in which N number of depth frames are acquired, each frame
being of resolution J X K pixels. For each pixel location (j, k), there exist N
data samples, which also contains outliers due to the noise inherent to the sensor.
From these N samples the outliers need to be removed. The outliers in the data
acquired are removed based on the median absolute deviation ( MAD ) of the data
[10, 11].
The MAD of a normal distribution is the median of absolute deviation from
the median, i.e.
M AD = b · M (|xi − M (X)|) (3.1)
where M is the median of a given distribution, X is the set containing the N
samples of data, xi is every individual sample in the data set X . Assuming
the depth intensities to be a normal distribution, b = 1.4826 is considered [11],
ignoring the abnormalities induced by the outliers in the data.
8
Chapter 3. Theoretical Background 9
The criterion for detecting outliers is based on a threshold value set based on
the value of MAD. The equation
|xi − M (X)| <= 3M AD (3.2)
is the criterion for detecting outliers. If a given sample xi of data set X satises
this condition, then the sample belongs to the data set. If more than 50% of the
data has the same value, then the MAD becomes zero. In such scenarios this
detection technique does not work.
After the outliers in the depth pixel intensities are detected and discarded for
each set X, the remaining samples of depth intensities are averaged to acquire a
value for pixel location (j, k) in the averaged depth map.
A pixel in a depth map is said to be invalid if it does not hold any depth
information, i.e. if the intensity value of that pixel is undened or zero. The
invalid pixels in a depth map are called holes in this work. These holes need to
be lled with valid depth values in order to avoid any holes in the point clouds.
The holes in the depth data are lled using the eight nearest neighbour principle,
which uses a set of intensities of the 8 nearest neighbours of a hole for calculating
its depth value based on the mean of those 8 nearest neighbours. Holes in the
averaged depth maps are lled and stored.
3.3 Point Cloud Generation
Figure 3.1: Kinect for Windows V2.0 sensor showing the orientation of the coor-
dinate system.
Image courtesy: Microsoft Corporation.
The depth maps of the environment are then converted into 3D point clouds using
the intrinsic parameters of the depth camera of the Kinect sensor, the acquired
depth data and the perspective projection relationship [8]. Each pixel p(j, k) in
these depth maps is converted into a physical location P(X, Y, Z) in 3D point

cloud with respect to the location of the depth camera in the Kinect, i.e. the origin
of the point cloud now generated is located at the position of the depth camera
of the Kinect sensor. The orientation of X, Y, Z axes in this 3D system is given
by the gure 3.1. The X and Y locations of the point P(X,Y, Z) corresponding
to each pixel p(j, k) in a depth map are calculated using the equations
j − Cx
X= ·Z (3.3)
Fx
k − Cy
Y = ·Z (3.4)
Fy
Here (j, k) is the location of the pixel p in the depth map, ( Cx , Cy ) and (Fx ,
Fy ) are the intrinsic parameters of the depth camera which are the location of the
focal centre and the focal length respectively. Z is the depth value of the pixel
p(j, k) in the depth map.
3.4 Point Cloud Alignment & Merging

The point clouds thus generated are then aligned to a particular position using
the iterative closest point ( ICP ) algorithm [2, 12]. The registration of one point
cloud with respect to another involves the calculation of transformation between
the two point clouds and followed by the transformation of the input point cloud
into the reference point cloud's orientation.
Registration of two or more 3D objects is a process that involves the approxi-
mation of the transformation between dierent 3D objects. ICP algorithm is the
standard process for registration of 3D objects. Consider two point clouds D, M
in such a way that D ⊂ M.
D = {d1 , d2 , ..., dNd }

(3.5)
M = {m1 , m2 , ..., mNm }
Here Nd , Nm are the number of points in the point clouds D and M respec-
tively.
The error function between the two point clouds is

Nd
EICP (a, D, M ) = E(T (a, D), M ) = ||(Rdu + t) − mv ||2 (3.6)
u=1
where a is the transformation function that best aligns the point cloud D to M,
a = (R, t) where R is the rotation matrix, t is the translation vector, ( du , mv )
are the corresponding points. Fixing du ∈ D, the corresponding point mv ∈ M is
computed such that
v= arg min ||(Rdu + t) − mw ||2 . (3.7)

w∈{1,2,...,Nm }
Based on the point correspondences, the transformation function a = (R,t)

could be computed to minimize the EICP . This process could be accomplished in
several ways. One of those approaches is based on singular value decomposition
(SVD ).
Firstly, the cross covariance matrix C is formed based on the Nd correspon-
dences (du , mv ), as
1
Nd
C= [du − d][mv − m]T (3.8)
Nd u=1
where d, m are the means formed over the Nd correspondences. Performing SVD
of C, we get
U SV T = C (3.9)
where U,V are two orthogonal matrices and S is a diagonal matrix of singular
values. The rotation matrix R can be estimated from the pair of orthogonal
matrices using the equation [13].
R = V S U T, (3.10)
where
I if det(U)det(V) = 1
S = (3.11)
diag(1, 1, ..., 1, −1) if det(U)det(V) = -1.
Here diag() denotes a diagonal matrix, I denotes identity matrix and det() denotes
the determinant of a given matrix. The translation vector t can be estimated as
t = m − Rd. (3.12)
The ICP algorithm is iteratively performed so as to improve the correspon-

dences, hence minimizing the EICP , until the true correspondences are known.
The root mean square error of the dierence between the reference and the trans-
formed point cloud is calculated.
The point cloud could be converted back into a depth map if required using
the equations 3.13, 3.14 [8]. If P(X, Y, Z) is a point in a 3D coordinate system,
then the pixel p (j, k) in the depth map is located using the equations

X · Fx
j= + Cx (3.13)
Z
and
Y · Fy
k= + Cy (3.14)
Z
and the depth value of the pixel p is given by
p(j, k) = Z (3.15)
Chapter 4
Implementation
The hardware requirements of this research study are a Microsoft Kinect for Win-
dows V2.0 sensor, a PC that satises the requirements for using Microsoft Kinect
for Windows V2.0 sensor and a turntable. The software requirements for acquir-
ing and processing data from a Kinect sensor are Windows 8.1 operating system,
MATLAB 2016a with image processing toolbox, Kinect for Windows hardware
support package for MATLAB, Kin2 toolbox for MATLAB [9], Microsoft Kinect
SDK V2.0 and Microsoft Visual Studio 2013.
4.1 Experimental Setup

The experimental setup for data acquisition in this research is shown in the gure
4.1.
Figure 4.1: Experimental setup without the objects of interest being present.
13
Chapter 4. Implementation 14
A scenario is considered in which a long table is present against a clear white

background (i.e. a wall) with a turn-table at one end with a Kinect sensor facing
it at the other end. A 360 °protractor is placed at the centre of the turntable to
facilitate accurate rotation as shown in the gure 4.1. A paper pointer is set along
the 0° of the protractor to facilitate the measurement of angles with respect to
this reference.
The object of interest ( OOI ) is placed at the centre of the turntable. Two
non-identical objects are placed on either side of the OOI as shown in gure 4.2 by
the two yellow lines in the place of the imaginary planes. The minimum distance
between OOI and the plane of the Kinect sensor is considered to be longer than
0.5 meters, due to the range limitations of the Kinect sensor. It is considered
that the OOI and the two non-identical objects are present between two xed
imaginary planes, parallel to the face of the Kinect sensor as shown in g 4.2.
These imaginary planes are considered in such a way that the object's surface
does not cross these planes in all the perspectives, i.e even when the turn-table is
rotated by an arbitrary angle. The horizontal distance D1 between the imaginary
plane-1 in front of the OOI and the Kinect sensor is measured. The horizontal
distanceD2 between the imaginary plane-2 behind the surface of the OOI and the
wall is also measured. If the setup is considered to be in the middle of the room,
another imaginary plane needs to be considered as a wall for allowing the isolation
of the OOI from that of the environment. The Kinect sensor is connected to the
PC via a USB 3.0 connection.
Figure 4.2: Experimental setup with the objects of interest.

4.2 Experiment
In this research study, depth maps of the scene with and without the OOI are
acquired to create a good quality 3D model of the OOI. This includes data ac-
quisition followed by various pre and post processing techniques for obtaining the
nal point cloud of the OOI to an appreciable accuracy.
4.2.1 Data Acquisition & Depth Image Enhancement

In the data acquisition stage, the Kinect object in MATLAB is initialized. The
intrinsic parameters of the Kinect sensor are acquired with the help of the "get-
DepthIntrinsics" of the Kin2 toolbox and stored. Initially a set of depth maps and
colour images of the scene without the OOI, for eg. 10, are acquired and stored.
The OOI is then placed at the centre of the turn-table with two non-identical
objects, eg. pens, on either side of the OOI. The horizontal distance D1 between
the face of the Kinect sensor and the OOI, D2 between the OOI and the wall
are measured. Sets of depth maps and colour images of the scene with turn-table
being rotated, for eg. 10 in each position, are acquired and stored. The total
number of positions is considered to be odd, for eg. 21, so that the nal position
of the point cloud of the OOI can be considered to be the middle position, i.e
"0°" with respect to the reference. The turntable is rotated by maximum two
degrees at a time, for eg. −10° to +10° with a rotation of one degree per position,
so as to maintain the accuracy and minimize the root mean square error.
The acquired depth maps of the environment are then enhanced and holes in
the data are lled based on the MAD and eight nearest neighbour principle. For
hole lling purposes, the MATLAB function "imll" is used, as it works based on
the same principle [14]. After the outliers and the holes in the depth maps of the
environment are discarded, a single averaged and hole lled depth map for each
position of the OOI on the turn-table is constructed.
For isolating the depths of OOI from the remaining of the environment, depth
maps of the OOI in each of the 21 positions is subtracted from that of the
environment without the OOI. Depths of the OOI in these subtracted outputs
would that from the wall behind the OOI as shown in the gure 4.2. Based
on these depth intensities and the distance D2 , the pixels with depth intensities
greater than D2 are identied and their corresponding depths from the enhanced
depth maps of the OOI in the environment are copied into the same location in
another new empty depth map and stored.
4.2.2 Point Cloud Generation, Alignment and Merging

The depth maps of the environment with the OOI and that of the isolated OOI are
converted into 3D point clouds based on the intrinsic parameters of the Kinect's
depth sensor and the depth data using the equations 3.3 and 3.4. The origin of
each of these point clouds is then translated along the Z -axis on to the imaginary
plane-1 which is in front of the OOI using the distance D1 .
The generated point clouds of the OOI are then aligned to a particular position
using the ICP algorithm. The nal position of the point cloud is considered to
be the middle position in the acquired data, i.e. 0° with respect to the reference
paper pointer. The MATLAB function "pcregrigid", with increased maximum
number of iterations and very low tolerance of error in terms of rotation and
translation, is used to register a given point cloud with respect to another [15].
Point clouds of the same orientation can be merged together to form a single
point cloud using the MATLAB function "pcmerge". This function merges the
two point clouds by using a box grid lter [16].
We consider a total of 21 rotated positions of the objects of interest i.e. from
−10° to +10° with a dierence of 1 °per position. The table 4.1 shows positions of
the OOI and their corresponding angle of rotation with respect to the reference
pointer.
Registration procedure is initially performed for the rst two positions, where
point cloud in the position 2 is considered to be the reference, and the transformed
rst point cloud is then merged with the reference point cloud and stored. This
merged point cloud is then registered with respect to the point cloud in the
next position, and the transformed output is merged with the reference. This
procedure is continued until all the point clouds prior to the 0° position, i.e.
position 11, are registered and merged into it. Similarly starting from the position
21 all the point clouds are registered with respect to their previous positions, and
are then merged until all of them are registered and merged into the position 11.
Thus, two point clouds for the position 11 are generated. Finally, the point cloud
from the rst set of registrations, and that from the second set of registrations
are merged to obtain the dense 3D point cloud of the OOI. This point cloud is
converted back into a depth map using the equations 3.13, 3.14, 3.15 to improve
the visualization. The root mean square error ( RMSE ) between each of the two
point clouds, i.e. transformed output and the reference in all of the cases is
calculated and the average RMSE of the reconstructed sole of the shoe is also
calculated.
Position Angle of Rotation w.r.t

Number Reference(degrees)
1(initial) -10
2 -9
3 -8
4 -7
5 -6
6 -5
7 -4
8 -3
9 -2
10 -1
11(middle) 0
12 1
13 2
14 3
15 4
16 5
17 6
18 7
19 8
20 9
21(nal) 10
Table 4.1: The positions of the object of interest and their corresponding angle
of rotation with respect to the reference pointer.
Chapter 5
Results
5.1 Intrinsic Parameters

Intrinsic parameters of the Kinect sensor being used for this work were acquired
using Kin2 toolbox for MATLAB[9]. The acquired intrinsic parameters of the
Kinect sensor being used in this work are shown in the table 5.1.
Kinect SDK
Intrinsic Parameters
Values (pixels)
Fx 365.2946
Focal Length
Fy 365.2946
Cx 259.7606
Focal Centre
Cy 205.8992
2nd
Order 0.0923
Radial Distortion 4th Order -0.2701
6th Order 0.0927
Table 5.1: Intrinsic parameters of the Kinect for Windows sensor.
18
Chapter 5. Results 19
5.2 Data Acquisition & Depth Image Enhance-

ment
Initially a set of depth maps and colour maps of the scene without the OOI are
acquired. These acquired images are horizontally ipped versions of the original
scene that we perceive. Figure 5.1 shows depth maps highlighting the changes in
the location of the holes over a period of time.
Figure 5.1: A set of depth maps of the scene without the objects of interest.
The OOI along with the two non identical objects is then placed on the turn-
table and the distance between the Kinect sensor and imaginary plane-1 is noted
down as D1 . Similarly the distance between the wall and the imaginary plane-2
is noted down as D2 . A set of depth maps and colour maps of the scene with the
OOI are acquired for each position of the scene according to the table 4.1 and
stored.
Figures 5.2, 5.3 show the depth maps and colour maps of the scene in the 1st
and 21st positions respectively.
Figure 5.2: A set of depth maps of the scene with the objects of interest in the
initial position(1st ).
Figure 5.3: A set of depth maps of the scene with the objects of interest in the
nal position(21st ).
It can be observed that the orientation of the sole slightly changes from g.
5.3 to g. 5.2.
The outliers from the acquired depth maps of the scene with and without the
OOI are detected based on the MAD using the equations 3.1 and 3.2 and are then
removed. The sets of depth maps of the scene in each position are then averaged
and the holes in the averaged depth maps are lled using the MATLAB function
"imll". Figures 5.4 and 5.5 show the averaged depth maps and enhanced depth
maps of the scene without the OOI respectively. When compared to the originally
captured depth maps, it could be observed that the averaged depth maps have
considerably less holes, due to the averaging of the set of depth maps over a
period of time. From gure 5.5, we can observe that the holes present on the
surface of the turn-table in gure 5.4, are lled. The black areas inside the circle
in the gure 5.4 show the holes.
Figure 5.4: Averaged depth map of the scene without the objects of interest.
Figure 5.5: Enhanced depth map of the scene without the objects of interest.
Figures 5.6 and 5.7 show the averaged and enhanced depth maps of the scene
with the OOI in the 1st position respectively.
Figure 5.6: Averaged depth map of the scene with the objects of interest in the
Figure 5.7: Enhanced depth map of the scene with the objects of interest in the
The enhanced depth maps of the scene with the OOI in dierent positions
considered are subtracted from that of the scene without the OOI. Based on the
distance from the wall D2 , the pixels with depth values less than D2 are discarded.
For the remaining valid pixels, the corresponding depth values from the enhanced
depth maps of the scene with the OOI are copied into another empty depth map
of the same dimensions. Figure 5.8 shows the subtracted depth map of the scene
with the objects of interest in the initial position. Figures 5.9 and 5.10 show the
retrieved depth maps of the OOI objects of interest from the Kinect sensor in the
1st and 21st positions.
Figure 5.8: Subtracted depth map of the scene with the objects of interest in the
1st position.
Figure 5.9: Depth map of the objects of interest in the 1st position.
Figure 5.10: Depth map of the objects of interest in the 21st position.
5.3 Point Cloud Generation

The depth maps of the OOI and that of the scene with the OOI are subsequently
converted into point clouds based on the depth values and intrinsic parameters of
the Kinect depth sensor (Table 5.1), using the equations 3.3 and 3.4. Figure 5.11
show the point cloud of the whole scene with the OOI, generated by converting
the depth map in the gure 5.7 in to a point cloud. Figures 5.12 and 5.13 show
the point clouds of the OOI in the 1st and 21st positions respectively, generated
from the depth maps in the gures 5.9 and 5.10 respectively. The origin of the
3D coordinate system of the point clouds in the gures 5.11, 5.12 and 5.13 has
been translated on to the imaginary plane-1, based on D1 , and all the points with
negative depths have been discarded. From gure 5.11, it could be observed that
the whole scene is present with the object of interest where as in gures 5.12 and
5.13 only the OOI are present. From gures 5.12, 5.13, it could also be observed
that the orientation of the OOI changes signicantly.
Figure 5.11: Point cloud of the scene with the objects of interest in the 1st position.
Figure 5.12: Point cloud of the objects of interest in the 1st position.
Figure 5.13: Point cloud of the objects of interest in the 21st position.
5.4 Point Cloud Alignment & Merging

The point clouds thus generated for dierent positions of the objects of interest
are then aligned into the middle position( 11th ), where the surface of the object
of interest is considered to be parallel to the face of the Kinect sensor, iteratively
using the ICP algorithm. Figures 5.14 and 5.15 show the point clouds in the rst
and second positions before and after ICP algorithm is implemented for aligning
point cloud in the rst position into the second.
Figure 5.14: Point clouds of the objects of interest in the rst and second posi-
tions.
Figure 5.15: Point cloud of the objects of interest in the second position with the
registered point cloud of the rst position with respect to the second position.
From gures 5.14 and 5.15, it could be observed that the green points that are
slightly misaligned with respect to the pink ones in the gure 5.14 are perfectly
aligned in the gure 5.15.
Similarly gures 5.16 and 5.17 show the point clouds in the 21st position into
the 20th positions before and after ICP algorithm is implemented for aligning
point cloud in the 21st position into the 20th position.
Figure 5.16: Point clouds of the objects of interest in the 21st and the 20th position.
Figure 5.17: Point cloud of the objects of interest in the 20th position with the
registered point cloud of the 21st position with respect to the 20th position.
20th
20th
21 st
20th
This process is continued till all the point clouds before and after the middle
position are aligned and merged into it. Figures 5.20 and 5.21 show the point
clouds generated from aligning and merging of the point clouds before and after
the middle position into it.
Figure 5.20: Merged point cloud of the objects of interest in the 11st position
obtained from the positions 1 to 10.
Figure 5.21: Merged point cloud of the objects of interest in the 11st position
obtained from the positions 12 to 21.
11th
11th
11th
11th
11th
ICP registration from Root Mean Square Error

position no. to position no. (RMSE )
1 to 2 0.0012
2 to 3 0.0011
3 to 4 0.0016
4 to 5 0.0017
5 to 6 0.0019
6 to 7 0.0024
7 to 8 0.0027
8 to 9 0.0027
9 to 10 0.0035
10 to 11 0.0030
21 to 20 0.0048
20 to 19 0.0046
19 to 18 0.0042
18 to 17 0.0043
17 to 16 0.0037
16 to 15 0.0038
15 to 14 0.0034
14 to 13 0.0032
13 to 12 0.0038
12 to 11 9.58 × 10− 4
Table 5.2: The ICP registrations and their corresponding RMSE values.
The RMSE of each of the iterations of the ICP algorithm is calculated using
the MATLAB function "pcregrigid". The RMSE values for each of the registra-
tions are shown in the table 5.2. The average root mean square error of all the
registrations is
N
−1
RM SEn
n=1 0.00576958
RM SEavg = = = 0.002885m (5.1)
N −1 20
Chapter 6
Discussion
6.1 Validation of Results

For validation of the results, a scenario is considered where the shoe is placed on
a graph paper as shown in gure 6.1. The X-axis of the 3D coordinate system is
considered to be along the length of the shoe, Y-axis is considered to be along the
height of the shoe, which is perpendicular to the surface of the graph paper, while
the Z-axis is considered to be along the width of the shoe. Initially, we locate the
origin of the coordinate system on the graph paper on the XZ plane as shown in
gure 6.1 . From gures 6.1 and 6.2, it can be observed that the length of the
shoe and height of the sole are approximately the same in both the cases. Two
points P1 (4, −10, 12) mm and P2 (23, −10, 13) mm are considered on the surface
of the real object as shown in the gure 6.3, and the distance d between them is
calculated as

d= (x2 − x1 )2 + (y2 − y1 )2 + (z2 − z1 )2 (6.1)

d= (23 − 4)2 + (−10 − (−10))2 + (13 − 12)2 = 19.02mm. (6.2)
On the surface of the point cloud, we try to locate the nearest matches to
the two points considered earlier on surface of the object. The points cannot be
exactly located as there still exists some loss of resolution. The two points from
the point cloud are P3 (4.163, −9.907, 11, 85) mm and P4 (22.98, −9.955, 13.35) mm
as shown in gure 6.4 and the distance d' between them is calculated as

d = (22.98 − 4.163)2 + (−9.955 − (−9.907))2 + (13.35 − 11.85)2 = 18.876mm.
(6.3)
38
Chapter 6. Discussion 39
From d and d , it could be observed that the variation in the measured distance
is very much negligible. Based on this, it could be said that there is not much
distortion in the point cloud thus generated. From gures 6.2, 6.4 and 6.6, it could
be observed that the depth intensities of the points in the point cloud present
around the edges of the OOI are quite erroneous due to the presence of ying
pixels in the data.
Figure 6.1: A top view of the object on a graph paper.

Figure 6.6: Front view of the point cloud of the object.
Similarly, two points P5 (55, −14, 15) mm and P6 (65, −9, 22) mm on the real
object are considered with a variation in the depth as shown in gure 6.5, and
the distance d1 between them is calculated.

d1 = (65 − 55)2 + (−9 + 14)2 + (22 − 15)2 = 13.19mm. (6.4)
Nearest matches to these two 3D points are then located on the point cloud of
the object as P7 (55.22, −14.21, 17) mm and P8 (65.27, −9.02, 22.18) mm and the
distance d1 between them is calculated.

d1 = (65.27 − 55.22)2 + (−9.02 + 14.21)2 + (22.18 − 17)2 = 12.44mm. (6.5)
From d1 and d1 , it could be observed that the dierence between them is negligible.
Based on this, it could be concluded that the point cloud of the object of interest
is generated to a good accuracy.
6.2 Sources of Error in 3D Model Creation

Initially, the objects of interest and the sensor were at dierent elevations as
shown in gure 6.7. This led to the creation of an IR reection of the object
in front of it as seen in gure 6.8. The reection could be avoided in a scenario
where both the OOI and the Kinect sensor are placed at the same level.
Figure 6.7: Colour image of the scene showing the placement of the object of
interest and the Kinect sensor on dierent levels.
Figure 6.8: Depth map of the object of interest showing the IR reection in front
of the object.
The minimum distance between the Kinect and the OOI for the OOI to be
recognized by the Kinect depth sensor is 0.5 meters, hence we consider an optimal
distance of approximately 0.85 meters between the Kinect and the OOI. If the
OOI is considered nearer or too farther than that of the optimal distance(0.8 to
1.2 meters), the variance of the depth intensities over a period of time increases,
in turn leading to an increase in the RMSE.
Two non identical objects are considered along with the main OOI to simplify
the detection of points used for the alignment of point clouds in two dierent
positions using the ICP algorithm.
In order to remove the noise created during the conversion of a depth map
into a point cloud, we consider two imaginary planes IP1 and IP2 parallel to the
face of the Kinect sensor such that plane IP1 is in front of the object's surface and
the plane IP2 behind the object's surface as shown in gure 4.2. The imaginary
planes IP1 and IP2 also facilitate isolation of the depths of the OOI from the
background.
The Kinect for Windows V2.0 depth sensor is vulnerable to a various kinds
errors [5, 8]. The main sources of errors in the acquired depth data are due to
Temperature drift
Depth inhomogeneity
Intensity related issues
Unlike the depth sensors working based on the structured light principle, the
time of ight depth sensors produce a lot of heat. During the warm up cycle,
it could be observed that the acquired depth values from the Kinect sensor vary
signicantly over a period of time as shown in gure 6.9. This variation of depth
during the warm up cycle can be avoided by allowing the Kinect sensor to heat
up for at least twenty minutes before initializing the data acquisition [5, 8]. From
gure 6.9, it could be observed that the depth intensities , i.e. the brightness of
the depth map, vary signicantly from one depth map to another.
Figure 6.9: Two depth maps of the same scene showing the depth variations over
a period of time due to the drift in the temperature.
Figure 6.10: Two depth maps of the same scene showing how the holes at the
edges of the objects vary over time.
At the object boundaries, the depth values become erroneous due to the su-
perimposing of the signals reecting from surfaces with dierent depths causing
the depth values of such pixels to vary signicantly over a period of time. Hence
such pixels are called ying or mixed pixels [5]. These can be eliminated using
a proper outlier detection algorithm. From gure 6.10, it could be observed that
the pixel depths of certain pixels around the edges of the objects in scene vary
over a period of time, i.e. some of them become valid while some of them become
holes in the depth maps.
Figure 6.11: Colour image and depth map of a scene depicting the intensity
related issues in depths maps.
The intensities recorded by the Kinect depth sensor for highly reective sur-
faces or very dark surfaces are particularly low. Thus, the corresponding depth
values are larger than expected. This mainly occurs due to the absorption of Near
IR radiation by the dark coloured surfaces. When the reectivity of a surface is
very high, it might result in multipath reection of the signal, creating an increase
in the time of ight, leading to an increase in the recorded depth for that pixel
[5, 8]. From gure 6.11, it could be observed that even though the sole of the shoe
is dark, the depth intensities of the sole are consistent with the rest of the depth
map where as the depth intensities of the top half of the pen are not consistent
with that of its other half. This is because of the high reectivity of the surface
of the top half of pen.
6.3 Advantages and Limitations

Due to the change in the technology used for sensing the depths in the Kinect for
Windows V2.0 sensor, the measurements have become more accurate, and have
three times more delity when compared to that of the earlier Kinect sensor. The
point clouds generated through this sensor have a higher resolution, as the depth
is calculated individually for each pixel in the depth maps, which imminently
leads to its capability to capture even the small artefacts in an environment. Its
invulnerability to daylight makes it even more possible to use Kinect for Windows
in an open environment.
Due to the high accuracy in the depth maps, i.e. less than 3 mm at a distance
of around one meter [5], we could say that the depth maps and point clouds
generated using a Kinect for Windows V2.0 sensor are reliable. In this study, we
exploit this ability of the Kinect sensor to generate accurate and high resolution
point clouds at a low cost. By decreasing the rotational and translational error
tolerance of the ICP algorithm, it could be observed that the generated point
cloud of the OOI is perfectly registered and that the nal average RMSE of the
system is 2.9 mm. This study emphasises on the use of low cost 3D imaging
sensors such as Kinect for Windows sensor for high resolution 3D reconstruction
of an object in an environment.
If the number of positions of the OOI considered is high, then the registration
of the various point clouds using ICP algorithm consumes large amount of time,
i.e. around 45 minutes per registration between two consecutive positions. A
trade o between the accuracy and execution time can be taken into consider-
ation when time is a major constraint. The error in the process of registration
propagates and increases if the dierence between any two consecutive positions
is more than two degrees. This is due to one of the initial considerations in the
ICP algorithm that the point cloud being registered is a subset of the point cloud
to which it is being registered [2]. Sole of the shoe has to be rotated atleast by
ten degrees, i.e. from −5° to 5°, for acquiring the complete surface of the shoe.
Chapter 7
Conclusions and Future Scope
Based on the results, it could be concluded that the 3D point cloud of an irregular
surfaced object can be reconstructed using Kinect for Windows V2.0 sensor in an
indoor environment. The average RMSE of the reconstructed sole of the shoe is
0.002885 m. This system could be eectively used for the 3D reconstruction in
forensics, for preserving historic artefacts, etc.
7.1 Further Scope

Accuracy of the algorithm can be further improved by using image lters such
as joint bilateral lters for lling the holes in a depth map. Implementation of a
colour, depth and texture based ICP algorithm for the registration of the point
clouds in dierent positions may also lead to an increase in the accuracy, but it has
to be noted that the computational complexity, hardware and time requirements
of the process also increase signicantly with the increase in the accuracy.
The accuracy of the rotation system can be increased by automating the
turntable so that it can be rotated in steps of one degree, which in turn leads to
an increase in the accuracy of the system. By using ecient surface reconstruc-
tion techniques, this system can be used to generate 3D printable point clouds.
Parallel computing could be used for eectively decreasing the time taken for the
registration of dierent point clouds into the same alignment.
A perfect 3D reconstruction of the sole of the shoe could be achieved by
acquiring more complete depth data of the sole of the shoe, i.e. by acquiring the
depth data while rotating the shoe along both its horizontal and vertical axis,
subsequently converting the depth maps into point clouds, aligning and merging
them to form a more dense point cloud.
48
References
[1] M. Hansard, S. Lee, O. Choi, and R. Horaud, Time-of-Flight Cameras:

Principles, Methods and Applications . SpringerBriefs in Computer Science,
Springer London, 2012.
[2] N. Pears, Y. Liu, and P. Bunting, 3D Imaging, Analysis and Applications .
Springer London, 2014.
[3] D. Forsyth and J. Ponce, Computer Vision: A Modern Approach: A Modern

Approach . Pearson Education Limited, 2015.
[4] R. Szeliski, Computer Vision: Algorithms and Applications . Texts in Com-
puter Science, Springer London, 2010.
[5] H. Sarbolandi, D. Leoch, and A. Kolb, Kinect range sensing: Structured-
light versus time-of-ight kinect, CoRR , vol. abs/1505.05459, 2015.
[6] L. Shao, J. Han, P. Kohli, and Z. Zhang, Computer Vision and Machine
Learning with RGB-D Sensors . Advances in Computer Vision and Pattern
Recognition, Springer International Publishing, 2014.
[7] S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P. Kohli, J. Shot-
ton, S. Hodges, D. Freeman, A. Davison, and A. Fitzgibbon, KinectFusion:
Real-time 3d reconstruction and interaction using a moving depth camera,
ACM Symposium on User Interface Software and Technology.
[8] E. Lachat, H. Macher, M.-A. Mittet, T. Landes, and P. Grussenmeyer, First
Experiences with Kinect v2 Sensor for Close Range 3d Modelling,ISPRS -

International Archives of the Photogrammetry, Remote Sensing and Spatial
Information Sciences , pp. 93100, Feb. 2015.
[9] J. R. Terven and D. M. Córdova-Esparza, Kin2. a kinect 2 toolbox for
MATLAB,
[10] Oracle crystal ball reference and examples guide - outlier detection meth-
ods.
49
References 50
[11] C. Leys, C. Ley, O. Klein, P. Bernard, and L. Licata, Detecting outliers:

Do not use standard deviation around the mean, use absolute deviation
around the median, Journal of Experimental Social Psychology , vol. 49,
no. 4, pp. 764 766, 2013.
[12] S. Rusinkiewicz and M. Levoy, Ecient variants of the ICP algorithm,

in Third International Conference on 3D Digital Imaging and Modeling
(3DIM), June 2001.
[13] S. Umeyama, Least-squares estimation of transformation parameters be-
tween two point patterns, IEEE Transactions on pattern analysis and ma-
chine intelligence, vol. 13, no. 4, pp. 376380, 1991.
[14] Fill image regions and holes - MATLAB imll - MathWorks nordic.
[15] Register two point clouds using ICP algorithm - MATLAB pcregrigid -
MathWorks india.
[16] Merge two 3-d point clouds - MATLAB pcmerge - MathWorks india.

Scan 3D Objects With Kinect

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Scan 3D Objects With Kinect

Uploaded by

Copyright:

Available Formats

Master Thesis

Department of Applied Signal Processing

Vinay Kanth Devu

Department of Applied Signal Processing Internet : www.bth.se

Three dimensional image processing and analysis, particularly, the

Keywords: XBOX Kinect, 3D Reconstruction, ICP.

7 Conclusions and Future Scope 48

1.1 XBOX 360 Kinect showing its internal sensors.

4.1 Experimental setup without the objects of interest being present. 13

6.1 A top view of the object on a graph paper. . . . . . . . . . . . . . 39

5.1 Intrinsic parameters of the Kinect for Windows sensor. . . . . . . 18

Computer vision is an interdisciplinary eld that helps us in gaining high-level

Figure 1.1: XBOX 360 Kinect showing its internal sensors.

Figure 1.2: Active 3D imaging system using time of ight technology.

Figure 1.3: XBOX One Kinect showing internal sensors.

Features XBOX 360 Kinect XBOX One Kinect

1.1 Aims & Objectives

The research questions for this thesis are:

 Is the 3D reconstruction of irregular surfaced object using XBOX Kinect

 How is the proposed algorithm better in terms of speed of reconstruction

 Is this a realistic approach for performing online 3D reconstruction rather

1.2 Thesis Organisation

In [7], the author suggests the real-time reconstruction of a 3D scene using

3.1 Intrinsic Parameters of the Kinect Sensor

3.2 Depth Image Enhancement

|xi − M (X)| <= 3M AD (3.2)

3.3 Point Cloud Generation

these depth maps is converted into a physical location P(X, Y, Z) in 3D point

3.4 Point Cloud Alignment & Merging

D = {d1 , d2 , ..., dNd }

The error function between the two point clouds is

v= arg min ||(Rdu + t) − mw ||2 . (3.7)

Based on the point correspondences, the transformation function a = (R,t)

The ICP algorithm is iteratively performed so as to improve the correspon-

4.1 Experimental Setup

A scenario is considered in which a long table is present against a clear white

Figure 4.2: Experimental setup with the objects of interest.

4.2.1 Data Acquisition & Depth Image Enhancement

4.2.2 Point Cloud Generation, Alignment and Merging

Position Angle of Rotation w.r.t

5.1 Intrinsic Parameters

Table 5.1: Intrinsic parameters of the Kinect for Windows sensor.

5.2 Data Acquisition & Depth Image Enhance-

5.3 Point Cloud Generation

5.4 Point Cloud Alignment & Merging

ICP registration from Root Mean Square Error

6.1 Validation of Results

Figure 6.1: A top view of the object on a graph paper.

Figure 6.6: Front view of the point cloud of the object.

6.2 Sources of Error in 3D Model Creation

 Intensity related issues

6.3 Advantages and Limitations

7.1 Further Scope

[1] M. Hansard, S. Lee, O. Choi, and R. Horaud, Time-of-Flight Cameras:

Springer London, 2012.

[2] N. Pears, Y. Liu, and P. Bunting, 3D Imaging, Analysis and Applications .

Springer London, 2014.

[3] D. Forsyth and J. Ponce, Computer Vision: A Modern Approach: A Modern

[4] R. Szeliski, Computer Vision: Algorithms and Applications . Texts in Com-

puter Science, Springer London, 2010.

[5] H. Sarbolandi, D. Leoch, and A. Kolb, Kinect range sensing: Structured-

light versus time-of-ight kinect, CoRR , vol. abs/1505.05459, 2015.

Recognition, Springer International Publishing, 2014.

Computer vision is an interdisciplinary eld that helps us in gaining high-level

Figure 1.2: Active 3D imaging system using time of ight technology.

Is the 3D reconstruction of irregular surfaced object using XBOX Kinect

How is the proposed algorithm better in terms of speed of reconstruction

Is this a realistic approach for performing online 3D reconstruction rather

Intensity related issues

[5] H. Sarbolandi, D. Leoch, and A. Kolb, Kinect range sensing: Structured-

light versus time-of-ight kinect, CoRR , vol. abs/1505.05459, 2015.

ton, S. Hodges, D. Freeman, A. Davison, and A. Fitzgibbon, KinectFusion:

Real-time 3d reconstruction and interaction using a moving depth camera,

[8] E. Lachat, H. Macher, M.-A. Mittet, T. Landes, and P. Grussenmeyer, First

Experiences with Kinect v2 Sensor for Close Range 3d Modelling,ISPRS -

[9] J. R. Terven and D. M. Córdova-Esparza, Kin2. a kinect 2 toolbox for

[11] C. Leys, C. Ley, O. Klein, P. Bernard, and L. Licata, Detecting outliers:

[12] S. Rusinkiewicz and M. Levoy, Ecient variants of the ICP algorithm,