You are on page 1of 6

CA LIBRATION BETWEEN DEPTH A ND COLOR SENSORS

FOR COMMODITY DEPTH CA MERA S

Cha Zhang and Zhengyou Zhang

Communication and Collaboration Systems Group, Microsoft Research


{chazhang, zhang}@microsoft.com

ABSTRACT

Commodity depth cameras have created many interesting new


applications in the research community recently. These appli­
cations often require the calibration information between the
color and the depth cameras. Traditional checkerboard based
calibration schemes fail to work well for the depth camera,
since its corner features cannot be reliably detected in the
depth image. In this paper, we present a maximum likelihood
solution for the joint depth and color calibration based on two Fig. 1. The calibration pattern used in this paper. The color
principles. First, in the depth image, points on the checker­ image is shown on the left; and the depth image is shown on
board shall be co-planar, and the plane is known from color the right. The depth pixels inside the red rectangle shall lie
camera calibration. Second, additional point correspondences on the model plane surface, though point correspondence is
between the depth and color images may be manually speci­ difficult to obtain. A few manually specified corresponding
fied or automatically established to help improve calibration point pairs are also shown in the figure.
accuracy. Uncertainty in depth values has been taken into ac­
count systematically. The proposed algorithm is reliable and • Feature points such as the corners of checkerboard
accurate, as demonstrated by extensive experimental results patterns are often indistinguishable from other surface
on simulated and real-world examples. points in the depth image, as shown in Fig. 1.
• Although depth discontinuity can be easily observed in
Index Terms- depth camera, calibration
the depth image, the boundary points are usually unreli­
able due to unknown depth reconstruction mechanisms
1. INTRODUCTION
used inside the depth camera.
• One may use the infrared image co-centered with the
Recently, there has been an increasing number of depth cam­
depth image to perform calibration. However, this
eras available at commodity prices, such as those from 3DV
may require external infrared illumination (e.g., the
systems I and Microsoft Kinect2. These cameras can usually
Kinect camera). In addition, the depth mapping func­
capture both color and depth images in real time. They have
tion (Eq. (22» of the depth image may not be calibrated
created a lot of interesting new research applications, such
with such a method.
as 3D shape scanning [1] , foreground/background segmenta­
• Most commodity depth cameras produce noisy depth
tion [2], facial expression tracking [3], etc.
images. Such noises need to be accurately modeled in
For many applications that use the color and the depth
order to obtain satisfactory results.
images jointly, it is critical to know the calibration parame­
ters of the sensor pair. Such parameters include the intrinsic In this paper, we propose a maximum likelihood solution
parameters of the color camera, its radial distortion param­ for joint depth and color calibration using commodity depth
eters, the rotation and translation between the depth camera cameras. We use the popular checkerboard pattern adopted
and the color camera, and parameters that help determine the in color camera calibration (Fig. 1), thus no extra hardware is
depth values (e.g., in meters) of pixels in the depth image. Al­ needed. We utilize the fact that points on the checkerboard
though color camera calibration has been thoroughly studied shall lie on a common plane, thus their distance to the plane
in the literature [4, 5] , the joint calibration of depth and color shall be minimized. Point correspondences between the depth
images presents a few new challenges: and color images may be further added to help improve cal­
1
3DV Systems, http://www.3dvsystems.coml. ibration accuracy. A maximum likelihood framework is pre­
2
Microsoft, http://www.xbox.comlen-USlkinecti. sented with careful modeling of the sensor noise, particularly

978-1-61284-350-6/111$26.00 ©2011 IEEE


FLl
in [5] . In total there are n image pairs (color and depth) cap­
lane i
tured by the depth camera. The positions of the calibration
board in the n images are different, as shown in Fig. 2. We

I
� ...
Y1
M=[;] ,P Z
set up local 3D coordinate system (Xi , Zi)
ti,
sition of the calibration model plane, such that the
for each po­
Zi = 0
/ 1
:
R 1> t 11: plane coincides with the model plane. In addition, we assume

m
I
the model plane has a set of m feature points. Usually they
Z,/m
I
\


... -
\
are the corners of a checkerboard pattern. We denote these
<---_ feature points as Pj , = I,. .
j . ,m. Note the 3D coordinates
of these feature points in each model plane's local coordinate
system are identical. Each feature point's local 3D coordinate
is associated with the world coordinate as:
Fig. 2. Illustration of the notations used in the paper.
(4)
in the depth image. Extensive experimental results are pre­
sented to validate the proposed calibration method.
where Mij is the jth feature point of the ith image in the
ti
world coordinate system, Ri and are the rotation and trans­
2. NOTATIONS
lation from the ith model plane's local coordinate system to
the world coordinate system. The feature points are observed
Fig. 2 illustrates the notations used during our calibration pro­
cedure. We assume the color camera's 3D coordinate system
in the color image as mij, which are associated with Mij
through Eq. (1).
coincides with the world coordinate system. In the homoge­
neous representation, a 3D point in the world coordinate sys­
Given the set of feature points Pj and their projections

tem is denoted by Y,M =[X, Z, IV,


and its corresponding
mij, our goal is to recover the intrinsic matrix A, the model

2D projection in the color image is v,


m=[u, IV. We model
plane rotations and translations R i ,ti, and the transform be­

the color camera by the usual pinhole model, i.e.,


tween the color and the depth cameras R and t. It is well­
known that given the set of color images, the intrinsic matrix
sm=A[IO]M , (1) A and the model plane positions Ri, ti can be computed [5] .
It is unclear, however, whether the depth images can be used
where is the identity matrix and 0 is the zero vector. is a
I s t
to reliably determine Rand automatically.
scale factor. In this particular case, is the camera's s= Z. A
intrinsic matrix, given by [5]:
3.2. A Maximnm Likelihood Solutiou
[0: I uo j
Vo ,
The calibration solution to the color image only problem is
A= 0 fJ (2)
well known [5]. Due to the pinhole camera model, we have:
o 0 I
where 0:and fJ are the scale factors in the image coordinate (5)
system, ( uo, vo ) are the coordinates of the principal point, and
I is the skewness of the two image axes.
In practice, the feature points on the color images are usu­
ally extracted with automatic algorithms, and may have er­
The depth camera typically outputs an image with depth
values, denoted by x v, z where ( u, v ) are the pixel
=[u, V, rors. Assume that mij follows a Gaussian distribution with
the ground truth position as its mean, i.e.,
coordinates, and z is the depth value. The mapping from x to
the point in the depth camera's 3D coordinate system Md = (6)
[X d ,yd , Z d , I]Tis usually known, denoted as f(x). Md =
The rotation and translation between the color and the depth The log likelihood function can be written as:
t,
[� �] Md.
cameras are denoted by Rand i.e.,

(7)
M= (3)

where
3. JOINT DEPTH/COLOR CAMERA CALffiRATION I A[R t ]P
tij=mij - - i i j. (8)
Sij
3.1. Problem Statement
We next study terms related to the depth images. There
During calibration, we assume the user moves a planar cal­ are a set of points in the depth image that correspond to the
ibration board in front of the depth camera, similar to that model plane, as those inside the red quadrilateral in Fig. 1. We
randomly sample Ki
points within the quadrilateral, denoted Combining all the information together, we maximize the
by Mfki' i = 1,··· , n ; ki
= 1,··· ,Ki.
If the depth image is overall log likelihood as:
noise free, we shall have:

where Pi ,
i = 1,2,3 are weighting parameters. The above
which states that if we transform these points to the local co­ objective function is a nonlinear least squares problem, which
ordinate system of each model plane, the coordinate shallZi can be solved using the Levenberg-Marquardt method [6] .
be zero.
Since the depth images are usually noisy, we assume Mfki 3.3. Estimate Other Parameters
follows a Gaussian distribution as:
There may be a few other parameters that need to be esti­
(10) mated during calibration. For instance, the color camera may
exhibit significant lens distortions, thus it is necessary to esti­
The log likelihood function can thus be written as: mate them based on the observed model planes. Another set
I n
Ki c 2 of unknown parameters may be in the depth mapping func­

2 ",n
L2 -
� � 2 '
""� (11) ()
tion f . . For instance, the structured light-based Kinect depth
L...i=1 Ki i=1 ki=1 (7iki
--

camera may have a depth mapping function as:

where
ciki aTi Midki
= (12)
(22)

where J.1 and v are the scale and bias of the z value, and
where
Ad

[�l
is the intrinsic matrix of the depth sensor. Usually Ad
is

�l
(13) pre-determined. The other two parameters J.1 and v can be
used to model the depth sensor's decalibration due to temper­
ature variation or mechanical vibration, and can be estimated
and
within the same maximum likelihood framework.
(14)

It is sometimes helpful to have a few corresponding point


3.4. Solutions for Initialization
pairs in the color images and the depth images, as shown
in Fig. 1. We denote such point pairs as i = (miPi' Mfpi)' Since the overall likelihood function in Eq. (21) is nonlin­
1,... , n; Pi= 1, . . . ,Pi.
These point pairs shall satisfy: ear, it is very important to have good initialization for the un­
known parameters. For the parameters related to the color
(15) camera, i.e., A, Ri and ti, we may adopt the same initializa­
tion scheme as in [5]. In the following, we discuss methods to
Whether the point correspondences are manually labeled or
automatically established, they may not be accurate. Assume:
provide the initial estimation of the rotation R and translation
t between the depth and color sensors. During the process,
mipi rvN(mi - Pi ,q,iPJ;MdiPi rvN(M - di , di , (16) we assume A, Riand ti of the color camera are known.
Pi q, PJ
where q,iPi models the inaccuracy of the point in the color
3.4.1. Initialization with Model Plane Matching
image, and q,fPi models the uncertainty of the 3D point from
the depth sensor. The log likelihood function can be written For most commodity depth cameras, the color camera and
as: the depth camera are positioned very closely. It is there­

L3 = =n"'""
-2-=",
1
. -p-.
� � �iTPiq,--1
iPi�ipi , (17)
fore simple to automatically identify a set of points in each
-
� � depth image that lies on the model plane. Let these points be
L... . =1 • i=1 Pi=1 i = 1,··· n;, ki , Ki.
Mfki' = 1,··· For a given depth im­
where age i, if Ki :::: 3, it is possible to fit a plane to the points in
(18) that image. That is, given:
where
1
BiPi = -A[R t], (19)
SiPi = 0, (23)
and
(20)
where nf is the normal of the model plane in the depth sen­ '.5
0�
' '"

sor's 3D coordinate system,


from the origin. Ilnf lland
II nf l12
bf
bf
= 1; and is the bias
can be easily found through
, e.
" 0'
1.2

,
T,

'.

least squares fitting.


In the color sensor's coordinate system, the model plane
can also be described by plane equation

',��--�
. --�--�,--�� ...
"
, 1.5 2
Oeph noise 11.rdatd c:lr41ion (min) Oeph noise 11andatd ctr.ialion (mm)
(24)

Fig. 3. Calibration accuracy vs. depth camera noise level.


i ti
Since R and are known, we represent the plane's normal
asni, Iinil12
=1, and bias from the origin bi. Note the intrinsic matrix A is known. Such a problem has
We first solve the rotation matrix R. Denote:
been studied extensively in the literature [8, 9] . It has been
shown that given 3 point pairs, there are in general four solu­
(25) tions to the rotation and translation. W hen one has 4 or more
non-coplanar point pairs, the so-called POSIT algorithm [10]
can be used to find the initial value of Rand t.
We minimize the following objective function with constraint:
n 3 4. EXPERIMENTAL RESULTS
J( R) L I in i - Rnfll + L Aj(rJrj - 1 ) +
i=l j=l The maximum likelihood solution in Eq. (21) can be used to
2A4rfr2 + 2A5rfr3 + 2A6r�r3. (26) calibrate all unknown parameters for the depth and color sen­
sors. Due to space limitations, in this paper we focus our at­
This objective function can be solved in close form [7]. Let: tention on the parameters related to the depth sensor only, i.e.,
n t
R, and f(·), and assume A, R , are known (or obtained i ti
separately from, say, maximizing Eq. (7) only [5] ).
C= L nf nf. (27)
i=l
4.1. Simulated Results
The singular value decomposition of C can be written as:
The simulated depth/color camera has the following parame­
ters. For the color camera, Ct = 750,,8 = 745,,), = O,uo =
C=UDVT, (28)

where U and V are orthonormal matrices and D is a diagonal


315,Vo = 245. The image resolution is 640 x 480. The
rotation and translation from the depth camera to the color
matrix. The rotation matrix is simply:
camera is represented by vector [Bx,By,Bz,tx,ty,tz]T =
R=VUT. (29) [0.05,-0.01,0.02,25,2,-2jT, where [Bx,By,BzjT in radians
represents rotation, which can be converted to Rthrough the
The minimum number of images to determine the rotation well-known Rodrigues' rotation formula, and the last three el­
matrix R is n = 2,
provided that the two model planes are ements represent translation = [tx,ty,tzjT in millimeters. t
not parallel to each other.
For translation, we have the following relationship:
4.1.1. Performance W.T.t. the Noise Level
(30) In this experiment we examine the impact of the depth cam­
era's noise level to the calibration accuracy. Three model
Thus three non-parallel model planes will determine a unique
t. If n > 3, we may solve t through least squares fitting. planes are used in the experiment. The checkerboard pattern
has 10 x 7 corners on a regular grid. The distance between
neighboring corners is 37. The three model planes are located
3.4.2. Initialization with Point Pair Matching at [i,0,0,-300,25, 750jT, [0, - rs, -110,-100,1150]T �
Another scheme to estimate the initial rotation R and trans­ and [-;,,0,�,120,-200,800] , respectively. Only the
t
lation is through the knowledge of a set of point correspon­ plane fitting likelihood term (Eq. (11)) is maximized to de­
dences between the color images and the depth images. De­ termine Rand where t,
= 1000. The covariance of theKi
note such point pairs as ) ,i = 1,··· , n ;
(miPi' Mfpi = Pi depth noise is assumed to be independent of the depth values
1,· .
. ,Pi.We have the relationship: (which is an acceptable assumption for time-of-flight based
depth cameras). At each noise level, 500 trials were run and
(31) the standard deviations (STDs) of the errors are reported, as
planes for calibration. A total of 500 trials were run for each

o 0, T,

T,
configuration in Fig. 5. It contains three groups of curves.
In the first group, only the plane fitting likelihood term
(Eq. (11» is maximized to determine Rand t. It can be seen
that when the plane orientations are small, the calibration er­
rors are big. This is intuitive, since according to Section 3.4.1,
parallel planes would not be effective in determining the rota­
����--�.--�,,--7.-�� °2���--�.--�,,--7.-�� tion/translation between the depth and color sensors.
Nunber ot pia..... Nunbtf 01 plarlllll
In the second group, we use the point pair likelihood term
Fig. 4. Calibration accuracy vs. number of model planes.
(Eq. (17» to determine Rand t. For this purpose, we assume
the 4 corners of each model plane is known in both the color
s
3X10.

_e. (pLaoottlingonly) T1(pIalGlllingootyj


and the depth images, thus we have a total of 12 point pairs.
Noise of covariance 0.252 is added to the position of the point

"�
_ _ , (pLaoott1ingonly) __ _ Ty(plMei1lingootyj

9. (plane tt1ingonly)
\
_ T:(pIa1elltingooly)

f 2
= � ::::::: i -e--T.(poinlpairsooty)

--£l_Ty(po;nlpa;rsooly)
pairs to mimic the real-world scenario. It can be seen from
Fig. 5 that with so few point pairs, the calibration error STDs
�_�:::,..oo", i"
T.(poinlpairsG"lly)

i
--.E. T.(bah)

� 1
are generally bigger than those generated by planing fitting.

---':-
::' Ty(boIhl

9l(bdh) �
: §
T:(boIh)

= �
-�_ :-; -- � � � An exception is the the cases of very small plane orientations,
fi
-e- <l
& '-0-


1 "- _ &- G -e.- -0 -0 , ..Q 0 'e-
"0--0- G-- G -e- B-C
t � G--� � n� where the plane fitting only solution performs very poorly.
0. '

t2:-"'-
"'""' 6 -� ��
.... . In the third group, we determine R and t based on the
�·�o�������--���oo�
� -- �4e.t-� -���
o
combination of plane fitting and point pair likelihood terms.
We use P2 1 and P3
" 30 40 50 60
Orientation 01 pIiw1es «()egree) Oriertalionot�(degee)

= = 0.2. It can be seen that for small


Fig. 5. Calibration accuracy vs. plane orientations. plane orientations, combining the two likelihood terms results
in better performance than using only either. When the plane
shown in Fig. 3. It can be seen that the STDs of the angular orientations are large, however, the plane fitting likelihood
errors and translation errors increase linearly with the noise term alone seem to perform better. In practice, we also need to
level. The mean of the errors are generally very close to zero consider the calibration accuracy of the color camera param­
(not shown in Fig. 3), and the STDs are very small, indicating eters. It has been shown in [5] that color camera calibration
satisfactory calibration accuracy and algorithm stability. will perform poorly if the model planes are near perpendicu­
lar to the color image's imaging plane. Therefore, we recom­
4.1.2. Performance W.r.t. the Number of Planes mend to use model planes oriented about 30-50 degrees with
respect to the color camera's imaging plane for better overall
In the second experiment we examine whether increasing the
calibration quality.
number of model planes could improve calibration accuracy.
We tested between 3 and 15 planes, as shown in Fig. 4. The
first 3 planes are the same as those in Section 4.1.1. From 4.1.4. Performance W.r.t. the Correct Noise Model
the fourth image on, we randomly generate a vector on the So far we have assumed depth-independent noises in the
unit sphere, and apply a rotation with respect to the vector for depth image. This is acceptable for time-of-flight depth cam­
an angle of �. Again only the plane fitting likelihood term eras such as the ZCam from 3DV systems. However, for trian­
(Eq. (II» is maximized to determine Rand t. The covari­ gularization based depth cameras such as the Kinect camera,
ance of the depth noise is set to 1 throughout the experiment. the noise level is a quadratic function of the depth [3] . Both
For a given number of planes, we run 500 trials and report the types of noises can be accommodated with our maximum
STDs of the errors. From Fig. 4, it is obvious that increas­ likelihood solution in Section 3. On the other hand, applying
ing the number of planes leads to smaller STDs of the errors, the correct noise model would generally improve calibration
thus better calibration accuracy. We recommend at least 8-10 performance. Here we use the same setup as Section 4.1.1,
planes to achieve sufficient accuracy during calibration. except that the depth noise follows the formula in [3]:

4.1.3. Performance W.r.t. the Plane Orientations a <X Z2 , (32)

Next we study the impact of the model plane orientations. We and we assume the noise level at Z 1000 is known (adapt­
=

again use 3 planes for calibration. The planes are generated ing as the horizontal axis of Fig. 6). We ran two sets of ex­
as follows. We first randomly choose three vectors on the unit periments. In the first set, we assume the user is unaware of
circle in the color camera's imaging plane, and we make sure the sensor noise type, and blindly assume that the noise is
the smallest angle between the three vectors is greater than depth-independent. In the second set, the correct noise model
�. We then apply a rotation with respect to the three vectors is applied. It can be seen from Fig. 6 that using the correct
for a varying angle between 10° and 80° to generate 3 model noise model results in better calibration performance.
" r'---
, ---;.:-;:
, I,-===_:;:;;]I
� 10�

...... .....
ibration effective zone. It is well known that for calibra­
• a.,(z-lrNlrlan1 roiHmodeI)
tion to work well, the checkerboard shall be placed across
O. (Klnec:lroiHmodeI)
--() a.,(K'-:lno1s.modeI)
the whole workspace. In Fig. 7 (a), we notice the chair re­
O,(KInec:lroiHmodeI)
IS

gion is not aligned well, since no checkerboard was placed


there in the 12 images. We manually add 3-5 point corre­
spondences for each image pair, some of them lying on the
background of the scene (see Fig 1). After using both plane
fitting and point correspondences to calibrate, the results are
[-0.0089, -0.0071,0.004, -15.7919,8.2644,12.5897V for
Fig. 6. Calibration accuracy vs. correct noise model. rotation and translation, /1 0.9948, /J -6.5470 for depth
= =

scale and bias. A warped result is shown in Fig. 7 (b). The


improvement in the chair area is very obvious.

5. CONCLUSIONS A ND FUTURE WORK

In this paper, we presented a novel algorithm to calibrate color


and depth sensors jointly. The method is reliable and accurate,
and it does not require additional hardware other than the eas­
ily available checkerboard pattern. Future work includes bet­
ter modeling of the depth mapping function f(·), and better
understanding of the depth cameras' noise models.

6, REFERENCES

[1] Y. Cui, S. Schuon, D. Chan, S. Thrun, and C. Theoba1t, "3D


shape scanning with a time-of-flight camera," in CVPR, 2010.

[2] R. Crabb,C. Tracey, A. Puranik,and J. Davis, "Real-time fore­


Fig. 7. Calibration results for a real scene. (a) Use plane ground segmentation via range and color imaging," in CVPR
fitting only. (b) Use both plane fitting and point pairs. Workshop on ToF-Camera based Computer Vision, 2008.

[3] Q. Cai, D. Gallup, C. Zhang, and Z. Zhang, "3d deformable


4.2. Real-World Results face tracking with a commodity depth camera," in ECCV,
2010.
Finally, we test the proposed method on a real Kinect sen­ [4] R. Tsai, "A versatile camera calibration technique for high­
sor. A set of 12 model planes at different positions and accuracy 3D machine vision metrology using off-the-shelf tv
orientations are captured. One of the image pairs has been cameras and lens;' IEEE Journal ofRobotics and Automation,
shown in Fig. 1. The color sensor's intrinsic parameters are vol. R A-3,no. 4,pp. 323-344,1987.
first estimated with [5] as 0: 528.32,j3 527.03,'Y = = =

0,Uo 320.10,Vo
[5] Z. Zhang, "A flexible new technique for camera calibration,"
= 257.57. We then apply the pro­
=
IEEE Transactions on Pattern Analysis and Machine Intelli­
posed technique to calibrate R,t and f(·), where f(·) con­ gence, vol. 22,no. 11,pp. 1330-1334,2000.
tains unknown parameters such as the depth scale and bias [6] J. More, "The levenberg-marquardt algorithm: implementation
/1, /J. The depth camera's intrinsic matrix Ad is pre-set to and theory," Numerical Analysis, Lecture Notes in Mathemat­
o:d = 575,j3d = 575,'Y = 0,ug = 320,vg = 240. ics, vol. 630/1978,pp. 105-116,1978.
The calibration results with plane fitting alone are [7] K. Arun, T. Huang, and S. Blostein, "Least-squares fitting of
[0.0058,-0.0049,0.0057,-16.4411,21.077,5.4074V for two 3-d point sets," IEEE Trans. PAMI, vol. 9,no. 5,pp. 698-
rotation and translation, /1 0.9771, /J 16.1883 for depth
= =
700,1987.

scale and bias. To demonstrate the calibration accuracy, we [8] M. A. Fischler and R. C. Bolles, "Random sample consensus: a
warp the color images based on the calibrated parameters, paradigm for model fitting with applications to image analysis

and overlay them onto the depth images to examine how well and automated cartography," Communications of the ACM, vol.
24,no. 6,pp. 381-395,1981.
they align with each other, as shown in Fig. 7 (a). It can be
seen that the alignment is very accurate. [9] J. S.-C. Yuan, "A general photogrammetric method for de­
termining object position and orientation," IEEE Trans. on
As shown in Section 4.1.3, adding additional point cor­
Robotics and Automation, vol. 5,no. 2,pp. 129-142,1989.
respondences between the color and the depth images may
[10] D. Dementhon and L. Davis, "Model-based object pose in 25
help improve calibration performance when the model planes
lines of code," International Journal of Computer Vision, vol.
do not have sufficient variations in orientation. Another ben­
15,no. 1,pp. 123-141,1995.
efit of adopting point correspondences is to expand the cal-

You might also like