Professional Documents
Culture Documents
HOME ABOUT ME COMPUTER VISION PROJECTS MY MUSIC TALKS
Firstly, some resources:
I think one must read all of them to understand this subtle art of calibrating cameras. Although, I’d like to recommend the
Microsoft technical report as well as the In-depth tutorial .
Proceeding with the blog article. I shall cover the article in the following sequence.
Let’s begin!
https://kushalvyas.github.io/calib.html 1/14
5/3/2019 Demystifying Geometric Camera Calibration for Intrinsic Matrix - BitsMakeMeCrazy <br> Kushal Vyas's Blog
Source: Mathworks
As seen, the visual pipeline is capturing the object in 3D from the World coordinate space and converting it through the the
aperture ( pinhole, in this case) and projects onto the camera image plane. This leads to the formation of the image.
The concept to be understood is that any point in the 3D world coordinate space is represented by P = (X, Y , Z )
T
. There is
an essential conversion of the 3D world point P to a local image coordinate space point, let’s say p = (u, v)
T
. Hence for
conversion of the points P → p , there is an effective projection transform ( just a matrix ) which enables so. The aim of
calibration is to find the effective projection transform hence yielding significant information regarding the vision system
such as focal lengths, camera pose, camera center, etc. I’ll get to it too. Thus formulating a basic equation for the above
paragraph, we can write it as:
[p] = M . [P ]
where M is a projection matrix converting the World (X, Y , Z , 1) point to the Image (u, v, 1) point. This is a very casual
representation of the above process happening through the visual pipeline.
On a broad view, the camera calibration yields us an intrinsic camera matrix, extrinsic parameters and the distortion
coefficients. The basic model for a camera is a pinhole camera model, but today’s cheap camera’s incorporate high levels of
noise/distortion in the images. For a simple visualization, I’ll put 2 images below. Note that the image on the left shows an
image captured by my logitech webcam, followed by the image on the right which shows an undistorted image. The
straight lines appear to be bent (curved) in the left image, whereas in the right one it appears normal.
Hence, the camera calibration process is useful in providing an accurate input image to any computer vision system in the
first place. (computer vision system which deal with pixel/real measurements. For other applications, it is not needed to
compute this process).
Camera calibration
We have established the the there basically is a transform that converts the world 3D point to an image point. However,
there are a series of sub transforms in between that enable that. The 3D world coordinates undergo a Rigid Body Transform
to get the same 3D World coordinates w.r.t the camera space. This newly obtained 3D set of coordinates are then projected
into the camera’s image plane yielding a 2D coordinate.
The conversion due to the rigid transformation is due to the “extrinsic parameters”, which comprise of rotation and
translation vectors, namely R & T . On the other hand, the “intrinsic parameters” is the “camera matrix” which is a 3 x 3
Camera Matrix :(A)
https://kushalvyas.github.io/calib.html 2/14
5/3/2019 Demystifying Geometric Camera Calibration for Intrinsic Matrix - BitsMakeMeCrazy <br> Kushal Vyas's Blog
α γ uc
⎡ ⎤
⎢ 0 β vc ⎥
⎣ ⎦
0 0 1
The essence of camera calibration starts with estimating a matrix/transform which maps the World Coordinates to Image
Plane coordinates. As described above, it eventually ends up being a equation in matrix form. However, let us start with
preparing the initial data.
To estimate the transform, Zhang’s method requires images of a fixed geometric pattern; the images of which are taken
from multiple views. Let’s say the total number of views are M . Given M views, each view comprises of a set of points for
which image and world coordinates are established. Consider N points per view.
For the above function one can use OpenCV’s findchessboardcorners function. cv2.findChessboardCorners which returns
a list of chessboard corners in the image.
Let the observed points be denoted as U and the model points be represented as X. For the image/observed points (U)
extracted from the M views, let each point be denoted bu U , where i is the view ; and
i,j j represents the extracted point
(chessboard). Hence, . At the same time,
Ui,j = (u, v) X represents a similar structure as U , with each point
Xi,j = (X, Y , Z )
From each correspondence between model points and image points, compute an associated homography between the
points. For each view, compute the homography.
Once the intrinsics are computed, Rotation and Translation Vectors (extrinsic) are estimated.
Using intrinsic and extrinsic parameters as initial guess for the LM Optimizer, refine all parameters.
I’ve described the complete algorithm for Zhang’s camera calibration. However this article will cover till point 6 -> pertaining
to the intrinsic params.
Implementation
We divide the implementation in the following parts
First of all steps is to collect sample images ( remember, there are M model views to be taken into account.) That means one
has to capture M images through the camera, such that each of the M images are at a unique position in the camera’s field
of view. Once those image sets are captures, we proceed to marking correspondences between the model and the images.
#!python
https://kushalvyas.github.io/calib.html 3/14
5/3/2019 Demystifying Geometric Camera Calibration for Intrinsic Matrix - BitsMakeMeCrazy <br> Kushal Vyas's Blog
np.set_printoptions(suppress=True)
puts = pprint.pprint
Computing the Chessboard corners using the cv2.findChessboardCorners function. One can note there is an array for
image_points which holds the image coordinates for the chessboard corners. Also, the array named object_points
holds the world coordinates for the same.
WHY CHESSBOARD! : Zhangs method, or even camera calibration in general is concerned with obtaining an transform from
real world 3D to image 2D coordinates. Since the grid pattern formed on a chessboard is a really simple, linear pattern, it is
natural to go with it. That being said, geometric calibration also requires a mapping for the world and image coordinates.
The reason i emphasize on this point is to understand the structure and “shape” (numpy users will be familiar to “shape”) of
the previously defined U and X data points.
Now, U is a array/list/matrix/data structure containing of all points in an image U . So a given points inside an image will be
i
u0,0 = (u0 , v 0 )
⎡ ⎤
⎢ u0,1 = (u1 , v 1 ) ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⋮ ⎥
⎣ ⎦
u0,N −1 = (uN −1 , v N −1 )
u0,0 = (u0 , v 0 )
⎡ ⎤
⎢ ⎥
⎢ ⋮ ⎥
⎢ ⎥
⎢ ⎥
⎢ u0,N −1 = (uN −1 , v N −1 )⋮ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⋮ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⋮ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⋮ ⎥
⎢ ⎥
⎢ ⎥
⎢ uN −1,0 = (u0 , v 0 ) ⎥
⎢ ⎥
⎢ ⎥
⎢ ⋮ ⎥
⎣ ⎦
uN −1,N −1 = (uN −1 , v N −1 )
https://kushalvyas.github.io/calib.html 4/14
5/3/2019 Demystifying Geometric Camera Calibration for Intrinsic Matrix - BitsMakeMeCrazy <br> Kushal Vyas's Blog
Secondly, as mentioned previously in the introduction, we are there has to be correspondences established before we
compute the transfer matrix. Every point belonging to the image plane has coordinates (u, v) . The real world 3D point
corresponding to it will be of the format (X, Y , Z ) . So technically, there needs to be a transform that maps,
T T
U (u, v, 1) = [M ]. P (X, Y , Z , 1)
Hence, we also create an array for the model/realworld points which establishes the correspondences. I have mentioned a
parameter SQUARE_SIZE previously which is the size of the chessboard square (cm). the next step is to create P array of
shape M × (N × 3) . For each of the M views, the array is a N × 3 array which has N rows, each of the N rows having
(X, Y , Z )
Since we are using a chessboard, and we know the chessboard square size, it is easy to virtually compute physical locations
of the chessboard corners in real world . Assuming a Point A = (0, 0) , every point can be expressed as
^ ^ ^ ^
(A i + A j ) + (k × SQUARE_SIZE( i + j )) , where k ranges upto PATTERN_SIZE
Below is the code for detecting chessboard_corners, and establishing correspondences between image_points(U) and
model points (X).
https://kushalvyas.github.io/calib.html 5/14
5/3/2019 Demystifying Geometric Camera Calibration for Intrinsic Matrix - BitsMakeMeCrazy <br> Kushal Vyas's Blog
Important : One important point to be noted during Zhang’s algorithm is that for any object points P(X, Y, Z), since it is a
planar method, Z = 0 . To visualize this, consider the following diagram. As seen, below is sample origin of the chessboard
real world system. X-Y Axis belong inside the plane of the chessboard, and Z-axis is normal to the chessboard.
Note that the Z-Axis is normal to the board, hence for every real world point Z=0
eventually leading to
p(u, v) = M . P (X, Y , Z )
where Matrix M represents the required transformation from world to image point. However, there are 2 aspects in the
above conversion. One is the rigid transform ( extrinsic parameters) and then that is passed on to the intrinsic
camera transform.
Hence, we can split the M-matrix into sub matrices , thus breaking down the flow into multiple blocks. Also, note that now
the computations will be carried in homogeneous coordinate spaces, so, p(u, v) → p(u, v, 1) and
P (X, Y , Z ) → P (X, Y , Z , 1) .
p(u, v, 1) = M . P (X, Y , Z , 1)
p = A. [R|t]. P
where A resembles the intrinsic camera matrix (projective transform) and [R|t] resembles the rotation and translation of
the camera pose. (extrinsic)
p is a 3 × 1 matrix,
A is a 3 × 3 matrix,
[R|t] is a 3 × 4 matrix,
P is a 4 × 1 matrix.
Therefore,
X
⎡ ⎤
u a 00 a 01 a 02 R 00 R 01 R 02 T 03
⎡ ⎤ ⎡ ⎤⎡ ⎤
⎢ Y ⎥
⎢ v ⎥ = ⎢ a 10 a 11 a 12 ⎥ ⎢ R 10 R 11 R 12 T 13 ⎥⎢ ⎥
⎢ ⎥
⎣ ⎦ ⎣ ⎦⎣ ⎦ Z = 0
1 a 20 a 21 a 22 R 20 R 21 R 22 T 23 ⎣ ⎦
1
https://kushalvyas.github.io/calib.html 6/14
5/3/2019 Demystifying Geometric Camera Calibration for Intrinsic Matrix - BitsMakeMeCrazy <br> Kushal Vyas's Blog
Since Z = 0, we can eliminate the third column of [R|t] , because the multiplication of that entire column will coincide with
Z=0, resulting in a zero contribution. Hence, we can eliminate Z from P and the third column from [R|t].
u a 00 a 01 a 02 R 00 R 01 T 03 X
⎡ ⎤ ⎡ ⎤⎡ ⎤⎡ ⎤
⎢ v ⎥ = ⎢ a 10 a 11 a 12 ⎥ ⎢ R 10 R 11 T 13 ⎥ ⎢ Y ⎥
⎣ ⎦ ⎣ ⎦⎣ ⎦⎣ ⎦
1 a 20 a 21 a 22 R 20 R 21 T 23 1
What is homography : So i used the word homography in the above paragraph. A Homography can be said a
transform/matrix which essentially converts points from one coordinate space to another, like how the world points P are
being converted to image points p through the matrix [M ] . Hence, for each view, there is a homography associated to it
which converets P to p.
Hence, p ← [M ]. X . This can be considered as the base equation from which we will compute [M ] . I’ll actually write H
p ← H. X
Hence,
h00
⎛ ⎞
⎜ h01 ⎟
⎜ ⎟
⎜ h02 ⎟
⎜ ⎟
⎜ ⎟
⎜ h10 ⎟
−X −Y −1 0 0 0 u. X u. Y u ⎜ ⎟
( )⎜ ⎟
h11 ⎟ = 0
⎜
0 0 0 −X −Y −1 v. X v. Y v ⎜ ⎟
⎜ h12 ⎟
⎜ ⎟
⎜ ⎟
⎜ h20 ⎟
⎜ ⎟
⎜ h ⎟
21
⎝ ⎠
h22
A. x = 0
https://kushalvyas.github.io/calib.html 7/14
5/3/2019 Demystifying Geometric Camera Calibration for Intrinsic Matrix - BitsMakeMeCrazy <br> Kushal Vyas's Blog
This is for only one point located in one image. For N points per image, just vertically stack the above matrix, and solve
AX=0 for the above system of points. For each point out of the N points, there are 2 rows obtained in the above
representation. Hence, for N points, it will be 2 × N rows.
−X0 −Y 0 −1 0 0 0 u0 . X0 u0 . Y 0 u0
⎛ ⎞
⎜ 0 0 0 −X0 −Y 0 −1 v 0 . X0 v0 . Y 0 v0 ⎟
⎜ ⎟
⎜ ⎟
⎜ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⎟
⎜ ⎟
⎜ ⎟ ⃗
⎜ ⎟ .h = 0
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
⎜ ⎟
⎜ ⎟
⎜ ⎟
⎜ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⎟
⎜ ⎟
⎜ −X −Y N −1 −1 0 0 0 uN −1 . XN −1 uN −1 . Y N −1 uN −1 ⎟
N −1
⎝ ⎠
0 0 0 −XN −1 −Y N −1 −1 v N −1 . XN −1 v N −1 . Y N −1 v N −1 (2×N ,9)
The above system shows an Ax=0 system. The solution can be of two ways. The obvious trivial solution is x=0 , however
we are not looking for that. The other solution is to find a non-trivial finite solution such that Ax ~ 0, if not zero. However, the
explaination to this lies along the lines of using a Null Space of vector A, such that the ||Ax|| . The solution for such
2
→ min
U : Shape - (2 × N , 2 × N )
https://kushalvyas.github.io/calib.html 8/14
5/3/2019 Demystifying Geometric Camera Calibration for Intrinsic Matrix - BitsMakeMeCrazy <br> Kushal Vyas's Blog
S : Shape - (2 × N , 9)
V_transpose : Shape - 9 × 9
Below is the python snippet for computing numpy svd, and returns a normalized homography matrix. The homography
matrix need to be de-normalized as well, since the initial points are in a raw/de-normalized form. Normalization is used to
make DLT (direct linear transformation) give an optimal solution.
h_norm = vh[np.argmin(s)]
h_norm = h_norm.reshape(3, 3)
Refining Homographies:
To refine the homography, obtained per view, a non liner optimizer: Levenberg Marquadt is used. This can be done using
scipy.optimize . Refer the source code on github to know more about the minimizer function and the jacobian .
1 N = normalized_object_points.shape[0]
2 X = object_points.flatten()
3 Y = image_points.flatten()
4 h = H.flatten() #H is homography for given view.
5 h_prime = opt.least_squares(fun=minimizer_func, \
6 x0=h, jac=jac_function, method="lm" , \
7 args=[X, Y, h, N], verbose=0)
8
9 if h_prime.success:
10 H = h_prime.x.reshape(3, 3)
11 H = H/H[2, 2]
12 return H
Computing intrinsic params: For each view we compute a homography. Let us maintain an array of size (M), where M being
the number of views (donot confuse M - the number of views with the matrix M in M.h =0) Hence, for each of the M views,
(i.e. M chessboard images), there are M homographies obtained.
thus,
for i in range(M):
H[i] = compute_view_homography(i)
But what was homography in the first place ? : We said that that
p(u, v, 1) ← H . P (X, Y , Z , 1)
Hence, the homography per view computed comprises of the intrinsic projection transform as well as the extrinsic rigid
body transform. Hence, we can say that:
H = A. [R|t]
https://kushalvyas.github.io/calib.html 9/14
5/3/2019 Demystifying Geometric Camera Calibration for Intrinsic Matrix - BitsMakeMeCrazy <br> Kushal Vyas's Blog
p(u, v) = A[R|t]. P (X, Y , Z )
u X
⎡ ⎤ ⎡ ⎤
⎢ v ⎥ = [ A0 A1 A 2 ] [ R0 R1 t2 ] ⎢ Y ⎥
⎣ ⎦ ⎣ ⎦
1 1
[ h0 h1 h2 ] = λ × A × [R 0 , R 1 , T 2 ]
Given that R0 , and R1 are orthonomal, their dot products is 0.Therefore, h0 = λ × A × R 0 and h1 = λ × A × R 1 . Thus,
R0 = A
−1
. h0 , and similarly for R . This yields us R and R , and their dot product gives R
1 0 1
T
0
. R1 = 0 .
T −1 T −1
h . (A ) . (A ). h1 = 0
0
Let B = (A
−1
)
T
. (A
−1
) (according to zhang’s paper) we define a symmetric matrix, B as :
B0 B1 B3 B 11 B 12 B 13
⎛ ⎞ ⎛ ⎞
B = ⎜ B1 B2 B 4 ⎟ or ⎜ B 21 B 22 B 23 ⎟
⎝ ⎠ ⎝ ⎠
B3 B4 B5 B 31 B 32 B 33
hi0 . hj0
⎡ ⎤
⎣ ⎦
hi 2. hj2
Therefore, using the dot product constraint for B mentioned above, we can get,
T
v
12
[ ].b = V.b = 0
(v 11 − v 22 )
Again, the system is of the form Ax = 0, and the solution is computed using the SVD(V) which yields us b, and by extension
B .
1 def get_intrinsic_parameters(H_r):
2 M = len(H_r)
3 V = np.zeros((2*M, 6), np.float64)
4
5 def v_pq(p, q, H):
6 v = np.array([
7 H[0, p]*H[0, q],
8 H[0, p]*H[1, q] + H[1, p]*H[0, q],
9 H[1, p]*H[1, q],
10 H[2, p]*H[0, q] + H[0, p]*H[2, q],
11 H[2, p]*H[1, q] + H[1, p]*H[2, q],
12 H[2, p]*H[2, q]
13 ])
14 return v
15
16 for i in range(M):
17 H = H_r[i]
18 V[2*i] = v_pq(p=0, q=1, H=H)
19 V[2*i + 1] = np.subtract(v_pq(p=0, q=0, H=H), v_pq(p=1, q=1, H=H))
20
21 # solve V.b = 0
22 u, s, vh = np.linalg.svd(V)
23 b = vh[np.argmin(s)]
24 print("V.b = 0 Solution : ", b.shape)
https://kushalvyas.github.io/calib.html 10/14
5/3/2019 Demystifying Geometric Camera Calibration for Intrinsic Matrix - BitsMakeMeCrazy <br> Kushal Vyas's Blog
2
v c = (b[1]. b[3] − b[0]. b[4])/(b[0]. b[2] − b[1] )
2
l = b[5] − (b[3] + vc. (b[1]. b[2] − b[0]. b[4]))/b[0]
2
beta = np. sqrt(((l. b[0])/(b[0]. b[2] − b[1] )))
2
gamma = −1. ((b[1]). (alpha ). (beta/l))
2
uc = (gamma. vc/beta) − (b[3]. (alpha )/l)
Hence, A is:
α γ uc
⎡ ⎤
A = ⎢ 0 β vc ⎥
⎣ ⎦
0 0 1
Furthermore, A can be upudated along with the complete set of intrinsic and extrinsic parameters using
Levenberg Marquadt.
Results
I implemented using Python 2.7, and NumPy 1.12. for the given dataset of images, the following values are returned.
Camera Matrix:
⎢ 0. 826.80638173 223.27202318 ⎥
⎣ ⎦
0. 0. 1.
https://kushalvyas.github.io/calib.html 11/14
5/3/2019 Demystifying Geometric Camera Calibration for Intrinsic Matrix - BitsMakeMeCrazy <br> Kushal Vyas's Blog
532.79536563 0. 342.4582516
⎡ ⎤
⎢ 0. 532.91928339 233.90060514 ⎥
⎣ ⎦
0. 0. 1.
⎢ 0. 537.44026588 235.75125989 ⎥
⎣ ⎦
0. 0. 1.
https://kushalvyas.github.io/calib.html 12/14
5/3/2019 Demystifying Geometric Camera Calibration for Intrinsic Matrix - BitsMakeMeCrazy <br> Kushal Vyas's Blog
2 Comments bitsmakemecrazy
1 Login
LOG IN WITH
OR SIGN UP WITH DISQUS ?
Name
ALSO ON BITSMAKEMECRAZY
'The Social Network' where it all started · … Using GigE Cameras with Aravis | OpenCV | Gstreamer
1 comment • 3 years ago 5 comments • a year ago
Riken Shah — Awesome post..! Keep writing more. ;) kushalvyas — I'll check that out too .. thanks for being a reader
Avatar Avatarof the blog :)
✉ Subscribe d Add Disqus to your siteAdd DisqusAdd 🔒 Disqus' Privacy PolicyPrivacy PolicyPrivacy
TOP ARTICLES
Camera Calibration Image Stitching Bag of Visual Words Structured Light 3D Reconstruction Caffe + ConvNets
Coming Soon
Tags
stitching(1) mosaicking(1) AI(1) utils(1) Py(1) wikipedia(1) Machine Vision(1) technical(1) dev(1) images(1) 3D(1)
Camera Calibration(1) CV(4) ObjectRecognition(4)
Categories
https://kushalvyas.github.io/calib.html 13/14
5/3/2019 Demystifying Geometric Camera Calibration for Intrinsic Matrix - BitsMakeMeCrazy <br> Kushal Vyas's Blog
3D Reconstruction, 3D, stereo, CV (1) AI (1) Computer Vision, Javascript, JS, CV (1) CV (3) CV, IP (1)
CV, Machine Vision, Camera (1) graph_py (1) networks (1) Object Recognition, Classification, CV (2)
Object Recognition,CV (1) technical, dev (1) utils (1)
https://kushalvyas.github.io/calib.html 14/14