Demystifying Geometric Camera Calibration For Intrinsic Matrix - BitsMakeMeCrazy - BR - Kushal Vyas's Blog

5/3/2019 Demystifying Geometric Camera Calibration for Intrinsic Matrix - BitsMakeMeCrazy <br> Kushal Vyas's Blog
HOME ABOUT ME COMPUTER VISION PROJECTS MY MUSIC TALKS     
Demystifying Geometric Camera Calibration for

Intrinsic Matrix
Posted on Sun 13 May 2018
Using Zhangs method to compute the intrinsic matrix using Python NumPy
Computing the intrinsic camera matrix using Zhangs algorithm

Long time no blogging; but i am very interested in writing this article - the reason being i first used camera calibration in my
second year, but that time I had OpenCV to use. ALthough, since that time I had decided to write a tutorial explaining the
aspects of it as well. So first things first. I’ll start this off mentioning about 2 articles that helped me get a clearer
understanding of the method of calibration, then I will start off with what it is, how it is useful, which parameters it
computes, etc. Also, to mention, this article delineates about the intrinsic matrix, and I will be covering {R|T} matrices along
with distortion coefficients and image undistortion in an upcoming update to the blog article.
Firstly, some resources:
Original Paper by Zhengyou Zhang — “A flexible new technique for camera calibration”

Microsoft Technical Report for Camera Calibration
Zhang’s Camera Calibration Algorithm: In-Depth Tutorial and Implementation - Report by Wilhelm Burger
I have also made my own notes, which is basically information from the above resources. Uploaded it here
Implementation and source code for article : https://github.com/kushalvyas/CameraCalibration
I think one must read all of them to understand this subtle art of calibrating cameras. Although, I’d like to recommend the
Microsoft technical report as well as the In-depth tutorial .
Proceeding with the blog article. I shall cover the article in the following sequence.
Image formation in a Camera → World and Image points

Concept of Camera Calibration
Intrinsic and Extrinsic Parameters
Types of distortions (Radial, Barrel, Pincushion)
Computation of the intrinsic camera calibration matrix
Computation of extrinsic parameters (To be Updated)
Distortion Coefficients and Undistortion (TO be Updated)
Let’s begin!
So here’s how a pinhole camera works. Consider the image below.
https://kushalvyas.github.io/calib.html 1/14
Source: Mathworks
As seen, the visual pipeline is capturing the object in 3D from the World coordinate space and converting it through the the
aperture ( pinhole, in this case) and projects onto the camera image plane. This leads to the formation of the image.
The concept to be understood is that any point in the 3D world coordinate space is represented by P = (X, Y , Z )
T
. There is
an essential conversion of the 3D world point P to a local image coordinate space point, let’s say p = (u, v)
T
. Hence for
conversion of the points P → p , there is an effective projection transform ( just a matrix ) which enables so. The aim of
calibration is to find the effective projection transform hence yielding significant information regarding the vision system
such as focal lengths, camera pose, camera center, etc. I’ll get to it too. Thus formulating a basic equation for the above
paragraph, we can write it as:
[p] = M . [P ]
where M is a projection matrix converting the World (X, Y , Z , 1) point to the Image (u, v, 1) point. This is a very casual
representation of the above process happening through the visual pipeline.
On a broad view, the camera calibration yields us an intrinsic camera matrix, extrinsic parameters and the distortion
coefficients. The basic model for a camera is a pinhole camera model, but today’s cheap camera’s incorporate high levels of
noise/distortion in the images. For a simple visualization, I’ll put 2 images below. Note that the image on the left shows an
image captured by my logitech webcam, followed by the image on the right which shows an undistorted image. The
straight lines appear to be bent (curved) in the left image, whereas in the right one it appears normal.
Source :OpenCV Camera Calibration docs
Hence, the camera calibration process is useful in providing an accurate input image to any computer vision system in the
first place. (computer vision system which deal with pixel/real measurements. For other applications, it is not needed to
compute this process).
So let’s start with the camera calibration algorithm.
Camera calibration
We have established the the there basically is a transform that converts the world 3D point to an image point. However,
there are a series of sub transforms in between that enable that. The 3D world coordinates undergo a Rigid Body Transform
to get the same 3D World coordinates w.r.t the camera space. This newly obtained 3D set of coordinates are then projected
into the camera’s image plane yielding a 2D coordinate.
{Rigid Transform} {Projective Transform}

′
P (X, Y , Z ) ⟶ P (X, Y , Z ) w. r. t. camera sf rame ⟶ p(u, v)
The conversion due to the rigid transformation is due to the “extrinsic parameters”, which comprise of rotation and
translation vectors, namely R & T . On the other hand, the “intrinsic parameters” is the “camera matrix” which is a 3 x 3
matrix ( the projective transform).
This is how each of the matrices look like
Camera Matrix :(A)
Where α, β is the focal length ( f , f ); γ is pixel skew; (u

x y c, v c ) is the camera center (origin)
α γ uc
⎡ ⎤
⎢ 0 β vc ⎥
⎣ ⎦
0 0 1
Algorithm for Camera Calibration
The essence of camera calibration starts with estimating a matrix/transform which maps the World Coordinates to Image
Plane coordinates. As described above, it eventually ends up being a equation in matrix form. However, let us start with
preparing the initial data.
To estimate the transform, Zhang’s method requires images of a fixed geometric pattern; the images of which are taken
from multiple views. Let’s say the total number of views are M . Given M views, each view comprises of a set of points for
which image and world coordinates are established. Consider N points per view.
For M views, consider M images from I to I 0 M −1
For each image I where i = (0 … M-1) : N correspondence points are computed:

i
For the above function one can use OpenCV’s findchessboardcorners function. cv2.findChessboardCorners which returns
a list of chessboard corners in the image.
Let the observed points be denoted as U and the model points be represented as X. For the image/observed points (U)
extracted from the M views, let each point be denoted bu U , where i is the view ; and
i,j j represents the extracted point
(chessboard). Hence, . At the same time,
Ui,j = (u, v) X represents a similar structure as U , with each point
Xi,j = (X, Y , Z )
From each correspondence between model points and image points, compute an associated homography between the
points. For each view, compute the homography.
From the set of estimated homographies, compute intrinsic parameters α, γ, u c, β, v c .
Update parameters using the LM-Optimizer.
Once the intrinsics are computed, Rotation and Translation Vectors (extrinsic) are estimated.
Using intrinsic and extrinsic parameters as initial guess for the LM Optimizer, refine all parameters.
I’ve described the complete algorithm for Zhang’s camera calibration. However this article will cover till point 6 -> pertaining
to the intrinsic params.
Implementation
We divide the implementation in the following parts
Computing observed and model points correspondences.

Normalization
Compute view-wise homographies.
Refine Homography
Estimate Camera Intrinsic from homographies.
pre { overflow: auto; word-wrap: normal; white-space: pre; }
Computing observed and model points
First of all steps is to collect sample images ( remember, there are M model views to be taken into account.) That means one
has to capture M images through the camera, such that each of the M images are at a unique position in the camera’s field
of view. Once those image sets are captures, we proceed to marking correspondences between the model and the images.
Let’s just mention the imports and other variables.
#!python
from __future__ import print_function, division

import os
import glob
import sys, argparse
import pprint
import numpy as np
import cv2
from scipy import optimize as opt
np.set_printoptions(suppress=True)
puts = pprint.pprint
DATA_DIR = "/<path to data>/data/"

DEBUG_DIR = "/<path to data>/data/debug/"
PATTERN_SIZE = (7, 7)
SQUARE_SIZE = 1.0 #
The post will use OpenCV’s cv2.findChessboardCorners function for locating chessboard corners from the image. Other
than that everything is computed using NumPy .
Store the images in a DATA directory.

Make an additional directory DATA/DEBUG/ to store debug images
.
.
1 def show_image(string, image):
2 cv2.imshow(string, image)
3 cv2.waitKey()
4
5 # read images from DATA_DIR, one at a time
6 # returns image path, as well as image in grayscale
7 def get_camera_images():
8 images = [each for each in glob.glob(DATA_DIR + "*.jpg")]
9 images = sorted(images)
10 for each in images:
11 yield (each, cv2.imread(each, 0))
Computing the Chessboard corners using the cv2.findChessboardCorners function. One can note there is an array for
image_points which holds the image coordinates for the chessboard corners. Also, the array named object_points
holds the world coordinates for the same.
WHY CHESSBOARD! : Zhangs method, or even camera calibration in general is concerned with obtaining an transform from
real world 3D to image 2D coordinates. Since the grid pattern formed on a chessboard is a really simple, linear pattern, it is
natural to go with it. That being said, geometric calibration also requires a mapping for the world and image coordinates.
The reason i emphasize on this point is to understand the structure and “shape” (numpy users will be familiar to “shape”) of
the previously defined U and X data points.
Now, U is a array/list/matrix/data structure containing of all points in an image U . So a given points inside an image will be
i
Ui,j = (u, v) given that i is the th

i model view ( i = (0, M − 1) , and j is the th
j point inside the th
i image, where
j = (0, N − 1) .
One can image it as a vector as follows
u0,0 = (u0 , v 0 )
⎡ ⎤
⎢ u0,1 = (u1 , v 1 ) ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⋮ ⎥
⎣ ⎦
u0,N −1 = (uN −1 , v N −1 )
and eventually, for all M Views :
u0,0 = (u0 , v 0 )
⎡ ⎤
⎢ ⎥
⎢ ⋮ ⎥
⎢ ⎥
⎢ ⎥
⎢ u0,N −1 = (uN −1 , v N −1 )⋮ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⋮ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⋮ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⋮ ⎥
⎢ ⎥
⎢ ⎥
⎢ uN −1,0 = (u0 , v 0 ) ⎥
⎢ ⎥
⎢ ⎥
⎢ ⋮ ⎥
⎣ ⎦
uN −1,N −1 = (uN −1 , v N −1 )
Hence, the above mention.
Secondly, as mentioned previously in the introduction, we are there has to be correspondences established before we
compute the transfer matrix. Every point belonging to the image plane has coordinates (u, v) . The real world 3D point
corresponding to it will be of the format (X, Y , Z ) . So technically, there needs to be a transform that maps,
T T
U (u, v, 1) = [M ]. P (X, Y , Z , 1)
Hence, we also create an array for the model/realworld points which establishes the correspondences. I have mentioned a
parameter SQUARE_SIZE previously which is the size of the chessboard square (cm). the next step is to create P array of
shape M × (N × 3) . For each of the M views, the array is a N × 3 array which has N rows, each of the N rows having
(X, Y , Z )
Since we are using a chessboard, and we know the chessboard square size, it is easy to virtually compute physical locations
of the chessboard corners in real world . Assuming a Point A = (0, 0) , every point can be expressed as
^ ^ ^ ^
(A i + A j ) + (k × SQUARE_SIZE( i + j )) , where k ranges upto PATTERN_SIZE
Below is the code for detecting chessboard_corners, and establishing correspondences between image_points(U) and
model points (X).
1 def getChessboardCorners(images = None, visualize=False):

2 objp = np.zeros((PATTERN_SIZE[1]*PATTERN_SIZE[0], 3), dtype=np.float64)
3 objp[:, :2] = np.indices(PATTERN_SIZE).T.reshape(-1, 2)
4 objp *= SQUARE_SIZE
5
6 chessboard_corners = []
7 image_points = []
8 object_points = []
9 correspondences = []
10 ctr=0
11 for (path, each) in get_camera_images(): #images:
12 print("Processing Image : ", path)
13 ret, corners = cv2.findChessboardCorners(each, patternSize=PATTERN_SIZE)
14 if ret:
15 print ("Chessboard Detected ")
16 corners = corners.reshape(-1, 2)
17 if corners.shape[0] == objp.shape[0] :
18 # print(objp[:,:-1].shape)
19 image_points.append(corners)
20 object_points.append(objp[:,:-1]) #append only World_X, World_Y. Because World_Z is ZERO. Just a simple
21 correspondences.append([corners.astype(np.int), objp[:, :-1].astype(np.int)])
22 if visualize:
23 # Draw and display the corners
24 ec = cv2.cvtColor(each, cv2.COLOR_GRAY2BGR)
25 cv2.drawChessboardCorners(ec, PATTERN_SIZE, corners, ret)
26 cv2.imwrite(DEBUG_DIR + str(ctr)+".png", ec)
27
28 else:
29 print ("Error in detection points", ctr)
30
31 ctr+=1
32
33 return correspondences
corners : image points returned by cv2.findChessboardCorners
image_points : array for containing all points extracted. (u, v) format
object_points : object points (X, Y , |Z = 0)
Important : One important point to be noted during Zhang’s algorithm is that for any object points P(X, Y, Z), since it is a
planar method, Z = 0 . To visualize this, consider the following diagram. As seen, below is sample origin of the chessboard
real world system. X-Y Axis belong inside the plane of the chessboard, and Z-axis is normal to the chessboard.
Note that the Z-Axis is normal to the board, hence for every real world point Z=0
Representation of the correspondence:

This section details the construction of the transformation matrices required through this process. Let the image point be
denoted by p or U (I’ll keep alternating between these notations throughout). Also the model/world points are : X or P .
The conversion of model points to image points is as
{Rigid Transform} {Projective Transform}

′
P (X, Y , Z ) ⟶ P (X, Y , Z ) w. r. t. camera sf rame ⟶ p(u, v)
eventually leading to
p(u, v) = M . P (X, Y , Z )
where Matrix M represents the required transformation from world to image point. However, there are 2 aspects in the
above conversion. One is the rigid transform ( extrinsic parameters) and then that is passed on to the intrinsic
camera transform.
Hence, we can split the M-matrix into sub matrices , thus breaking down the flow into multiple blocks. Also, note that now
the computations will be carried in homogeneous coordinate spaces, so, p(u, v) → p(u, v, 1) and
P (X, Y , Z ) → P (X, Y , Z , 1) .
p(u, v, 1) = M . P (X, Y , Z , 1)
p = A. [R|t]. P
where A resembles the intrinsic camera matrix (projective transform) and [R|t] resembles the rotation and translation of
the camera pose. (extrinsic)
Assessing the shapes of each matrix, we can deduce that:
p is a 3 × 1 matrix,
A is a 3 × 3 matrix,
[R|t] is a 3 × 4 matrix,
P is a 4 × 1 matrix.
Therefore,
p3×1 = A 3×3 . [R|t]3×4 . P 4×1
X
⎡ ⎤
u a 00 a 01 a 02 R 00 R 01 R 02 T 03
⎡ ⎤ ⎡ ⎤⎡ ⎤
⎢ Y ⎥
⎢ v ⎥ = ⎢ a 10 a 11 a 12 ⎥ ⎢ R 10 R 11 R 12 T 13 ⎥⎢ ⎥
⎢ ⎥
⎣ ⎦ ⎣ ⎦⎣ ⎦ Z = 0
1 a 20 a 21 a 22 R 20 R 21 R 22 T 23 ⎣ ⎦
1
Since Z = 0, we can eliminate the third column of [R|t] , because the multiplication of that entire column will coincide with
Z=0, resulting in a zero contribution. Hence, we can eliminate Z from P and the third column from [R|t].
Hence, the system reduces to a complete 3 × 3 system.
p3×1 = A 3×3 . [R − R :,3 |t]3×3 . [P P −Z ]3×1
u a 00 a 01 a 02 R 00 R 01 T 03 X
⎡ ⎤ ⎡ ⎤⎡ ⎤⎡ ⎤
⎢ v ⎥ = ⎢ a 10 a 11 a 12 ⎥ ⎢ R 10 R 11 T 13 ⎥ ⎢ Y ⎥
⎣ ⎦ ⎣ ⎦⎣ ⎦⎣ ⎦
1 a 20 a 21 a 22 R 20 R 21 T 23 1
Normalization & Estimate View Homographies:

The next step in the algorithm is to estimate homographies for each of the M views. However, there is an intermediate step
to normalize the points (refer to normaliztion function in the source code ). An essential part of the estimating view
homographies is to obtain a solution using Direct Linear transformation (will conver it in a later section). This requires
normalization of the input data points around its mean. This makes sure the there is a finite DLT solution for the equations
obtained while estimating the homography.
What is homography : So i used the word homography in the above paragraph. A Homography can be said a
transform/matrix which essentially converts points from one coordinate space to another, like how the world points P are
being converted to image points p through the matrix [M ] . Hence, for each view, there is a homography associated to it
which converets P to p.
Hence, p ← [M ]. X . This can be considered as the base equation from which we will compute [M ] . I’ll actually write H
instead of M , so that it doesnt conflict with the number of views (M views ).
p ← H. X
u h00 h01 h02 X

⎡ ⎤ ⎡ ⎤⎡ ⎤
⎢ v ⎥ = ⎢ h10 h11 h12 ⎥ ⎢ Y ⎥

⎣ ⎦ ⎣ ⎦⎣ ⎦
1 h20 h21 h22 1
Hence, on obtaining the results,
h00 . X + h01 . Y + h02

u =
h20 . X + h21 . Y + h22
h10 . X + h11 . Y + h12

v =
h20 . X + h21 . Y + h22
Hence,
u. (h20 . X + h21 . Y + h22 ) − (h00 . X + h01 . Y + h02 ) = 0
v. (h20 . X + h21 . Y + h22 ) − (h10 . X + h11 . Y + h12 ) = 0
We can remodel the above equation a simpler wayy..
h00
⎛ ⎞
⎜ h01 ⎟
⎜ ⎟
⎜ h02 ⎟
⎜ ⎟
⎜ ⎟
⎜ h10 ⎟
−X −Y −1 0 0 0 u. X u. Y u ⎜ ⎟
( )⎜ ⎟
h11 ⎟ = 0
⎜
0 0 0 −X −Y −1 v. X v. Y v ⎜ ⎟
⎜ h12 ⎟
⎜ ⎟
⎜ ⎟
⎜ h20 ⎟
⎜ ⎟
⎜ h ⎟
21
⎝ ⎠
h22
A. x = 0
This is for only one point located in one image. For N points per image, just vertically stack the above matrix, and solve
AX=0 for the above system of points. For each point out of the N points, there are 2 rows obtained in the above
representation. Hence, for N points, it will be 2 × N rows.
−X0 −Y 0 −1 0 0 0 u0 . X0 u0 . Y 0 u0
⎛ ⎞
⎜ 0 0 0 −X0 −Y 0 −1 v 0 . X0 v0 . Y 0 v0 ⎟
⎜ ⎟
⎜ ⎟
⎜ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⎟
⎜ ⎟
⎜ ⎟ ⃗
⎜ ⎟ .h = 0
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
⎜ ⎟
⎜ ⎟
⎜ ⎟
⎜ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⎟
⎜ ⎟
⎜ −X −Y N −1 −1 0 0 0 uN −1 . XN −1 uN −1 . Y N −1 uN −1 ⎟
N −1
⎝ ⎠
0 0 0 −XN −1 −Y N −1 −1 v N −1 . XN −1 v N −1 . Y N −1 v N −1 (2×N ,9)
The formulation of the above matrix can be written in this loop
1 # repeat these steps for each view

2
3 N = len(image_points)
4 print("Number of points in current view : ", N)
5
6 M = np.zeros((2*N, 9), dtype=np.float64)
7 print("Shape of Matrix M : ", M.shape)
8
9 # create row wise allotment for each 0-2i rows
10 # that means 2 rows..
11 for i in xrange(N):
12 X, Y = normalized_object_points[i] #model points
13 u, v = normalized_image_points[i] #image points
14
15 row_1 = np.array([ -X, -Y, -1, 0, 0, 0, X*u, Y*u, u])
16 row_2 = np.array([ 0, 0, 0, -X, -Y, -1, X*v, Y*v, v])
17 M[2*i] = row_1
18 M[(2*i) + 1] = row_2
19
20 print ("p_model {0} \t p_obs {1}".format((X, Y), (u, v)))
Computing the homography:
The above system shows an Ax=0 system. The solution can be of two ways. The obvious trivial solution is x=0 , however
we are not looking for that. The other solution is to find a non-trivial finite solution such that Ax ~ 0, if not zero. However, the
explaination to this lies along the lines of using a Null Space of vector A, such that the ||Ax|| . The solution for such
2
→ min
a system can be computed using SVD. (SVD provides orthonormal vectors).
source: SVD - Wikipedia
Similarly in our system, A matrix is of shape (2 × N , 9) . Thus the decomposition of A returns
U : Shape - (2 × N , 2 × N )
S : Shape - (2 × N , 9)
V_transpose : Shape - 9 × 9
Thus, computing solution for h, we obtain
Solution to Ax=0 / M.h = 0

u, s, v_t = svd(A)
x = v_t[argmin(s)]
Since v_t is a 9 × 9 matrix, it indicates to have 9 rows, each row having other 9 elements. The solution x is obtained by
picking the eigen vector corresponding to the minimum value in S. This is obtained by selecting the row number, such that its
index is same as the index of min value in S. Eventually leads to a row vectors of 9 columns. Thus the final solution to x : in
our case (where it is a 3 × 3 matrix) is to reshape it.
Below is the python snippet for computing numpy svd, and returns a normalized homography matrix. The homography
matrix need to be de-normalized as well, since the initial points are in a raw/de-normalized form. Normalization is used to
make DLT (direct linear transformation) give an optimal solution.
# M.h = 0 . solve system of linear equations using SVD

u, s, vh = np.linalg.svd(M)
print("Computing SVD of M")
# print("U : Shape {0} : {1}".format(u.shape, u))
# print("S : Shape {0} : {1}".format(s.shape, s))
# print("V_t : Shape {0} : {1}".format(vh.shape, vh))
# print(s, np.argmin(s))
h_norm = vh[np.argmin(s)]
h_norm = h_norm.reshape(3, 3)
returns mormalized homography matrix
Refining Homographies:
To refine the homography, obtained per view, a non liner optimizer: Levenberg Marquadt is used. This can be done using
scipy.optimize . Refer the source code on github to know more about the minimizer function and the jacobian .
1 N = normalized_object_points.shape[0]
2 X = object_points.flatten()
3 Y = image_points.flatten()
4 h = H.flatten() #H is homography for given view.
5 h_prime = opt.least_squares(fun=minimizer_func, \
6 x0=h, jac=jac_function, method="lm" , \
7 args=[X, Y, h, N], verbose=0)
8
9 if h_prime.success:
10 H = h_prime.x.reshape(3, 3)
11 H = H/H[2, 2]
12 return H
Computing intrinsic params: For each view we compute a homography. Let us maintain an array of size (M), where M being
the number of views (donot confuse M - the number of views with the matrix M in M.h =0) Hence, for each of the M views,
(i.e. M chessboard images), there are M homographies obtained.
thus,
for i in range(M):
H[i] = compute_view_homography(i)
But what was homography in the first place ? : We said that that
p(u, v, 1) ← H . P (X, Y , Z , 1)
Hence, the homography per view computed comprises of the intrinsic projection transform as well as the extrinsic rigid
body transform. Hence, we can say that:
H = A. [R|t]
At the same time, one can say that
p(u, v) = A[R|t]. P (X, Y , Z )
If I mention the above equation in a strict column form, I get,
u X
⎡ ⎤ ⎡ ⎤
⎢ v ⎥ = [ A0 A1 A 2 ] [ R0 R1 t2 ] ⎢ Y ⎥
⎣ ⎦ ⎣ ⎦
1 1
where, H = A[R 0 , R 1 , T 2 ] , therefore: using the same column representation:
[ h0 h1 h2 ] = λ × A × [R 0 , R 1 , T 2 ]
Given that R0 , and R1 are orthonomal, their dot products is 0.Therefore, h0 = λ × A × R 0 and h1 = λ × A × R 1 . Thus,
R0 = A
−1
. h0 , and similarly for R . This yields us R and R , and their dot product gives R
1 0 1
T
0
. R1 = 0 .
T −1 T −1
h . (A ) . (A ). h1 = 0
0
Let B = (A
−1
)
T
. (A
−1
) (according to zhang’s paper) we define a symmetric matrix, B as :
B0 B1 B3 B 11 B 12 B 13
⎛ ⎞ ⎛ ⎞
B = ⎜ B1 B2 B 4 ⎟ or ⎜ B 21 B 22 B 23 ⎟
⎝ ⎠ ⎝ ⎠
B3 B4 B5 B 31 B 32 B 33
The next step is to build a matrix v (note , small v), such that
hi0 . hj0
⎡ ⎤
⎢ hi0 . hj1 + hi1 . hj0 ⎥

⎢ ⎥
⎢ ⎥
⎢ hi1 . hj1 ⎥
v ij = ⎢ ⎥
⎢ h .h + hi0 . hj2 ⎥
⎢ i2 j0 ⎥
⎢ ⎥
⎢ hi2 . hj1 + hi1 . hj 2 ⎥
⎣ ⎦
hi 2. hj2
Therefore, using the dot product constraint for B mentioned above, we can get,
T
v
12
[ ].b = V.b = 0
(v 11 − v 22 )
where b is a representation of B as a six dimensional vector [B 0, B1 , B2 , B3 , B4 , B5 ]
Again, the system is of the form Ax = 0, and the solution is computed using the SVD(V) which yields us b, and by extension
B .
1 def get_intrinsic_parameters(H_r):
2 M = len(H_r)
3 V = np.zeros((2*M, 6), np.float64)
4
5 def v_pq(p, q, H):
6 v = np.array([
7 H[0, p]*H[0, q],
8 H[0, p]*H[1, q] + H[1, p]*H[0, q],
9 H[1, p]*H[1, q],
10 H[2, p]*H[0, q] + H[0, p]*H[2, q],
11 H[2, p]*H[1, q] + H[1, p]*H[2, q],
12 H[2, p]*H[2, q]
13 ])
14 return v
15
16 for i in range(M):
17 H = H_r[i]
18 V[2*i] = v_pq(p=0, q=1, H=H)
19 V[2*i + 1] = np.subtract(v_pq(p=0, q=0, H=H), v_pq(p=1, q=1, H=H))
20
21 # solve V.b = 0
22 u, s, vh = np.linalg.svd(V)
23 b = vh[np.argmin(s)]
24 print("V.b = 0 Solution : ", b.shape)
Estimating intrinsic params: α, β, γ, uc , vc :
Once, B is computed, it is pretty straightforward to compute the intrinsic parameters.
2
v c = (b[1]. b[3] − b[0]. b[4])/(b[0]. b[2] − b[1] )
2
l = b[5] − (b[3] + vc. (b[1]. b[2] − b[0]. b[4]))/b[0]
alpha = np. sqrt((l/b[0]))
2
beta = np. sqrt(((l. b[0])/(b[0]. b[2] − b[1] )))
2
gamma = −1. ((b[1]). (alpha ). (beta/l))
2
uc = (gamma. vc/beta) − (b[3]. (alpha )/l)
Hence, A is:
α γ uc
⎡ ⎤
A = ⎢ 0 β vc ⎥
⎣ ⎦
0 0 1
Furthermore, A can be upudated along with the complete set of intrinsic and extrinsic parameters using
Levenberg Marquadt.
Results
I implemented using Python 2.7, and NumPy 1.12. for the given dataset of images, the following values are returned.
Camera Matrix:
826.53065764 −1.58262613 271.85569445

⎡ ⎤
⎢ 0. 826.80638173 223.27202318 ⎥
⎣ ⎦
0. 0. 1.
Using on OpenCV’s sample images:
Opencv cv2.calibrateCamera() function Camera MAtrix:
532.79536563 0. 342.4582516
⎡ ⎤
⎢ 0. 532.91928339 233.90060514 ⎥
⎣ ⎦
0. 0. 1.
Camera matrix (above code):
535.85981472 −2.33641346 351.72727058

⎡ ⎤
⎢ 0. 537.44026588 235.75125989 ⎥
⎣ ⎦
0. 0. 1.
Implementation can be found at my github. https://github.com/kushalvyas/CameraCalibration.
Filed under CV | Tagged: Camera Calibration | Permalink
2 Comments bitsmakemecrazy 
1 Login
 Recommend 3 t Tweet f Share Sort by Best
Join the discussion…
LOG IN WITH
OR SIGN UP WITH DISQUS ?
Name
ali hamidi • 5 months ago

Hi. i have this error in run:
print("N = ", len(chessboard_correspondences_normalized[0][0]), " points per image")
IndexError: list index out of range
△ ▽ • Reply • Share ›
Craig Reynolds • a year ago

Thanks!
Some copy-editing nit-picking:

• Bit of markup leaked through: pre { overflow: auto; word-wrap: normal; white-space: pre; }
• Also you might use center-dot (“⋅” or “•”) instead of period for those products.
△ ▽ • Reply • Share ›
ALSO ON BITSMAKEMECRAZY
Structured Light 3D Reconstruction Image Stitching A Simplistic Tutorial

1 comment • a year ago 35 comments • 3 years ago
trusktr — Hello, fun stuff! Can this be done in JavaScript? - Joe kushalvyas — Thanks :)
Avatar Avatar
'The Social Network' where it all started · … Using GigE Cameras with Aravis | OpenCV | Gstreamer
1 comment • 3 years ago 5 comments • a year ago
Riken Shah — Awesome post..! Keep writing more. ;) kushalvyas — I'll check that out too .. thanks for being a reader
Avatar Avatarof the blog :)
✉ Subscribe d Add Disqus to your siteAdd DisqusAdd 🔒 Disqus' Privacy PolicyPrivacy PolicyPrivacy
comments powered by Disqus
TOP ARTICLES
Camera Calibration Image Stitching Bag of Visual Words Structured Light 3D Reconstruction Caffe + ConvNets
Coming Soon
Stuff and credits

Proudly powered by Pelican, which takes great advantage of Python.
Theme is a modified Pelican Bricks
This site also makes use of Zurb Foundation Framework and is typeset using the blocky -- but quite good-looking indeed --
Exo 2 fonts, which comes in a lot of weight and styles.
Enjoy!
Tags
stitching(1) mosaicking(1) AI(1) utils(1) Py(1) wikipedia(1) Machine Vision(1) technical(1) dev(1) images(1) 3D(1)
Camera Calibration(1) CV(4) ObjectRecognition(4)
Categories
3D Reconstruction, 3D, stereo, CV (1) AI (1) Computer Vision, Javascript, JS, CV (1) CV (3) CV, IP (1)
CV, Machine Vision, Camera (1) graph_py (1) networks (1) Object Recognition, Classification, CV (2)
Object Recognition,CV (1) technical, dev (1) utils (1)

Demystifying Geometric Camera Calibration For Intrinsic Matrix - BitsMakeMeCrazy - BR - Kushal Vyas's Blog

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Demystifying Geometric Camera Calibration For Intrinsic Matrix - BitsMakeMeCrazy - BR - Kushal Vyas's Blog

Uploaded by

Copyright:

Available Formats

5/3/2019 Demystifying Geometric Camera Calibration for Intrinsic Matrix - BitsMakeMeCrazy <br> Kushal Vyas's Blog

Demystifying Geometric Camera Calibration for

Posted on Sun 13 May 2018

Using Zhangs method to compute the intrinsic matrix using Python NumPy

Computing the intrinsic camera matrix using Zhangs algorithm

Original Paper by Zhengyou Zhang — “A flexible new technique for camera calibration”

Image formation in a Camera → World and Image points

So here’s how a pinhole camera works. Consider the image below.

Source :OpenCV Camera Calibration docs

So let’s start with the camera calibration algorithm.

{Rigid Transform} {Projective Transform}

matrix ( the projective transform).

This is how each of the matrices look like

Where α, β is the focal length ( f , f ); γ is pixel skew; (u

Algorithm for Camera Calibration

For M views, consider M images from I to I 0 M −1

For each image I where i = (0 … M-1) : N correspondence points are computed:

From the set of estimated homographies, compute intrinsic parameters α, γ, u c, β, v c .

Update parameters using the LM-Optimizer.

Computing observed and model points correspondences.

pre { overflow: auto; word-wrap: normal; white-space: pre; }

Computing observed and model points

Let’s just mention the imports and other variables.

from __future__ import print_function, division

DATA_DIR = "/<path to data>/data/"

Store the images in a DATA directory.

Ui,j = (u, v) given that i is the th

One can image it as a vector as follows

and eventually, for all M Views :

Hence, the above mention.

1 def getChessboardCorners(images = None, visualize=False):

object_points : object points (X, Y , |Z = 0)

Representation of the correspondence:

The conversion of model points to image points is as

{Rigid Transform} {Projective Transform}

Assessing the shapes of each matrix, we can deduce that:

p3×1 = A 3×3 . [R|t]3×4 . P 4×1

Hence, the system reduces to a complete 3 × 3 system.

p3×1 = A 3×3 . [R − R :,3 |t]3×3 . [P P −Z ]3×1

Normalization & Estimate View Homographies:

instead of M , so that it doesnt conflict with the number of views (M views ).

u h00 h01 h02 X

⎢ v ⎥ = ⎢ h10 h11 h12 ⎥ ⎢ Y ⎥

Hence, on obtaining the results,

h00 . X + h01 . Y + h02

h10 . X + h11 . Y + h12

u. (h20 . X + h21 . Y + h22 ) − (h00 . X + h01 . Y + h02 ) = 0

v. (h20 . X + h21 . Y + h22 ) − (h10 . X + h11 . Y + h12 ) = 0

We can remodel the above equation a simpler wayy..

The formulation of the above matrix can be written in this loop

1 # repeat these steps for each view

Computing the homography:

a system can be computed using SVD. (SVD provides orthonormal vectors).

source: SVD - Wikipedia

Similarly in our system, A matrix is of shape (2 × N , 9) . Thus the decomposition of A returns

Thus, computing solution for h, we obtain

Solution to Ax=0 / M.h = 0

# M.h = 0 . solve system of linear equations using SVD

returns mormalized homography matrix

At the same time, one can say that

If I mention the above equation in a strict column form, I get,

where, H = A[R 0 , R 1 , T 2 ] , therefore: using the same column representation:

The next step is to build a matrix v (note , small v), such that

from future import print_function, division