You are on page 1of 11

IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 12, NO.

1, FEBRUARY 1996 103

Motion of an Uncalibrated Stereo Rig:


Self-Calibration and Metric Reconstruction
Zhengyou Zhang, Quang-Tuan Luong, and Olivier Faugeras, Senior Member, ZEEE

Abstruct- We address in this paper the problem of self- apparatus. We can either send a calibration apparatus together
calibration and metric reconstruction (up to a scale factor) from with the planetary rover and observe it each time we need
one unknown motion of an uncalibrated stereo rig. The epipolar to calibrate the cameras, which is not very realistic, or we
constraint is first formulated for two uncalibrated images. The
problem then becomes one of estimating unknowns such that can precalibrate the stereo system on the ground, which is not
the discrepancy from the epipolar constraint, in terms of sum reliable.
of squared distances between points and their corresponding Recently, a number of researchers in Computer Vision and
epipolar lines, is minimized. Although the full self-calibrationis Robotics have been trying to develop online camera calibration
theoretically possible, we assume in this paper that the coordi-
techniques, known as self-calibration. The idea is to calibrate
nates of the principal point of each camera are known. Then,
the initialization of the unknowns can be done based on our a camera by just moving it in the surrounding environment.
previous work on self-calibration of a single moving camera, The motion rigidity provides several constraints on the camera
which requires to solve a set of so-called Kruppa equations. intrinsic parameters. They are more commonly known as the
Redundancy of the information contained in a sequence of stereo epipolar constraint, and can be expressed as a 3 x 3, so-called
images makes this method more robust than using a sequence of
monocular images. Real data has been used to test the proposed fundamental matrix. Hartley [151 proposed a singular-value-
method, and the results obtained are quite good. We also show decomposition method to compute the focal lengths from a
experimentally that it is very difficult to estimate precisely the pair of images if all other cameras parameters are known.
coordinates of the principal points of cameras. A variation of as Trivedi [I61 tried to determine only the coordinates of the
high as several dozen pixels in the principal point coordinates
does not affect significantly the 3-D reconstruction. principal point of a camera. Maybank and Faugeras [I41
proposed a theory of self-calibration. They showed that a
camera can be in general completely calibrated from three
I. INTRODUCTION different displacements. At the same time, they proposed an

I T IS well recognized [I]-[3] that stereoscopic cues are im- algorithm using tools from algebraic geometry. However, the
portant in understanding the 3-D environment surrounding algorithm is very sensitive to noise, and is of no practical
us and allow robust algorithms for 3-D reconstruction to be use. Luong, in cooperation with them, has developed a real
applied. In order for the 3-D reconstruction to be possible, practical system as long as the points of interest can be located
one needs to know the relationship between the 3-D world with subpixel precision, say 0.2 pixels, in image planes [17],
coordinates and their corresponding 2-D image coordinates Wl.
for each camera, and the relative geometry between the two In this paper, we describe a self-calibration method for a
cameras. This is the purpose of camera calibration. A wealth of binocular stereo rig from one displacement using a simplified
work on camera calibration has been carried out by researchers camera model (i.e., the principal points are known). We have
either in Photogrammetry [4], [5] or in Computer Vision and made this simplificationbecause it is very difficult to estimate
Robotics [6]-[I21 (see [I31 for a review). The usual method precisely the position of the principal point by calibration, and
of calibration is to compute cameras parameters from one is in practice very close to the image center. We have shown
or more images of an object of known size and shape, for this experimentally in Section VI. The formulation, however,
example, a flat plate with a regular pattern marked on it. One can be easily extended to include all parameters. Because
problem is that it is impossible to calibrate online, while the of the exploitation of information redundancy in the stereo
cameras are involved in a visual task [14]. Any change of system, our approach yields more robust calibration results
camera calibration occurring during the performance of the than those which consider a single camera, as to be shown
task cannot be corrected without interrupting the task. The by experiments with real images. Section I1 describes the
change may be deliberate, for example the focal length of a calibration problem to be addressed in this paper. Section 111
camera may be adjusted, or it may be accidental, for example summarizes the epipolar constraint, which is the fundamental
the camera may undergo small mechanical or thermal changes. constraint underlying all self-calibration techniques. Section
In many situations such as vision-based planetary exploration, IV deals with the details of the problem solving, including
it is not very practical to calibrate cameras with a calibration additional constraints present in a stereo rig. As the problem
Manuscript received November 15, 1993; revised July 25, 1994. is solved by a nonlinear optimization, an initial estimation of
Z. Zhang and 0. Faugeras are with INRIA Sophia-Antipolis, F-06902 the camera parameters must be supplied, which is described
Sophia-Antipolis, Cedex, France (e-mail: zzhang@sophia.inria.fr).
Q.-T. Luong is with SRI International, Menlo Park, CA 94025 USA. in Section V. Experimental results with real data are provided
Publisher Item Identifier S 1042-296X(96)01061-5. in Section VI. In Section VII, we discuss the possibility to
1042-296X/96$05.00 0 1996 IEEE

Authorized licensed use limited to: McGill University. Downloaded on November 16,2023 at 05:10:21 UTC from IEEE Xplore. Restrictions apply.
104 IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 12, NO. 1, FEBRUARY 1996

include cross correspondences and the possibility to estimate M


all camera parameters.
This work is an extension of our previous work described
in [19], where the intrinsic parameters of the cameras were
supposed to be known while the extrinsic ones were unknown.
fl
11. PROBLEM
STATEMENT
AND NOTATIONS

A. Camera Model
A camera is described by the widely used pinhole model.
The coordinates of a 3-D point M = [x,y, TI.
and its retinal
image coordinates m = [U, wIT are related by
t2

where s is an arbitrary scale, and P is a 3 x 4 matrix, called Ieft image right image
the perspective projection matrix. Denoting the homogeneous Fig. 1. Illustration of the problem to be studied.
coordinates of a vector x = [x,y,...]' by 2, i.e., x =
~ , have s m = PM.
[ ~ , y , . . . , l ]we
The basic assumption behind this model is that the relation- B. Problem Statement
ship between the world coordinates and the pixel coordinates The problem is illustrated in Fig. 1. The left and right images
is linear projective. This allows us to use the powerful tools at time tl are respectively denoted by I 1 and 1 2 , and those at
of projective geometry, which is emerging as an attractive time t 2 are denoted by I, and I,. A point m in the image
framework for computer vision [20]. With the state of the plane Ii is noted as m,, and a point M in 3-space expressed
art technology, camera distortion is reasonably small, and the in the coordinate system attached to the i-th camera is noted
pinhole model is thus a good approximation. as M,. The second subscript, if any, will indicate the index of
The matrix P can be decomposed as the point in consideration. Thus mZ3is the image point in I,
of the j-th 3-D point, and MzJ is the j-th 3-D point expressed
P = A[R t]
in the coordinate system attached to the i-th camera.
where A is a 3 x 3 matrix, mapping the normalized image Without loss of generality, we choose as the world coordi-
coordinates to the retinal image coordinates, ( R , t ) is the dis- nate system the coordinate system attached to the left camera at
placement (rotation and translation) from the world coordinate tl. Let (R,, ts) be the displacement between the left and right
system to the camera coordinate system. The most general cameras of the stereo rig. Let (RI,tz) be the displacement of
matrix A can be written as the stereo rig between tl and t 2 with respect to the left camera.
Let (€&, t , ) be the displacement of the stereo rig between tl
-fk, fle,cotB U0
and t z with respect to the right camera. Let Al and A, be the
A=[ 0 -&
0 1]
% (1) insinsic matrices of the left and right camefas, respectively.
The problem can now be stated as follows.
where
given I. m point correspondences between I , and I,, noted
0 f is the focal length of the camera;
by {m:P,mif} ( i = 1 , ..., m)
0

0
leu and IC, are the horizontal and vertical scale factors,
whose inverses characterize the size of the pixel in the
world coordinate unit;
uo and W O are the coordinates of the principal point of the
by {=#, 4:) ( j = 1,.. . , .>
n point correspondences between 1 3 and I,, noted

o p point correspondences between I1 and Is, noted

camera, i.e., the intersection between the optical axis and by {m:;,m;;) (IC = 1,.. . ,P)
e q point correspondences between 1 2 and Iq, noted
the image plane; and
* 8 is the angle between the retinal axes. This parameter
by G 4 f , m:;> ( I = 1,.. . > 4 )
estimate the intrinsic matrices Al and A,, and the
is introduced to account for the fact that the pixel grid displacements (R,, ts),( R z ,tz)
may not be exactly orthogonal. In practice it is very close and (RT, t,), and the 3-D structure of these points if
to T / 2 . necessary.
It is clear that we cannot separate f from lc- and le,. In the
following, we use the following notations: a, = -f IC, and
a, = -f IC,. We thus have five intrinsic parameters for each Refer to Fig. 1. As the relative geometry of the two
camera: a,, a,, U O , WO, and 8. cameras, i.e., ( R s , t,), does not change between tl and t 2 ,

Authorized licensed use limited to: McGill University. Downloaded on November 16,2023 at 05:10:21 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: MOTION OF AN UNCALIBRATED STEREO RIG 105

geometry. The corresponding point in the first image of each


point m21~lying on lm,, must lie on the epipolar line l,,,,
which is the intersection of the same plane IIk with the first
image plane 11. All epipolar lines form a pencil containing the
epipole el, which is the intersection of the line ClC2 with the
image plane 11. If ml (a point in 11) and m2 (a point in 12)
correspond to a single physical point M in space, then ml,
m2, C1 and C2 must lie in a single plane. This is the well-
known co-planarity constraint or epipolar equation in solving
motion and structure from motion problems when the intrinsic
parameters of the cameras are known [21].
Let the displacement from the first camera to the second be
(R,t). Let ml and m2 be the images of a 3-D point M on
the cameras. Under the pinhole model, we have the following
two equations:

Fig. 2. The epipolar geometry.


where A1 and A2 are the intrinsic matrices of the first and
second cameras, respectively. Eliminating M, 31, and s2 from
the displacement of the left camera (Rl,tl) and that of the the above equations, we obtain
right (R,, tr) are not independent from each other. Indeed,
after some simple algebra, we have the following constraints: 151;Ay~[t]xRAT~m1
=0 (4)

R, = R,RlRsT (2) where [tIx is an antisymmetric matrix defined by t such that


[tIxx = t x x for all 3-D vector x ( x denotes the cross
+
t, = t, R,tl - R,t,.
(3) product).
Furthermore, the overall scale can never be recovered by this Equation (4) is a fundamental constraint underlying any two
system, we can set one of the translations to have unit length, images if they are perspective projections of the same static
say, (It,JI= 1. scene. Let F = AFT[t] RA,', F is known as the fundamen-
Thus, there are in total 21 unknowns in this system: 5 tal matrix of the two images [17], [18]. Without considering
parameters for each intrinsic matrix, 5 parameters for (R,,t,), 3-D metric entities, we can think of the fundamental matrix
and 6 parameters for (Rl,tl). as providing the two epipoles (i.e., el and e2, the vertexes of
the two pencils of epipolar lines) and the 3 parameters of the
111. EPIPOLAR
CONSTRAINT homography between these two pencils, and this is the only
geometric information available from two uncalibrated images
Let us consider the case of two cameras as shown in Fig. 2. [14], [17]. This implies that the fundamental matrix has only
Let C1 and C2 be the optical centers of the first and second seven degrees of freedom. Indeed, it is only defined up to a
cameras, respectively. Given a point ml in the first image, scale factor and its determinant is zero. Geometrically, Fm1
its corresponding point in the second image is constrained defines the epipolar line of point ml in the second image.
to lie on a line called the epipolar line of ml, denoted by Equation (4) says no more than that the correspondence in the
The line l,, is the intersection of the plane TI, defined right image of point ml lies on the corresponding epipolar
by ml, C1 and C2 (known as the epipolar plane), with the line. Transposing (4) yields the symmetric relation from the
second image plane 12. This is because image point ml may second image to the first image.
correspond to an arbitrary point on the semiline C1M (A4
may be at infinity) and that the projection of C1M on I2
is the line lml. Furthermore, one observes that all epipolar IV. PROBLEM SOLVING
lines of the points in the first image pass through a common
point e2, which is called the epipole. e 2 is the intersection A. Formulation
of the line C1C2 with the image plane 12. This can be easily From the preceding section, we see that each point corre-
understood as follows. For each point ml,+in the first image spondence provides one equation of the form (4). As we have
11, its epipolar line l,,, in I2 is the intersection of the plane + +
in total M = m n f p q point correspondences (see Section
ITk (defined by mlk, C1 and C2) with image plane 12. All II), we can estimate the intrinsic matrices Al and A,, and
epipolar planes ITk thus form a pencil of planes containing the the displacements (R, ,t ,), (Rl, t 1 ) and (R, , t,) by solving a
line ClC,. They must intersect I2 at a common point, which least-squares problem, which minimizes the discrepancy from
is e2. Finally, one can easily see the symmetry of the epipolar the epipolar constraint (4), i.e.,

Authorized licensed use limited to: McGill University. Downloaded on November 16,2023 at 05:10:21 UTC from IEEE Xplore. Restrictions apply.
106 IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 12, NO. 1, FEBRUARY 1996

Image uo WO %d ffv P-nl2l


1 211.77 305.16 684.27 1012.25 3 . 2 5 ~ 1 0 - ~
2 297.11 294.23 697.05 1030.90 3 . 0 1 ~ 1 0 - ~
3 299.29 186.17 685.10 1014.49 1 . 4 0 ~ 1 0 - ~

min [5((
2=1
m;:)' A,' [t,] R,A,' m:)
The position of the principal point is in practice very
close to the image center. On the other hand, it is very
n difficult to estimate it precisely. To show this, Fig. 3(a)
+ ((m~~)TA,T[t,].R,A,'rh~g)2 shows three images taken from three different positions
by the same camera. The comers on the grids have been
j=1
used to estimate the intrinsic parameters using a classical
calibration method [7].The results are given in Fig. 3(b).
k=l We see that the variation of uo and WO is quite large from
a position to position.
Note that this simplification has also been adopted by many
researchers 161, [22]. They claim that a deviation of the
The total number of unknowns is 21 (see Section II). On the location of the principal point by a dozen of pixels from the
other hand, we have three independent fundamental matrices, real location does not produce any severe distortion in 3-D re-
each providing seven constraints on the intrinsic and extrinsic construction. This has been confirmed by our experimentation
parameters. We thus have in total 21 constraints. This implies (see Section VI).
that we can in principle solve all the unknowns. However, a Under this simplification, we are able to compute an initial
set of extremely nonlinear equations are involved, making the estimate of all remaining unknowns from the three fundamen-
parameter initialization impossible'. tal matrices, as to be described in Section V.
In this paper, we assume a simplified camera model: the
angle between the retinal axes 6 ' is ~ / 2 and
, the location of the B. Implementation Details
principal point ( U O ,WO) is known, and is assumed to be at the
Now we describe some implementation details. The mini-
image center in this paper. What we need to estimate is then
mization problem formulated by (5) is based on the epipolar
a,, a, for each camera. We have made such simplification
constraint (see (4)). However, it is clear from (4) that a
for the following reasons.
translation can only be determined up to a scale. We thus nor-
* With the current technology, the angle 8 can be made very malize each translation such that its norm is 1. More precisely,
close to ~ / 2 Indeed,
. we have carried out a large number each translation is represented by its spherical coordinates. A
of experiments on our CCD cameras using a classical rotation is described by the Rodrigues matrix [23], shown at
calibration method [7],and the differences between the the bottom of the page. A rotation is thus represented by three
estimated 6' and T / Z are found to be in the order of parameters g = [a,b, elT. Such a representation is valid for all
radians. rotations except for the one whose rotation angle is equal to
'More correctly, we have not been able to work out such an algonthm. T (which will not happen in the problem addressed in this

1
1 + (a2 - b2 - c2)/4 -e + ab12 + ac/2
b
R= c + ub/2 1 + (-2 + b2 - c2)/4 + bc/2
1+ + + e2)/4 -a
(u2 b2
-b + U C / ~ a + bc/2 1 + (-u2 - b2 + e2)/4

Authorized licensed use limited to: McGill University. Downloaded on November 16,2023 at 05:10:21 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: MOTION OF AN UNCALIBRATED STEREO RIG 107

paper). We have chosen this parameterization because of its The new criterion is more meaningful because Euclidean
relatively simple expression of its derivative and because there distances in the measurement space are used.
is no constraint on the parameters. The minimization is performed by the Nag routine
Regarding the constraints on the extrinsic parameters, it is E04GDF,which is a modified Gauss-Newton algorithm for
rather easy to incorporate the constraint (2) in (5). We do it finding an unconstrained minimum of a sum of squares of
by simply replacing R, by R,RlRST.For the constraint (3), M nonlinear functions in N variables ( M 2 N ) [24]. It
however, it is much more difficult. As the scale of one of requires the first derivative of the objective function and an
the translations can never be recovered by this system, we initial estimate of the intrinsic and extrinsic parameters. In
can set, for example, Jlt,II = 1. The scales of the other two the next section, we shall describe how to compute the initial
translations, t l and t,, cannot be recovered by our algorithm. estimation. If we have appropriate knowledge of the cameras
This is because we try to estimate the unknowns by minimizing (e.g., from constructors' specifications), we can directly use
the discrepancy from the epipolar constraint (quantified by the it as the initial estimation.
distance of a point to its epipolar line, see below), and the In summary, there are N = 16 unknowns to be estimated
scales in the translations does not influence the functional to be ( 2 intrinsic parameters for each camera, 5 for (R8,t,), 5 for
minimized. This is also clear from the fact that the fundamental (R1,tl) and 2 for t,) under the constraint (6). We need at least
matrix is only defined up to a scale factor. Two unknown scales m = 15 point correspondences. However, in order to use the
are thus involved in (3), which implies that (3) provides only initialization technique described in the next section, we must
one scalar equation. To be more precise, the scalar equation +
have m n 2 8, p 2 8, and q 2 8. In that case, we have
is given by M 2 24 and the problem is always overdetermined.

IR,il ( I - R,)i, f,( =0 (6)


v. INITIALIZATION OF THE PARAMETERS TO BE ESTIMATED
where 1.1denotes the determinant of a 3 x 3 matrix, and t
In this section, we briefly outline the process of the ini-
denotes the unit translation direction vector, i.e., f = t/lltll.
The constraint (6) says nothing more than that the three vectors tialization of the parameters to be estimated, and provide
appropriate references if necessary.
Rs&,( I - R,)t, and tT are coplanar. This implies that the
First, based on the epipolar constraint (4), i.e.,
crossproduct of two vectors, say, [ ( I- RT)tS] x t,, should be
orthogonal to the other vector, R,&. In our implementation,
this constraint multiplied by a coefficient (Lagrange multiplier)
is used as an additional measurement in the objective function.
The constraint is not satisfied exactly, but has a small value. we can estimate the fundamental matrices Fll, F,,, and F,l
This value, noted as c, depends on the value of the Lagrange between the left images, the right images, and the left and right
multiplier, noted as A. The larger the value of X is, the smaller images, respectively, from the given point correspondences.
the value of c is. We set X = lo5, which gives a value Several methods have been proposed in [25]. We have used
of c in the order of lo-'. The unknown scales in t l and a linear least-squares method followed by a nonquadratic
t, can be easily recovered in the 3-D reconstruction phase minimization to improve the results.
if, for example, one can identify one point in each image Consider now the case of the left camera, i.e., Fll. For the
corresponding to a single point in 3-D space. reason of simplicity, the subscript will be omitted. From the
The problem described by ( 5 ) is to minimize a sum of terms definition of F, we have E = ATFA, where E = [tIxR is
of the form called the essential matrix. As is well known [26], E is subject
to two independent polynomial constraints. This implies that
(7) the entries of A are subject to two independent polynomial
constraints inherited from E. As there are only two unknowns,
where F,, = ALT[tIxRAy1( i , j = Z,r) is a fundamental
a, and a=,they can then be solved. Let us now follow the
matrix. This criterion, however, does not have direct interpre-
theory of self-calibration developed by Maybank and Faugeras
tation in the measurement space, i.e., in the image planes. As
ll = Fzjml actually defines the epipolar line of ml, we can
+ +
[14]. Consider the absolute conic: xf x$ xi = 0, 2 4 = 0.
If we define x = xl/xg and y = x2/x3, then the equation
replace (7) by the squared distance from m k to 11, i.e.,
+
can be rewritten as x2 y2 = -1, which represents an
imaginary circle of radius i = flon the plane at infinity.
The important property of the absolute conic is that its image w
where 11 and 12 are the first two elements of the vector 11. on the camera does not change from the position of the camera
d(mk, 11) is the Euclidean distance of point mk to line 11 in as long as the same intrinsic parameters as maintained. In fact,
the image plane. In order to maintain the symmetry between the image w is a conic described by the matrix B = A-TA-l.
the two cameras to avoid any discrepancy in the epipolar This implies that the two epipolar lines tangent to w in the first
geometry, the term (7) is in fact replaced by the following image must correspond to the two epipolar lines tangent to w
two terms: in the second image under the epipolar transformation defined
by the fundamental matrix F. This provided two independent,
the so-called Kruppa equations. They are of degree 2 in the

Authorized licensed use limited to: McGill University. Downloaded on November 16,2023 at 05:10:21 UTC from IEEE Xplore. Restrictions apply.
108 IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 12, NO. 1, FEBRUARY 1996

entries of matrix K,
-623
63
62
-613
63

61
:f]
-612
a
l
Left camera
an
Right camera
cyv
a7.
l

initialization 597.15 786.99 598.31 902.26


distance
(pixels)
11.3
'

where K is the matrix defining the dual conic of w, i.e., finalestimate 610.99 910.13 617.13 916.58 0.5
K = B* = AAT, defined up to a scale factor. We thus
have the following relations:
Once we have computed au and a,, we can compute the
61 = 210 essential matrix by E = ATFA. The rotation and translation
62 = U0 are then computed from E by any standard methods such as
those described in [27]-[29]. We thus compute auz, awl,Rz,
Cot 6
63 = uovo - a!,a,- and t l for the left camera from the fundamental matrix FZZ. In
sin 6
the same way, we can compute au,,a,,, R,, and t, for the
612 = -1
right camera from the fundamental matrix F,,.
2
S23 = -u02 - -
Qru It remains to compute the rotation R, and the translation t ,
sin' 6 between the left and right cameras. Since the intrinsic matrices
Al and A, are now available, we can easily compute the
essential matrix E, = ATF,lAl. R, and t, can then be
To be more precise, the Kruppa equations can be derived solved from E, using the same method as for (RI,tl) and
as follows. Consider the epipolar line 1 tangent to w in the (%> t r ).

first image. It is defined by the epipole el and a point, say,


In summary, the initial estimates are obtained as follows.
y, and therefore, 1 = el x y. The epipolar line 1 is tangent First, compute the fundamental matrices F,z,F z ~ , F,,
and
to w if and only if from the stereo correspondences, the temporal correspon-
dences of the left camera, and the temporal correspon-
(el x y)TK(el x Y) = 0 Y ~ [ ~ ~ I ~ K
or [ ~(8)I ] ~ Ydences of the right camera, respectively.
= 0.
Second, compute Al, RL, and tl from F z ~compute
; A,,
The epipolar line corresponding to the point y is Fy and is
€&, and t, from F,,.
tangent to w in the second image if and only if Finally, compute R, and t, from F,I.
0

Y ~ F ~ K F=J0.T (9)
VI. EXPEFUMENTAL
RESULTS
Writing that (8) and (9) are equivalent yield the so-called
We show in Fig. 4 the two pairs of stereo images used in
Kruppa equations. One possible way to do so is to ti&e
y = (1,r, O)T in the case where the epipoles are at finite this experiment. Two CCD cameras with resolution 512 x 512
distance. The relations (8) (respectively, (9)) take the form are used. The points of interest used for self-calibration are
+ +
PI(') = ko k l r Ic2rZ = o (respectively, P2(r)= also shown as indicated by the white crosses. These points are
+ +
kb Sir khr2 = O), and the Kruppa equations can be extracted with subpixel accuracy by an interactive program of
Blaszka and Deriche 1301. However, a few points, especially
written as three proportionality conditions between these two
polynomials, of which only two are independent: those on the background are not well localized.
We show in Table I a subset of results of the self-calibration
/ I
k z k , - kzkl 0 on this sequence of images, where the distance means the
I , root of mean squares of distances between points and their
kohl - k,kl = 0
/ I
corresponding epipolar lines (i.e., the root of the objective
kOk2 - kOk2 = 0. function divided by the number of correspondences). We see
The reader is referred to [I41 and [I71 for more details about that the average distance decreases from 11.3 pixels to 0.5
pixels. This shows the advantage of our approach which takes
the epipolar transformation and the Kruppa equations.
In our particular case ( U O , 00 and 8 are known), the Kruppa into account the stereo correspondences over the previous
approach based on calibrating separately each camera. The
equations reduce to two simple equations of the form
two cameras are almost the same as they have almost the
alxl2
+blxix' +
clxl dlxa el = 0+ + (20) same intrinsic parameters.
2
+ + +
a2x2 b 2 2 1 2 2 e 2 2 2 d ~ x 1 +e2 = 0 (11) After having estimated the intrinsic and extrinsic param-
eters, we can perform metric reconstruction (up to a scale
where XI = ai, 22 = a;, and a l , h i , . . . , e2 are coefficients factor). For comparison, we show both the 3-D reconstruction
which can be computed from F, U O , 210 and 8. We must point result with the initial estimate (Fig. 5 ) and that with the final
out that we always have ala2 = blb2. We can thus solve x2 estimate (Fig. 6). In order to have a better visualization, the
in terms of 2 1 from the first equation, and substitute it into 3-D reconstructed points are artificially linked, and shown as
the second equation to obtain an equation of degree 3 only in line segments. Figs. 5(a) and 6(a) show the back projection of
XI. So, we can have at most three real solutions, and usually the reconstructed 3-D points on the left image at tl, which are
only one physical solution such that x1 2 0 and x2 2 0. linked by black line segments, together with the original 2-D

Authorized licensed use limited to: McGill University. Downloaded on November 16,2023 at 05:10:21 UTC from IEEE Xplore. Restrictions apply.
ZHANG er al.: MOTION OF AN UNCALIBRATED STEREO RIG 109

Left image at t1 Right image at tl

Left image a% Right image at t 2


Fig. 4. Images with overlay of the points of interest used for self-calibration.

points as indicated by white crosses. The projected and original points reconstructed, and eventually compute the distances
points with the technique described in this paper coincide between the transformed points and the manually measured
very well as shown in Fig. 6(a), while this is not the case ones. The error (the root of the mean squares of the distances)
with the previous technique used for parameter initialization is found to be 0.86 mm (the grid size is about 300 mm), which
as shown in Fig. 5(a). Figs. 5(b) and 6(b) show the projection is remarkably small remembering that no knowledge has been
of the 3-D reconstruction on a plane perpendicular to the image used except that the principal point is assumed to be located
plane, which is expected to be parallel to the ground plane. A at the image center.
closeup of the foreground is also given in Fig. 6(b). We see Now let us consider the effect of the position of the principal
clearly that the reconstructed points of the foreground lie in point on the reconstruction. The process is as follows. We shift
two planes. The reconstruction of the background is however the coordinates of the principal point from the image center by
noisier due to the poor location of the 2-D points, especially (bu0, &), and then carry out the same calibration procedure.
with the previous technique as shown in Fig. 5(b). In Fig. Finally we compare the reconstruction result with the manually
7, a stereogram of the final 3-D reconstruction is displayed. measured one, as described just above. The results are shown
The reader can easily perceive the 3-D information by cross- in Table 11, where the errors are still quantified as the root
eye fusion. In particular, he should see two planes in the of the mean squares of the distances between the transformed
foreground. points and the manually measured ones (in millimeters). The
To give an idea of the quantitative performance, we consider image center is at (255, 255) in our case. For example, the
the points on the grid pattern because we have manually number given at the first row and the third column, 1.70 mm,
measured their positions. Using an algorithm similar to that corresponds to the error of the reconstruction with (&, , S,,)
described in [31], we are able to compute the scalar factor, = -15 x (1,0), i.e., the principal point is assumed to be
the rotation matrix and the translation between the points re- at (240, 255). From this table, we confirm that the 3-D
constructed and those measured manually. The scale computed reconstruction is not very sensitive to the location of the
is 301.69. We then apply the estimated transformation to the principal points of the cameras. Even with a deviation as large

Authorized licensed use limited to: McGill University. Downloaded on November 16,2023 at 05:10:21 UTC from IEEE Xplore. Restrictions apply.
110 IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 12, NO. 1, FEBRUARY 1996

Fig. 5. Initial 3-D reconstruction result. (a) Back projection on the left image at t l . (b) Projection on a plane perpendicular to the image plane (top view).

(a) (b)
Fig. 6. Final 3-D reconsbuckon result. (a) Back projection on the left image at t l . (b) Projection on a plane perpendicular to the image plane (top view).

as 35 pixels from the image center, the 3-D reconsmction is I I

still reasonable.

VII. DISCUSSIONS
I-
CI d
In this section, we examine the exact number of geomelric
constraints which exist in the problem addressed in this paper.

A. Gain from the CorrespondencesAcross Cameras


In this paper, we have ignored the correspondences between
image 1 and image 4 and those between image 2 and image
Q
Fig. 7. Stereogram of the metnc reconstruction for cross-eye fusion.
3. In this section, we describe the benefit of using such cross
correspondences2. satisfies F:ej, = 0; the epipolar line in Image i of a point in
Let F,, be the fundamental matrix between image i and
image j , m3, is given by lm3 = F;m,. Geometrically, the
image j . The epipole in image i , e,,, satisfies F,,ez3= 0; the
epipole e2, is the projection in image i of the optical center
epipolar line in image of a point in image 2, m,, IS given
of image j .
by lmt = FZ3m,. Symmetrically, the epipole in image J , eJ2,
Assume we have obtained the fundamental matrices F,,
'This problem was raised by one of the anonymous reviewers of this article Fl, and F,. As said in Section 111, each fundamental matrix

Authorized licensed use limited to: McGill University. Downloaded on November 16,2023 at 05:10:21 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: MOTION OF AN UNCALIBRATED STEREO RIG 111

TABLE I1
ERRORS
IN RECONSTRUCTION (DISTANCES
IN MILLIMETERS)
VERSUS POSITIONS OF THE PRINCIPAL POINTS

(6uo,6,0)1 -25 -20 -15 -10 -5 0 5 10 15 20 25


(1. 0) 1 1.57 1.73 1.70 1.47 1.11 0.86 0.97 1.29 1.76 2.29 2.81
io: i j 1.62 1.37 1.15 0.97 0.87 0.86 0.95 1.11 1.30 1.52 1.76
(1, 1) 2.73 2.75 2.51 2.00 1.33 0.86 1.13 1.74 2.49 3.26 3.99
(1,-1) 1.09 1.02 1.06 1.04 0.94 0.86 0.88 0.95 1.14 1.40 1.69

Fig. 8. Illustration of the epipolar geometry of a stereo rig in motion.

depends on seven parameters which can be interpreted geomet- le,, , and that of le,, and le,, , respectively. Since in addition
rically as the coordinates of the two epipoles (four parameters) to e14 and e41, three correspondences of epipolar lines are
and the three parameters determining the homography between required to determine completely F14. Therefore, the three
the two pencils of epipolar lines. Thus, the three fundamental fundamental matrices F,, Fl and F, leave only one unknown
matrices provide in total 21 constraints on the intrinsic and parameter in F14.
extrinsic parameters. Consider now the epipolar geometry F23between I 2 and 13.
Consider now the epipolar geometry F 14between images Making exactly the same reasoning, we build two pairs of
1 and 4 (see Fig. 8). In image 12, we know two epipoles corresponding epipolar lines between 12 and Is (as shown as
e21 and e 2 4 . Therefore we can build the epipolar line in 11 solid lines in Fig. 8), which leaves one unknown parameter in
of e24, le,, = FTe24, and the epipolar line in 14 of e21, F23. This unknown can, however, be determined as follows.
le,, = FTe21. These two epipolar lines are the intersections In fact, we have not yet used the epipoles e14 and e41 in 11
of the plane going through the three optical centers (the so- and 14; they are corresponding points for these two images
called trifocal plane [ 3 2 ] ) with 11 and 14,respectively. Thus, (they can be thought of as images of the point at infinity
we observe 1321: of the line going through the two optical centers of I1 and
the two epipolar lines correspond to each other between 14). Considering the triplet of cameras (11, 12, 14), we can
I1 and 14;and construct in 12 the point corresponding to e 1 4 and e41 as the
the two epipoles e14 and e41 must lie on these two lines, intersection of two epipolar lines (shown as dashed lines in
respectively. Fig. 8). Similarly, considering the triplet of cameras (11, 12.
Exactly the same reasoning can be made with the triplet of 14), we can construct in I3 the point corresponding to e14 and
cameras (I3, 11, 14):We build two epipolar lines, le,, = FTe34, e41 as the intersection of two epipolar lines. These two points
and le,, = Fze31, which correspond to each other between correspond to each other, and provide us with a third pair of
11 and 14.Thus, we have two pairs of Corresponding epipolar corresponding epipolar lines (by linking them to the epipoles),
lines. The epipoles e14 and e41 are the intersection of le,, and which completes the determination of F23.

Authorized licensed use limited to: McGill University. Downloaded on November 16,2023 at 05:10:21 UTC from IEEE Xplore. Restrictions apply.
112 IEEE TRLWSACTONS ON ROBOTICS AND AUTOMATION, VOL 12, NO. 1, FEBRUARY 1996

From the above discussion, it is clear that we can obtain, iJ~e- Our future work will be on the extension of the current
oretically, only one additional constraint on the intrinsic mid technique to take into account all camera parameters and to
extrinsic parameters by using the correspondences between 1[1 include more image views.
and 14 and those between I1 and 14. In practice, we would In order to apply the technique described in this paper, we
expect to achieve a better self-calibration result because of must establish correspondences between two images (stereo
data redundancy. or temporal) with unknown epipolar geometry. This is a
difficult problem. We have recently developed an algorithm to
solve it, and very good results have been obtained [33], [34].
B. Estimation of All Camera Parameters
The idea is to establish matches through the recovery of the
As described in Section IV, by counting the number of unknown epipolar geometry. We first use classical techniques
geometric constraints available and the number of intrinsic (correlation and relaxation methods in our particular imple-
and extrinsic parameters to estimate, it is possible to achieve mentation) to find an initial set of matches, and then use a
a full calibration of the stereo rig undergoing one motion. The robust technique-the Least Median of Squares (LMedS)-to
formulation in Section IV is still valid for the full calibration, discard false matches in this set. The epipolar geometry is
only the minimization should be carried out over a larger set then accurately estimated using a meaningful image criterion.
of unknowns by including the location of the principal points;. This algorithm produces the fundamental matrix as well as the
However, we have not been able until now to work out an correspondences between two images. The executable code is
algorithm to initialize the parameters from the fundamental available through anonymous ftp on krakatoa.inria.f r
matrices. in file pub/robotvis/BINAIRES/sun4OS4/image-
One possible solution could be the following: Assume the matching.gz.
principal point is at the image center, compute the other
parameters exactly as described in Section V, and eventually
ACKNOWLEDGMENT
conduct a minimization of sum of squared distances between
points and their epipolar lines, as described in Section JY, The authors would like to thank the anonymous reviewers
over all parameters. However, the self-calibration may be less; for their insightful comments which improved this paper.
stable because a larger set of unknowns is involved in the
minimization leaving less constraint on the solution, as noticed REFERENCES
by Luong [ 171 in calibrating a single camera from three views [l] D. Marr and T. Poggio, “A computational theory of human stereo
We have not yet implemented this method. vision,” in Proc. Royal Sociery London, 1979, B. 204, pp. 301-328.
[23 J. E. W. Mayhew and J. P. Frisby, “Psychophysical and ‘computational
studies toward a theory of human stereopsis,” Artgcial Intell., vol. 17,
pp. 349-385, 1981.
VIII. CONCLUSION [3] -, Eds., 3D Model Recognition from Stereoscopic Cues. Cam-
bridge, MA: MIT Press, 1991.
In this paper, we have described a new method for calibrat- [4] D. C. Brown, “Close-range camera calibration,” PhotogrammetricEng.,
ing a stereo rig by moving it in an environment without using vol. 37, no. 8, pp. 855-866, 1971.
[5] W. Faig, “Calibration of close range photogrammetry systems: mathe-
any reference points (self-calibration). The only geometric matical formu!ation,” Photogrammetric Eng. Remote Sensing, vol. 41,
constraint between a pair of uncalibrated images is the epipolar no. 12, pp. 1479-1486, 1975.
constraint, which has been formulated in this paper from [6] R. Tsai, “An efficient and accurate camera calibration technique for
3D machine vision,” in Proc. IEEE Con$ Computer Vision and Pattern
a point of view in Euclidean space. The problem of self- Recogition, Miami, FL, June 1986, pp. 364-374.
calibration has then been formulated as one of estimating [7] 0. D. Faugeras and G. Toscani, “The calibration problem for stereo,”
in Proc. IEEE Con$ Computer Vision and Pattern Recognition, Miami,
unknowns such that the discrepancy from the epipolar con- FL, June 1986, pp. 15-20.
straint, in terms of sum of squared distances between points [8] R. Lenz and R. Tsai, “Techniques for calibrating of the scale factor and
and their corresponding epipolar lines, is minimized. As the image center for high accuracy 3D machine vision metiology,” in Proc.
Int. Con$ Robotics and Automation, Raleigh, NC, 1987, pp. 68-75.
minimization problem is nonlinear, an initial estimate of the [91 R. Tsai and R. Lenz, “A new technique for fully autonomous and
unknowns must be supplied. The initialization is done based on efficient 3D robotics hand-eye calibration,” in Proc. Int. Symp. Robotics
the work of Maybank, Luong, and Faugeras on self-calibration Research, Santa Cmz, CA, Aug. 1987, pp. 281-297.
[lo1 G. Toscani, Syst2me de calibration optique et perception du mouvement
of a single moving camera, which requires to solve a set of en vision artifrcielle, Ph.D. dissertation, Univ. Paris XI, Orsay, Paris,
so-called Kruppa equations. One point which differs ow work France, 1987.
from the previous ones in that our formulation is directly [ l l ] G. Q. Wei and S. D. Ma, “Two plane camera calibration: A unified
model,” in Proc. IEEE Con$ Computer Vision and Pattern Recognition,
built in the measurement space and is thus physically more June 1991, pp. 133-138.
meaningful. Furthemore, stereo setup is used in our work. [12] J. Weng, P. Cohen, and M. Herniou, “Camera calibration with distorton
Redundancy of the information contained in a sequence of models and accuracy evaluation,” IEEE Trans. Pattern Anal. Machine
InteZL, vol. 14, no. 10, pp. 965-980, 1992.
stereo images makes this method more robust than using a [13] R. Y. Tsai, “Synopsis of recent progress on camera calibration for 3D
sequence of monocular images. This has been demonstrated machine vision,” in The Robotics Review, 0. Khatib, J. J. Craig, and T.
Lozano-Perez, Eds. Cambridge, MA: MIT Press, 1989, pp. 147-159.
with real data. The results obtained are very good. We have [14] S. J. Maybank and 0. D. Faugeras, “A theory of self-calibration of a
also shown experimentally that it is very difficult to estimate moving camera,” Int. J. Compuf. Vision, vol. 8, no. 2, pp. 123-152,
precisely the principal point. A variation of as high as several Aug. 1992.
1151 R. I. Hartley, “Estimation of relative camera positions for uncalibrated
dozens of pixels in the principal point coordinates does not cameras,” in Proc. 2nd European Con$ Computer Vision, 1992, pp.
affect significantly the 3-D reconstruction. 579-587.

Authorized licensed use limited to: McGill University. Downloaded on November 16,2023 at 05:10:21 UTC from IEEE Xplore. Restrictions apply.
ZHANG er al.: MOTION OF AN UNCALIBRATED STEREO RIG 113

[16] H. P. Trivedi, “Can multiple views make up for lack of camera Zhengyou Zhang was bom in Zhejiang, China, in
registration,” Image Vision Comput., vol. 6, no. 1, pp. 29-32, Feb. 1965. He received the B.S. degree in electronic
1988. engineering from the University of Zhejiang, Chma,
[17] Q.-T. Luong, Matrice fondamentale et calibration visuelle sur in 1985, the D.E.A. diploma in computer science
1’environnement: Vers une plus grande autonomie des syst2mes from the University of Nancy, France, in 1987,
robotiques, Ph.D. dissertation, Univ. Pans XI, Orsay, Paris, France, and the Ph.D. degree in computer science and the
Dec. 1992. diploma Habilitation ci diriger des recherches from
[18] Q.-T. Luong and 0. D. Faugeras, “Camera calibration, scene motion the University of Pans XI, Orsay, France, in 1990
and structure recovery from point correspondences and fundamental and 1994, respectively.
matnces,” Int. J. Comput. Vision, vol. 17, no. 12, Dec. 1995. From 1987 to 1990 he was a Research Assistant
[19] Z. Zhang, “Motion and structure of four points from one motion of in the Computer Vision and Robotics Group of the
a stereo rig with &own exmnsic parameters,” IEEE Trans. Pattem French National Institute for Research in Computer Science and Control
Anal. Machine Intell., vol. 17, no. 12, Dec. 1995. Also, in Proc. IEEE ( I ” ) ,
France. Currently, he is a Research Scientist at m.His CUrrent
Con$ CVPR, June 1993, pp. 556-561. research interests include computer vision, mobile robotics, dynamic scene
[201 J. L. Mundy and A. Zisserman, Eds., Geometric Invariance in Computer analysis and mUkisensor fusion. He is the author of the book (with 0.
Vzszon. Cambridge, M A MIT Press, 1992. Faugeras) 3 0 Dynamic Scene Analysis: A Stereo Based Approach (Berlin,
[211 H. C. Longuet-Higgins, “A computer algorithm for reconstructing a Heidelberg: Springer, 19921, and the foahcomng book (with S. Ma) Computer
scene from two projections,” Nature, vol. 293, pp. 133-135, 1981. fission(in Chinese) (Chinese Academy of Sciences, l996).
[22] K. Kanatani and Y. Onodera, “Anatomy of camera calibration using
vanishing points,” IEICE Transactions on Informations and Systems, vol.
E74, no. 10, pp. 3369-3378, Oct. 1991.
[23] C. C. Slama, Ed., Manual of Photogrammetry, 4th ed. Amer. Soc.
Photogrammetry, 1980.
Quang-Tuan Luong was bom in Pans in 1964. He
[24] Numerical Algorithms Group, NAg, FORTRAN Library Manual, mark graduated from Ecole Polytechnique in 1987. He
14 ed., Oxford, U.K., 1990. received the M.S. (“Magistere”) degree from &ole
[25] Q.-T. Luong, R. Deriche, 0. Faugeras, and T. Papadopoulo, “On Normale Superieure in 1988, and the Ph.D. degree
determining the fundamental matrix: Analysis of different methods from the University of Paris X I , Orsay, France, in
and experimental results,” INRIA Sophia-Antipolis, France, Rapport de 1992.
Recherche 1894, 1993. Since then, he has been a Visiting Postdoctoral
[26] T. S. Huang and 0. D. Faugeras, “Some properties of the E matrix Researcher with the University of California at
in two-view motion estimation,” IEEE Trans. Pattem Anal. Machine Berkeley before joining SRI Intemaaonal as a Com-
Intell., vol. 11, no. 12, pp. 131G1312, Dec. 1989. puter Scientist in 1995. His research interests are in
[27] 0. D. Faugeras, F. Lustman, and G. Toscani, “Motion and structure from computer vision, which include color, calibration,
motion from point and line matches,” in Proc. 1st Int. Conf: Computer 3-D reconstruction, projective geometry, autonomous vehicles, and virtual
Vision, London, UK, 1987, pp. 25-34. reality.
1281 J. Weng, T. S. Huang, and N. Ahuja, “Motion and structure from two
perspective views: Algorithms, error analysis and error estimation,”
IEEE Trans. Pattem Anal. Machine Intell., vol. 11, no. 5 , pp. 451-476,
May 1989.
[29] 0. D. Faugeras, Three-Dimensional Computer Vision: A Geometric Olivier Faugeras (SM95) is currently Research
Viewpoint. Cambridge, M A MIT Press, 1993. Director at INRIA where he leads the Computer
[30] R. Denche and T. Blaszka, “Recovering and characterizing image Vision and Robotics group. His research interests
features using an efficient model based approach,” in Proc. IEEE Con$ include the application of Mathematics to computer
Computer Vision and Pattem Recognition, New York, NY, June 1993, vision, robotics, shape representation, computational
pp. 530-535. geometry, and the architectures for artificial vision
[31] B. K. P. Hom, “Closed-form solution of absolute orientation using unit systems as well as the links between artificial and
quaternions,” J. Opt. Soc. Amer. A, vol. I , pp. 629-642, Apr. 1987. biological vision. He is an Associate Professor of
[32] 0. D. Faugeras and L. Robert, “What can two images tell us about a Applied Mathematics at the Ecole Polytechnique in
third one?,” Int. J. Comuut. Vision, 1994, acceuted for uublication. Also Palamau, France, where he teaches computer SCI-
INRIA Res. Rep. 2018, 1993. ence, computer vision, and computational geometry.
[33] 2. Zhang, R. Deriche, Q.-T. Luong, and 0. Faugeras, “A robust approach Dr. Faugeras is an Associate Editor of several international scientific jour-
to image matching: Recovery of the epipolar
- _ -geometry,” in Proc. Int. nals including the Intemational Joumal of Robotics Research, Pattem Recog-
Symp. Young Investigators on Information\Computer\Control, Beijing, nition Letters, Signal Processing, Robotics and Autonomous Systems, and
China, Feb. 1994, pp. 7-28. Vision Research. He is Editor-in-Chief of the Intemational Joumal of Com-
[34] 2. Zhang, R. Deriche, 0. Faugeras, and Q.-T. Luong, “A robust puter Vision. He has served as Associate Editor for the IEEE TRANSACTIONS
technique for matching two uncalibrated images through the recovery ON PATIERNANALYSISAND MACHINE INTELLIGENCE from 1987 to 1990. In
of the unknown epipolar geometry,” Artificial Intell. J., 1995, to appear. Apnl 1989 he received the “Institut de France-Fondation Fiat” prize from
Also INRIA Res. Rep. 2273, May 1994. the French Science Academy for his work in vision and robotics.

Authorized licensed use limited to: McGill University. Downloaded on November 16,2023 at 05:10:21 UTC from IEEE Xplore. Restrictions apply.

You might also like