You are on page 1of 35

P1: NTA

International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

International Journal of Computer Vision 27(2), 161–195 (1998)


°
c 1998 Kluwer Academic Publishers. Manufactured in The Netherlands.

Determining the Epipolar Geometry and its Uncertainty: A Review

ZHENGYOU ZHANG
INRIA, 2004 route des Lucioles, BP 93, F-06902 Sophia-Antipolis Cedex, France
zzhang@sophia.inria.fr

Received July 16, 1996; Accepted February 13, 1997

Abstract. Two images of a single scene/object are related by the epipolar geometry, which can be described by a
3 × 3 singular matrix called the essential matrix if images’ internal parameters are known, or the fundamental matrix
otherwise. It captures all geometric information contained in two images, and its determination is very important in
many applications such as scene modeling and vehicle navigation. This paper gives an introduction to the epipolar
geometry, and provides a complete review of the current techniques for estimating the fundamental matrix and its
uncertainty. A well-founded measure is proposed to compare these techniques. Projective reconstruction is also
reviewed. The software which we have developed for this review is available on the Internet.

Keywords: epipolar geometry, fundamental matrix, calibration, reconstruction, parameter estimation, robust
techniques, uncertainty characterization, performance evaluation, software

1. Introduction the problem is known as motion and structure from mo-


tion, and has been extensively studied in Computer Vi-
Two perspective images of a single rigid object/scene sion; two excellent reviews are already available in this
are related by the so-called epipolar geometry, which domain (Aggarwal and Nandhakumar, 1988; Huang
can be described by a 3×3 singular matrix. If the inter- and Netravali, 1994). We are interested here in differ-
nal (intrinsic) parameters of the images (e.g., the focal ent techniques for estimating the fundamental matrix
length, the coordinates of the principal point, etc.) are from two uncalibrated images, i.e., the case where both
known, we can work with the normalized image coordi- the intrinsic and extrinsic parameters of the images are
nates (Faugeras, 1993), and the matrix is known as the unknown. From this matrix, we can reconstruct a pro-
essential matrix (Longuet-Higgins, 1981); otherwise, jective structure of the scene, defined up to a 4 × 4
we have to work with the pixel image coordinates, matrix transformation.
and the matrix is known as the fundamental matrix The study of uncalibrated images has many impor-
(Luong, 1992; Faugeras, 1995; Luong and Faugeras, tant applications. The reader may wonder the useful-
1996). It contains all geometric information that is ness of such a projective structure. We cannot obtain
necessary for establishing correspondences between any metric information from a projective structure:
two images, from which three-dimensional structure measurements of lengths and angles do not make sense.
of the perceived scene can be inferred. In a stereovi- However, a projective structure still contains rich in-
sion system where the camera geometry is calibrated, formation, such as coplanarity, collinearity, and cross
it is possible to calculate such a matrix from the cam- ratios (ratio of ratios of distances), which is sometimes
era perspective projection matrices through calibration sufficient for artificial systems, such as robots, to per-
(Ayache, 1991; Faugeras, 1993). When the intrinsic form tasks such as navigation and object recognition
parameters are known but the extrinsic ones (the rota- (Shashua, 1994a; Zeller and Faugeras, 1994; Beardsley
tion and translation between the two images) are not, et al., 1994).
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

162 Zhang

In many applications such as the reconstruction of FMatrix detects false matches, computes the funda-
the environment from a sequence of video images mental matrix and its uncertainty, and performs the
where the parameters of the video lens is submitted projective reconstruction of the points as well. Al-
to continuous modification, camera calibration in the though not reviewed, a software AffineF which com-
classical sense is not possible. We cannot extract any putes the affine fundamental matrix (see Section 5.3)
metric information, but a projective structure is still is also made available.
possible if the camera can be considered as a pin-
hole. Furthermore, if we can introduce some knowl- 2. Epipolar Geometry and Problem Statement
edge of the scene into the projective structure, we can
obtain more specific structure of the scene. For exam- 2.1. Notation
ple, by specifying a plane at infinity (in practice, we
need only to specify a plane sufficiently far away), an A camera is described by the widely used pinhole
affine structure can be computed, which preserves par- model. The coordinates of a 3D point M = [x, y, z]T
allelism and ratios of distances (Quan, 1993; Faugeras, in a world coordinate system and its retinal image co-
1995). Hartley et al. (1992) first reconstruct a pro- ordinates m = [u, v]T are related by
jective structure, and then use eight ground reference
 
points to obtain the Euclidean structure and the camera   x
parameters. Mohr et al. (1993) embed constraints such u y
s v = P   ,
as location of points, parallelism and vertical planes z
(e.g., walls) directly into a minimization procedure to 1
1
determine a Euclidean structure. Robert and Faugeras
(1993) show that the 3D convex hull of an object can where s is an arbitrary scale, and P is a 3 × 4 matrix,
be computed from a pair of images whose epipolar ge- called the perspective projection matrix. Denoting the
ometry is known. homogeneous coordinates of a vector x = [x, y, . . .]T
If we assume that the camera parameters do not by x̃, i.e., x̃ = [x, y, . . . , 1]T , we have s m̃ = PM̃.
change between successive views, the projective invari- The matrix P can be decomposed as
ants can even be used to calibrate the cameras in the
classical sense without using any calibration apparatus P = A[R t],
(known as self-calibration) (Maybank and Faugeras,
1992; Faugeras et al., 1992; Luong, 1992; Zhang et al., where A is a 3 × 3 matrix, mapping the normalized im-
1996; Enciso, 1995): age coordinates to the retinal image coordinates, and
Recently, we have shown (Zhang, 1996a) that even (R, t) is the 3D displacement (rotation and translation)
in the case where images are calibrated, more reliable from the world coordinate system to the camera coor-
results can be obtained if we use the constraints arising dinate system.
from uncalibrated images as an intermediate step. The quantities related to the second camera is indi-
This paper gives an introduction to the epipolar ge- cated by 0 . For example, if mi is a point in the first
ometry, provides a new formula of the fundamental image, mi0 denotes its corresponding point in the sec-
matrix which is valid for both perspective and affine ond image.
cameras, and reviews different methods reported in the A line l in the image passing through point m =
literature for estimating the fundamental matrix. Fur- [u, v]T is described by equation au + bv + c = 0. Let
thermore, a new method is described to compare two l = [a, b, c]T , then the equation can be rewritten as
estimations of the fundamental matrix. It is based on a lT m̃ = 0 or m̃T l = 0. Multiplying l by any non-zero
measure obtained through sampling the whole visible scalar will define the same 2D line. Thus, a 2D line
3D space. Projective reconstruction is also reviewed. is represented by a homogeneous 3D vector. The dis-
The software called FMatrix which implements the tance from point m0 = [u 0 , v0 ]T to line l = [a, b, c]T
reviewed methods and the software called Fdiff is given by
which computes the difference between two fundamen-
tal matrices are both available from my home page: au 0 + bv0 + c
d(m0 , l) = √ .
a 2 + b2
http://www.inria.fr/robotvis/personnel/
zzhang/zzhang-eng.html Note that we here use the signed distance.
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

Epipolar Geometry 163

Finally, we use a concise notation A−T = (A−1 )T = section of the line CC 0 with the image plane I. The
(AT )−1 for any invertible square matrix A. symmetry leads to the following observation. If m (a
point in I) and m0 (a point in I 0 ) correspond to a single
physical point M in space, then m, m0 , C and C 0 must lie
2.2. Epipolar Geometry and Fundamental Matrix in a single plane. This is the well-known co-planarity
constraint in solving motion and structure from motion
The epipolar geometry exists between any two camera problems when the intrinsic parameters of the cameras
systems. Consider the case of two cameras as shown are known (Longuet-Higgins, 1981).
in Fig. 1. Let C and C 0 be the optical centers of the first The computational significance in matching differ-
and second cameras, respectively. Given a point m in ent views is that for a point in the first image, its corres-
the first image, its corresponding point in the second pondence in the second image must lie on the epipolar
image is constrained to lie on a line called the epipolar line in the second image, and then the search space for
line of m, denoted by l0m . The line l0m is the intersec- a correspondence is reduced from two dimensions to
tion of the plane 5, defined by m, C and C 0 (known one dimension. This is called the epipolar constraint.
as the epipolar plane), with the second image plane Algebraically, in order for m in the first image and
I 0 . This is because image point m may correspond to m0 in the second image to be matched, the following
an arbitrary point on the semi-line CM (M may be at equation must be satisfied:
infinity) and that the projection of CM on I 0 is the line
l0m . Furthermore, one observes that all epipolar lines m̃0T Fm̃ = 0 with F = A0−T [t]× RA−1 , (1)
of the points in the first image pass through a common
point e0 , which is called the epipole. Epipole e0 is the where (R, t) is the rigid transformation (rotation and
intersection of the line CC 0 with the image plane I 0 . translation) which brings points expressed in the first
This can be easily understood as follows. For each camera coordinate system to the second one, and [t]× is
point mk in the first image I, its epipolar line l0mk in I 0 the antisymmetric matrix defined by t such that [t]× x =
is the intersection of the plane 5k , defined by mk , C t × x for all 3D vector x. This equation can be derived
and C 0 , with image plane I 0 . All epipolar planes 5k as follows. Without loss of generality, we assume that
thus form a pencil of planes containing the line CC 0 . the world coordinate system coincides with the first
They must intersect I 0 at a common point, which is e0 . camera coordinate system. From the pinhole model,
Finally, one can easily see the symmetry of the epipolar we have
geometry. The corresponding point in the first image
of each point m0k lying on l0mk must lie on the epipolar s m̃ = A[I 0] M̃ and s 0 m̃0 = A0 [R t] M̃.
line lm0k , which is the intersection of the same plane 5k
with the first image plane I. All epipolar lines form Eliminating M̃, s and s 0 in the above two equations, we
a pencil containing the epipole e, which is the inter- obtain Eq. (1). Geometrically, Fm̃ defines the epipolar
line l0m of point m in the second image. Equation (1)
says no more than that the correspondence in the second
image of point m lies on the corresponding epipolar line
l0m . Transposing (1) yields the symmetric relation from
the second image to the first image: m̃T FT m̃0 = 0.
The 3 × 3 matrix F is called the fundamental matrix.
Since det([t]× ) = 0,

det(F) = 0. (2)

F is of rank 2. Besides, it is only defined up to a scalar


factor, because if F is multiplied by an arbitrary scalar,
Eq. (1) still holds. Therefore, a fundamental matrix has
only seven degrees of freedom. There are only seven
independent parameters among the nine elements of
Figure 1. The epipolar geometry. the fundamental matrix.
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

164 Zhang

Convention Note. We use the first camera coordinate image at m0 , that is


system as the world coordinate system. In (Faugeras,
1993; Xu and Zhang, 1996), the second camera coor- s 0 m̃0 = sP0 P+ m̃ + P0 p⊥ .
dinate system is chosen as the world one. In this case,
(1) becomes m̃T F0 m̃0 = 0 with F0 = A−T [t0 ]× R0 A0−1 , Performing a cross product with P0 p⊥ yields
where (R0 , t0 ) transforms points from the second cam-
era coordinate system to the first. The relation between s 0 (P0 p⊥ ) × m̃0 = s(P0 p⊥ ) × (P0 P+ m̃).
(R, t) and (R0 , t0 ) is given by R0 = RT , and t0 = −RT Et.
The reader can easily verify that F = F0T . Eliminating s and s 0 by multiplying m̃0T from the left
(equivalent to a dot product), we have
2.3. A General Form of Epipolar Equation for Any
m̃0T Fm̃ = 0, (5)
Projection Model

In this section we will derive a general form of epipolar where F is a 3 × 3 matrix, called fundamental matrix:
equation which does not assume whether the cameras
follow the perspective or affine projection model (Xu F = [P0 p⊥ ]× P0 P+ . (6)
and Zhang, 1996).
A point m in the first image is matched to a point Since p⊥ is the optical center of the first camera, P0 p⊥
m0 in the second image. From the camera projection is actually the epipole in the second image. It can also
model (orthographic, weak perspective, affine, or full be shown that this expression is equivalent to (1) for the
perspective), we have s m̃ = PM̃ and s 0 m̃0 = P0 M̃, where full perspective projection (see Xu and Zhang, 1996),
P and P0 are 3 × 4 matrices. An image point m defines but it is more general. Indeed, (1) assumes that the first
actually an optical ray, on which every space point M̃ 3 × 3 sub-matrix of P is invertible, and thus is only
projects on the first image at m̃. This optical ray can valid for full perspective projection but not for affine
be written in parametric form as cameras (see Section 5.3), while (6) makes use of the
pseudoinverse of the projection matrix, which is valid
for both full perspective projection as well as affine
M̃ = sP+ m̃ + p⊥ , (3)
cameras. Therefore, the equation does not depend on
any specific knowledge of projection model. Replacing
where P+ is the pseudo-inverse of matrix P: the projection matrix in the equation by specific pro-
jection matrix for each specific projection model (e.g.,
P+ = PT (PPT )−1 , (4) orthographic, weak perspective, affine or full perspec-
tive) produces the epipolar equation for that specific
projection model. See (Xu and Zhang, 1996) for more
and p⊥ is any 4-vector that is perpendicular to all the
details.
row vectors of P, i.e.,
The vector p⊥ still needs to be determined. We first
note that such a vector must exist because the difference
Pp⊥ = 0. between the row dimension and the column dimension
is one, and that the row vectors are generally indepen-
Thus, p⊥ is a null vector of P. As a matter of fact, p⊥ dent from each other. Indeed, one way to obtain p⊥
indicates the position of the optical center (to which all is
optical rays converge). We show later how to determine
p⊥ . For a particular value s, Eq. (3) corresponds to a p⊥ = (I − P+ P)ω, (7)
point on the optical ray defined by m. Equation (3)
is easily justified by projecting M onto the first image, where ω is an arbitrary 4-vector. To show that p⊥ is per-
which indeed gives m. pendicular to each row of P, we multiply p⊥ by P from
Similarly, an image point m0 in the second image the left: Pp⊥ = (P − PPT (PPT )−1 P)ω = 0, which is
defines also an optical ray. Requiring that the two rays indeed a zero vector. The action of I − P+ P is to trans-
to intersect in space implies that a point M correspond- form an arbitrary vector to a vector that is perpendicular
ing to a particular s in (3) must project onto the second to every row vector of P. If P is of rank 3 (which is the
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

Epipolar Geometry 165

case for both perspective and affine cameras), then p⊥ number of point matches required for having a solution
is unique up to a scale factor. of the epipolar geometry.
In this case, n = 7 and rank(U7 ) = 7. Through sin-
gular value decomposition, we obtain vectors f1 and f2
2.4. Problem Statement which span the null space of U7 . The null space is a
linear combination of f1 and f2 , which correspond to
The problem considered in the sequel is the estimation matrices F1 and F2 , respectively. Because of its ho-
of F from a sufficiently large set of point correspon- mogeneity, the fundamental matrix is a one-parameter
dences: {(mi , mi0 ) | i = 1, . . . , n}, where n ≥ 7. family of matrices αF1 + (1 − α)F2 . Since the deter-
The point correspondences between two images can minant of F must be null, i.e.,
be established by a technique such as that described
in (Zhang et al., 1995). We allow, however, that a frac-
det[αF1 + (1 − α)F2 ] = 0,
tion of the matches may be incorrectly paired, and thus
the estimation techniques should be robust.
we obtain a cubic polynomial in α. The maximum
number of real solutions is 3. For each solution α, the
3. Techniques for Estimating fundamental matrix is then given by
the Fundamental Matrix
F = αF1 + (1 − α)F2 .
Let a point mi = [u i , vi ]T in the first image be matched
to a point mi0 = [u i0 , vi0 ]T in the second image. They Actually, this technique has already been used in esti-
must satisfy the epipolar Eq. (1), i.e., m̃i0T Fm̃i = 0. mating the essential matrix when seven point matches
This equation can be written as a linear and homo- in normalized coordinates are available (Huang and
geneous equation in the 9 unknown coefficients of Netravali, 1994). It is also used in (Hartley, 1994; Torr
matrix F: et al., 1994) for estimating the fundamental matrix.
As a matter of fact, the result that there may have
uiT f = 0, (8)
three solutions given seven matches has been known
since 1800’s (Hesse, 1863; Sturm, 1869). Sturm’s al-
where
gorithm (Sturm, 1869) computes the epipoles and the
epipolar transformation (see Section 2.2) from seven
ui = [u i u i0 , vi u i0 , u i0 , u i vi0 , vi vi0 , vi0 , u i , vi , 1]T
point matches. It is based on the observation that the
f = [F11 , F12 , F13 , F21 , F22 , F23 , F31 , F32 , F33 ]T . epipolar lines in the two images are related by a ho-
mography, and thus the cross-ratios of four epipolar
Fi j is the element of F at row i and column j. lines is invariant. In each image, the seven points de-
If we are given n point matches, by stacking (8), we fine seven lines going through the unknown epipole,
have the following linear system to solve: thus providing four independent cross-ratios. Since
these cross-ratios should remain the same in the two
Un f = 0, images, one obtains four cubic polynomial equations
in the coordinates of the epipoles (four independent pa-
where rameters). It is shown that there may exist up to three
solutions for the epipoles.
Un = [u1 , . . . , un ]T .

This set of linear homogeneous equations, together 3.2. Analytic Method with Eight
with the rank constraint of the matrix F, allow us to or More Point Matches
estimate the epipolar geometry.
In practice, we are given more than seven matches.
If we ignore the rank-2 constraint, we can use a
3.1. Exact Solution with Seven Point Matches least-squares method to solve
X¡ ¢2
As described in Section 2.2, a fundamental matrix F min m̃i0T Fm̃i , (9)
has only 7 degrees of freedom. Thus, 7 is the minimum F
i
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

166 Zhang

which can be rewritten as: no coefficient of F prevails over the others. In this case,
the problem (10) becomes a classical one:
min kUn fk2 . (10)
f
min kUn fk2 subject to kfk = 1. (11)
The vector f is only defined up to an unknown scale f

factor. The trivial solution f to the above problem is


f = 0, which is not what we want. To avoid it, we need It can be transformed into an unconstrained minimiza-
to impose some constraint on the coefficients of the tion problem through Lagrange multipliers:
fundamental matrix. Several methods are possible and
are presented below. We will call them the eight-point min F (f, λ), (12)
f
algorithm, although more than eight point matches can
be used. where

3.2.1. Linear Least-Squares Technique. The first F (f, λ) = kUn fk2 + λ(1 − kfk2 ) (13)
method sets one of the coefficients of F to 1, and then
solves the above problem using linear least-squares
and λ is the Lagrange multiplier. By requiring the first
techniques. Without loss of generality, we assume that
derivative of F(f, λ) with respect to f to be zero, we
the last element of vector f (i.e., f 9 = F33 ) is not equal
have
to zero, and thus we can set f 9 = −1. This gives

kUnEfk2 = kU0n f 0 − c9 k2 UnT Un f = λf.

= f 0T U0T 0 0 T 0 0
n Un f − 2c9 Un f + c9 c9 ,
T
Thus, the solution f must be a unit eigenvector of the
9 × 9 matrix UnT Un and λ is the corresponding eigen-
where U0n is the n ×8 matrix composed of the first eight
value. Since matrix UnT Un is symmetric and positive
columns of Un , and c9 is the ninth column of Un . The
semi-definite, all its eigenvalues are real and positive
solution is obtained by requiring the first derivative to
or zero. Without loss of generality, we assume the nine
be zero, i.e.,
eigenvalues of UnT Un are in non-increasing order:
∂kUn fk2
= 0. λ1 ≥ · · · ≥ λi ≥ · · · ≥ λ9 ≥ 0.
∂f 0

By definition of vector derivatives, ∂(aT x)/∂x = a, for We therefore have nine potential solutions: λ = λi for
all vector a. We thus have i = 1, . . . , 9. Back substituting the solution to (13)
gives
0 0
2U0T 0T
n Un f − 2Un c9 = 0,
F(f, λi ) = λi .
or
¡ ¢
0 −1 0T
f 0 = U0T
n Un Un c9 . Since we are seeking to minimize F (f, λ), the solution
to (11) is evidently the unit eigenvector of matrix UnT Un
The problem with this method is that we do not know associated to the smallest eigenvalue, i.e., λ9 .
a priori which coefficient is not zero. If we set an
element to 1 which is actually zero or much smaller 3.2.3. Imposing the Rank-2 Constraint. The advan-
than the other elements, the result will be catastrophic. tage of the linear criterion is that it yields an analytic so-
A remedy is to try all nine possibilities by setting one lution. However, we have found that it is quite sensitive
of the nine coefficients of F to 1 and retain the best to noise, even with a large set of data points. One rea-
estimation. son is that the rank-2 constraint (i.e., det F = 0) is not
satisfied. We can impose this constraint a posteriori.
3.2.2. Eigen Analysis. The second method consists in The most convenient way is to replace the matrix
imposing a constraint on the norm of f, and in particular F estimated with any of the above methods by the
we can set kfk = 1. Compared to the previous method, matrix F̂ which minimizes the Frobenius norm (see
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

Epipolar Geometry 167

Appendix, Section A.3.3) of F − F̂ subject to the con- Suppose that coordinates mi in one image are re-
straint det F̂ = 0. Let placed by m̂i = Tm̃i , and coordinates mi0 in the other
image are replaced by m̂i0 = T0 m̃i0 , where T and
F = USVT T0 are any 3 × 3 matrices. Substituting in the
equation m̃i0T Fm̃i = 0, we derive the equation
be the singular value decomposition of matrix F, where m̂i0T T0−T FT−1 m̂i = 0. This relation implies that
S = diag(σ1 , σ2 , σ3 ) is a diagonal matrix satisfying T0−T FT−1 is the fundamental matrix corresponding to
σ1 ≥ σ2 ≥ σ3 (σi is the ith singular value), and U and the point correspondences m̂i ↔ m̂i0 . Thus, an alter-
V are orthogonal matrices. It can be shown that native method of finding the fundamental matrix is as
follows:
F̂ = UŜVT
1. Transform the image coordinates according to trans-
with Ŝ = diag(σ1 , σ2 , 0) minimizes the Frobenius
formations m̂i = Tm̃i and m̂i0 = T0 m̃i0 .
norm of F − F̂ (see the Appendix B for the proof).
This method was used by Tsai and Huang (1984) in 2. Find the fundamental matrix F̂ corresponding to the
estimating the essential matrix and by Hartley (1995) matches m̂i ↔ m̂i0 .
in estimating the fundamental matrix. 3. Retrieve the original fundamental matrix as F =
T0T F̂T.
3.2.4. Geometric Interpretation of the Linear Crite-
The question now is how to choose the transformations
rion. Another problem with the linear criterion is
T and T0 .
that the quantity we are minimizing is not physically
Hartley (1995) has analyzed the problem with the
meaningful. A physically meaningful quantity should
eight-point algorithm, which shows that its poor per-
be something measured in the image plane, because the
formance is due mainly to the poor conditioning of the
available information (2D points) are extracted from
problem when the pixel image coordinates are directly
images. One such quantity is the distance from a point
used (see Appendix C). Based on this, he has proposed
mi0 to its corresponding epipolar line li0 = Fm̃i ≡
an isotropic scaling of the input data:
[l10 , l20 , l30 ]T , which is given by (see Section 2.1)
1. As a first step, the points are translated so that their
m̃i0T li0 1
d(mi0 , li0 ) =q = 0 m̃i0T Fm̃i , (14) centroid is at the origin.
l102 + l202 ci 2. Then, the coordinates are scaled, so that on the aver-
i = [1, 1, 1] . Such
T
age a point m̃i is of the form m̃√
q a point will lie at a distance 2 from the origin.
where ci0 = l102 + l202 . Thus, the criterion (9) can be Rather than choosing different scale factors for u
rewritten as and v coordinates, we choose to scale the points
X
n isotropically so that the average √ distance from the
min ci02 d 2 (mi0 , li0 ). origin to these points is equal to 2.
F
i=1
Such a transformation is applied to each of the two
This means that we are minimizing not only a physical images independently.
quantity d(mi0 , li0 ), but also ci0 which is not physically An alternative to the isotropic scaling is an affine
meaningful. Luong (1992) shows that the linear crite- transformation so that the two principal moments of
rion introduces a bias and tends to bring the epipoles the set of points are both equal to unity. However,
towards the image center. Hartley (1995) found that the results obtained were
little different from those obtained using the isotropic
3.2.5. Normalizing Input Data. Hartley (1995) has scaling method.
analyzed, from a numerical computation point of view, Beardsley et al. (1994) mention a normalization
the high instability of this linear method if pixel co- scheme which assumes some knowledge of camera pa-
ordinates are directly used, and proposed to perform a rameters. Actually, if approximate intrinsic parameters
simple normalization of input data prior to running the (i.e., the intrinsic matrix A) of a camera are available,
eight-point algorithm. This technique indeed produces we can apply the transformation T = A−1 to obtain a
much better results, and is summarized below. “quasi-Euclidean” frame.
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

168 Zhang

Boufama and Mohr (1995) use implicitly data nor- A natural idea is then to minimize the distances be-
malization by selecting four points, which are largely tweenPpoints and their corresponding epipolar lines:
spread in the image (i.e., most distant from each other), minF i d 2 (m̃i0 , Fm̃i ), where d(·, ·) is given by (14).
to form a projective basis. However, unlike the case of the linear criterion, the two
images do not play a symmetric role. This is because
the above criterion determines only the epipolar lines
3.3. Analytic Method with Rank-2 Constraint in the second image. As we have seen in Section 2.2,
by exchanging the role of the two images, the funda-
The method described in this section is due to Faugeras mental matrix is changed to its transpose. To avoid the
(1995) which imposes the rank-2 constraint during the inconsistency of the epipolar geometry between the two
minimization but still yields an analytic solution. With- images, we minimize the following criterion
out loss of generality, let f = [gT , f 8 , f 9 ]T , where g is
a vector containing the first seven components of f. Let X
min (d 2 (m̃i0 , Fm̃i ) + d 2 (m̃i , FT m̃i0 )), (15)
c8 and c9 be the last two column vectors of Un , and B F
i
be the n ×7 matrix composed of the first seven columns
of Un . From Un f = 0, we have which operates simultaneously in the two images.
Let li0 = Fm̃i ≡ [l10 , l20 , l30 ]T and li = FT m̃i0 ≡
Bg = − f 8 c8 − f9 c9 . [l1 , l2 , l3 ]T . Using (14) and the fact that m̃i0T Fm̃i =
m̃iT FT m̃i0 , the criterion (15) can be rewritten as:
Assume that the rank of B is 7, we can solve for g by X ¡ ¢2
least-squares as min wi2 m̃i0T Fm̃i , (16)
F
i

g = − f 8 (BT B)−1 BT c8 − f 9 (BT B)−1 BT c9 . where


à !1/2
The solution depends on two free parameters f 8 and f 9 . 1 1
As in Section 3.1, we can use the constraint det(F) = 0, wi = + 02
l1 + l2
2 2
l1 + l202
which gives a third-degree homogeneous equation in
à !1/2
f 8 and f 9 , and we can solve for their ratio. Because l 2 + l 2 + l 02 + l202
a third-degree equation has at least one real root, we = ¡ 12 22 ¢¡ 021 ¢ .
are guaranteed to obtain at least one solution for F. l1 + l2 l1 + l202
This solution is defined up to a scale factor, and we
can normalize f such that its vector norm is equal to 1. We now present two methods for solving this problem.
If there are three real roots, we choose the one that
minimizes the vector norm of Un f subject to kfk = 1. In 3.4.1. Iterative Linear Method. The similarity be-
fact, we can do the same computation for any of the 36 tween (16) and (9) leads us to solve the above problem
choices of pairs of coordinates of f and choose, among by a weighted linear least-squares technique. Indeed,
the possibly 108 solutions, the one that minimizes the if we can compute the weight wi for each point match,
previous vector norm. the corresponding linear equation can be multiplied by
The difference between this method and those de- wi (which is equivalent to replacing ui in (8) by wi ui ),
scribed in Section 3.2 is that the latter impose the rank-2 and exactly the same eight-point algorithm can be run
constraint after application of the linear least-squares. to estimate the fundamental matrix, which minimizes
We have experimented this method with a limited num- (16).
ber of data sets, and found the results comparable with The problem is that the weights wi depends them-
those obtained by the previous one. selves on the fundamental matrix. To overcome this
difficulty, we apply an iterative linear method. We first
assume that all wi = 1 and run the eight-point algo-
3.4. Nonlinear Method Minimizing Distances rithm to obtain an initial estimation of the fundamental
of Points to Epipolar Lines matrix. The weights wi are then computed from this
initial solution. The weighted linear least-squares is
As discussed in Section 3.2.4, the linear method (10) then run for an improved solution. This procedure can
does not minimize a physically meaningful quantity. be repeated several times.
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

Epipolar Geometry 169

Although this algorithm is simple to implement and computed, the same decomposition as for the columns
minimizes a physical quantity, our experience shows is used for the rows, which now divides the parame-
that there is no significant improvement compared to terized set into nine maps, corresponding to the choice
the original linear method. The main reason is that of a column and a row as linear combinations of the
the rank-2 constraint of the fundamental matrix is not two columns and two rows left. A parameterization of
taken into account. the matrix is then formed by the two coordinates x and
y of the first epipole, the two coordinates x 0 and y 0 of
3.4.2. Nonlinear Minimization in Parameter Space. the second epipole and the four elements a, b, c and d
From the above discussions, it is clear that the right left by ci1 , ci2 , l j1 and l j2 , which in turn parameterize
thing to do is to search for a matrix among the 3 × 3 the epipolar transformation mapping an epipolar line of
matrices of rank 2 which minimizes (16). There are the second image to its corresponding epipolar line in
several possible parameterizations for the fundamen- the first image. In that way, the matrix is written, for
tal matrix (Luong, 1992), e.g., we can express one example, for i 0 = 3 and j0 = 3:
row (or column) of the fundamental matrix as the lin-  
ear combination of the other two rows (or columns). a b −ax − by
 
The parameterization described below is based directly F= c d −cx − dy 
on the parameters of the epipolar transformation (see −ax 0 − cy 0 −bx 0 − dy 0 F33
Section 2.2).
(19)
Parameterization of Fundamental Matrix. Let us de-
with
note the columns of F by the vectors c1 , c2 and c3 . The
rank-2 constraint on F is equivalent to the following F33 = (ax + by)x 0 + (cx + dy)y 0 .
two conditions:
At last, to take into account the fact that the fundamental
∃λ1 , λ2 such that c j0 + λ1 c j1 + λ2 c j2 = 0 (17)
matrix is defined only up to a scale factor, the matrix is
6 ∃λ such that c j1 + λc j2 = 0 (18) normalized by dividing the four elements (a, b, c, d)
by the largest in absolute value. We have thus in total
for j0 , j1 , j2 ∈ [1, 3], where λ1 , λ2 and λ are scalars. 36 maps to parameterize the fundamental matrix.
Condition (18), as a non-existence condition, cannot be
expressed by a parameterization: we shall only keep Choosing the Best Map. Giving a matrix F and the
condition (17) and so extend the parameterized set to epipoles, or an approximation to it, we must be able
all the 3 × 3 matrices of rank strictly less than 3. In- to choose, among the different maps of the parame-
deed, the rank-2 matrices of, for example, the following terization, the most suitable for F. Denoting by fi0 j0
forms: the vector of the elements of F once decomposed as in
Eq. (19), i 0 and j0 are chosen in order to maximize the
[c1 c2 λc2 ] , [c1 03 c3 ] and [c1 c2 03 ] rank of the 9 × 8 Jacobian matrix:
do not have any parameterization if we take j0 = 1. A dfi0 j0
J= where p = [x, y, x 0 , y 0 , a, b, c, d]T . (20)
parameterization of F is then given by (c j1 , c j2 , λ1 , λ2 ). dp
This parameterization implies to divide the parameter-
This is done by maximizing the norm of the vector
ized set among three maps, corresponding to j0 = 1,
whose coordinates are the determinants of the nine
j0 = 2 and j0 = 3.
If we construct a 3-vector such that λ1 and λ2 are the 8 × 8 submatrices of J. An easy calculation shows
j1 th and j2 th coordinates and 1 is the j0 th coordinate, that this norm is equal to
p p
then it is obvious that this vector is the eigenvector of (ad − bc)2 x 2 + y 2 + 1 x 02 + y 02 + 1.
F, and is thus the epipole in the case of the fundamen-
tal matrix. Using such a parameterization implies to At the expense of dealing with different maps, the
compute directly the epipole which is often a useful above parameterization works equally well whether the
quantity, instead of the matrix itself. epipoles are at infinity or not. This is not the case with
To make the problem symmetrical and since the the original proposition in Luong (1992). More details
epipole in the other image is also worth being can be found in (Csurka et al., 1996).
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

170 Zhang

Minimization. The minimization of (16) can now does not affect the minimization, the problem (21) be-
be performed by any minimization procedure. The comes
Levenberg-Marquardt method (as implemented in X¡ ¢2 ± 2
MINPACK from NETLIB (More, 1977) and in the Nu- min m̃i0T Fm̃i gi ,
F
meric Recipes in C (Press et al., 1988)) is used in our i
program. During the process of minimization, the pa- q
rameterization of F can change: The parameterization where gi = l12 + l22 + l102 + l202 is simply the gradient
chosen for the matrix at the beginning of the process of f i . Note that gi depends on F.
is not necessarily the most suitable for the final matrix. It is shown (Luong, 1992) that f i /gi is a first or-
The nonlinear minimization method demands an initial der approximation of the orthogonal distance from
estimate of the fundamental matrix, which is obtained (mi , mi0 ) to the quadratic surface defined by m̃0T
by running the eight-point algorithm. Fm̃ = 0.

3.5. Gradient-Based Technique 3.6. Nonlinear Method Minimizing Distances


P Between Observation and Reprojection
Let f i = m̃i0T Fm̃i . Minimizing i f i2 does not yield a
good estimation of the fundamental matrix, because the If we can assume that the coordinates of the observed
variance of each fi is not the same. The least-squares points are corrupted by additive noise and that the
technique produces an optimal solution if each term noises in different points are independent but with equal
has the same variance. Therefore, we can minimize the standard deviation (the same assumption as that used
following weighted sum of squares: in the previous technique), then the maximum likeli-
X ± hood estimation of the fundamental matrix is obtained
min f i2 σ 2fi , (21) by minimizing the following criterion:
F
i

F (f, M) = kmi − h(f, Mi )k2
where σ 2fi is the variance of f i , and its computation i
will be given shortly. This criterion now has the de- ¢
sirable property: f i /σ fi follows, under the first order + kmi0 − h0 (f, Mi )k2 , (22)
approximation, the standard Gaussian distribution. In
particular, all f i /σ fi have the same variance, equal to 1. where f represents the parameter vector of the funda-
The same parameterization of the fundamental matrix mental matrix such as the one described in Section 3.4,
as that described in the previous section is used. M = [M1T , . . . , MnT ]T are the structure parameters of the
Because points are extracted independently by the n points in space, while h(f, Mi ) and h0 (f, Mi ) are the
same algorithm, we make a reasonable assumption that projection functions in the first and second image for
the image points are corrupted by independent and a given space coordinates Mi and a given fundamental
identically distributed Gaussian noise, i.e., their co- matrix between the two images represented by vector
variance matrices are given by f. Simply speaking, F(f, M) is the sum of squared dis-
tances between observed points and the reprojections of
Λmi = Λmi0 = σ 2 diag(1, 1), the corresponding points in space. This implies that we
estimate not only the fundamental matrix but also the
where σ is the noise level, which may be not known. structure parameters of the points in space. The estima-
Under the first order approximation, the variance of f i tion of the structure parameters, or 3D reconstruction,
is then given by in the uncalibrated case is an important subject and
needs a separate section to describe it in sufficient de-
µ ¶ µ ¶ tails (see Appendix A). In the remaining subsection,
∂ fi T ∂ fi ∂ fi T ∂ fi
σ 2fi = Λmi + Λmi0 we assume that there is a procedure available for 3D
∂mi ∂mi ∂mi0 ∂mi0
£ ¤ reconstruction.
= σ 2 l12 + l22 + l102 + l202 , A generalization to (22) is to take into account differ-
ent uncertainties, if available, in the image points. If a
where li0 = Fm̃i ≡ [l10 , l20 , l30 ]T and li = FT m̃i0 ≡ point mi is assumed to be corrupted by a Gaussian noise
[l1 , l2 , l3 ]T . Since multiplying each term by a constant with mean zero and covariance matrix Λmi (a 2 × 2
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

Epipolar Geometry 171

symmetric positive-definite matrix), then the maximum 3.7. Robust Methods


likelihood estimation of the fundamental matrix is ob-
tained by minimizing the following criterion: Up to now, we assume that point matches are given.
X¡ They can be obtained by techniques such as correla-
0T −1
¢
F (f, M) = 1miT Λ−1 0
mi 1mi + 1mi Λm0 1mi
tion and relaxation (Zhang et al., 1995). They all ex-
i
i ploit some heuristics in one form or another, for exam-
ple, intensity similarity or rigid/affine transformation
with in image plane, which are not applicable to most cases.
Among the matches established, we may find two types
1mi = mi − h(f, Mi ) and 1mi0 = mi0 − h0 (f, Mi ). of outliers due to bad locations and false matches.

Here we still assume that the noises in different points Bad Locations. In the estimation of the fundamental
are independent, which is quite reasonable. matrix, the location error of a point of interest is as-
When the number of points n is large, the nonlin- sumed to exhibit Gaussian behavior. This assump-
ear minimization of F(f, M) should be carried out in tion is reasonable since the error in localization for
a huge parameter space (3n + 7 dimensions because most points of interest is small (within one or two
each space point has 3 degrees of freedom), and the pixels), but a few points are possibly incorrectly lo-
computation is very expensive. As a matter of fact, the calized (more than three pixels). The latter points
structure of each point can be estimated independently will severely degrade the accuracy of the estimation.
given an estimate of the fundamental matrix. We thus False Matches. In the establishment of correspon-
conduct the optimization of the structure parameters in dences, only heuristics have been used. Because
each optimization iteration for the parameters of the the only geometric constraint, i.e., the epipolar con-
fundamental matrix, that is: straint in terms of the fundamental matrix, is not yet
( available, many matches are possibly false. These
X ¡ will completely spoil the estimation process, and the
min min kmi − h(f, Mi )k2 final estimate of the fundamental matrix will be use-
f Mi
i
¾ less.
¢
+ kmi0 − h0 (f, Mi )k2 . (23)
The outliers will severely affect the precision of the fun-
damental matrix if we directly apply the methods de-
Therefore, a problem of minimization over (3n +7)- scribed above, which are all least-squares techniques.
D space (22) becomes a problem of minimization over Least-squares estimators assume that the noise cor-
7-D space, in the latter each iteration contains n in- rupting the data is of zero mean, which yields an
dependent optimizations of three structure parameters. unbiased parameter estimate. If the noise variance is
The computation is thus considerably reduced. As will known, a minimum-variance parameter estimate can
be seen in Section 5.5, the optimization of structure be obtained by choosing appropriate weights on the
parameters is nonlinear. In order to speed up still more data. Furthermore, least-squares estimators implicitly
the computation, it can be approximated by an ana- assume that the entire set of data can be interpreted by
lytic method; when this optimization procedure con- only one parameter vector of a given model. Numer-
verges, we then restart it with the nonlinear optimiza- ous studies have been conducted, which clearly show
tion method. that least-squares estimators are vulnerable to the vi-
The idea underlying this method is already well olation of these assumptions. Sometimes even when
known in motion and structure from motion (Faugeras, the data contains only one bad datum, least-squares es-
1993; Zhang, 1995) and camera calibration (Faugeras, timates may be completely perturbed. During the last
1993). Similar techniques have also been reported for three decades, many robust techniques have been pro-
uncalibrated images (Mohr et al., 1993; Hartley, 1993). posed, which are not very sensitive to departure from
Because of the independence of the structure estimation the assumptions on which they depend.
(see last paragraph), the Jacobian matrix has a simple Recently, computer vision researchers have paid
block structure in the Levenberg-Marquardt algorithm. much attention to the robustness of vision algorithms
Hartley (1993) exploits this property to simplify the because the data are unavoidably error prone (Haralick,
computation of the pseudo-inverse of the Jacobian. 1986; Zhuang et al., 1992). Many the so-called robust
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

172 Zhang

regression methods have been proposed that are not so This is exactly the system of equations that we ob-
easily affected by outliers (Huber, 1981; Rousseeuw tain if we solve the following iterated reweighted least-
and Leroy, 1987). The reader is referred to (Rousseeuw squares problem
and Leroy, 1987, Chap. 1) for a review of different ro- X ¡ (k−1) ¢
bust methods. The two most popular robust methods min w ri ri2 , (28)
i
are the M-estimators and the least-median-of-squares
(LMedS) method, which will be presented below. More where the superscript (k) indicates the iteration number.
details together with a description of other parameter The weight w(ri(k−1) ) should be recomputed after each
estimation techniques commonly used in computer vi- iteration in order to be used in the next iteration.
sion are provided in (Zhang, 1996c). Recent works The influence function ψ(x) measures the influence
on the application of robust techniques to motion seg- of a datum on the value of the parameter estimate. For
mentation include (Torr and Murray, 1993; Odobez example, for the least-squares with ρ(x) = x 2 /2, the
and Bouthemy, 1994; Ayer et al., 1994) and those on influence function is ψ(x) = x, that is, the influence of
the recovery of the epipolar geometry include (Olsen, a datum on the estimate increases linearly with the size
1992; Shapiro and Brady, 1995; Torr, 1995). of its error, which confirms the non-robustness of the
least-squares estimate. When an estimator is robust, it
3.7.1. M-Estimators. Let ri be the residual of the ith may be inferred that the influence of any single obser-
datum, i.e., the difference between the ith observation vation (datum) is insufficient to yield any significant
and its fitted value.PThe standard least-squares method offset (Rey, 1983). There are several constraints that a
tries to minimize i ri2 , which is unstable if there are robust M-estimator should meet:
outliers present in the data. Outlying data give an effect
so strong in the minimization that the parameters thus • The first is of course to have a bounded influence
estimated are distorted. The M-estimators try to reduce function.
the effect of outliers by replacing the squared residuals • The second is naturally the requirement of the ro-
ri2 by another function of the residuals, yielding bust estimator to be unique. This implies that the
X objective function of parameter vector p to be mini-
min ρ(ri ), (24) mized should have a unique minimum. This requires
i that the individual ρ-function is convex in variable p.
This is necessary because only requiring a ρ-function
where ρ is a symmetric, positive-definite function with
to have a unique minimum is not sufficient. This
a unique minimum at zero, and is chosen to be less
is the case with maxima when considering mixture
increasing than square. Instead of solving directly this
distribution; the sum of unimodal probability dis-
problem, we can implement it as an iterated reweighted
tributions is very often multimodal. The convexity
least-squares one. Now let us see how.
constraint is equivalent to imposing that ∂ ∂pρ(:)
2
2 is non-
Let p = [ p1 , . . . , p p ]T be the parameter vector to
negative definite.
be estimated. The M-estimator of p based on the func-
• The third one is a practical requirement. Whenever
tion ρ(ri ) is the vector p which is the solution of the ∂ 2 ρ(·)
following p equations: ∂p2
is singular, the objective should have a gradient,
∂ρ(·)
i.e., ∂p 6= 0. This avoids having to search through
X ∂ri the complete parameter space.
ψ(ri ) = 0, for j = 1, . . . , p, (25)
i
∂pj
There are a number of different M-estimators pro-
where the derivative ψ(x) = dρ(x)/d x is called the posed in the literature. The reader is referred to (Zhang,
influence function. If now we define a weight function 1996c) for a comprehensive review.
It seems difficult to select a ρ-function for general
ψ(x) use without being rather arbitrary. The result reported
w(x) = , (26)
x in Section 4 uses Tukey function:
 Ã "
then Eq. (25) becomes  µ ¶2 #3 !
 2
 c 1 − 1 − ri if |ri | ≤ cσ
X ∂ri ρ(ri ) = 6 cσ
w(ri )ri = 0, for j = 1, . . . , p. (27) 

∂pj  (c2 /6) otherwise,
i
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

Epipolar Geometry 173

where σ is some estimated standard deviation of errors, the nonlinear minimization problem:
and c = 4.6851 is the tuning constant. The correspond-
ing weight function is min median ri2 .
i
½
[1 − (x/c)2 ]2 if |ri | ≤ cσ
wi = That is, the estimator must yield the smallest value for
0 otherwise.
the median of squared residuals computed for the en-
tire data set. It turns out that this method is very robust
Another commonly used function is the following tri-
to false matches as well as outliers due to bad localiza-
weight one:
tion. Unlike the M-estimators, however, the LMedS
 problem cannot be reduced to a weighted least-squares

1 |ri | ≤ σ
problem. It is probably impossible to write down a
wi = σ/|ri | σ < |ri | ≤ 3σ straightforward formula for the LMedS estimator. It


0 3σ < |ri |. must be solved by a search in the space of possible es-
timates generated from the data. Since this space is too
In (Olsen, 1992; Luong, 1992), this weight function large, only a randomly chosen subset of data can be ana-
was used for the estimation of the epipolar geometry. lyzed. The algorithm which we have implemented (the
Inherent in the different M-estimators is the simul- original version was described in (Zhang et al., 1994;
taneous estimation of σ , the standard deviation of the Deriche et al., 1994; Zhang et al., 1995) for robustly es-
residual errors. If we can make a good estimate of the timating the fundamental matrix follows the one struc-
standard deviation of the errors of good data (inliers), tured in (Rousseeuw and Leroy, 1987; Chap. 5), as
then data whose error is larger than a certain number outlined below.
of standard deviations can be considered as outliers. Given n point correspondences: {(mi , mi0 ) | i =
Thus, the estimation of σ itself should be robust. The 1, . . . , n}, we proceed the following steps:
results of the M-estimators will depend on the method
used to compute it. The robust standard deviation es- 1. A Monte Carlo type technique is used to draw m
timate is related to the median of the absolute values random subsamples of p = 7 different point corre-
of the residuals, and is given by spondences (recall that 7 is the minimum number to
determine the epipolar geometry).
σ̂ = 1.4826[1 + 5/(n − p)] median|ri |. (29) 2. For each subsample, indexed by J , we use the tech-
i
nique described in Section 3.1 to compute the fun-
The constant 1.4826 is a coefficient to achieve the same damental matrix F J . We may have at most three
efficiency as a least-squares in the presence of only solutions.
Gaussian noise (actually, the median of the absolute 3. For each F J , we can determine the median of the
values of random numbers sampled from the Gaus- squared residuals, denoted by M J , with respect to
sian normal distribution N (0, 1) is equal to 8−1 ( 34 ) ≈ the whole set of point correspondences, i.e.,
1/1.4826); 5/(n − p) (where n is the size of the data £ ¡ ¢¤
set and p is the dimension of the parameter vector) is M J = median d 2 (m̃i0 , F J m̃i ) + d 2 m̃i , FTJ m̃i0 .
i=1,...,n
to compensate the effect of a small set of data. The
reader is referred to (Rousseeuw and Leroy, 1987, p.
202) for the details of these magic numbers. Here, the distances between points and epipolar
Our experience shows that M-estimators are robust lines are used, but we can use other error measures.
to outliers due to bad localization. They are, however, 4. Retain the estimate F J for which M J is minimal
not robust to false matches, because they depend heav- among all m M J ’s.
ily on the initial guess, which is usually obtained by
least-squares. This leads us to use other more robust The question now is: How do we determine m? A
techniques. subsample is “good” if it consists of p good corre-
spondences. Assuming that the whole set of corre-
3.7.2. Least Median of Squares (LMedS). The spondences may contain up to a fraction ε of outliers,
LMedS method estimates the parameters by solving the probability that at least one of the m subsamples is
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

174 Zhang

good is given by

P = 1 − [1 − (1 − ε) p ]m . (30)

By requiring that P must be near 1, one can determine


m for given values of p and ε:

log(1 − P)
m= .
log[1 − (1 − ε) p ]

In our implementation, we assume ε = 40% and re-


quire P = 0.99, thus m = 163. Note that the algorithm
can be speeded up considerably by means of parallel
computing, because the processing for each subsample
can be done independently.
As noted in (Rousseeuw and Leroy, 1987), the Figure 2. Illustration of a bucketing technique.
LMedS efficiency is poor in the presence of Gaussian
noise. The efficiency of a method is defined as the ra- As said previously, computational efficiency of the
tio between the lowest achievable variance for the esti- LMedS method can be achieved by applying a Monte
mated parameters and the actual variance provided by Carlo type technique. However, the seven points of a
the given method. To compensate for this deficiency, subsample thus generated may be very close to each
we further carry out a weighted least-squares proce- other. Such a situation should be avoided because the
dure. The robust standard deviation estimate is given estimation of the epipolar geometry from such points is
by (29), that is, highly instable and the result is useless. It is a waste of
p time to evaluate such a subsample. In order to achieve
σ̂ = 1.4826[1 + 5/(n − p)] M J , higher stability and efficiency, we develop a regularly
random selection method based on bucketing tech-
where M J is the minimal median estimated by the niques, which works as follows. We first calculate the
LMedS. Based on σ̂ , we can assign a weight for each min and max of the coordinates of the points in the first
correspondence: image. The region is then evenly divided into b × b
( buckets (see Fig. 2). In our implementation, b = 8. To
1 if ri2 ≤ (2.5σ̂ )2 each bucket is attached a set of points, and indirectly
wi = a set of matches, which fall in it. The buckets having
0 otherwise,
no matches attached are excluded. To generate a sub-
where sample of seven points, we first randomly select seven
mutually different buckets, and then randomly choose
ri2 = d 2 (m̃i0 , Fm̃i ) + d 2 (m̃i , FT m̃i0 ). one match in each selected bucket.
One question remains: How many subsamples are
The correspondences having wi = 0 are outliers and required? If we assume that bad matches are uniformly
should not be further taken into account. We thus con- distributed in space, and if each bucket has the same
duct an additional step: number of matches and the random selection is uni-
form, the formula (30) still holds. However, the num-
5. Refine the fundamental matrix F by solving the ber of matches in one bucket may be quite different
weighted least-squares problem: from that in another. As a result, a match belonging
to a bucket having fewer matches has a higher proba-
X bility to be selected. It is thus preferred that a bucket
min wi ri2 .
i
having many matches has a higher probability to be
selected than a bucket having few matches, in order
The fundamental matrix is now robustly and accurately for each match to have almost the same probability
estimated because outliers have been detected and dis- to be selected. This can be realized by the following
carded by the LMedS method. procedure. If we have in total l buckets, we divide
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

Epipolar Geometry 175

as a sample of f and the uncertainty is given by the


covariance matrix of f.
In the remaining of this subsection, we consider a
general random vector y ∈ IR p , where p is the dimen-
sion of the vector space. The same discussion applies,
of course, directly to the fundamental matrix. The
covariance of y is defined by the positive symmetric
matrix

Λy = E[(y − E[y])(y − E[y])T ], (31)

where E[y] denotes the mean of the random vector y.

Figure 3. Interval and bucket mapping. 3.8.1. The Statistical Method. The statistical method
consists in using the well-known large number law to
approximate the mean: if we have a sufficiently large
range [0 1] into l intervalsPsuch that the width of the number N of samples yi of a random vector y, then
ith interval is equal to n i / i n i , where n i is the num- E[y] can be approximated by the sample mean
ber of matches attached to the ith bucket (see Fig. 3).
During the bucket selection procedure, a number, pro- 1 XN
duced by a [0 1] uniform random generator, falling in E N [yi ] = yi ,
N i=1
the ith interval implies that the ith bucket is selected.
Together with the matching technique described
and Λy is then approximated by
in (Zhang et al., 1995), we have implemented this ro-
bust method and successfully solved, in an automatic
way, the matching and epipolar geometry recovery 1 X N
[(yi − E N [yi ])(yi − E N [yi ])T ]. (32)
problem for different types of scenes such as indoor, N − 1 i=1
rocks, road, and textured dummy scenes. The corre-
sponding software image-matching has been made A rule of thumb is that this method works reasonable
available on the Internet since 1994. well when N > 30. It is especially useful for simula-
tion. For example, through simulation, we have found
that the covariance of the fundamental matrix estimated
3.8. Characterizing the Uncertainty by the analytical method through a first order approxi-
of Fundamental Matrix mation (see below) is quite good when the noise level
in data points is moderate (the standard deviation is not
Since the data points are always corrupted by noise, larger than one pixel) (Csurka et al., 1996).
and sometimes the matches are even spurious or incor-
rect, one should model the uncertainty of the estimated 3.8.2. The Analytical Method
fundamental matrix in order to exploit its underlying
geometric information correctly and effectively. For
The Explicit Case. We now consider the case that y is
example, one can use the covariance of the fundamental
computed from another random vector x of IRm using
matrix to compute the uncertainty of the projective
a C 1 function ϕ:
reconstruction or the projective invariants, or to im-
prove the results of Kruppa’s equation for a better self-
y = ϕ(x).
calibration of a camera (Zeller, 1996).
In order to quantify the uncertainty related to the
Writing the first order Taylor expansion of ϕ in the
estimation of the fundamental matrix by the method
neighborhood of E[x] yields
described in the previous sections, we model the fun-
damental matrix as a random vector f ∈ IR7 (vector
space of real 7-vectors) whose mean is the exact value ϕ(x) = ϕ(E[x]) + Dϕ (E[x]) · (x − E[x])
we are looking for. Each estimation is then considered + O(x − E[x])2 , (33)
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

176 Zhang

where O(x)2 denotes the terms of order 2 or higher The Case of a Sum of Squares of Implicit Functions.
in x, and Dϕ (x) = ∂ϕ(x)/∂x is the Jacobian matrix. Here we study the case where C is of the form:
Assuming that any sample of x is sufficiently close to
E[x], we can approximate ϕ by the first order terms of X
n
Ci2 (xi , z)
(33) which yields: i=1

E[y] ' ϕ(E[x]), with x = [x1T , . . . , xiT , . . . , xnT ]T . Then, we have

ϕ(x) − ϕ(E[x]) ' Dϕ (E[x]) · (x − E[x]). ¶ X µ


∂Ci T
Φ=2 Ci
i
∂z
X ∂Ci ¶T ∂Ci
µ
The first order approximation of the covariance matrix
∂Φ X ∂ 2 Ci
of y is then given in function of the covariance matrix H= =2 +2 Ci 2 .
of x by ∂z i
∂z ∂z i
∂z

Now, it is a usual practice to neglect the terms Ci ∂∂zC2i


2
Λy = E[(ϕ(x) − ϕ(E[x]))(ϕ(x) − ϕ(E[x]))T ]
with respect to the terms ( ∂C
∂z
i T ∂C i
) ∂z (see classical books
= Dϕ (E[x])Λx Dϕ (E[x])T . (34) of numerical analysis (Press et al., 1988)) and the nu-
merical tests we did confirm that we can do this because
The Case of an Implicit Function. In some cases like the former is much smaller than the latter. We can then
ours, the parameter is obtained through minimization. write:
Therefore, ϕ is implicit and we have to make use of
∂Φ X µ ∂Ci ¶T ∂Ci
the well-known implicit functions theorem to obtain H= ≈2 .
the following result (see Faugeras, 1993; Chap. 6). ∂z i
∂z ∂z

In the same way we have:


Proposition 1. Let a criterion function C: IRm ×
IR p → IR be a function of class C ∞ , x0 ∈ IRm be ∂Φ X µ ∂Ci ¶T ∂Ci
the measurement vector and y0 ∈ IR p be a local mini- ≈2 .
∂x ∂z ∂x
mum of C(x0 , z). If the Hessian H of C with respect to i
z is invertible at (x, z) = (x0 , y0 ) then there exists an
open set U 0 of IRm containing x0 and an open set U 00 Therefore, Eq. (36) becomes:
of IR p containing y0 and a C ∞ mapping ϕ: IRm → IR p µ ¶T µ ¶T
P
such that for (x, y) in U 0 × U 00 the two relations “y Λy = 4H −1 ∂Ci ∂Ci
Λx
∂C j ∂C j
H−T .
i, j ∂z ∂x ∂x ∂z
is a local minimum of C(x, z) with respect to z” and
y = ϕ(x) are equivalent. Furthermore, we have the (37)
following equation:
Assume that the noise in xi and that in x j ( j 6= i)
∂Φ are independent (which is quite reasonable because
Dϕ (x) = −H−1 , (35) the points are extracted independently), then Λxi, j =
∂x
E[(xi − x̄i )(x j − x̄ j )T ] = 0 and Λx = diag(Λx1 , . . . ,
where Λxn ). Equation (37) can then be written as
µ ¶T
∂C ∂Φ X µ ∂Ci ¶T ∂Ci µ
∂Ci
¶T
∂Ci −T
Φ= and H= . Λy = 4H −1
Λxi H .
∂z ∂z ∂z ∂xi ∂xi ∂z
i

Taking x0 = E[x] and y0 = E[y], Eq. (34) then Since ΛCi = ∂C i


Λxi ( ∂C ) by definition (up to the first
i T
∂xi ∂xi
becomes order approximation), the above equation reduces to
µ ¶
∂Φ ∂Φ T −T X µ ∂Ci ¶T ∂Ci −T
Λy = H−1 Λx H . (36) Λy = 4H −1
ΛCi H . (38)
∂x ∂x ∂z ∂z
i
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

Epipolar Geometry 177

Considering that the mean of the value of Ci at the 3.9.1. Virtual Parallax Method. If two sets of im-
minimum is zero and under the somewhat strong as- age points are the projections of a plane in space (see
sumption that the Ci ’s are independent and have identi- Section 5.2), then they are related by a homography
cal distributed errors (Note: it is under this assumption H. For points not on the plane, they do not verify the
that the solution given by the least-squares technique is homography, i.e., m̃0 6= ρHm̃, where ρ is an arbitrary
optimal), we can then approximate ΛCi by its sample non-zero scalar. The difference (i.e., parallax) allows
variance (see e.g., Anderson, 1958): us to estimate directly an epipole if the knowledge of
H is available. Indeed, Luong and Faugeras (1996)
1 X 2 S show that the fundamental matrix and the homography
ΛCi = Ci = , is related by F = [ẽ0 ]× H. For a point which does not
n−p i n−p
belong to the plane, l0 = m̃0 × Hm̃ defines an epipo-
lar line, which provides one constraint on the epipole:
where S is the value of the criterion C at the minimum,
ẽ0T l0 = 0. Therefore, two such points are sufficient to
and p is the number of parameters, i.e., the dimension
estimate the epipole e0 . The generate-and-test methods
of y. Although it has little influence when n is big, the
(see e.g., Faugeras and Lustman, 1988), can be used to
inclusion of p in the formula above aims at correcting
detect the coplanar points.
the effect of a small sample set. Indeed, for n = p,
The virtual parallax method proposed by Boufama
we can almost always find an estimate of y such that
and Mohr (1995) does not require the prior identifica-
Ci = 0 for all i, and it is not meaningful to estimate
tion of a plane. To simplify the computations, without
the variance. Equation (38) finally becomes
loss of generality, we can perform a change of projec-
tive coordinates in each image such that
2S 2S
Λy = H−1 HH−T = H−T . (39)
n−p n−p
m̃1 = [1, 0, 0]T , m̃2 = [0, 1, 0]T , m̃3 = [0, 0, 1]T ,
The Case of the Fundamental Matrix. As explained m̃4 = [1, 1, 1]T ; (41)
in Section 3.4, F is computed using a sum of squares of m̃01 = [1, 0, 0] ,T
m̃02 = [0, 1, 0] ,T
m̃03 = [0, 0, 1]T ,
implicit functions of n point correspondences. Thus,
referring to the previous paragraph, we have p = 7, m̃04 = [1, 1, 1]T . (42)
and the criterion function C(m̂, f7 ) (where m̂ = [m1 ,
m01 , . . . , mn , m0n ]T and f7 is the vector of the seven These points are chosen such that no three of them are
chosen parameters for F) is given by (15). Λf7 is thus collinear. The three first points define a plane in space.
computed by (39) using the Hessian obtained as a by- Under such choice of coordinate systems, the homogra-
product of the minimization of C(m̂, f7 ). phy matrix such that m̃i0 = ρHm̃i (i = 1, 2, 3) is diag-
According to (34), ΛF is then computed from Λf7 : onal, i.e., H = diag (a, b, c), and depends only on two
parameters. Let the epipole be ẽ0 = [eu0 , ev0 , et0 ]T . As we
∂F(f7 ) ∂F(f7 ) T have seen in the last paragraph, for each additional point
ΛF = Λf7 . (40) (mi , mi0 ) (i = 4, . . . , n), we have ẽ0T (m̃i0 × Hm̃i ) = 0,
∂f7 ∂f7
i.e.,
Here, we actually consider the fundamental matrix
F(f7 ) as a 9-vector composed of the nine coefficients vi0 eu0 c − vi eu0 b + u i ev0 a − u i0 ev0 c + u i0 vi et0 b
which are functions of the seven parameters f7 . −vi0 u i et0 a = 0. (43)
The reader is referred to (Zhang and Faugeras, 1992,
Chap. 2) for a more detailed exposition on uncertainty This is the basic epipolar equation based on virtual par-
manipulation. allax. Since (a, b, c) and (eu0 , ev0 , et0 ) are defined each
up to a scale factor, the above equation is polynomial of
degree two in four unknowns. To simplify the problem,
3.9. Other Techniques
we make the following reparameterization. Let
To close the review section, we present two analyt-
x1 = eu0 c, x2 = eu0 b, x3 = ev0 a,
ical techniques and one robust technique based on
RANSAC. x4 = ev0 c, x5 = et0 b, and x6 = et0 a,
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

178 Zhang

which are defined up to a common scale factor. Equa- points for the first camera the following coordinates:
tion (43) now becomes
M̃1 = [1, 0, 0, 0]T , M̃2 = [0, 1, 0, 0]T ,
vi0 x1 − vi x2 + u i x3 − u i0 x4 + u i0 vi x5 − vi0 u i x6 = 0. C̃ = [0, 0, 1, 0]T , (46)
(44)
M̃3 = [0, 0, 0, 1]T , M̃4 = [1, 1, 1, 1]T .
Unlike (43), we here have five independent variables, The same coordinates are assigned to the basis points
one more than necessary. The unknowns xi (i = for the second camera. Therefore, the camera projec-
1, . . . , 6) can be solved linearly if we have five or more tion matrix for the first camera is given by
point matches. Thus, we need in total eight point cor-
 
respondences, like the eight-point algorithm. The orig- 1 0 0 0
inal unknowns can be computed, for example, as  
P = 0 1 0 0. (47)
eu0 = et0 x2 /x5 , ev0 = et0 x3 /x6 , 0 0 0 1
(45)
a = cx3 /x4 , b = cx2 /x1 . Let the coordinates of the optical center C of the first
camera be [α, β, γ , 1]T in the projective basis of the
The fundamental matrix is finally obtained as second camera, and let the coordinates of the four scene
[ẽ0 ]× diag (a, b, c), and the rank constraint is automat- points remain the same in both projective bases, i.e.,
ically satisfied. However, note that Mi0 = Mi (i = 1, . . . , 4). Then, the coordinate transfor-
mation H from the projective basis of the first camera
• the computation (45) is not optimal, because each to that of the second camera is given by
intermediate variable xi is not used equally;  
• the rank-2 constraint in the linear Eq. (44) is not γ −α 0 α 0
 0 γ −β β 0 
necessarily satisfied because of the introduction of  
an intermediate parameter. H= . (48)
 0 0 γ 0 
0 0 1 γ −1
Therefore, the rank-2 constraint is also imposed a
posteriori, similar to the eight-point algorithm (see It is then a straightforward manner to obtain the pro-
Section 3.2). jection matrix of the first camera with respect to the
The results obtained with this method depends on the projective basis of the second camera:
choice of the four basis points. The authors indicate  
that a good choice is to take them largely spread in the γ −α 0 α 0
 
image. P0 = PH =  0 γ −β β 0  . (49)
Experiments show that this method produces good
0 0 1 γ −1
results. Factors which contribute to this are the fact
the dimensionality of the problem has been reduced, According to (6), the epipolar equation is m̃i0T Fm̃i
and the fact that the change of projective coordinates = 0, while the fundamental matrix is given by F
achieve a data renormalization comparable to the one = [P0 p⊥ ]× P0 P+ . Since
described in Section 3.2.5.
 
0
3.9.2. Linear Subspace Method. Ponce and Genc 0
 
(1996), through a change of projective coordinates, set p⊥ = C =  
1
up a set of linear constraints on one epipole using the
linear subspace method proposed by Heeger and Jepson 0
(1992). A change of projective coordinates in each im-  
1 0 0
age as described in (41) and (42) is performed. Further- 0 1 0
 
more, we choose the corresponding four scene points P+ = PT (PPT )−1 =  ,
Mi (i = 1, . . . , 4) and the optical center of each camera 0 0 0
as a projective basis in space. We assign to the basis 0 0 1
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

Epipolar Geometry 179

we obtain the fundamental matrix: We now turn to the estimation of τ 0 and χ0 . From
the above discussion,
Pn we see that
Pthe set of linear com-
n
F = [ẽ0 ]× diag(γ − α, γ − β, γ − 1), (50) binations i=5 ξ gi such that i=5 ξ qi = 0 is one-
dimensional. Construct two 3 × (n − 4) matrices:
where ẽ0 ≡ P0 p⊥ = [α, β, 1]T is just the projection of
the first optical center in the second camera, i.e., the G = [g5 , . . . , gn ] and Q = [q5 , . . . , qn ].
second epipole. Pn
Consider now the remaining point matches The set of vectors ξ such that i=5 ξqi = 0 is simply
{(mi , mi0 ) | i = 5, . . . , n}, where m̃i = [u i , vi , 1]T the null space of Q. Let Q = U1 S1 V1T be the singular
and m̃i0 = [u i0 , vi0 , 1]T . From (50), after some simple value decomposition (SVD) of Q, then the null space
algebraic manipulation, the epipolar equation can be is formed by the rightmost n − 4 − 3 = n − 7 columns
rewritten as of V1 , which
Pn will be denoted Pby V0 . Then, the set of
n
vectors i=5 ξ gi such that i=5 ξ qi = 0 is thus the
γ giT ẽ0 = qiT f, subspace spanned by the matrix GV0 , which is 3 ×
(n − 7). Let GV0 = U2 S2 V2T be the SVD. According
to our assumptions, this matrix has rank 1, thus τ 0 is
where f = [α, β, αβ]T , gi = m̃i0 × m̃i = [vi0 − vi , u i −
the range of GV0 , which is simply the leftmost column
u i0 , −vi0 u i + u i0 vi ]T and qi = [vi0 (1 − u i ), −u i0 (1 −
of U2 up to a scale factor. Vector χ0 can be computed
vi ), u i − vi ]T . Consider a linear combination of the
following the same construction by reversing the rôles
above equations. Let us define the coefficient Pn vector of τ and χ.
ξ = [ξ5 , . . . P , ξn ]T and the vectors τ (ξ) = i=5 ξi gi
n The results obtained with this method depends on
and χ(ξ) = i=5 ξi qi . It follows that
the choice of the four basis points. The authors show
experimentally that a good result can be obtained by
γ τ (ξ)T ẽ0 = χ(ξ)T f. (51) trying 30 random basis choices and picking up the so-
lution resulting the smallest epipolar distance error.
The idea of the linear subspace is that for any value ξ τ Note that although unlike the virtual parallax
such that τ (ξ τ ) = 0, Eq. (51) provides a linear con- method, the linear subspace technique provides a lin-
straint on f, i.e., χ(ξ τ )T f = 0, while for any value ear algorithm without introducing an extraneous pa-
ξ χ such that χ(ξ χ ) = 0, the same equation provides rameter, it is achieved in (52) by simply dropping the
a linear constraint on ẽ0 , i.e., τ (ξχ )T ẽ0 = 0. Because estimated information in cτ and cχ . In the presence
of the particular structure of gi and qi , it is easy of noise, τ 0 and χ0 computed through singular value
to show (Ponce and Genc, 1996) that the vectors decomposition do not necessarily satisfy τ 0T 1 = 0 and
τ (ξ χ ) and χ(ξ τ ) are both orthogonal to the vector χ0T 1 = 0, where 1 = [1, 1, 1]T .
[1, 1, 1]T . Since the vectors τ (ξ χ ) are also orthogonal Experiments show that this method produces good
to ẽ0 , they only span a one-dimensional line, and their results. The same reasons as for the virtual parallax
representative vector is denoted by τ 0 = [aτ , bτ , cτ ]T . method can be used here.
Likewise, the vectors χ(ξ τ ) span a line orthogonal to
both f and [1, 1, 1]T , and their representative vector is 3.9.3. RANSAC. Random sample consensus
denoted by χ0 = [aχ , bχ , cχ ]T . Assume for the mo- (RANSAC) (Fischler and Bolles, 1981) is a paradigm
ment that we know τ 0 and χ0 (their computation will be originated in the Computer Vision community for ro-
described shortly), from [aτ , bτ , −aτ − bτ ]T ẽ0 = 0 and bust parameter estimation. The idea is to find, through
[aχ , bχ , −aχ − bχ ]T f = 0, the solution to the epipole random sampling of a minimal subset of data, the pa-
is given by rameter set which is consistent with a subset of data
as large as possible. The consistent check requires the
bχ aτ + bτ a χ a τ + bτ user to supply a threshold on the errors, which reflects
α= , β= . (52)
a τ a χ + bχ bτ a χ + b χ the a priori knowledge of the precision of the expected
estimation. This technique is used by Torr (1995) to es-
Once the epipole has been computed, the remaining timate the fundamental matrix. As is clear, RANSAC
parameters of the fundamental matrix can be easily is very similar to LMedS both in ideas and in imple-
computed. mentation, except that
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

180 Zhang

• RANSAC needs a threshold to be set by the user for between them were computed off-line through stereo
consistence checking, while the threshold is auto- calibration. There are 241 point matches, which are
matically computed in LMedS; established automatically by the technique described
• In step 3 of the LMedS implementation described in in (Zhang et al., 1995). Outliers have been discarded.
Section 3.7.2, the size of the point matches which The calibrated parameters of the cameras are of course
are consistent with F J is computed, instead of the not used, but the fundamental matrix computed from
median of the squared residuals. these parameters serves as a ground truth. This is shown
in Fig. 5, where the four epipolar lines are displayed,
However, LMedS cannot deal with the case where corresponding, from the left to the right, to the point
the percentage of outliers is higher than 50%, while matches 1, 220, 0 and 183, respectively. The intersec-
RANSAC can. Torr and Murray (1993) compared both tion of these lines is the epipole, which is clearly very
LMedS and RANSAC. RANSAC is usually cheaper far from the image. This is because the two cameras
because it can exit the random sampling loop once a are placed almost in the same plane.
consistent solution is found. The epipolar geometry estimated with the linear
If one knows that the number of outliers is more than method is shown in Fig. 6 for the same set of point
50%, then they can easily adapt the LMedS by using matches. One can find that the epipole is now in the
an appropriate value, say 40%, instead of using the me- image, which is completely different from what we
dian. (When we do this, however, the solution obtained have seen with the calibrated result. If we perform a
may be not globally optimal if the number of outliers data normalization before applying the linear method,
is less than 50%.) If there is a large set of images of the result is considerably improved, as shown in Fig. 7.
the same type of scenes to be processed, one can first This is very close to the calibrated one.
apply LMedS to one pair of the images in order to find The nonlinear method gives even better result, as
an appropriate threshold, and then apply RANSAC to shown in Fig. 8. A comparison with the “true” epipo-
the remaining images because it is cheaper. lar geometry is shown in Fig. 9. There is only a small
difference in the orientation of the epipolar lines. We
have also tried the normalization method followed by
4. An Example of Fundamental Matrix the nonlinear method, and the same result was obtained.
Estimation with Comparison Other methods have also been tested, and visually al-
most no difference is observed.
The pair of images is a pair of calibrated stereo images Quantitative results are provided in Table 1, where
(see Fig. 4). By “calibrated” is meant that the intrin- the elements in the first column indicates the meth-
sic parameters of both cameras and the displacement ods used in estimating the fundamental matrix:

Figure 4. Image pair used for comparing different estimation techniques of the fundamental matrix.
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

Epipolar Geometry 181

Figure 5. Epipolar geometry estimated through classical stereo calibration, which serves as the ground truth.

Figure 6. Epipolar geometry estimated with the linear method.

they are respectively the classical stereo calibra- (reproj.), and the LMedS technique (LMedS). The
tion (Calib.), the linear method with eigen analysis fundamental matrix of Calib is used as a reference.
(linear), the linear method with prior data normaliza- The second column shows the difference between the
tion (normal.), the nonlinear method based on mini- fundamental matrix estimated by each method with that
mization of distances between points and epipolar lines of Calib. The difference is measured as the Frobenius
(nonlinear), the nonlinear method based on minimiza- norm: 1F = kF−FCalib k×100%. Since each F is nor-
tion of gradient-weighted epipolar errors (gradient), malized by its Frobenius norm, 1F is directly related to
the M-estimator with Tukey function (M-estim.), the angle between two unit vectors. It can be seen that
the nonlinear method based on minimization of dis- although we have observed that Method normal has
tances between observed points and reprojected ones considerably improved the result of the linear method,
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

182 Zhang

Figure 7. Epipolar geometry estimated with the linear method with prior data normalization.

Figure 8. Epipolar geometry estimated with the nonlinear method.

its 1F is the largest. It seems that 1F is not appropri- the points are not well localized; and we think the latter
ate to measure the difference between two fundamental is the major reason because the corner detector we use
matrix. We will describe another one in the next para- only extracts points within pixel precision. The last
graph. The third and fourth columns show the positions column shows the approximate CPU time in seconds
of the two epipoles. The fifth column gives the root of when the program is run on a Sparc 20 workstation.
the mean of squared distances between points and their Nonlinear, gradient and reproj give essentially the
epipolar lines. We can see that even with Calib, the same result (but the latter is much more time consum-
RMS is as high as 1 pixel. There are two possibilities: ing). The M-estimator and LMedS techniques give the
either the stereo system is not very well calibrated, or best results. This is because the influence of poorly
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

Epipolar Geometry 183

Figure 9. Comparison between the Epipolar geometry estimated through classical stereo calibration (shown in Red/Dark lines) and that
estimated with the nonlinear method (shown in Green/Grey lines).

Figure 12. Epipolar bands for several point matches.


P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

184 Zhang

Table 1. Comparison of different methods for estimating the fundamental matrix.

Method 1F e e0 RMS CPU

Calib. 5138.18 −8875.85 1642.02 −2528.91 0.99


Linear 5.85% 304.018 124.039 256.219 230.306 3.40 0.13 s
Normal. 7.20% −3920.6 7678.71 8489.07 −15393.5 0.89 0.15 s
Nonlinear 0.92% 8135.03 −14048.3 1896.19 −2917.11 0.87 0.38 s
Gradient 0.92% 8166.05 −14104.1 1897.80 −2920.12 0.87 0.40 s
M-estim. 0.12% 4528.94 −7516.3 1581.19 −2313.72 0.87 1.05 s
Reproj. 0.92% 8165.05 −14102.3 1897.74 −2920.01 0.87 19.1 s
LMedS 0.13% 3919.12 −6413.1 1500.21 −2159.65 0.75 2.40 s

localized points has been reduced in M-estimator or Step 3: If the epipolar line does not intersect the second
they are simply discarded in LMedS. Actually, LMedS image, go to Step 1.
has detected five matches as outliers, which are 226, Step 4: Choose randomly a point m0 on the epipolar
94, 17, 78 and 100. Of course, these two methods are line. Note that m and m0 correspond to each other
more time consuming than the nonlinear method. exactly with respect to F1 .
Step 5: Draw the epipolar line of m in the second image
using F2 , i.e., F2 m, and compute the distance, noted
4.1. A Measure of Comparison Between by d10 , between point m0 and line F2 m.
Fundamental Matrices Step 6: Draw the epipolar line of m0 in the first image
using F2 , i.e., F2T m0 , and compute the distance, noted
From the above discussion, the Frobenius norm of the by d1 , between point m and line F2T m0 .
difference between two normalized fundamental ma- Step 7: Conduct the same procedure from Step 2
trices is clearly not an appropriate measure of compar- through Step 6, but reversing the roles of F1 and F2 ,
ison. In the following, we describe a measure proposed and compute d2 and d20 .
by Stéphane Laveau from INRIA Sophia-Antipolis, Step 8: Repeat N times Step 1 through Step 7.
which we think characterizes well the difference be- Step 9: Compute the average distance of d’s, which
tween two fundamental matrices. Let the two given is the measure of difference between the two funda-
fundamental matrices be F1 and F2 . The measure is mental matrices.
computed as follows (see Fig. 10):
In this procedure, a random number generator based
Step 1: Choose randomly a point m in the first on uniform distribution is used. The two fundamen-
image. tal matrices plays a symmetric role. The two images
Step 2: Draw the epipolar line of m in the second image plays a symmetric role too, although it is not at first
using F1 . The line is shown as a dashed line, and is sight. The reason is that m and m0 are chosen ran-
defined by F1 m. domly and the epipolar lines are symmetric (line F1T m0
goes through m). Clearly, the measure computed as
above, in pixels, is physically meaningful, because it
is defined in the image space in which we observe the
surrounding environment. Furthermore, when N tends
to infinity, we sample uniformly the whole 3D space
visible from the given epipolar geometry. If the image
resolution is 512 × 512 and if we consider a pixel reso-
lution, then the visible 3D space can be approximately
sampled by 5123 points. In our experiment, we set
N = 50000. Using this method, we can compute the
Figure 10. Definition of the difference between two fundamental distance between each pair of fundamental matrices,
matrices in terms of image distances. and we obtain a symmetric matrix.
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

Epipolar Geometry 185

Table 2. Distances between the fundamental matrices estimated by different techniques

Linear Normal. Nonlinear Gradient M-estim. Reproj. LMedS

Calib. 116.4 5.97 2.66 2.66 2.27 2.66 1.33


Linear 117.29 115.97 116.40 115.51 116.25 115.91
Normal. 4.13 4.12 5.27 4.11 5.89
Nonlinear 0.01 1.19 0.01 1.86
Gradient 1.19 0.00 1.86
M-estim. 1.20 1.03
Reproj. 1.88

The result is shown in Table 2, where only the upper the covariance matrix of l0 is computed by
triangle is displayed (because of symmetry). We arrive µ 0 ¶T " #
at the following conclusions: ∂l00 ∂l0 Λm0 02
Λl00 = ΛF +F FT , (53)
∂F ∂F 02T 0
• The linear method is very bad.
where F in the first term of the right hand is treated as
• The linear method with prior data normalization
a 9-vector, and 02 = [0, 0]T .
gives quite a reasonable result.
Any point m0 = [u 0 , v 0 ]T on the epipolar line l00 ≡
• The nonlinear method based on point-line distances
[l1 , l2 , l3 ] must satisfy m̃0T l00 = l0T
0 0 0 T 0 0 0
0 m̃ = l 1 u + l 2 v +
0 0
and that based on gradient-weighted epipolar errors 0
l3 = 0 (we see the duality between points and lines).
give very similar results to those obtained based on
The vector l00 is defined up to a scale factor. It is a pro-
minimization of distances between observed points
jective point in the dual space of the image plane, and
and reprojected ones. The latter should be avoided
is the dual of the epipolar line. We consider the vector
because it is too time consuming.
of parameters x0 = (x0 , y0 ) = (l10 /l30 , l20 /l30 )T (if l30 = 0
• M-estimators or the LMedS method give still better
we can choose (l10 /l20 , l30 /l20 ) or (l20 /l10 , l30 /l10 )). The co-
results because they try to limit or eliminate the effect
variance matrix of x0 is computed in the same way as
of poorly localized points. The epipolar geometry
(34): C = (∂l00 /∂x0 )Λl00 (∂l00 /∂x0 )T . The uncertainty
estimated by LMedS is closer to the one computed
of x0 can be represented in the usual way by an ellipse
through stereo calibration.
C in the dual space (denoted by x) of the image plane:
The LMedS method should be definitely used if the (x − x0 )T C−1 (x − x0 ) = k 2 , (54)
given set of matches contain false matches.
where k is a confidence factor determined by the χ2 dis-
tribution of 2 degrees of freedom. The probability that
4.2. Epipolar Band x appears at the interior of the ellipse defined by (54)
is equal to Pχ2 (k, 2). Equation (54) can be rewritten in
Due to space limitation, the result on the uncertainty projective form as
of the fundamental matrix is not shown here, and can " #
be found in (Csurka et al., 1996), together with its use C−1 −C−1 x0
x̃ Ax̃ = 0 with A =
T
.
in computing the uncertainty of the projective recon- −x0T C−T x0T C−1 x0 − k 2
struction and in improving the self-calibration based
on Kruppa equations. We show in this section how The dual of this ellipse, denoted by C ∗ , defines a
to use the uncertainty to define the epipolar band for conic in the image plane. It is given by
matching.
m̃T A∗ m̃ = 0 (55)
We only consider the epipolar lines in the second
image (the same can be done for the first). For a given where A∗ is the adjoint of matrix A (i.e., A∗ A =
point m0 = [u 0 , v0 ]T in the first image together with det(A) I). Because of the duality between the param-
its covariance matrix Λm0 = [ σσuu
uv
σuv
σvv ], its epipolar line eter space x and the image plane m (see Fig. 11),
in the second image is given by l00 = Fm̃0 . From (34), for a point x on C, it defines an epipolar line in the
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

186 Zhang

of them may have been incorrectly paired. How to es-


tablish point matches is the topic of the paper (Zhang
et al., 1995).

5.1. Summary

For two uncalibrated images under full perspective pro-


jection, at least seven point matches are necessary to
Figure 11. Duality between the image plane and the parameter determine the epipolar geometry. When only seven
space of the epipolar lines. matches are available, there are possibly three solu-
tions, which can be obtained by solving a cubic equa-
image plane, line(x), which is tangent to conic C ∗ at a tion. If more data are available, then the solution is in
point m, while the latter defines a line in the parameter general unique and several linear techniques have been
space, line(m), which is tangent to C at x. It can be developed. The linear techniques are usually sensitive
shown (Csurka, 1996) that, for a point in the interior of to noise and not very stable, because they ignore the
ellipse C, the corresponding epipolar line lies outside constraints on the nine coefficients of the fundamen-
of conic C ∗ (i.e., it does not cut the conic). Therefore, tal matrix and the criterion they are minimizing is not
for a given k, the outside of this conic defines the region physically meaningful. The results, however, can be
in which the epipolar line should lie with probability considerably improved by first normalizing the data
Pχ2 (k, 2). We call this region the epipolar band. For a points, instead of using pixel coordinates directly, such
given point in one image, its match should be searched that their new coordinates are on the average equal to
in this region. Although, theoretically, the uncertainty unity. Even better results can be obtained under non-
conic defining the epipolar band could be an ellipse or linear optimization framework by
parabola, it is always an hyperbola in practice (except
when ΛF is extremely huge). • using an appropriate parameterization of fundamen-
We have estimated the uncertainty of the funda- tal matrix to take into account explicitly the rank-2
mental matrix for the image pair shown in Fig. 4. In constraint, and
Fig. 12, we show the epipolar bands of matches 1, 220, • minimizing a physically meaningful criterion.
0 and 183 in the second images, computed as described
Three choices are available for the latter: the distances
above. The displayed hyperbolas correspond to a prob-
between points and their corresponding epipolar lines,
ability of 70% (k = 2.41) with image point uncertainty
the gradient-weighted epipolar errors, and the distances
of σuu = σvv = 0.52 and σuv = 0. We have also
between points and the reprojections of their corre-
shown in Fig. 12 the epipolar lines drawn in dashed
sponding points reconstructed in space. Experiments
lines and the matched points indicated in +. An inter-
show that the results given by the optimization based
esting thing is that the matched points are located in the
on the first criterion are slightly worse than the last two
area where the two sections of hyperbolas are closest
which give essentially the same results. However, the
to each other. This suggests that the covariance matrix
third is much more time consuming, and is therefore
of the fundamental matrix actually captures, to some
not recommended, although it is statistically optimal
extent, the matching information (disparity in stereo
under certain conditions. One can, however, use it
terminology). Such areas should be first examined in
as the last step to refine the results obtained with the
searching for point matches. This may, however, not
first or second technique. To summarize, we recom-
be true if a significant depth discontinuity presents in
mend the second criterion (gradient-weighted epipolar
the scene and if the point matches used in computing
errors), which is actually a very good approximation to
the fundamental matrix do not represent sufficiently
the third one.
enough the depth variation.
Point matches are obtained by using some heuristic
techniques such as correlation and relaxation, and they
5. Discussion usually contain false matches. Also, due to the lim-
ited performance of a corner detector or low contrast
In this paper, we have reviewed a number of techniques of an image, a few points are possibly poorly local-
for estimating the epipolar geometry between two im- ized. These outliers (sometimes even one) will severely
ages. Point matches are assumed to be given, but some affect the precision of the fundamental matrix if we
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

Epipolar Geometry 187

directly apply the methods described above, which matrix), except for the case of seven point matches
are all least-squares techniques. We have thus pre- where three solutions may exist. Sometimes, however,
sented in detail two commonly used robust techniques: even with a large set of point matches, there exist many
M-Estimators and Least Median of Squares (LMedS). solutions for the fundamental matrix which explain the
M-estimators try to reduce the effect of outliers by data equally well, and we call such situations degener-
replacing the squared residuals by another function ate for the determination of the fundamental matrix.
of the residuals which is less increasing than square. Maybank (1992) has thoroughly studied the degen-
They can be implemented as an iterated reweighted erate configurations:
least-squares. Experiments show that they are robust
to outliers due to bad localization, but not robust to • 3D points lie on a quadric surface passing through
false matches. This is because they depend tightly on the two optical centers (called the critical surface,
the initial estimation of the fundamental matrix. The or maybank quadric by Longuet-Higgins). We may
LMedS method solves a nonlinear minimization prob- have three different fundamental matrices compati-
lem which yields the smallest value for the median of ble with the data. The two sets of image points are
squared residuals computed for the entire data set. It related by a quadratic transformation:
turns out that this method is very robust to false matches
as well as to outliers due to bad localization. Unfor- m0 = F1 m × F2 m,
tunately, there is no straightforward formula for the
LMedS estimator. It must be solved by a search in the where F1 and F2 are two of the fundamental matrices.
space of possible estimates generated from the data. • The two sets of image points are related by a homog-
Since this space is too large, only a randomly chosen raphy:
subset of data can be analyzed. We have proposed
a regularly random selection method to improve the m̃0 = ρHm̃,
efficiency.
Since the data points are always corrupted by noise, where ρ is an arbitrary non-zero scalar, and H is a
one should model the uncertainty of the estimated fun- 3 × 3 matrix defined up to a scale factor. This is a
damental matrix in order to exploit its underlying geo- degenerate case of the previous situation. It arises
metric information correctly and effectively. We have when 3D points lie on a plane or when the camera
modeled the fundamental matrix as a random vector undergoes a pure rotation around the optical center
in its parameterization space and described methods to (equivalent to the case when all points lie on a plane
estimate the covariance matrix of this vector under the at infinity).
first order approximation. This uncertainty measure • 3D points are in even more special position, for ex-
can be used to define the epipolar band for matching, ample on a line.
as shown in Section 4.2. In (Csurka et al., 1996), we
also show how it can be used to compute the uncer- The stability of the fundamental matrix related to the
tainty of the projective reconstruction and to improve degenerate configurations is analyzed in (Luong and
the self-calibration based on Kruppa equations. Faugeras, 1996). A technique which automatically de-
Techniques for projective reconstruction will be re- tects the degeneracy based on χ 2 test when the noise
viewed in Appendix A. Although we cannot obtain level of the data points is known is reported in (Torr
any metric information from a projective structure et al., 1995, 1996).
(measurements of lengths and angles do not make
sense), it still contains rich information, such as copla-
narity, collinearity, and ratios, which is sometimes suf- 5.3. Affine Cameras
ficient for artificial systems, such as robots, to perform
tasks such as navigation and object recognition. So far, we have only considered images under perspec-
tive projection, which is a nonlinear mapping from
3D space to 2D. This makes many vision problems
5.2. Degenerate Configurations difficult to solve, and more importantly, they can be-
come ill-conditioned when the perspective effects are
Up to now, we have only considered the situations small. Sometimes, if certain conditions are satisfied,
where no ambiguity arises in interpreting a set of point for example, when the camera field of view is small
matches (i.e., they determine a unique fundamental and the object size is small enough with respect to the
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

188 Zhang

distance from the camera to the object, the projection 5.5. Multiple Cameras
can be approximated by a linear mapping (Aloimonos,
1990). The affine camera introduced in (Mundy and The study of the epipolar geometry is naturally
Zisserman, 1992) is a generation of the orthographic extended to more images. When three images are con-
and weak perspective models. Its projection matrix has sidered, trilinear constraints exist between point/line
the following special form: correspondences (Spetsakis and Aloimonos, 1989).
  “Trilinear” means that the constraints are linear in the
P11 P12 P13 P14 point/line coordinates of each image, and the epipo-
 
PA =  P21 P22 P23 P24  lar constraint (5) is a bilinear relation. The trilin-
0 0 0 P34 ear constraints have been rediscovered in (Shashua,
1994b) in the context of uncalibrated images. Similar
defined up to a scale factor. The epipolar constraint (5) to the fundamental matrix for two images, the con-
is still valid, but the fundamental matrix (6) will be of straints between three images can be described by a
the following simple form (Xu and Zhang, 1996): 3 × 3 × 3 matrix defined up to a scale factor (Spetsakis
  and Aloimonos, 1989; Hartley, 1994). There exist
0 0 a13 at most four linear independent constraints in the el-
 
FA =  0 0 a23  . ements of the above matrix, and seven point matches
are required to have a linear solution (Shashua, 1994b).
a31 a32 a33
However, the 27 elements are not algebraically inde-
This is known as the affine fundamental matrix pendent. There are only 18 parameters to describe the
(Zisserman, 1992; Shapiro et al., 1994). Thus, the geometry between three uncalibrated images (Faugeras
epipolar equation is linear in the image coordinates un- and Robert, 1994), and we have three algebraically in-
der affine cameras, and the determination of the epipo- dependent constraints. Therefore, we need at least six
lar geometry is much easier. This has been thoroughly point matches to determine the geometry of three im-
studied by Oxford group (Shapiro, 1993; Shapiro ages (Quan, 1995).
et al.,1994) (see also Xu and Zhang, 1996), and thus When more images are considered, quadrilinear re-
is not addressed here. A software called AffineF is lations arising when four-tuples of images are consid-
available from my Web home page. ered, which are, however, algebraically dependent of
the trilinear and bilinear ones (Faugeras and Mourrain,
1995). That is, they do not bring in any new infor-
5.4. Cameras with Lens Distortion mation. Recently, quite a lot of efforts have been
directed towards the study of the geometry of N im-
With the current formulation of the epipolar geometry ages (see Luong and Viéville, 1994; Carlsson, 1994;
(under either full perspective or affine projection), the Triggs, 1995; Weinshall et al., 1995; Vieville et al.,
homogeneous coordinates of a 3D point and those of 1996; Laveau, 1996 to name a few). A complete re-
the image point are related by a 3 × 4 matrix. That view of the work on multiple cameras is beyond the
is, the lens distortion is not addressed. This statement scope of this paper.
does not imply, though, that lens distortion has never
been accounted for in the previous work. Indeed, distor-
tion has usually been corrected off-line using classical Appendix A: Projective Reconstruction
methods by observing for example straight lines, if it
is not weak enough to be neglected. A preliminary in- We show in this section how to estimate the position
vestigation has been conducted (Zhang, 1996b), which of a point in space, given its projections in two images
considers lens distortion as an integral part of a camera. whose epipolar geometry is known. The problem is
In this case, for a point in one image, its corresponding known as 3D reconstruction in general, and triangu-
point does not lie on a line anymore. As a matter of lation in particular. In the calibrated case, the relative
fact, it lies on the so-called epipolar curve. Prelimi- position (i.e., the rotation and translation) of the two
nary results show that the distortion can be corrected cameras is known, and 3D reconstruction has already
on-line if cameras have a strong lens distortion. More been extensively studied in stereo (Ayache, 1991). In
work still needs to be done to understand better the the uncalibrated case, like the one considered here,
epipolar geometry with lens distortion. we assume that the fundamental matrix between the
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

Epipolar Geometry 189

two images is known (e.g., computed with the meth- where F is the known fundamental matrix. The 3D
ods described in Section 3), and we say that they are structure thus reconstructed is M. The proposition says
weakly calibrated. that the 3D structure H−1 M̃, where H is any projective
transformation of the 3D space, is still consistent with
A.1. Projective Structure from Two the observed image points and the fundamental matrix.
Uncalibrated Images Following the pinhole model, the camera projection
matrices corresponding to the new structure H−1 M̃ are
In the calibrated case, a 3D structure can be recov-
ered from two images only up to a rigid transformation P̂ = PH and P̂0 = P0 H,
and an unknown scale factor (this transformation is
also known as a similarity), because we can choose
respectively. In order to show the above proposition,
an arbitrary coordinate system as a world coordinate
we only need to prove
system (although one usually chooses it to coincide
with one of the camera coordinate systems). Similarly,
in the uncalibrated case, a 3D structure can only be [P̂0 p̂⊥ ]× P̂0 P̂+ = λF ≡ λ[P0 p⊥ ]× P0 P+ , (2)
performed up to a projective transformation of the 3D
space (Maybank, 1992; Faugeras, 1992; Hartley et al., where p̂⊥ = (I − P̂+ P̂)ω̂ with ω̂ any 4-vector, and λ
1992; Faugeras, 1995). is a scalar since F is defined up to a scale factor. The
At this point, we have to introduce a few notations above result has been known for several years. In (Xu
from Projective Geometry (a good introduction can be and Zhang, 1996), we provide a simple proof through
found in the appendix of (Mundy and Zisserman, 1992) pure linear algebra.
or (Faugeras, 1995)). For a 3D point M = [X, Y, Z ]T , its
homogeneous coordinates are x̃ = [U, V, W, S]T = λM̃
where λ is any nonzero scalar and M̃ = [X, Y, Z , 1]T . A.2. Computing Camera Projection Matrices
This implies: U/S = X , V /S = Y , W/S = Z .
If we include the possibility that S = 0, then x̃ = The projective reconstruction is very similar to the 3D
[U, V, W, S]T are called the projective coordinates of reconstruction when cameras are calibrated. First, we
the 3D point M, which are not all equal to zero and de- need to compute the camera projection matrices from
fined up to a scale factor. Therefore, x̃ and λx̃ (λ 6= 0) the fundamental matrix F with respect to a projective
represent the same projective point. When S 6= 0, basis, which can be arbitrary because of Proposition 2.
x̃ = S M̃. When S = 0, we say that the point is at in-
finity. A 4 × 4 nonsingular matrix H defines a linear A.2.1. Factorization Method. Let F be the fundamen-
transformation from one projective point to another, tal matrix for the two cameras. There are an infinite
and is called the projective transformation. The ma- number of projective bases which all satisfy the epipo-
trix H, of course, is also defined up to a nonzero scale lar geometry. One possibility is to factor F as a product
factor, and we write of an antisymmetric matrix [e0 ]× (e0 is in fact the epipole
in the second image) and a matrix M, i.e., F = [e0 ]× M.
ρ ỹ = Hx̃, (1)
A canonical representation can then be used:
if x̃ is mapped to ỹ by H. Here ρ is a nonzero scale
factor. P = [I 0] and P0 = [M e0 ].
Proposition 2. Given two ( perspective) images with
unknown intrinsic parameters of a scene, the 3D struc- It is easy to verify that the above P and P0 do yield the
ture of the scene can be reconstructed up to an unknown fundamental matrix.
projective transformation as soon as the epipolar The factorization of F into [e0 ]× M is in general not
geometry (i.e., the fundamental matrix) between the unique, because if M is a solution then M + e0 vT is also
two images is known. a solution for any vector v (indeed, we have always
[e0 ]× e0 vT = 0). One way to do the factorization is as
Assume that the true camera projection matrices are
follow (Luong and Viéville, 1994). Since FT e0 = 0, the
P and P0 . From (6), we have the following relation
epipole in the second image is given by the eigenvector
F = [P0 p⊥ ]× P0 P+ , of matrix FFT associated to the smallest eigenvalue.
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

190 Zhang

Once we have e0 , using the relation 10 constraints on each camera projection matrix, leav-
ing only one unknown parameter. This unknown can
kvk2 I3 = vvT − [v]2× ∀v, then be solved using the known fundamental matrix.

we have A.3. Reconstruction Techniques


1 ¡ 0 0T ¢
F= 0
e e − [e0 ]2× F Now that the camera projection matrices of the two
ke k2
images with respect to a projective basis are available,
µ ¶
1 0 0T 0 [e0 ]× we can reconstruct 3D structures with respect to that
= 0 2 |e e{z F} + [e ]× − 0 2 F . projective basis from point matches.
ke k ke k
0 | {z }
M
A.3.1. Linear Methods. Given a pair of points in cor-
respondence: m = [u, v]T and m0 = [u 0 , v 0 ]T . Let
The first term on the right hand is equal to 0 because
x̃ = [x, y, z, t]T be the corresponding 3D point in
FT e0 = 0. We can thus define the M matrix as
space with respect to the projective basis chosen be-
1 fore. Following the pinhole model, we have:
M=− [e0 ]× F.
ke0 k2
s [u, v, 1]T = P [x, y, z, t]T , (3)
This decomposition is used in (Beardsley et al., 1994). £ ¤
s 0 u 0 , v 0 , 1 = P0 [x, y, z, t]T , (4)
Numerically, better results of 3D reconstruction are
obtained when the epipole e is normalized such that where s and s 0 are two arbitrary scalars. Let pi and pi0
kek = 1. be the vectors corresponding to the ith row of P and P0 ,
respectively. The two scalars can then be computed as:
A.2.2. Choosing a Projective Basis. Another possi- s = p3T x̃, s 0 = p03 T x̃. Eliminating s and s 0 from (3)
bility is to choose effectively five pairs of points, each and (4) yields the following equation:
of four points not being coplanar, between the two cam-
eras as a projective basis. We can of course choose five Ax̃ = 0, (5)
corresponding points we have identified. However, the
precision of the final projective reconstruction will de- where A is a 4 × 4 matrix given by
pend heavily upon the precision of the pairs of points.
In order to overcome this problem, we have chosen [p1 − up3 , p2 − vp3 , p01 − u 0 p03 , p02 − v 0 p0 3 ]T .
in (Zhang et al., 1995) the following solution. We first
choose five arbitrary points in the first image, noted As the projective coordinates x̃ are defined up to a scale
by mi (i = 1, . . . , 5). Although they could be cho- factor, we can impose kx̃k = 1, then the solution to (5)
sen arbitrarily, they are chosen such that they are well is well known (see also the description in Section 3.2.2)
distributed in the image to have a good numerical stabil- to be the eigenvector of the matrix AT A associated to
ity. For each point mi , its corresponding epipolar line the smallest eigenvalue.
in the second image is given by li0 = Fmi . We can now If we assume that no point is at infinity, then we can
choose an arbitrary point on li0 as mi0 , the corresponding impose t = 1, and the projective reconstruction can
point of mi . Finally, we should verify that none of four be done exactly in the same way as for the Euclidean
points is coplanar, which can be easily done using the reconstruction. The set of homogeneous equations,
fundamental matrix (Faugeras, 1992, credited to Roger Ax̃ = 0, is reduced to a set of four non-homogeneous
Mohr). The advantage of this method is that the five equations in three unknowns (x, y, z). A linear least-
pairs of points satisfy exactly the epipolar constraint. squares technique can be used to solve this problem.
Once we have five pairs of points (mi , mi0 ), (i =
1, . . . , 5), we can compute the camera projection ma- A.3.2. Iterative Linear Methods. The previous ap-
trices as described in (Faugeras, 1992). Assigning proach has the advantage of providing a closed-form
the projective coordinates (somewhat arbitrarily) to solution, but it has the disadvantage that the criterion
the five reference points, we have five image points that is minimized does not have a good physical inter-
and space points in correspondence, which provides pretation. Let us consider the first of the Eq. (5). In
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

Epipolar Geometry 191

general, the point x̃ found will not satisfy this equation corresponding epipolar line defined by the ideal space
exactly; rather, there will be an error ²1 = p1T x̃−up3T x̃. point being sought. By parameterizing the pencil of
What we really want to minimize is the difference be- epipolar lines in one image by a parameter t (which
tween the measured image coordinate u and the pro- defines also the corresponding epipolar line in the other
jection of x̃, which is given by p1T x̃/p3T x̃. That is, we image by using the fundamental matrix), they are able
want to minimize to transform the minimization problem to the resolution
of a polynomial of degree 6 in t. There may exist up
± ±
²10 = p1T x̃ p3T x̃ − u = ²1 p3T x̃. to 6 real roots, and the global minimum can be found
by evaluating the minimization function for each real
This means that if the equation had been weighted by root.
the factor 1/w1 where w1 = p3T x̃, then the resulting More projective reconstruction techniques can be
error would have been precisely what we wanted to found in (Hartley and Sturm, 1994; Rothwell et al.,
minimize. Similarly, the weight for the second equa- 1995), but it seems to us that the iterative linear or the
tion of (5) would be 1/w2 = 1/w1 , while the weight nonlinear techniques based on the image errors are the
for the third and fourth equation would be 1/w3 = best that one can recommend.
1/w4 = 1/p0T 3 x̃. Finally, the solution could be found
by applying exactly the same method described in the
Appendix B: Approximate Estimation
last subsection (either eigenvector computation or lin-
of Fundamental Matrix from a General Matrix
ear least-squares).
Like the method for estimating the fundamental ma-
We first introduce the Frobenius norm of a matrix A =
trix described in Section 3.4, the problem is that the
[ai j ] (i = 1, . . . , m; j = 1, . . . , n), which is defined
weights wi depends themselves on the solution x̃. To
by
overcome this difficulty, we apply an iterative linear
method. We first assume that all wi = 1 and run a lin- v
uX
ear algorithm to obtain an initial estimation of x̃. The u m X
n

weights wi are then computed from this initial solu- kAk = t a2 . ij (1)
i=1 j=1
tion. The weighted linear least-squares is then run for
an improved solution. This procedure can be repeated
several times until convergence (either the solution or It is easy to show that for all orthogonal matrices U and
the weight does not change between successive itera- V of appropriate dimensions, we have
tions). Two iterations are usually sufficient.
kUAVT k = kAk.
A.3.3. Nonlinear Methods. As said in the last para-
graph, the quantity we want to minimize is the error Proposition 3. We are given a 3 × 3 matrix F, whose
measured in the image plane between the observation singular value decomposition (SVD) is
and the projection of the reconstruction, that is
µ ¶ µ ¶ µ ¶ F = USVT ,
pT x̃ 2 pT x̃ 2 p0 T x̃ 2
u − 1T + v − 2T + u 0 − 10 T
p3 x̃ p3 x̃ p3 x̃ where S = diag(σ1 , σ2 , σ3 ) and σi (i = 1, 2, 3) are
µ ¶ singular values satisfying σ1 ≥ σ2 ≥ σ3 ≥ 0. Let
p0 T x̃ 2
+ v 0 − 20 T . Ŝ = diag(σ1 , σ2 , 0), then
p3 x̃

However, there does not exist any closed-form solution, F̂ = UŜVT


and we must use any standard iterative minimization
technique, such as the Levenberg-Marquardt. The ini- is the closest matrix to F that has rank 2. Here,
tial estimate of x̃ can be obtained by using any linear “closest” is quantified by the Frobenius norm of F − F̂,
technique described before. i.e., kF − F̂k.
Hartley and Sturm (1994) reformulates the above
criterion in terms of the distance between a point and its Proof: We show this in two parts.
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

192 Zhang

First, the Frobenius norm of F − F̂ is given by of the matrix UnT Un (because λ9 is expected to be 0).
This parameter is well known to be an important factor
kF − F̂k = kUT (F − F̂)Vk in the analysis of stability of linear problems (Golub
and van Loan, 1989). If κ is large, then very small
= kdiag(0, 0, σ3 )k = σ3 . changes to the data can cause large changes to the solu-
tion. The sensitivity of invariant subspaces is discussed
Second, for some 3 × 3 matrix G of rank 2, we can in detail in (Golub and van Loan, 1989, p. 413).
always find an orthogonal vector z such that Gz = 0, The major reason for the poor condition of the ma-
i.e., z is the null vector of matrix G. Since trix UnT Un ≡ X is the lack of homogeneity in the image
coordinates. In an image of dimension 200 × 200, a
X
3
¡ ¢
Fz = σi viT z ui , typical image point will be of the form (100, 100, 1).
i=1 If both m̃i and m̃i0 are of this form, then ui will be
of the form [104 , 104 , 102 , 104 , 104 , 102 , 102 , 102 , 1]T .
where ui and vi are the ith column vectors of U and V, The contribution to the matrix X is of the form
we have ui uiT , which will contain entries ranging between 108
and 1. The diagonal entries of X will be of the form
kF − Gk2 ≥ k(F − G)zk2 = kFzk2 [108 , 108 , 104 , 108 , 108 , 104 , 104 , 104 , 1]T . Summing
X
3
¡ ¢2 over all point matches will result in a matrix X whose
= σi2 viT z ≥ σ32 . diagonal entries are approximately in this proportion.
i=1 We denote by Xr the trailing r × r principal submatrix
(that is the last r columns and rows) of X, and by λi (Xr )
This implies that F̂ is indeed the closest to F, which its ith largest eigenvalue. Thus X9 = X = UnT Un and
completes the proof. 2 κ = λ1 (X9 )/λ8 (X9 ). First, we consider the eigenval-
ues of X2 . Since the sum of the two eigenvalues is
In the above derivation, we have used the follow- equal to the trace, we see that λ1 (X2 ) + λ2 (X2 ) =
ing inequality which relates the Frobenius norm to the trace (X2 ) = 104 + 1. Since eigenvalues are non-
vector norm: negative, we know that λ1 (X2 ) ≤ 104 +1. From the in-
terlacing property (Golub and van Loan, 1989, p. 411),
kAk ≥ max kAzk ≥ kAzk with kzk = 1. we arrive that
kzk=1

The reader is referred to (Golub and van Loan, 1989) λ8 (X9 ) ≤ λ7 (X8 ) ≤ · · · ≤ λ1 (X2 ) ≤ 104 + 1.
for more details.
On the other hand, also from the interlacing property,
Appendix C: Image Coordinates and Numerical we know that the largest eigenvalue of X is not less than
Conditioning of Linear Least-Squares the largest diagonal entry, i.e., λ1 (X9 ) ≥ 108 . There-
fore, the ratio κ = λ1 (X9 )/λ8 (X9 ) ≥ 108 /(104 + 1). In
This section describes the relation between the nu- fact, λ8 (X9 ) will usually be much smaller than 104 + 1
merical conditioning of linear least-squares problems and the condition number will be far greater. This anal-
and the image coordinates, based on the analysis given ysis shows that scaling the coordinates so that they are
in (Hartley, 1995). on the average equal to unity will improve the condition
Consider the method described in Section 3.2.2, of the matrix UnT Un .
which consists in finding the eigenvector of the 9 × 9 Now consider the effect of translation. A usual prac-
matrix UnT Un associated with the least eigenvalue (for tice is to fix the origin of the image coordinates at the
simplicity, this vector is called the least eigenvec- top left hand corner of the image, so that all the im-
tor in the sequel). This matrix can be expressed as age coordinates are positive. In this case, an improve-
UnT Un = UDUT , where U is orthogonal and D is di- ment in the condition of the matrix may be achieved
agonal whose diagonal entries λi (i = 1, . . . , 9) are by translating the points so that the centroid of the
assumed to be in non-increasing order. In this case, points is at the origin. Informally, if the first image
the least eigenvector of UnT Un is the last column of U. coordinates (the u-coordinates) of a set of points are
The ratio λ1 /λ8 , denoted by κ, is the condition number {101.5, 102.3, 98.7, . . .}, then the significant values of
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

Epipolar Geometry 193

the coordinates are obscured by the coordinate offset of rig. In Proc. of the 3rd European Conf. on Computer Vision, J.-O.
100. By translating by 100, these numbers are changed Eklundh (Ed.), Vols. 800–801 of Lecture Notes in Computer Sci-
to {1.5, 2.3, −1.3, . . .}. The significant values become ence, Springer Verlag: Stockholm, Sweden, Vol. 1, pp. 567–576.
Enciso, R. 1995. Auto-calibration des capteurs visuels actifs. Recon-
now prominent. struction 3D active. Ph.D. Thesis, University Paris XI Orsay.
Thus, the conditioning of the linear least-squares Faugeras, O. 1992. What can be seen in three dimensions with an
process will be considerably improved by translating uncalibrated stereo rig. In Proc. of the 2nd European Conf. on
and scaling the image coordinates, as described in Computer Vision, G. Sandini (Ed.), Vol. 588 of Lecture Notes
Section 3.2.5. in Computer Science, Springer-Verlag: Santa Margherita Ligure,
Italy, pp. 563–578.
Faugeras, O. 1993. Three-Dimensional Computer Vision: A Geo-
metric Viewpoint. The MIT Press.
Acknowledgments Faugeras, O. 1995. Stratification of 3-D vision: Projective, affine,
and metric representations. Journal of the Optical Society of Amer-
The author gratefully acknowledges the contribu- ica A, 12(3):465–484.
tion of Gabiella Csurka, Stéphane Laveau, Gang Xu Faugeras, O. and Lustman, F. 1988. Motion and structure from
(Ritsumeikan University, Japan), and Cyril Zeller. The motion in a piecewise planar environment. International Jour-
nal of Pattern Recognition and Artificial Intelligence, 2(3):485–
comments from Tuan Luong and Andrew Zisserman 508.
have helped the author to improve the paper. Faugeras, O., Luong, T., and Maybank, S. 1992. Camera self-
calibration: Theory and experiments. In Proc. 2nd ECCV,
G. Sandini (Ed.), Vol. 588 of Lecture Notes in Computer Science,
References Springer-Verlag: Santa Margherita Ligure, Italy, pp. 321–334.
Faugeras, O. and Robert, L. 1994. What can two images tell us about
Aggarwal, J. and Nandhakumar, N. 1988. On the computation of mo- a third one?. In Proc. of the 3rd European Conf. on Computer
tion from sequences of images—A review. In Proc. IEEE, Vol. 76, Vision, J.-O. Eklundh (Ed.), Vols. 800–801 of Lecture Notes in
No. 8, pp. 917–935. Computer Science, Springer-Verlag: Stockholm, Sweden. Also
Aloimonos, J. 1990. Perspective approximations. Image and Vision INRIA Technical report 2018.
Computing, 8(3):179–192. Faugeras, O. and Mourrain, B. 1995. On the geometry and algebra
Anderson, T. 1958. An Introduction to Multivariate Statistical of the point and line correspondences between n images. In Proc.
Analysis. John Wiley & Sons, Inc. of the 5th Int. Conf. on Computer Vision, IEEE Computer Society
Ayache, N. 1991. Artificial Vision for Mobile Robots. MIT Press. Press: Boston, MA, pp. 951–956.
Ayer, S., Schroeter, P., and Bigün, J. 1994. Segmentation of mov- Fischler, M. and Bolles, R. 1981. Random sample consensus: A
ing objects by robust motion parameterestimation over multiple paradigm for model fitting with applications to image analysis
frames. In Proc. of the 3rd European Conf. on Computer Vision, and automated cartography. Communications of the ACM, 24:381–
J.-O. Eklundh (Ed.), Vols. 800–801 of Lecture Notes in Computer 385.
Science, Springer-Verlag: Stockholm, Sweden, Vol. II, pp. 316– Golub, G. and van Loan, C. 1989. Matrix Computations. The John
327. Hopkins University Press.
Beardsley, P., Zisserman, A., and Murray, D. 1994. Navigation using Haralick, R. 1986. Computer vision theory: The lack thereof. Com-
affine structure from motion. In Proc. of the 3rd European Conf. on puter Vision, Graphics, and Image Processing, 36:372–386.
Computer Vision, J.-O. Eklundh (Ed.), Vol. 2 of Lecture Notes in Hartley, R. 1993. Euclidean reconstruction from uncalibrated views.
Computer Science, Springer-Verlag: Stockholm, Sweden, pp. 85– In Applications of Invariance in Computer Vision, J. Mundy and
96. A. Zisserman (Eds.), Vol. 825 of Lecture Notes in Computer Sci-
Boufama, B. and Mohr, R. 1995. Epipole and fundamental matrix ence, Springer-Verlag: Berlin, pp. 237–256.
estimation using the virtual parallax property. In Proc. of the 5th Hartley, R. 1994. Projective reconstruction and invariants from mul-
Int. Conf. on Computer Vision, IEEE Computer Society Press: tiple images. IEEE Transactions on Pattern Analysis and Machine
Boston, MA, pp. 1030–1036. Intelligence, 16(10):1036–1040.
Carlsson, S. 1994. Multiple image invariance using the double alge- Hartley, R. 1995. In defence of the 8-point algorithm. In Proc. of the
bra. In Applications of Invariance in Computer Vision, J.L. Mundy, 5th Int. Conf. on Computer Vision, IEEE Computer Society Press:
A. Zissermann, and D. Forsyth (Eds.), Vol. 825 of Lecture Notes Boston, MA, pp. 1064–1070.
in Computer Science, Springer-Verlag, pp. 145–164. Hartley, R., Gupta, R., and Chang, T. 1992. Stereo from uncalibrated
Csurka, G. 1996. Modélisation projective des objets tridimensionnels cameras. In Proc. of the IEEE Conf. on Computer Vision and
en vision par ordinateur. Ph.D. Thesis, University of Nice, Sophia- Pattern Recognition, Urbana Champaign, IL, pp. 761–764.
Antipolis, France. Hartley, R. and Sturm, P. 1994. Triangulation. In Proc. of the ARPA
Csurka, G., Zeller, C., Zhang, Z., and Faugeras, O. 1996. Character- Image Understanding Workshop, Defense Advanced Research
izing the uncertainty of the fundamental matrix. Computer Vision Projects Agency, Morgan Kaufmann Publishers, Inc., pp. 957–
and Image Understanding, 68(1):18–36, 1997. Updated version 966.
of INRIA Research Report 2560, 1995. Heeger, D.J. and Jepson, A.D. 1992. Subspace methods for recover-
Deriche, R., Zhang, Z., Luong, Q.-T., and Faugeras, O. 1994. Ro- ing rigid motion I: Algorithm and implementation. The Interna-
bust recovery of the epipolar geometry for an uncalibrated stereo tional Journal of Computer Vision, 7(2):95–117.
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

194 Zhang

Hesse, O. 1863. Die cubische gleichung, von welcher die Lösung Proc. of the 4th Int. Conf. on Computer Vision, IEEE Computer
des problems der homographie von M. Chasles Abhängt. J. Reine Society Press: Berlin, Germany, pp. 540–544. Also INRIA Tech-
Angew. Math., 62:188–192. nical Report 2349.
Huang, T. and Netravali, A. 1994. Motion and structure from feature Rothwell, C., Csurka, G., and Faugeras, O. 1995. A comparison of
correspondences: A review. In Proc. IEEE, 82(2):252–268. projective reconstruction methods for pairs of views. In Proc. of
Huber, P. 1981. Robust Statistics. John Wiley & Sons: New York. the 5th Int. Conf. on Computer Vision, IEEE Computer Society
Laveau, S. 1996. Géométrie d’un système de N caméras. Théorie. Press: Boston, MA, pp. 932–937.
Estimation. Applications. Ph.D. Thesis, École Polytechnique. Rousseeuw, P. and Leroy, A. 1987. Robust Regression and Outlier
Longuet-Higgins, H. 1981. A computer algorithm for reconstructing Detection. John Wiley & Sons: New York.
a scene from two projections. Nature, 293:133–135. Shapiro, L. 1993. Affine analysis of image sequences. Ph.D. The-
Luong, Q.-T. 1992. Matrice Fondamentale et Calibration Visuelle sur sis, University of Oxford, Department of Engineering Science,
l’Environnement-Vers une plus grande autonomie des systèmes Oxford, UK.
robotiques. Ph.D. Thesis, Université de Paris-Sud, Centre d’Orsay. Shapiro, L., Zisserman, A., and Brady, M. 1994. Motion from point
Luong, Q.-T. and Viéville, T. 1994. Canonic representations for the matches using affine epipolar geometry. In Proc. of the 3rd Eu-
geometries of multiple projective views. In Proc. of the 3rd Euro- ropean Conf. on Computer Vision, J.-O. Eklundh (Ed.), Vol. II of
pean Conf. on Computer Vision, J.-O. Eklundh (Ed.), Vols. 800– Lecture Notes in Computer Science, Springer-Verlag: Stockholm,
801 of Lecture Notes in Computer Science, Springer-Verlag: Sweden, pp. 73–84.
Stockholm, Sweden, Vol. 1, pp. 589–599. Shapiro, L. and Brady, M. 1995. Rejecting outliers and estimating
Luong, Q.-T. and Faugeras, O.D. 1996. The fundamental matrix: errors in an orthogonal-regression framework. Phil. Trans. Royal
Theory, algorithms and stability analysis. The International Jour- Soc. of Lon. A, 350:407–439.
nal of Computer Vision, 1(17):43–76. Shashua, A. 1994a. Projective structure from uncalibrated images:
Maybank, S. 1992. Theory of Reconstruction from Image Motion. structure from motion and recognition. IEEE Transactions on Pat-
Springer-Verlag. tern Analysis and Machine Intelligence, 16(8):778–790.
Maybank, S.J. and Faugeras, O.D. 1992. A theory of self-calibration Shashua, A. 1994b. Trilinearity in visual recognition by alignment.
of a moving camera. The International Journal of Computer Vi- In Proc. of the 3rd European Conf. on Computer Vision, J.-O. Ek-
sion, 8(2):123–152. lundh (Ed.), Vols. 800–801 of Lecture Notes in Computer Science,
Mohr, R., Boufama, B., and Brand, P. 1993a. Accurate projective Springer-Verlag: Stockholm, Sweden, pp. 479–484.
reconstruction. In Applications of Invariance in Computer Vision, Spetsakis, M. and Aloimonos, J. 1989. A unified theory of structure
J. Mundy and A. Zisserman (Eds.), Vol. 825 of Lecture Notes in from motion. Technical Report CAR-TR-482, Computer Vision
Computer Science, Springer-Verlag: Berlin, pp. 257–276. Laboratory, University of Maryland.
Mohr, R., Veillon, F., and Quan, L. 1993b. Relative 3d reconstruction Sturm, R. 1869. Das problem der projektivität und seine anwendung
using multiple uncalibrated images. In Proc. of the IEEE Conf. on auf die flächen zweiten grades. Math. Ann., 1:533–574.
Computer Vision and Pattern Recognition, pp. 543–548. Torr, P. 1995. Motion segmentation and outlier detection. Ph.D. The-
More, J. 1977. The levenberg-marquardt algorithm, implementation sis, Department of Engineering Science, University of Oxford.
and theory. In Numerical Analysis, G.A. Watson (Ed.), Lecture Torr, P. and Murray, D. 1993. Outlier detection and motion segmen-
Notes in Mathematics 630, Springer-Verlag. tation. In Sensor Fusion VI, SPIE Vol. 2059, P. Schenker (Ed.),
Mundy, J.L. and Zisserman, A. (Eds.) 1992. Geometric Invariance Boston, pp. 432–443.
in Computer Vision. MIT Press. Torr, P., Beardsley, P., and Murray, D. 1994. Robust vision. British
Odobez, J.-M. and Bouthemy, P. 1994. Robust multiresolution esti- Machine Vision Conf., University of York, UK, pp. 145–154.
mation of parametric motion models applied to complex scenes. Torr, P., Zisserman, A., and Maybank, S. 1995. Robust detection of
Publication Interne 788, IRISA-INRIA Rennes, France. degenerate configurations for the fundamental matrix. In Proc. of
Olsen, S. 1992. Epipolar line estimation. In Proc. of the 2nd Euro- the 5th Int. Conf. on Computer Vision, IEEE Computer Society
pean Conf. on Computer Vision, Santa Margherita Ligure, Italy, Press: Boston, MA, pp. 1037–1042.
pp. 307–311. Torr, P., Zisserman, A., and Maybank, S. 1996. Robust detection of
Ponce, J. and Genc, Y. 1996. Epipolar geometry and linear subspace degenerate configurations whilst estimating the fundamental ma-
methods: A new approach to weak calibration. In Proc. of the trix. Technical Report OUEL 2090/96, Oxford University, Dept.
IEEE Conf. on Computer Vision and Pattern Recognition, San of Engineering Science.
Francisco, CA, pp. 776–781. Triggs, B. 1995. Matching constraints and the joint image. In Proc.
Press, W.H., Flannery, B.P., Teukolsky, S.A., and Vetterling, W.T. of the 5th Int. Conf. on Computer Vision, IEEE Computer Society
1988. Numerical Recipes in C. Cambridge University Press. Press: Boston, MA, pp. 338–343.
Quan, L. 1993. Affine stereo calibration for relative affine shape Tsai, R. and Huang, T. 1984. Uniqueness and estimation of three-
reconstruction. In Proc. of the Fourth British Machine Vision Conf., dimensional motion parameters of rigid objects with curved
Surrey, England, pp. 659–668. surface. IEEE Transactions on Pattern Analysis and Machine In-
Quan, L. 1995. Invariants of six points and projective reconstruction telligence, 6(1):13–26.
from three uncalibrated images. IEEE Transactions on Pattern Viéville, T., Faugeras, O.D., and Luong, Q.-T. 1996. Motion of points
Analysis and Machine Intelligence, 17(1). and lines in the uncalibrated case. The International Journal of
Rey, W.J. 1983. Introduction to Robust and Quasi-Robust Statistical Computer Vision, 17(1):7–42.
Methods. Springer: Berlin, Heidelberg. Weinshall, D., Werman, M., and Shashua, A. 1995. Shape tensors
Robert, L. and Faugeras, O. 1993. Relative 3d positioning and 3d for efficient and learnable indexing. IEEE Workshop on Represen-
convex hull computation from a weakly calibrated stereo pair. In tation of Visual Scenes, IEEE, pp. 58–65.
P1: NTA
International Journal of Computer Vision KL553-03-ZHANG March 2, 1998 15:16

Epipolar Geometry 195

Xu, G. and Zhang, Z. 1996. Epipolar Geometry in Stereo, Motion Zhang, Z. and Faugeras, O.D. 1992. 3D Dynamic Scene Analysis: A
and Object Recognition: A Unified Approach. Kluwer Academic Stereo Based Approach. Springer: Berlin, Heidelberg.
Publishers. Zhang, Z., Deriche, R., Luong, Q.-T., and Faugeras, O. 1994. A
Zeller, C. 1996. Calibration projective affine et euclidienne en vision robust approach to image matching: Recovery of the epipolar ge-
par ordinateur. Ph.D. Thesis, École Polytechnique. ometry. In Proc. International Symposium of Young Investigators
Zeller, C. and Faugeras, O. 1994. Applications of non-metric vi- on Information\Computer\Control, Beijing, China, pp. 7–28.
sion to some visual guided tasks. In Proc. of the Int. Conf. Zhang, Z., Deriche, R., Faugeras, O., and Luong, Q.-T. 1995a. A
on Pattern Recognition, Computer Society Press: Jerusalem, robust technique for matching two uncalibrated images through the
Israel, pp. 132–136. A longer version in INRIA Tech. Report recovery of the unknown epipolar geometry. Artificial Intelligence
RR2308. Journal, 78:87–119.
Zhang, Z. 1995. Motion and structure of four points from one motion Zhang, Z., Faugeras, O., and Deriche, R. 1995b. Calibrating a binoc-
of a stereo rig with unknown extrinsic parameters. IEEE Transac- ular stereo through projective reconstruction using both a calibra-
tions on Pattern Analysis and Machine Intelligence, 17(12):1222– tion object and the environment. In Proc. Europe-China Work-
1227. shop on Geometrical Modelling and Invariants for Computer Vi-
Zhang, Z. 1996a. A new multistage approach to motion and struc- sion, R. Mohr and C. Wu (Eds.), Xi’an, China, pp. 253–260.
ture estimation: From essential parameters to euclidean motion Also appeared in Videre: A Journal of Computer Vision Research,
via fundamental matrix. Research Report 2910, INRIA Sophia- 1(1):58–68, Fall 1997.
Antipolis, France. Also appeared in Journal of the Optical Society Zhang, Z., Luong, Q.-T., and Faugeras, O. 1996. Motion of an un-
of America A, 14(11):2938–2950, 1997. calibrated stereo rig: Self-calibration and metric reconstruction.
Zhang, Z. 1996b. On the epipolar geometry between two images with IEEE Trans. Robotics and Automation, 12(1):103–113.
lens distortion. International Conferences on Pattern Recognition, Zhuang, X., Wang, T., and Zhang, P. 1992. A highly robust estimator
Vienna, Austria, Vol. I, pp. 407–411. through partially likelihood function modeling and its application
Zhang, Z. 1996c. Parameter estimation techniques: A tutorial in computer vision. IEEE Transactions on Pattern Analysis and
with application to conic fitting. Image and Vision Computing, Machine Intelligence, 14(1):19–34.
15(1):59–76, 1997. Also INRIA Research Report No. 2676, Oct. Zisserman, A. 1992. Notes on geometric invariants in vision.
1995. BMVC92 Tutorial.

You might also like