Ego-Motion Estimation Using Optical Flow

In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Puerto Rico, June 1997, pages 457{462
Ego-Motion Estimation Using Optical Flow Fields Observed from

Multiple Cameras
An-Ting Tsaoy Yi-Ping Hungz Chiou-Shann Fuhy Yong-Sheng Cheny
y Dept. of Computer Science and Information Engineering, National Taiwan University, Taiwan
z Institute of Information Science, Academia Sinica, Taipei, Taiwan
Email: hung@iis.sinica.edu.tw
Abstract the left camera. The ow elds generated by the pure

translation and the pure rotation are very similar if the
In this paper, we consider a multi-camera vision eld of view of the camera is not large enough. It is
system mounted on a moving object in a static three- hard to distinguish from this single ow eld whether
dimensional environment. By using the motion ow the motion is pure translation or pure rotation.
elds seen by all of the cameras, an algorithm which Next, let us consider the left and right cameras to-
does not need to solve the point-correspondence prob- gether on this moving vehicle. If there is only trans-
lem among the cameras is proposed to estimate the 3D lation, the optical ows observed from the two cam-
ego-motion parameters of the moving object. Our ex- eras will be the same in scale but opposite in direction.
periments have shown that using multiple optical ow If there is only rotation, the optical ows will be the
elds obtained from dierent cameras can be very help- same in both the scale and the direction. Therefore, if
ful for ego-motion estimation. we can combine the information contained in the two
ow elds appropriately, it will be easier to solve the
1 Introduction ambiguity problem.
There are many methods [2, 9, 14] of using multi-
Three-dimensional ego-motion estimation has been camera vision systems to estimate the 3D motion pa-
one of the most important problems for the applica- rameters. Most of these systems need to solve the spa-
tion of computer vision in mobile robots [10]. Accu- tial point correspondence problems among all of the
rate estimation of ego-motion is very helpful for hu- cameras. Hence, in a multi-camera vision system, the
man computer interaction and short-term control such view elds of the cameras are usually arranged to be
as braking, steering, and navigation. overlapping in order to determine the 3D positions of
In the past, there have been many methods [1, 6, 7, feature points by triangulation. Thus, this kind of ap-
12] which use ow vectors as the basis of their deriva- proach did not enjoy the fact that the estimation ac-
tions for motion estimation. No matter their deriva- curacy can be improved by increasing the eld of view.
tions are linear or nonlinear, the ow vectors are ob- Besides, it is not easy to achieve high correction rate
served by using single camera. However, there are when trying to solve the spatial correspondence prob-
some drawbacks on using only one camera. First, one lem.
can only solve the translation up to the direction, i.e., We propose to solve the 3D ego-motion estimation
the absolute scale cannot be determined. This is the problem by a multi-camera system without overlap-
well known scaling factor problem. Second, the size of ping view elds. This idea implies: 1) we can obtain
view eld substantially aects the accuracy of 3D ego- a very large view eld by using several low-cost small-
motion estimation. Third, the solution is not unique, view-angle cameras, and 2) we do not have to solve the
which is the most serious problem. Given a ow eld spatial correspondence problem. Based on this idea,
observed from one camera, it may be interpreted as two we have developed a new algorithm for 3D ego-motion
dierent kinds of motions. estimation.
Let us consider a moving vehicle with two cameras In this paper, we consider a multi-camera vision sys-
mounted on the left and right sides. Assume there are tem mounted on a moving object (e.q., human or air-
only two types of motions: one is pure translation to- craft) in a static three-dimensional environment. A
ward the front direction and the other is pure rotation special case of 3D ego-motion, vehicle-type 3D motion,
around the vertical axis. Now, Suppose we have only has been discussed in a previous paper [10]. By us-
1
ing the motion ow elds seen by multiple cameras, an where ! and t denote the 3D angular velocity and
algorithm without solving the spatial point correspon- translational velocity of the ego-motion. This 3D rela-
dence problem is proposed to estimate 3D ego-motion tive motion can be expressed with respect to the k-th
parameters of the moving object. camera coordinate system C : k
P_ = ! P t ; (3)
2 Ego-Motion Estimation Algorithms k k k k
where
Let us consider an arbitrary multi-camera congura-
tion shown in Fig. 1. Without loss of generality, their ! R ! and t R [(! b ) + t] : (4)
k
T
k k
T
k k
focal lengths f 's (for k = 1 to K ) are all set to 1. A

global coordinate system (C = fO; I = [e1 je2 je3 ]g)
k
According to the perspective projection model, the
g
attached to the moving object is used to describe the image point p of the 3D point P is
k k
relative position of each camera. The point O is the ori- 2 3

gin, e1 = (1; 0; 0) , e2 = (0; 1; 0) , and e3 = (0; 0; 1) . p
p 4 p 5 = P1 P ;
T T T xk
Then, the k-th camera coordinate system can be ex- k

(5)yk k
pressed by C = fb ; R = [u 1 ju 2 ju 3 ]g. The 3-by- 1 zk
1 vector b denotes the position of O , and R is a

k k k k k k
k k k where P is the z -component of P . After dierenti-

3-by-3 orthonormal matrix. The positions of all the K
zk k
ating both sides of Eq. (5) with respect to time t and

cameras are assumed to have been calibrated before- substituting Eq. (3) into P_ , we have
hand.
k
_
uk3 p_ = (! p ) PP p P1 t :
k k (6)
k
zk
k k
zk zk
u13 Ok
e3 u k1 Applying cross product () to both sides of Eq. (6)
e2 bk by p , we have
k
uk2
O1
b1
O
e1 p [p_ + (! p )] = P1 (p t ): (7)
k k k k k k
zk
If we further apply inner product of t to both sides

b2
k
O2 of Eq. (7), we will have the following fundamental

equation which does not contain the unknown depth
u23 P :
fp [p_ + (! p )]g t = 0:
zk
k k (8) k k k
Figure 1: An arbitrary conguration of multiple cameras. Since we want to estimate ! and t using the observa-
tions from all the K cameras, the above fundamental
At any time instance, we can compute a ow eld for equation is re-expressed in terms of ! and t by using
each of the K cameras. The k-th ow eld is composed Eq. (4):
of N image feature points and their corresponding ow
k h i
vectors. Our goal is to compute the 3D ego-motion R fp p_ + (R ! p ) g (! b + t) = 0: (9)
T
from the K motion ow elds.

k k k k k k
A point P in 3D space can be respectively denoted It is Eq. (9) that we use to determine the 3D motion
by P and P in the two coordinate systems C and C .
k g k parameters without solving the point correspondence
These coordinate vectors must satisfy problem.
2 3 There are N ow vectors associated with the k -th
P camera. We use p and p_ to represent the i-th point
k
P4 P 5=R P +b :
xk
yk (1)
k k k
ki
and ow associated with the k -th camera. Eq. (9) can
ki
P zk
be rewritten to:
Due to the 3D ego-motion, a point P in the static envi- m (h + t) = 0;
T
(10)
ronment will have an instantaneous 3D motion relative ki k
to the global coordinate system C [4]: g where

h i
P_ = ! P t: (2) m R fp p_ + (R ! p ) g;
ki k ki ki
T
k ki
(11)
2
Because the optimal solution !^ can not be obtained
h !b : (12) by searching a close-to-singular error function, we have
to dene a new error function to solve these degenerate
k k
cases. In these degenerate cases, we found that all of

Algorithm 1: Non-Degenerate Case the (h + t)'s are almost parallel to the true translating
direction t ( t =kt k). So Eq. (10) can be
k
According to Eq. (10), we may dene an error func- n;true
reduced into the following form:

true true
tion J10 which depends on the unknowns ! and t:

m t = 0; or m t = 0:
T T
n (17)
X Xk
ki ki
K N
J10 (!; t) km (h + t)k2 :

T
ki k (13) The second form of Eq. (17) indicates that only the
k =1 i=1 translating direction t is recoverable in these degen-
n
We can search for the optimal estimates of ! and t by erate cases.

minimizing J10 . By letting @J10 =@ t = 0, we have Similarly, we can dene an error function J20 as
X Xk N
t = M 1c;
K
(14) J20 (!; t )

n (m T
ki t )2 :
n (18)
k =1 i=1
where
Expanding Eq. (18), we have
X Xk
K N
X Xk
K N
M m m and c
T
m m h: T
X Xk
K N
J20 (!; t ) = t ( m m )t
ki ki ki ki k
T T
k =1 i=1 k =1 i=1 n n ki ki n
(15) k =1 i=1
In the following, the matrix M is sometimes written = t Mt T
as M (!) to emphasize that M is a function of !.

n n
= (t )2 = ; (19)
By substituting Eq. (14) into the t in Eq. (13), we n
have a new error function J1 which only depends on where the matrix M is dened in Eq. (15), and is
the unknown rotation parameter !: an eigenvalue of M (!).
Given an estimate of !, the best estimate of t
X Xk
n
J1 (!) c T
M c+ 1
(m
K N
T
h): 2
(16) should be the eigenvector of M (!) corresponding to
ki k
the smallest eigenvalue. If we substitute the above op-
k =1 i=1
timal t into J20 , a new error function J2 which only
n
Therefore the optimal estimate of ! (denoted by depends on the unknown ! can be dened as
!^ ) is the one that minimizes the error function J1 (!). J2 (!) the smallest eigenvalue of M (!): (20)
Once we have !^ , the optimal estimate of t, denoted ^t,
can be easily obtained by using Eqs. (14) and (15). Here, the 3-by-3 matrix M is a function of !.
Therefore, the optimal estimate of !, denoted by !^ ,
Algorithm 2: Degenerate Case is the one which minimizes the error function J2 (!),
When M (!) is not full-rank, Eq. (14) fails to solve and the optimal estimate of t (denoted by ^t ) is n n
the corresponding t, which also causes a singularity the eigenvector of M (^!) corresponding to the smallest
when calculating J1 . eigenvalue.
By considering Eqs. (10) and (14), we know the
following conditions will make the matrix M (!) have 3 Experimental Results
a close-to-zero eigenvalue. They include:
This section will show some results of real experi-
(1) ! 0, i.e., there is almost no rotational motion. ments. We use a binocular head (referred to as the IIS
head) to simulate a moving object with two cameras
true
(2) All of the h 's are very close to zero. In this case,
k
mounted on it. The IIS head is built for experiments
the whole multi-camera vision system is just like a of active vision, which has four revolute joints and two
monocular vision system with K separate view elds, prismatic joints, as shown in Fig. 2. The two joints
which provides a much larger eld of view. on top of the IIS head are for camera verge or gazing.
The next two joints below them are for tilting and pan-
(3) All of the h 's are parallel to t
k true . ning the stereo cameras. All of the above four joints
3
Table 1: True motion parameters used in Experiment 1.
rotation translation
! (deg/frame) direction mag.(mm)
!x ! y ! z t xn tyn tzn ktk
0.00 0.00 0.00 -0.017 .045 1.00 20.00
Table 2: Rotational parameters estimated in Experiment

1.
!^ (deg=frame) error
Figure 2: A picture of the IIS head. !^ x!^ !^ k!^ ! k
y z true
Both .017 -0.034 .012 .040

Left 0.00 -0.052 -0.052 .073
are revolute and are mounted on an X-Y table which Right -0.012 .22 0.00 .22
is composed of two prismatic joints. The lenses of the
binocular head are motorized to focus on objects at
dierent distances.
To simplify the coordinate transform, we let the camera looks to the right. Table 1 is the true motion
global coordinate system and the left camera coordi- parameters used in this experiment. We estimate the
nate system be identical. Then the left camera coor- ego-motion for three cases: using the left camera only,
dinate system (LCCS ) can be expressed by LCCS = using the right camera only, and using both the left
fb ; R g, where b = 0 and R = I . We let the angle
l l l l and right cameras. The scenes viewed from the left
between the optical axes of left and right cameras be and right cameras are shown in Figs. 4(a) and 4(b),
about 90 . Notice that the z -axis of LCCS is the same respectively. The ow elds observed from the left and
as the optical axis of the left camera, the x-axis points right cameras are shown in Figs. 4(c) and 4(d), re-
toward the left side of the left camera, and the y-axis spectively. Notice that we did not optimize the optical
points toward the upper side of the left camera. The ow, so that some matching errors may occur. Also,
focal lengths of both cameras are 25 mm, and the elds the depth of the scene viewed from left camera is in the
of view are 15. The coordinate systems and camera range from 1.3m to 1.5m, while the depth of the scene
conguration are illustrated in Fig. 3. viewed from the right camera is about 5m.
Table 2 and 3 lists the estimates of the rotational pa-
3.1 Experiment 1 rameters and translational parameters. The results shows
that using both cameras performs better than using only
We let the IIS head move forward, such that the one camera. The performance of using only the left camera
left camera of the IIS head looks ahead and the right is also acceptable, which is because the translation direc-
tion is close to the optical axis of the left camera [10]. If
we use the right camera only, the ambiguity we mentioned
y=y l in section 1 will occur, where the rotational motion about
y -axis can be easily mis-classied as a translational motion
z=z l when the eld of view is relatively small.
x=x l
O=Ol
LCCS
yr Table 3: Translational parameters estimated in Experi-
Left Camera
xr ment 1. (^t ; t n ) is dened as the angle between ^t
n;true n
and t .
t (translational direction)
n;true
90
o
n error
(^t ; t
RCCS
Or
zr
t^
xn t^
yn t^
zn n n;true )
Global Coordinate System = LCCS Right Camera Both .040 .054 1.00 3:30
Left .062 .049 1.00 4:54
Figure 3: The coordinate systems and camera congura- Right -0.99 -0.051 .10 83:39
tion of the real experiments.
4
Table 4: True motion parameters used in Experiment 2.
Table 6: Translational parameters estimated in Experi-
ment 2. (^t ; t
n ) is dened as the angle between ^t
n;true n
rotation translation and t .

t (translational direction)
n;true
! (deg/frame) direction mag.(mm) n error

!x ! y ! z txn tyn tzn ktk t^ xn t^
yn t^
zn (^t ; t
n n;true )
.017 .50 -0.023 -0.62 -0.012 -0.78 1.89 Both -0.052 -0.034 -1.00 36.00
Left -0.049 -0.055 -1.00 36.00
Right 0.083 .99 -0.095 89.00
Table 5: Rotational parameters estimated in Experiment
2.
!^ (deg=frame) error
!^ x !^ y !^ z k!^ ! k true
Both 0.00 .52 -0.029 .025

Left .0057 .57 .0057 .086
Right .017 -0.0057 -0.011 .50
3.2 Experiment 2
(a) (b)
In experiment 2, we let the IIS head pan with a small
angle. Table 4 is the true motion parameters used in this
experiment. Again, we estimate the ego-motion by using
the left camera only, the right camera only, and both the
left and right camera, respectively. The scenes viewed from
the left and right cameras are the same as Figs. 4(a) and
4(b). The ow elds observed from left and right cameras
are shown in Fig. 5(a) and 5(b), respectively.
The result of this experiment is given in Tables 5 and
6. As expected, the performance of using both the left (c) (d)
and right cameras is the best. The ambiguity problem Figure 4: The images and the ow elds used in experiment
occurs again when using the right camera only, which 1: (a) The scene viewed from the left camera. (b) The
is because the depth of the scene viewed from the right scene viewed from the right camera. (c) The optical ow
camera is as far as ve meters and hence the optical eld obtained from the left camera. (d) The optical ow
ow eld looks very similar to the one caused by small eld obtained from the right camera. The ow vectors in
pure translation. Notice that the errors of the trans- the gures are enlarged by a factor of two.
lational direction are larger than the ones obtained in
Experiment 1. This is because the signals of transla-
tion in this experiment are very small. Whenever there
is noise, the signals of translation will be seriously cor-
rupted, and it is hard to estimate the translational pa-
rameters with high accuracy.
4 Conclusion
In this paper, we propose a method for ego-motion
estimation using a multiple-camera vision system. By (a) (b)
combining the information contained in the multiple Figure 5: The ow elds used in experiment 2: (a) The
optical ows observed from dierent cameras, some optical ow eld obtained from the left camera. (b) The
ambiguity problems can be avoided and the accuracy optical ow eld obtained from the right camera. The ow
can be improved. Our algorithm considers two cases vectors in the gures are also enlarged by a factor of two.
separately: non-degenerate case and degenerate case.
5
One potential application of our multiple-camera ap- [8] M. K. Leung and T. S. Huang, \Estimating Three-
proach is the \inside-out" (or \outward looking") head Dimensional Vehicle Motion in an Outdoor Scene Us-
tracker for virtual reality. The current outward looking Stereo Image Sequences," International Journal of
ing head tracker requires structured environments, e.g. Imaging Systems and Technology, Vol. 4, pp. 80-97,
regular pattern in the ceiling. Our approach does not 1992.
require specially-designed environment, as long as the [9] L. Li and J. H. Duncan, \3D Translational Motion and
environment have enough features for computing opti- Structure from Binocular Image Flows," IEEE Trans-
cal ow. actions on Pattern Analysis and Machine Intelligence,
Dierent camera congurations have dierent per- Vol. 15, pp. 739-745, 1993.
formance on ego-motion estimation. In this paper, we [10] L. G. Liou and Y. P. Hung, \Vehicle-Type Ego-Motion
have not analyzed the performance between dierent Estimation Using Multiple Cameras," Proceedings of
camera congurations. Some analysis on nding the Asian Conference on Computer Vision, pp. 166-170,
optimal camera conguration can be found in another Singapore, 1995.
paper [10]. [11] Y. Liu and T. S. Huang, \Vehicle-Type Motion Esti-
mation from Multi-frame Images," IEEE Transactions
Acknowledgements on Pattern Analysis and Machine Intelligence, Vol. 15,
pp. 802-808, 1993.
This work was supported in part by the National [12] K. Prazdny, \Determining the Instantaneous Direction
Science Council of Taiwan, under Grants NSC 86-2745- of Motion from Optical Flow Generated by a Curvilin-
E-001-007, NSC 86-2212-E-002-025, and by Mechanical early Moving Observer," Computer Graphics and Im-
Industry Research Laboratories, Industrial Technology age Processing, Vol. 17, pp. 238-248, 1981.
Research Institute, under Grant MIRL 863K67BN2. [13] T. Vieville, E. Clergue, and P. E. D. S. Facao, \Com-
putation of Ego-Motion and Structure from Visual and
Inertial Sensors Using the Vertical Cue," Proceedings
References of International Conference on Computer Vision, pp.
591-598, Berlin, Germany, 1993.
[1] G. Adiv, \Inherent Ambiguities in Recovering 3D Mo- [14] J. Weng, P. Cohen, and N. Rebibo, \Motion and Struc-
tion and Structure from a Noisy Flow Field," IEEE ture Estimation from Stereo Image Sequence," IEEE
Transactions on Pattern Analysis and Machine Intel- Transactions on Robotics and Automation, Vol. 8, pp.
ligence, Vol. 11, pp. 477-489, 1989. 362-382, 1992.
[2] N. Ayache and F. Lustman, \Trinocular Stereo Vision
for Robotics," IEEE Transactions on Pattern Analysis
and Machine Intelligence, Vol. 13, pp. 73-85, 1991.
[3] T. Broida and R. Chellappa, \Estimation of Object
Motion Parameters from Noisy Images," IEEE Trans-
actions on Pattern Analysis and Machine Intelligence,
Vol. 8, pp. 90-99, 1986.
[4] T. Broida and R. Chellappa, \Estimating the Kine-
matics and Structure of a Rigid Object from a Se-
quence of Monocular Images," IEEE Transactions on
Pattern Analysis and Machine Intelligence, Vol. 13,
pp. 497-512, 1991.
[5] W. Burger and B. Bhanu, \Estimating 3D Egomotion
from Perspective Image Sequences," IEEE Transac-
tions on Pattern Analysis and Machine Intelligence,
Vol. 12, pp. 1040-1058, 1990.
[6] D. J. Heeger and A. D. Jepson, \Subspace Methods
for Recovering Rigid Motion I: Algorithm and Im-
plementation," International Journal of Computer Vi-
sion, Vol. 7, pp. 95-117, 1992.
[7] R. Hummel and V. Sundareswaran, \Motion Param-
eter Estimation from Global Flow Field Data," IEEE
Transactions on Pattern Analysis and Machine Intel-
ligence, Vol. 15, pp. 459-476, 1993.

Ego-Motion Estimation Using Optical Flow

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ego-Motion Estimation Using Optical Flow

Uploaded by

Copyright:

Available Formats

In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Puerto Rico, June 1997, pages 457{462

Ego-Motion Estimation Using Optical Flow Fields Observed from

Abstract the left camera. The ow elds generated by the pure

focal lengths f 's (for k = 1 to K ) are all set to 1. A

relative position of each camera. The point O is the ori- 2 3

Then, the k-th camera coordinate system can be ex- k

pressed by C = fb ; R = [u 1 ju 2 ju 3 ]g. The 3-by- 1 zk

1 vector b denotes the position of O , and R is a

k k k where P is the z -component of P . After di erenti-

ating both sides of Eq. (5) with respect to time t and

If we further apply inner product of t to both sides

O2 of Eq. (7), we will have the following fundamental

from the K motion ow elds.

to the global coordinate system C [4]: g where

cases. In these degenerate cases, we found that all of

According to Eq. (10), we may de ne an error func- n;true

reduced into the following form:

tion J10 which depends on the unknowns ! and t:

J10 (!; t)  km (h + t)k2 :

We can search for the optimal estimates of ! and t by erate cases.

(14) J20 (!; t ) 

as M (!) to emphasize that M is a function of !.

Table 2: Rotational parameters estimated in Experiment

Both .017 -0.034 .012 .040

rotation translation and t .

! (deg/frame) direction mag.(mm) n error

Both 0.00 .52 -0.029 .025

You might also like

k k k where P is the z -component of P . After dierenti-

According to Eq. (10), we may dene an error func- n;true

J10 (!; t) km (h + t)k2 :

(14) J20 (!; t )