You are on page 1of 6

Copyright '<2 IFAC Intelligent Autonomous Vehicles,

IFAC
Sapporo, Japan,2001
[: 0 [>

Publications
,,·ww .elsevier.com/Iocate/ifac

INTER-FRAME MOTION ESTIMATION FOR


MOBILE VISION SYSTEMS

M.B. van Leeuwen * F .C.A. Groen·

* Intelligent Autonomous Systems Group, Faculty of Science,


University of Amsterdam
Kruislaan 403, 1098 SJ Amsterdam, The Netherlands
Phone: +31 20 5257524, Fax: +31 20 5257490
E-mail:{rien.groen}@wins.uva.nl

Abstract: One important issue when dealing with mobile vision systems is that small
rotations of the camera can seriously complicate the interpretation of the observations.
In this paper we present an approach that estimates these rotations in practice, The
motion of our vehicle is predominantly translational. However, irregularities in the
road and small steering actions cause additional roll, pitch and yaw components to
this egomotion. We estimate these rotational components from the observed optical
flow field of background points, using advanced least squares estimation techniques. In
this paper we illustrate the performance of our method in practice. The performance
of different orthogonalleast squares techniques is compared. The results indicate the
robustness and high accuracy of the approach in practice. Copyright © 2001 IFAC

Keywords: Vision, Motion Estimation, Egomotion, Intelligent Vehicles

1. Ir\TRODUCTION steering actions of the driver. Small steering ac-


tions occur when following a (generally smooth)
Artificial vision systems are often used for au- curve in the road and even in the situation of
tonomous vehicle guidance (Bertozzi et al., 2000). driving in a straight lane. Information about the
One important issue when dealing with mobile environment is obtained by a single camera, look-
systems is that small rotations of the camera can ing through the windscreen of the car.
seriously complicate the interpretation of the ob- In the past decade, a lot of research has been
servations. These small rotations introduce addi- done on egomotion estimation. Irani (Irani and
tional motion components to the motion observed Anandan, 1998) and Zhu (Zhu et al. , 1998) ex-
in the image plane. To enable efficient tracking plain how difficult this problem still is nowadays,
and accurate interpretations of the recordings of Egomotion estimation methods can be divided
a camera, these disturbances must be taken into into three groups: direct approaches (Aloimonos
account. In this paper we present an approach and Duric, 1992)(2\'egahdaripour, 1996), instan-
that estimates these rotations in practice. taneous approaches and feature based techniques
\Ve consider the situation where observations are (Arbogast and Mohr, 1992)(Wu et al. , 1995). In
obtained from a vehicle driving on a highway. The direct approaches the motion is determined di-
egomotion will therefore be predominantly trans- rectly from the variations of image brightness
lational. In practice, one has to allow for small roll, patterns in a motion sequence. This, without ex-
pitch and yaw components in the egomotion. Roll plicitly estimating the flow field. Instantaneous
and pitch are mainly the result of irregularities approaches are based on the estimation of the
in the road surface. Yaw can be caused by small optical flow field . Feature based approaches use

179
tracks of a small number of image features (points, niques based on inertial motion filtering (Zhu et
lines, contours) as input for their motion estima- al. , 1998) . Instead of presenting another new prob-
tion algorithm. lem specific motion filtering technique, we illus-
trate the accuracy of our method by limiting the
In both instantaneous approaches and direct tech-
time interval over which accumulation of errors
niques the information is gathered all over the im-
occurs. Therefore, in the experiments the inter-
age. An important advantage of these approaches
frame motion will be predicted over time-intervals
over feature based techniques is that the large
of 30 samples.
amount of data can be employed for redundancy.
A main disadvantage is that real-time implemen- Like most other solutions in vision, the work
tation is still hard to reach. Although the theory presented in this paper depends on the applica-
of direct techniques seems promising, they still tion and its environment. Our solution doesn't
have serious difficulties to establish stability in put many constrains on the scene structure (like
real-world applications. e.g. plane+parallax approaches). We put the con-
strains on the characteristics of the egomotion and
Feature based approaches are generally based on a
concentrate on the robustness and accuracy of our
small number of data points. This makes them ef-
solution. This makes the approach valid and useful
ficient and already enables real-time implementa-
for many applications.
tion nowadays. However, the small number of data
points easily leads to instability. Therefore, heavy
weight is put on additional temporal smooth-
2. INTER-FRAME ROTATIONS
ing methods (motion filtering techniques (Zhu et
al., 1998)) in order to reach robustness in practical 2.1 The optical motion of background points
applications.
Within instantaneous approaches, an important We start by defining the observer frame of the
group uses motion parallax. For example affine camera, (X, Y, Z). The origin of this frame is lo-
motion parallax (Lawn and Cipolla, 1994) or cated at the nodal point of the camera. The z-axis
plane+parallax (Irani and Anandan, 1998). These points in the viewing direction which corresponds
methods exploit the fact that at dept discontinu- with the driving direction of the vehicle. The
ities it is easy to distinguish between the influences relation between the position of points in the im-
of camera rotations and camera translation. How- age plane, (r x, r y), and their position in observer
ever, within these approaches assumptions are put coordinates is approximated by the perspective
on the scene structure that seem to be unfit for projection model. The camera model is illustrated
our application. For example, for our application in figure 1. The projection of observed points on
we can't guarantee the typical requirement of the the image plane is given by:
presence of one dominant planar surface in the Y
background, or a (observable) distant part of the Ty = FZ (1)
background where its depth changes don't play a where F denotes the focal length of the camera.
role.
p
The estimation of the (small) inter-frame rota-
tions of the camera can only be based on a small perspective z
projection:
number of images making the solution more sen- r =FXJZ
r' - FYIZ
sitive for inaccuracies in its input. Therefore we
base our method on the estimation of the optical
flow field of background points, and not on only
a small number of tracks. The redundancy in the
estimated motion field is exploited to achieve a ro- o
bust estimation of the inter-frame rotations. The \ X
nodal point
requirement of the optical flow field of background y
points leads to a computational expensive solution
for our problem. But it is the price we have to pay Fig. 1. Perspective projection model.
in order to achieve robust results.
Ko estimation method will be free of errors. Ac- The camera motion has 6 degrees of freedom: 3
translational components and 3 rotational com-
cumulation of errors in the estimation of the
ponents. We represent the 3 rotation angles (yaw,
small rotations will lead to biased results, and
therefore should be controlled. This is generally pitch and roll) by (0,/3,,) and the 3 translational
done by using an application specific motion fil- components by (Trx' Try, Trz) . For our application
the only translational motion component unequal
tering method. Examples are Kalman based tech-
to 0 is Trz . We consider small rotations between
niques (Dickmanns and Graefe, 1988) and tech-
succeeding frames (less than 5 degrees) and denote

180
these rotations by {~Q:,~B,6:y}. The relative The rotation parameters are found by minimiz-
motion of an observed background point (X, Y, Z) ing the following re-weighted squared error in a
to the point (X', Y', Z'), due to this egomotion is (orthogonal) least squares sense:
approximated by:
E(~Q, ~f3, ~'Y) = L (Ail~Q + Ai2~: + Ai3~'Y - b i )
m 2

[;:]
Z'
= [-~, ~' ~~Q:]
~Q: -~fJ
1
[;]-
Z
[Trz~] (2) i= l
(5)
with
A background point is observed in the first frame
at position (r x, r y) . Its position in the next frame Ail =F(ryi + Tyi) , Ai2 = F(r"'i + T"'i)
is denoted by (rx + r x , ry + ry). c"sing equations Ai3 = - r",i(rri + Tri) - ryi(ryi + Ty;) (6)
1 and 2 we find the following relation between hi =rxiTyi - ryi'rxi

(rx, ry) and (rx + r x , ry + ry): Wi =y'(rri - F~Q + rYi~'Y)2 + (ryi + F~f3 - rri~'Y)2
. F rx - F~Q: + ry~,
rx + rx = See the appendix for details about the derivation
rx~Q: - ry~fJ + F - FTrz/Z (3) of this error function. If the weights W; are known,
. F ry+F~fJ-rx~, minimizing previous error in a simple least squares
ry + ry =
rx~Q: - ry~fJ +F - FTrz/Z sense assumes only measurement errors in h ,
and no inaccuracies in A. Having measurement
From these equations we will estimate the small errors in both A and h, our problem requires
rotations {~Q:, ~fJ, ~ I }. an orthogonalleast squares solution. (Golub and
VanLoan, 1980) and (van Huffel and Vandewalle,
1988) shown how this solution is found using
2.2 Derivation of {~Q:,~fJ,~/} singular value decomposition of the combined
matrix [A'; h'). Here, the accents indicate that A
In equation 3 the term Trz/Z represents the and h are re-weighted by w. Denote the SVD of
time-ta-contact of the background point under this combined matrix by
consideration. This depth dependency can differ
for each background point and won't be known. [A'i b '] =U~VT
We remove this term from previous equations with U =
[Ul, . .. ,Um] , Ui E R m ,uTu = Im
by means of substitution. By doing so, we ob- V = [Vl, . . . ,V4],Vi E]R4 , vTv = 14 (7)
tain a single expression that relates the param-
eters {~Q:, ~fJ, ~ I} to the measurement data of ~= [diag(Al, . .. , A4)] , Al ~ ... ~ A4
O(m-4)x4
the optical motion of each background point,
{r x, r y, r x, r y} . If we estimate the optic flow field
of the background we can estimate the param- The orthogonalleast squares solution is obtained
eters {~Q:,~fJ,~/} from the following (over- by scaling V4 until its last components is -1:
determined) system of linear equations:
(8)
F(ry + Ty)~Q + F(r", + T",)~f3
- (r",(r", + T",) + ry(ry + Ty)) ~'Y = r",Ty - ryTx (4)
Because previous approximation requires knowl-
, 'V background points
edge about the weights w, and vice versa, an
We estimate the optical flow using the method iterative method is required. We therefore initially
of Dev (Dev, 1998). This method, based on the start with an approximation of the weights Wi =
assumptions of conservation of intensity and local J(r;; +r~;). This first guess doesn't depend on the
smoothness, provides us with an estimation of the rotation parameters and is an acceptable initial
movement of a point together with a reliability guess for our application with only small rotation
measure for each estimation. We used the reliabil- parameters (see equation 6 for the definition of
ity measure to reject very unreliable flow vectors the weights). Only a few iterations is necessary
from the field. The remaining flow vectors are (for the experiments we used 4 iteration steps).
used to solve the rotation parameters from 4 in
Even the best least squares method performs
a weighted orthogonal least squares sense.
feebly in the presence of outliers (see e.g. (Torr
In equation 4, measurement errors exist in the and Murray, 1997)). Therefore, in the experiments
measured flow vectors (rx, ry) . Solving the rota- we also present the results of estimation methods
tion parameters from this equation in a simple that involve outlier suppression. We implemented
orthogonalleast squares sense will result in a sub- 2 different approaches. One is a method of the
optimal estimate of the parameters. The variance category of M-estimators that was presented by
of each measurement is heteroscedastic; it depends Huber (1981). The other method is Rousseeuws
on the location in the image plane where the least median of squares (LMS) method (1987).
flow is estimated. For similar problems see e.g. This second method uses a random sampling
(Sampson, 1982) (fitting residuals to conics) and scheme to identify outliers in the data. Both
the articles (Torr and Murray, 1997) and (Weng methods are described and compared in detail in
et al., 1989) (both for estimating the parameters (Torr and Murray, 1997). We will not repeat their
of the so called fundamental matrix). work in this paper.

181
Fig. 2. One frame out of each of the sequences used in the experiments. The background point(s) we
tracked are also indicated.
3. RESt:LTS results of this measured track together with its re-
construction. The upper figure illustrates the hori-
In this section we present some experimental re- zontal position Tx(t) in the image plane. The lower
sults. A camera was solidly placed at the passen- figure the vertical position T y (t). The measured
ger seat looking through the windscreen of the track is represented by a straight line. The recon-
vehicle. This provided us with 50 non-interlaced struction in this figure is based on re-weighted
frames per second, of size 568x768 pixels. We will orthogonal least squares combined with outlier
discuss the results obtained with 2 sequences of suppression using L~lS.
around 500 frames (10 seconds) each. Figure 2
shows one frame from each sequence. In the second sequence we tracked one single
point. Figure 4 shows the results for this sequence.
We estimated the flow field plus reliability for The reconstruction in this figure is based on re-
background points in both sequences. From this weighted orthogonalleast squares without outlier
we derived the inter-frame rotations {Aii(t), A/J(t), suppression.
Ai(t)}. We don't have a ground truth for the
inter-frame rotations. Therefore we used the fol- Table 1 presents the mean squared error (MSE;
lowing approach to test the accuracy of the esti- in x and y-direction) between the observed track
mations. and the reconstruction of this track. This for both
sequences and for different complexities of the
Through each sequence we tracked 1 background estimation procedure. The results of the biased
point. The observed motion of this point is the estimations are also given. The results show that
result of the egomotion of our vehicle and is mod- the main part of the inaccuracy of the biased
eled by equation 3. We use the estimated rotations reproduction is caused by only a small number
{Aii(t),A/J(t),Ai (t)}, together with an approxi- of inaccurate estimates.
mation of the parameter Z(t)/Trz(t) to reproduce
the movement of the tracked background point We see that for both sequences, even with the
in the image plane. The agreement between the biased reproduction , pixel accuracy can be ob-
reproduction and the observed motion is a mea- tained. For sequence 1, the unbiased estimation
sure for the accuracy of the estimated rotations in obtains sub-pixel accuracy if the influence of out-
practice. liers is suppressed. The results obtained without
outlier suppression are also promising. Applying
The parameter Z /Trz represents the so called a more advanced motion filtering technique will
time-to-contact; the time, measured at time t, the be enough to reach sub-pixel accuracy based on
tracked point needs to reach the image plane of these estimations. For sequence 2, we see that the
the camera. During the recordings of both se- unbiased estimation is of sub-pixel accuracy. In
quences our vehicle was driving at approximately this sequence the vehicle reaches after 7.3 sec-
constant speed. Therefore, we could use a linear onds an irregularity in the surface of the road.
approximation for the time-to-contact of the back- This results in serious vertical vibrations of the
ground points. camera. Figure 4 shows that the less accurate
The influence of the accumulation of errors, to- estimation is derived at around 7.4 seconds 1 .
gether with the influence of small model errors But after this inaccurate measurement the vertical
related to the linear approximation of the time- vibration caused by the irregularity is estimated
to-contact, is suppressed by limiting the time- correctly, which illustrates the robustness of the
interval over which the reproduction of the track is algorithm.
calculated. As already explained in the introduc-
tion, each reproduction is based on the position
of the point as it was observed 30 samples before,
plus the 30 previously estimated rotations.
1 :'\ote that the inaccuracy of this estimation affects the
In the first sequence we sequentially tracked 2 accuracy of the reconstruction over an period of 30 frames
points (see also figure 2). Figure 3 shows the (0.6 seconds) after this event

182
estimated track {rx.r) (-) and its reproduction (--)

Ci)-160
~
:9: -180

::><-200

-220L---------~--------~----------~--------~------~
o 2 4 6 8

-60
Ci)
~ -80
:9:
~",,""100

o 2 4 6 8
t [sec]

Fig. 3. Observed motion through sequence 1 and the reconstruction of this motion based on the
estimation of the inter-frame motion, using re-weighted orthogonal least squares combined with
outlier suppression using LMS. The straight lines represent the observed motion of the sequentially
tracked background points. The dashed lines represent the reconstruction of these tracks based on
the estimated rotational parameters.
estimated track {rx.r) (-) and its reproduction (--)

Ci)-120
~
:9:-140
S
..... ><-160

o 2 4 6

-30
Ci)

1-40
-
..... >-
-50

o 2 4 6 8
t [sec]

Fig. 4. Observed motion through sequence 2 and the unbiased reconstruction of this motion based on
the estimation of the inter-frame motion, using re-weighted orthogonalleast squares without outlier
suppression.

4. CONCLUSIONS the necessity of outlier suppression seems to be


little, especially if motion filtering is added to the
We presented a method for estimating the small algorithm.
rotations roll, pitch and yaw. The experiments
illustrate the robustness and high accuracy of
our method in practice. Without putting severe
demands on additional motion filtering, sub-pixel ACKNOWLEDGMENTS
accuracy is obtained in the experiments. It will
depend on the characteristics of the flow field The authors would like to acknowledge the broad-
whether outlier suppression should be added to casting station TV-Noord for providing the video
the estimation scheme or not. For our application, data.

183
Table 1. Overview of the MSE image sequences. Int. J. Comp. Vision 15(1-
unbiased biased
2) , 77- 103.
MSE z MSE. MSE z MSE. Zhu, Z. , G. Xu, Y. Yang and J. S. Jin (1998). Cam-
SEQUENCE 1 (pixels) (pixels) (pixels) (pixels)
Orthogonal LS 0.52 2.20 1.53 9.94 era stabilization based on 2.5d motion estima-
Re-weighted OLS 0.43 1.34 0.73 5.04
+ ou t liers ( Huber) 0.41 0.81 0.71 1.47
tion and inertial motion filtering. In: IEEE
+ outliers ( L:vt S) 0.42 0.67 0.71 1.01 Int. Conf. on Intelligent Vehicles. pp. 329-
unbiased biased
SEQUENCE 2 MSE z MSE v MSE z MSE. 334.
Orthogonal LS 0.87 0.64 1.42 1.11
Re-weighted OLS 0 .55 0 .61 0.57 1.01
+ outliers (H ub er) 0.59 0 .64 0.83 1.22
+ outliers ( LMS ) 0 .75 1.07 1.41 3.78
Appendix A. DERIVATIO~ OF THE ERROR
5. REFERENCES F{j:\,CTIOl\
Aloimonos, J . and Z. Duric (1992) . Active egomo-
In this appendix we will derive the error function
tion estimation: A qualitative approach. In: given by equation 5. We start by repeating equa-
Proc. ECCV-2. pp. 497-510. tion 4:
Arbogast, E. and R. Mohr (1992). An egomotion
F(Ty + T-y)l:.o: + F(Tx + T-x )~{3
algorithm based on the tracking of arbitrary
- (Tx(Tx + T-x) + Ty(Ty + T-y» ~ , = TxT-y - TyT-x (A .l)
curves. In: ECCV92. pp. 467-475.
, V background points
Bertozzi, M., A. Broggi and A. Fascioli (2000) .
Vision-based intelligent vehicles: State of the Measurement inaccuracies are present in the
art and perspectives. J. of Robotics and Au- optic flow estimations (rxi , ryi). The solution
tonomous Systems 32, 1- 16. {~Q(t) , ~,B( t), ~r( t)} we are looking for mini-
Dev, A. (1998). Visual Navigation on Optical mizes the geometric squared distances between
the final solution and the m measurements in the
Flow. PhD thesis. University of Amsterdam. {rxi, ry;} space. The final solution describes the
Dickmanns, E. D. and V. Graefe (1988). Applica- following function in the {rxi ' ry;} space:
tions of dynamic monocular machine vision.
Machine Vision and Applications 1, 241- 26l.
Golub, H. and C. F. Van Loan (1980). An analysis
of the total least squares problem. SIAM J.
Numer. Anal. 17(6), 883- 893.
Irani, M. and P. Anandan (1998). A unified ap-
We now define the following line li in the {rxi ' ry;}
proach to moving object detection in 2d and
space:
3d scenes. IEEE Trans. on Patt. Anal. and
Mach. Intel. 20(6) , 577-589. li .- (A.3)
Lawn, J . M. and R. Cipolla (1994). Robust ego-
motion estimation from affine motion paral- This line goes through the measurement (rxi' r i)
lax.. In: ECCV94. pp. 205- 210. and perpendicularly intersects the function of tbe
!\Tegahdaripour, S. (1996) . Direct computation of final solution (equation A.2) in a point (rsxi ' rsyi) .
the foe with confidence measures. Computer The geometric distance between the measurement
data and the final solution is measured along
Vision and Image Understanding 64, 323- this line and is defined by the distance between
350. (rxi ' r ll i) and (r sxi' rsyi) . The point of intersection
Sampson, P. D. (1982). Fitting conic sections to is easily derived and given by:
'very scattered' data: An iterative refinement
(T-sxi,T-syi) =
of the bookstein algorithm. Computer Graph-
ics and Image Processing 18, 97- 108. -21- (A(T-Yi - r) + T-xi , A2T-yi + AT-xi + r) (A.4)
A +1
Torr, P. H. S. and D. W. Murray (1997) . The de-
velopment and comparison of robust methods We want to minimize the geometrical distances
for estimating the fundamental matrix. Int. J. between the m measurements and the final solu-
Comp. Vision 24(3), 271- 300. tion. This means minimizing the following error
van Huffel, S. and J. Vandewalle (1988) . The ap- function:
m
plication of the total least squares technique
in linear parameter estimation. Journal A E =L (rxi - rsxi)2 + (ryi - r Sy i)2 (A.5)
i=l
29(1) , 17- 26.
Weng, J ., T . S. Huang and ~. Ahuja (1989) . which leads us to the function 5.
~otion and structure from two perspective
views: Algorithms, error analysis and error
estimation. IEEE Trans. on Patt. Anal. and
Mach. Intell. 11,451-476.
Wu, T. H. , R. Chellappa and Q. F. Zheng (1995).
Experiments on estimating egomotion and
structure parameters using long monocular

184

You might also like